README_REPRODUCE.txt ======================================================================== OneCharacterCode Real-World Benchmark Demo - reproducibility guide This benchmark is meant to be reproducible. Any reader with PowerShell on Windows (or PowerShell 7+ on macOS / Linux) can run the same script on the same inputs and verify the same hashes and byte counts. ------------------------------------------------------------------------ WHAT'S IN THIS FOLDER ------------------------------------------------------------------------ run_benchmark.ps1 The benchmark script. inputs\ Three input files used by the script: SIMPLE_HTML_SAMPLE.html JSON_AGENT_SAMPLE.json TEXT_ARTICLE_SAMPLE.txt outputs\ Per-file compressed and reconstructed byproducts (.gz / .br / .occ / .reconstructed). benchmark-results.json Machine-readable results. benchmark-test-run.txt Human-readable run report. benchmark-source-sample.html Copy of the HTML input for download. benchmark-reconstructed-output.html Copy of the reconstructed HTML. benchmark.html / .js / .css The public web demo page. SHA256_MANIFEST.txt SHA-256 of every file in this folder. README_REPRODUCE.txt This file. ------------------------------------------------------------------------ HOW TO RUN ------------------------------------------------------------------------ Prerequisites: - Windows PowerShell 5.1+ (already installed on Windows 10/11). PowerShell 7+ also works. - No internet connection required. - No third-party tools required. Steps: 1. Open PowerShell. 2. Change to this folder: cd "Y:\OneCharacterCode\BENCHMARK_DEMO_2026-05-11" 3. Run: powershell -File run_benchmark.ps1 Or on PowerShell 7+: pwsh -File run_benchmark.ps1 4. The script prints per-file results and writes: benchmark-results.json benchmark-test-run.txt Optional: Brotli compression is part of .NET Core 2.1+ and is available in PowerShell 7+. On Windows PowerShell 5.1 (.NET Framework 4.x) Brotli is not available; the script prints "br : not available in this runtime" and the Brotli column in the results reads "n/a". ------------------------------------------------------------------------ HOW TO VERIFY HASHES ------------------------------------------------------------------------ Every file in this folder is listed in SHA256_MANIFEST.txt with its SHA-256 hash. To re-verify after a copy or transfer, run: Get-ChildItem -File | ForEach-Object { $h = (Get-FileHash -Algorithm SHA256 -Path $_.FullName).Hash "$($_.Name) $h" } Compare each line against the corresponding entry in SHA256_MANIFEST.txt. If you trust the manifest, you can also use: Get-Content SHA256_MANIFEST.txt and match by eye for the file you care about. The benchmark script itself performs the most important verification: for each input file it computes the SHA-256 of the original bytes, encodes the file with the prototype OCC encoder, decodes the OCC output back to bytes, and recomputes the SHA-256. The reconstruction status is PASS only when the two hashes are identical. Any compression number that comes with a FAIL is meaningless and should be discarded. ------------------------------------------------------------------------ HOW TO INTERPRET THE RESULTS ------------------------------------------------------------------------ For each test file the results table shows: Raw Bytes Size of the original input. Gzip Bytes Size after gzip (standard). Brotli Bytes Size after Brotli (standard). Reads n/a on runtimes without Brotli support. OCC Symbolic Bytes Size after the prototype symbolic encoder. OCC Reduction Percent of original removed by OCC. Positive means smaller; negative means larger. Reconstruction PASS if OCC decoded back to the original bytes, FAIL otherwise. SHA-256 Match The first 12 hex digits of the matching hash. The prototype symbolic encoder is a straightforward substring- dictionary replacer. It is NOT the final patented OneCharacterCode engine. It is included so the reconstruction round-trip can be verified locally. On short, English-like inputs gzip and Brotli are mature and very hard to beat; a simple substring dictionary usually will not match them. The honest result is reported as-is. ------------------------------------------------------------------------ WHAT'S DELIBERATELY NOT IN THIS BENCHMARK ------------------------------------------------------------------------ - The production OCC engine. Replacement of the prototype with the production engine is the natural next step. - Larger inputs. These three samples are KB-scale. MB-scale and GB-scale corpora will be added in future runs. - Other compressors. zstd, xz, lzma should be added for a fairer landscape view. - Streaming / chunked / random-access compression. This test only measures full-file round-trip. ------------------------------------------------------------------------ QUESTIONS AND BUG REPORTS ------------------------------------------------------------------------ The intent is reproducibility. If you ran the script and got different byte counts or a FAIL where this run got a PASS, the machine, the runtime version, or the input file may differ. Compare your SHA-256 hashes of the input files against the entries in SHA256_MANIFEST.txt. End of README_REPRODUCE.txt.