README_REPRODUCE.txt
========================================================================
OneCharacterCode Real-World Benchmark Demo  -  reproducibility guide

This benchmark is meant to be reproducible.  Any reader with PowerShell
on Windows (or PowerShell 7+ on macOS / Linux) can run the same script
on the same inputs and verify the same hashes and byte counts.

------------------------------------------------------------------------
WHAT'S IN THIS FOLDER
------------------------------------------------------------------------

  run_benchmark.ps1               The benchmark script.
  inputs\                         Three input files used by the script:
                                    SIMPLE_HTML_SAMPLE.html
                                    JSON_AGENT_SAMPLE.json
                                    TEXT_ARTICLE_SAMPLE.txt
  outputs\                        Per-file compressed and reconstructed
                                  byproducts (.gz / .br / .occ /
                                  .reconstructed).
  benchmark-results.json          Machine-readable results.
  benchmark-test-run.txt          Human-readable run report.
  benchmark-source-sample.html    Copy of the HTML input for download.
  benchmark-reconstructed-output.html   Copy of the reconstructed HTML.
  benchmark.html / .js / .css     The public web demo page.
  SHA256_MANIFEST.txt             SHA-256 of every file in this folder.
  README_REPRODUCE.txt            This file.

------------------------------------------------------------------------
HOW TO RUN
------------------------------------------------------------------------

Prerequisites:

  - Windows PowerShell 5.1+  (already installed on Windows 10/11).
    PowerShell 7+ also works.
  - No internet connection required.
  - No third-party tools required.

Steps:

  1.  Open PowerShell.
  2.  Change to this folder:
        cd "Y:\OneCharacterCode\BENCHMARK_DEMO_2026-05-11"
  3.  Run:
        powershell -File run_benchmark.ps1
      Or on PowerShell 7+:
        pwsh -File run_benchmark.ps1
  4.  The script prints per-file results and writes:
        benchmark-results.json
        benchmark-test-run.txt

Optional:  Brotli compression is part of .NET Core 2.1+ and is
available in PowerShell 7+.  On Windows PowerShell 5.1 (.NET Framework
4.x) Brotli is not available; the script prints "br : not available
in this runtime" and the Brotli column in the results reads "n/a".

------------------------------------------------------------------------
HOW TO VERIFY HASHES
------------------------------------------------------------------------

Every file in this folder is listed in SHA256_MANIFEST.txt with its
SHA-256 hash.  To re-verify after a copy or transfer, run:

  Get-ChildItem -File | ForEach-Object {
    $h = (Get-FileHash -Algorithm SHA256 -Path $_.FullName).Hash
    "$($_.Name) $h"
  }

Compare each line against the corresponding entry in
SHA256_MANIFEST.txt.  If you trust the manifest, you can also use:

  Get-Content SHA256_MANIFEST.txt

and match by eye for the file you care about.

The benchmark script itself performs the most important verification:
for each input file it computes the SHA-256 of the original bytes,
encodes the file with the prototype OCC encoder, decodes the OCC
output back to bytes, and recomputes the SHA-256.  The reconstruction
status is PASS only when the two hashes are identical.  Any
compression number that comes with a FAIL is meaningless and should
be discarded.

------------------------------------------------------------------------
HOW TO INTERPRET THE RESULTS
------------------------------------------------------------------------

For each test file the results table shows:

  Raw Bytes             Size of the original input.
  Gzip Bytes            Size after gzip (standard).
  Brotli Bytes          Size after Brotli (standard).  Reads n/a on
                        runtimes without Brotli support.
  OCC Symbolic Bytes    Size after the prototype symbolic encoder.
  OCC Reduction         Percent of original removed by OCC.  Positive
                        means smaller; negative means larger.
  Reconstruction        PASS if OCC decoded back to the original bytes,
                        FAIL otherwise.
  SHA-256 Match         The first 12 hex digits of the matching hash.

The prototype symbolic encoder is a straightforward substring-
dictionary replacer.  It is NOT the final patented OneCharacterCode
engine.  It is included so the reconstruction round-trip can be
verified locally.  On short, English-like inputs gzip and Brotli are
mature and very hard to beat; a simple substring dictionary usually
will not match them.  The honest result is reported as-is.

------------------------------------------------------------------------
WHAT'S DELIBERATELY NOT IN THIS BENCHMARK
------------------------------------------------------------------------

  - The production OCC engine.  Replacement of the prototype with the
    production engine is the natural next step.
  - Larger inputs.  These three samples are KB-scale.  MB-scale and
    GB-scale corpora will be added in future runs.
  - Other compressors.  zstd, xz, lzma should be added for a fairer
    landscape view.
  - Streaming / chunked / random-access compression.  This test only
    measures full-file round-trip.

------------------------------------------------------------------------
QUESTIONS AND BUG REPORTS
------------------------------------------------------------------------

The intent is reproducibility.  If you ran the script and got
different byte counts or a FAIL where this run got a PASS, the
machine, the runtime version, or the input file may differ.  Compare
your SHA-256 hashes of the input files against the entries in
SHA256_MANIFEST.txt.

End of README_REPRODUCE.txt.