OneCharacterCode benchmark V2 - test run report ================================================= Started : 2026-05-11T12:45:11 Finished : 2026-05-11T12:45:18 Duration : 6.63 seconds Machine : LIMITLESS PowerShell: 5.1.26100.8115 Command run: powershell -File run_benchmark_v2.ps1 Optimizations: - Tier 1 = 1-byte tokens for top 8 highest-savings entries - Tier 2 = 2-byte tokens (ESC 0x0E + index) for next 256 entries - Net-gain threshold per entry: skip if (savings_per_use * count) - dict_cost <= 0 - Greedy savings-ranked acceptance with overlap rejection - Multiple phrase lengths: 32, 24, 16, 12, 10, 8, 6, 5, 4, 3 - Reserved-byte escape (0x0F prefix) for source bytes in 0x01-0x08, 0x0E, 0x0F - gzip(OCC V2) measured separately so reader can see whether OCC helped gzip Inputs tested: JSON_AGENT_SAMPLE.json (5,458 bytes) SIMPLE_HTML_SAMPLE.html (6,904 bytes) TEXT_ARTICLE_SAMPLE.txt (8,824 bytes) Results table: File Raw Gzip OCCv1 OCCv2 gz(v2) Recon V2 red% ---------------------------------------------------------------------------------------------- JSON_AGENT_SAMPLE.json 5,458 2,529 6,188 4,324 2,774 PASS 20.78% SIMPLE_HTML_SAMPLE.html 6,904 2,929 8,738 5,359 3,315 PASS 22.38% TEXT_ARTICLE_SAMPLE.txt 8,824 3,529 10,714 6,800 4,000 PASS 22.94% V2 vs V1 (negative number means V2 is smaller than V1, i.e. better): JSON_AGENT_SAMPLE.json v1= 6,188 v2= 4,324 delta=30.12% SIMPLE_HTML_SAMPLE.html v1= 8,738 v2= 5,359 delta=38.67% TEXT_ARTICLE_SAMPLE.txt v1= 10,714 v2= 6,800 delta=36.53% Reconstruction status (SHA-256 round-trip): JSON_AGENT_SAMPLE.json PASS (raw=0dd8cf66dc4a8e96... recon=0dd8cf66dc4a8e96...) SIMPLE_HTML_SAMPLE.html PASS (raw=c26af82d440daab5... recon=c26af82d440daab5...) TEXT_ARTICLE_SAMPLE.txt PASS (raw=f410ca5401c070c2... recon=f410ca5401c070c2...) Limitations: - V2 is still a prototype symbolic dictionary encoder, NOT the final patented OneCharacterCode engine. Honest results only. - On short KB-scale inputs gzip and Brotli are mature and very hard to beat. A simple dictionary-substitution prototype - even an improved one - typically will not match them. Results are reported as-is. - V2 improves on V1 in candidate selection, dictionary overhead, and token width. The improvement should be visible in the v2-vs-v1 column. Compare against gzip(raw) to see the gap to standard compression. - gzip(OCC V2) is included so the reader can see whether the prototypes output is more or less compressible to gzip than the raw input. Next engine improvements (recommended order): - Iterative refinement: after each accepted entry, recompute candidate counts in the working text and re-rank the remaining candidates. - Variable-width tokens (3 tiers): single-byte for the top, two-byte for the middle, three-byte for the long tail. - Structural pattern templates: HTML tag opens/closes, JSON key:value headers, common punctuation runs, indentation runs. - Replace the prototype with the production OneCharacterCode engine and rerun the same harness for an apples-to-apples comparison. - Independent third-party reproduction on the same inputs. All inputs and outputs are hashed in SHA256_MANIFEST_V2.txt. Reproducibility instructions: README_REPRODUCE_V2.txt.