OneCharacterCode Benchmark V4

Persistent dictionary, delta updates, system-level data savings — three separate claims, reported separately.

Loading test-run metadata…

Required disclosure. “V4 separates standalone file compression from system-level data savings. Standalone gzip may still win on individual files. OneCharacterCode's stronger target is repeated-use local execution: after installation, the device can reuse persistent dictionaries and receive only symbolic or delta updates instead of re-downloading full payloads.”

Three tables on this page report three different claims. Reconstruction (SHA-256 round-trip) must PASS for any number on this page to count.

1. What V4 tests

Mode A — Standalone file compression. Per-file V4 encoder vs gzip. Cumulative bytes per file set.
Mode B — Persistent shared dictionary. Build one dictionary from a training subset, install it once, then ship later files as body-only symbolic packets. The dictionary is not re-transmitted. Compares cumulative wire bytes against cumulative raw-download cost.
Mode C — Delta updates. For versioned files, transmit only the changed middle bytes between common prefix and common suffix, inside an OCC carrier. Receiver applies the delta locally and verifies SHA-256.

2. Standalone compression results (Mode A)

Cumulative bytes per set. OCC V4 may not beat gzip here — this column is the same kind of comparison the V1/V2/V3 pages made.

Per-file encode summed across the set. Winner is whichever cumulative is smaller. Reconstruction must PASS.
File Set	Files	Raw Bytes	Gzip(raw) Bytes	OCC V4 Bytes	Winner	OCC %	Reconstruction
Loading standalone results…

3. Persistent dictionary results (Mode B)

Scope of this test. This is not file compression. This models a device that has already installed a shared dictionary. From that point on, every transferred file ships as a body-only symbolic packet that references the installed dictionary. The dictionary is not re-transmitted on later transfers.

Wire bytes (sender side) = one-time install + sum of per-file packets. Baseline = sum of raw file bytes pulled on every session.

File Set	Files	Normal Full Download (bytes)	Initial Install + Symbolic Packets (bytes)	Bytes Saved	Percent Saved	Reconstruction
Loading persistent-dictionary results…

4. Delta update results (Mode C)

Scope of this test. This is delta-sync system bandwidth, not file compression. The receiver already has the previous version of the file. For each new version, only the changed middle bytes are transmitted inside an OCC carrier. The receiver splices: common prefix + decoded middle + common suffix.

File Set	Versions	Full Download (bytes)	Delta Sync (bytes)	Bytes Saved	Percent Saved	Reconstruction
Loading delta-update results…

5. Reconstruction verification

Every row in the three tables above is gated by SHA-256 reconstruction. For every transmitted carrier, packet, or delta, the receiver decodes back to bytes and the hash is compared to the original. If reconstruction FAILs, the byte counts are reported but the row does not constitute a claim.

Mode A: per-file self-contained OCC4 carrier → decode → SHA-256 match.
Mode B: install dictionary, OCC4P packet → decode using installed dict → SHA-256 match per file.
Mode C: previous version + delta → apply delta + decode middle → SHA-256 match against the new version.

6. Why this matters for offline apps

The honest finding from V1, V2, V3, and the stacked test was that on short, single-file inputs, gzip's LZ77 + Huffman pipeline already extracts most of the redundancy; a dictionary substitution prototype can match or slightly help on cumulative numbers but cannot generally beat gzip on per-file size. That is still true in V4 Mode A.

The much stronger claim — and the reason OneCharacterCode is built the way it is — is what Modes B and C measure. Once a device has installed the shared dictionary, it does not need to re-receive that dictionary on subsequent transfers. Repeated structural payloads (UI chrome, JSON schemas, log line frames, agent state envelopes) can ship as small symbolic packets, and versioned files can ship as deltas. The cumulative wire savings across many sessions can be very high even though no individual file shrinks dramatically.

This is also why these tables are kept physically separate. “X percent saved” in Mode B is a sync-model result; it is not a compression ratio for any one file. Reading the page that way would be a misreading.

7. Honest limitations

Mode A standalone compression: still a prototype encoder with no entropy coder downstream. Gzip is mature and usually wins on short inputs.
Mode B persistent dictionary: the savings depend on how much cross-file redundancy the training corpus actually shares with the later corpus. Heterogeneous corpora will show much smaller gains than the structured ones generated here.
Mode C delta updates: the prefix/suffix delta scheme is intentionally simple. Sophisticated delta encoders (xdelta, bsdiff) would do better on files where the change is in the middle of repeated regions.
All inputs here are synthetic (deterministic generators) plus the V3 sample inputs. Real-world corpora will differ.
Only gzip was used as the raw-side baseline. zstd / xz / lzma still need to be compared.
The encoder is the V4 prototype, not the production OneCharacterCode engine.

8. Reproducibility downloads

benchmark-results-v4.jsonMode A (standalone) results system-persistent-results-v4.jsonMode B (persistent dictionary) results delta-update-results-v4.jsonMode C (delta update) results benchmark-test-run-v4.txtHuman-readable run report run_benchmark_v4.ps1The V4 benchmark script (PS 5.1 compatible) README_REPRODUCE_V4.txtReproducibility guide SHA256_MANIFEST_V4.txtHashes of every V4 file ← V3 stacked test(gzip(raw) vs gzip(OCC V3) transport) ← V3 page(file compression + bandwidth simulation)