OneCharacterCode V3 Stacked Compression Test
OCC + Gzip Transport Size — the fair comparison: gzip(raw) vs gzip(OCC V3 carrier), with full receiver-side reconstruction verified.
What is measured
stacked transport: raw → OCC V3 → gzip(OCC V3)
receiver: gzip(OCC V3) → gunzip → decode V3 → raw bytes
verification: SHA-256(original) ≡ SHA-256(reconstructed)
For each file we report the actual transmitted byte counts on both paths, the winner, and the receiver-side roundtrip status. The roundtrip must PASS and the SHA-256 hashes must match exactly for any number on this page to count.
Rule. An OCC win is declared only when gzip(OCC V3) is strictly smaller than gzip(raw) and the receiver-side roundtrip passes. Otherwise the page reports “Gzip(raw) still wins for this file.”
Results
| File | Raw Bytes | Gzip(raw) Bytes | OCC V3 Bytes | Gzip(OCC V3) Bytes | Best Transport Winner | Gzip(OCC) vs Gzip(raw) | Roundtrip | SHA-256 Match |
|---|---|---|---|---|---|---|---|---|
| Loading stacked-compression results… | ||||||||
Flow
3-tier dictionary
Downloads
Technical notes
Why this test exists. The earlier V3 page reported file-compression numbers (OCC V3 carrier vs. raw) and a separately-labeled bandwidth simulation. It did not answer the natural follow-up: in real transport you don't ship the OCC carrier bare; you also gzip it on the wire. The fair comparison is therefore gzip(raw) vs. gzip(OCC carrier). That is what this page measures.
What the result means if Gzip(raw) wins. The OCC V3 dictionary substitutions consume some of the redundancy that gzip would otherwise have exploited. When the input is short and English-like, gzip alone already captures most of the redundancy with its LZ77 window + Huffman coding; running OCC first removes structure that gzip needed, and the second-stage gzip cannot make up the gap. This is a real cost and we report it as a real loss.
What it would take for OCC to win this test. Inputs where the dictionary entries genuinely add information gzip's 32 KB window cannot see — very long range repeats, larger inputs, or structured corpora — should narrow or flip the comparison. So should an entropy coder downstream of the dictionary that does not pre-collide with gzip's redundancy model. Both are listed under “next optimization” below.
What this test is not. Not a comparison against zstd / xz / lzma. Not a claim about larger or differently structured inputs. Not the production OneCharacterCode engine. The encoder used here is the exact V3 prototype that produced the public V3 file-compression numbers.