OneCharacterCode Real-World Benchmark Demo
A reproducible compression and reconstruction test comparing raw files, standard compression, and OneCharacterCode symbolic encoding.
Results
| Test File | Raw Bytes | Gzip Bytes | Brotli Bytes | OCC Symbolic Bytes | OCC Reduction | Reconstruction | SHA-256 Match |
|---|---|---|---|---|---|---|---|
| Loading results… | |||||||
Flow
Downloads
Technical notes
Symbolic encoding. A symbolic encoder finds recurring patterns in the input and assigns each pattern a short symbol. The output is a small dictionary plus a body in which patterns are replaced by symbols. The encoded form is then expanded back to the original at read time. Standard compressors such as gzip and Brotli include this idea among many others (LZ77 sliding window, Huffman coding, context modeling), which is why they are mature and hard to beat on short, English-like inputs.
Spiral-inspired mapping. The full OneCharacterCode design treats the symbol space as a navigable structure (think of an index that spirals outward by frequency) so that the carrier file references a much larger external lexicon rather than carrying its own dictionary. That design is experimental architecture; this page does not demonstrate it. This page demonstrates only the simpler prototype substring-dictionary step that the production engine builds on.
Local reconstruction. Every test on this page is end-to-end verified: the encoded file is decoded back into bytes, and those bytes are SHA-256 hashed and compared against the SHA-256 hash of the original input. A compression number with a failed reconstruction is meaningless and is reported as FAIL.
Benchmark limitations. The three inputs here are short (kilobyte scale). Compression behavior changes substantially on longer inputs and on different content types. Brotli is part of .NET Core 2.1+ and is not present in Windows PowerShell 5.1 / .NET Framework 4.x; the Brotli column reads n/a when the runtime does not provide it. Future runs should include zstd, xz, lzma, and larger and more diverse corpora.
Independent testing. The point of this page is reproducibility. Any reader can download README_REPRODUCE.txt, run the same PowerShell script on the same inputs, and confirm the same hashes and the same byte counts.