Stas Kulesh — x.com/staskulesh
April 21, 2026 · short summary
Abstract. Tamp v0.8 ships a seventeen-stage HTTP-proxy compression pipeline exposed as a nine-level ladder (L1–L9). In a controlled live sweep of 12 scenarios × 18 configurations = 216 A/B calls routed through OpenRouter and judged by Claude Sonnet Haiku 4.5, every configuration preserves task-completion quality on 100% (216/216) of tasks. At the balanced default (L5) we measure 45.34% bytes / 47.56% tokens saved; the top of the ladder (L9) reaches 45.39% / 47.61%. The v0.5-baseline reaches 45.15% / 47.35%, so the L9 delta is +0.24 percentage points — honest evidence that four of v0.8's new stages are session-scoped and invisible to single-turn micro-fixtures.
| Level | Headline Stages | Bytes Saved | Tokens Saved | Lossy |
|---|---|---|---|---|
| L1 | minify | 18.84% | 25.92% | no |
| L2 | + whitespace, strip-lines | 19.37% | 26.49% | no |
| L3 | + cmd-strip | 19.57% | 26.71% | no |
| L4 | + dedup, diff | 19.57% | 26.71% | no |
| L5 | + read-diff, prune, toon | 45.34% | 47.56% | no* |
| L6 | + llmlingua | 45.34% | 47.56% | yes |
| L7 | + graph, br-cache | 45.34% | 47.56% | yes |
| L8 | + strip-comments, textpress | 45.39% | 47.61% | yes |
| L9 | + disclosure, bm25-trim, foundation-models | 45.39% | 47.61% | yes |
Figure 1. Tokens saved by ladder level. The lossless floor (L1–L4) caps at ~27%; L5 unlocks the 47.6% tier by enabling TOON, prune, and read-diff simultaneously.
We evaluated 18 compression configurations (3 presets, 1 v0.5 whitelist baseline, 5 leave-one-out variants, 9 ladder levels) against 12 scenarios in live A/B mode. Each (config, scenario) pair issued two real calls through OpenRouter using anthropic/claude-haiku-4.5 as the judge so that token accounting and quality verdicts come from a model separate from the compression target. Payloads covered small and large JSON, tabular data, source code, line-numbered Read output, errors, multi-turn dialogues, lockfiles, and a duplicate-read fixture. The full protocol, fixtures, and 216 A/B outcomes are committed at bench/results/level-sweep.json.
Configurations were arranged in three orthogonal cuts. The level ladder (L1–L9) builds up stage sets cumulatively. The leave-one-out sweep drops one of the new-in-v0.8 stages at a time from the aggressive preset to measure marginal contribution. The v0.5 baseline reproduces the v0.5-era stage list as a regression anchor. Quality is scored by a qualityOK match function applied to control and treatment responses; a single failure would be reported here.
Only cmd-strip shows a measurable marginal contribution (−0.20% bytes when dropped from aggressive). read-diff, br-cache, disclosure, and bm25-trim all register exactly 0.00% delta. This is not evidence they do nothing — they are session-scoped stages that only fire across multiple requests (cache hits, re-reads, progressive reveal, ranked retrieval from history). Single-turn A/B fixtures cannot exercise those pathways.
The 12 scenarios are single-turn and below ~18 KB. Four of the five new v0.8 stages are session-scoped and cannot be measured on this corpus — the L9 vs v0.5 baseline delta (+0.24 percentage points) is an artifact of that measurement gap, not a claim that the new stages are idle. Session-replay fixtures that exercise re-reads, Brotli-cache hits, and multi-turn disclosure are the top item on the next benchmark iteration. Results also depend on a single judge (Sonnet Haiku 4.5) and a single provider route (OpenRouter); cross-judge triangulation remains future work.