Why XZ Is Still the King of Compression Ratio
When people talk about practical compression formats, the usual tradeoff story goes something like this:
gzipis old, simple, and fast enoughzstdis the modern sweet spotxzis slow, but gives the smallest files
That story is broadly true.
But it leaves out the most interesting question:
Why does xz get files so much smaller than the others?
It is tempting to answer with a vague statement like:
- “xz just compresses harder”
That is not wrong, but it is not very satisfying.
I wanted a more technical answer.
This post is about an experiment I built to figure out where xz’s advantage really comes from. Is it mostly because xz finds better matches? Or because it encodes the same matches more efficiently? Or both?
The short answer is:
- xz has the strongest parser in this experiment
- xz also has the strongest backend
- xz wins on both sides
That is why it remains the king of compression ratio.
At A Glance
| Question | Short answer |
|---|---|
| Why is xz so small? | Because it wins at both parsing and backend coding |
| What was the biggest surprise? | Even after forcing xz into a Deflate-like 32 KiB window, its parser was still much stronger |
| What was the main lesson? | Window size matters a lot, but parser quality matters far more than I expected |
| What was still fastest overall? | zstd |
Background: These Formats Are More Related Than They First Appear
Not everyone reading this needs to be deep into compression internals, so it is worth starting with the common ground.
All three families in this experiment:
gzip/ Deflatezstdxz/ LZMA2
are built on the same broad idea: LZ77-style matching.
The basic concept is simple:
- sometimes emit a literal byte
- sometimes say “copy
lengthbytes fromdistancebytes ago”
That is the core LZ77 model.
So even though these formats feel very different in practice, they all spend a large part of their work doing some variant of:
- scan the input
- find repeated substrings
- turn the input into a stream of literals and matches
That common structure is why I could build a shared experiment.
I used a generic intermediate representation, or IR, that stores:
- literal runs
- matches of the form
(length, distance)
At a high level, that IR is just a format-neutral LZ77 token stream.
That gave me a common language all three compressors could share.
Parser vs Backend
These are not standard end-user terms, so before going further, I should define what I mean.
When I say parser, I mean:
- the part of the compressor that looks at the input bytes
- finds repeated substrings
- decides where to emit literals versus matches
- produces an LZ77-style token stream
In plain English, the parser answers:
What repeated structure is in this file, and how should I describe it?
When I say backend, I mean:
- the part of the compressor that takes that token stream
- turns it into the final compressed bitstream
- decides how efficiently literals, lengths, distances, repeats, and related symbols are encoded
In plain English, the backend answers:
Given this token stream, how do I encode it as compactly as possible?
So once I had that shared IR, I could split each compressor into two conceptual halves:
- a parser
- input bytes -> generic LZ77-style IR
- a backend
- generic LZ77-style IR -> final compressed bytes
That distinction is the whole point of the post.
When people say one compressor is better than another, they usually mean the whole package. But that whole package actually contains two different sources of strength:
- how good it is at finding and arranging matches
- how good it is at encoding those matches once it has them
I wanted to measure those separately.
The Experiment
Once each family had a parser and a backend, I could build a 3 x 3 matrix:
| Parser | gzip backend | zstd backend | xz backend |
|---|---|---|---|
| gzip parser | gzip -> gzip | gzip -> zstd | gzip -> xz |
| zstd parser | zstd -> gzip | zstd -> zstd | zstd -> xz |
| xz parser | xz -> gzip | xz -> zstd | xz -> xz |
That matrix lets me ask two very clean questions:
- If I keep the parser fixed, which backend is best?
- If I keep the backend fixed, which parser is best?
The Important Constraint
This was not a comparison of fully native, unconstrained gzip vs zstd vs xz.
To make the cross-family swaps possible, I forced all three into a shared Deflate-like envelope:
- window size: about
32 KiB - maximum match length:
258 - maximum replay-safe match distance for the current gzip backend:
32506
That matters a lot.
It means this experiment is not asking:
- which stock compressor wins with all of its native advantages?
Instead it is asking:
- under roughly the same match-space constraints, which parser is better?
- under the same token stream, which backend is better?
That is a much sharper question.
The Result: XZ Wins Everywhere
On a 10 MiB prefix of linux.tar, the default-level matrix produced:
| Parser | gzip backend | zstd backend | xz backend |
|---|---|---|---|
| gzip | 2,502,165 | 2,499,651 | 2,411,412 |
| zstd | 2,920,678 | 2,883,543 | 2,725,860 |
| xz | 2,468,361 | 2,431,131 | 2,291,404 |
This table is the whole story in one glance.
Two facts stand out immediately:
xzparser wins every backend column.xzbackend wins every parser row.
That means xz does not win for just one reason. It wins for two reasons at once:
- it produces a better LZ77 token stream
- it also encodes that token stream better
This was the central conclusion of the whole project.
Why That Result Is So Interesting
The xz backend result is impressive, but not especially shocking.
LZMA-style backend coding is rich and expensive. Most people already expect xz to be strong once it has a good parse.
The surprising part was the parser result.
I expected xz to lose much more of its edge after I cut it down to a Deflate-like 32 KiB window.
It did not.
That means xz’s ratio advantage is not just “it has a huge dictionary.”
Large window helps, absolutely. But parser quality is also doing a huge amount of work.
What The Experiment Says About Each Family
Gzip
At its strongest settings, gzip is not weak. It searches hard:
- long chain search
- lazy matching
- full
258-byte match length
But it is still fundamentally a local parser. It makes strong local decisions, not deep global optimization.
That makes gzip better than a naive greedy parser, but still far from what xz is doing.
Zstd
zstd is the most interesting middle case.
In practice, zstd is an excellent compressor. But in this constrained experiment, its parser was often weaker than I expected.
My read is that this comes from two design choices:
- zstd is intentionally speed-conscious in parsing
- zstd’s native parser assumptions are not centered on a Deflate-like
32 KiBworld
There is also a concrete detail that matters here:
- the zstd parser configuration used in this experiment naturally works with
minMatch = 5
That means many short matches that gzip and xz can still exploit are simply not part of zstd’s normal search space here. This is an intentional speed/ratio tradeoff, not a bug.
Using minMatch = 5 cuts parser work down a lot because a 5-byte prefix is much more selective than a 3-byte or 4-byte prefix.
Fewer positions look like plausible matches, so zstd spends less time chasing weak short candidates that do not pay off very well in its native coding model.
There is also an important second point from the earlier gzip vs zstd work.
When I isolated the main zstd constraints at default level 3, almost all of the size loss came from shrinking zstd’s native window down to the shared Deflate-like window.
At level 3, zstd normally uses about a 2 MiB window.
In the constrained experiment, I forced that down to 32 KiB.
That single change explained almost all of the final size increase.
So zstd is being hit by two things in this setup:
- a much smaller window than it normally wants
- a parser tuned around
minMatch = 5rather than aggressive short-match capture
So zstd remains a very good real-world compressor, but this specific apple-to-apple setup is not especially kind to its parser.
Xz
xz is much more aggressive in parsing.
At a high level, it does something like this:
- estimate the cost of literals
- estimate the cost of normal matches
- estimate the cost of repeat-based matches
- simulate future states
- choose the path with the lowest estimated total cost
That is not perfect foresight. It does not know the exact final bitstream in advance.
But it is much closer to a real optimum parser than gzip’s local lazy search or zstd’s more speed-balanced strategies.
That is why xz still dominates even after removing much of its native large-window advantage.
A Concrete Example Of Better Parsing
I did not want the conclusion to rest only on summary tables, so I also looked at local regions where gzip and xz parsed the exact same bytes differently.
In one text-heavy region around byte offset 7,191,744 of the test input, the two parsers behaved like this over roughly 520 bytes:
- gzip parser:
500matched bytes20literal bytes74tokens
- xz parser:
517matched bytes3literal bytes67tokens
That is a clean picture of parser quality:
- xz turned more bytes into matches
- xz emitted fewer literal breaks
- xz did it with fewer total tokens
This is important because it is not a backend effect. The difference is already visible before the backend gets involved.
That is what I mean by “xz has a stronger parser.”
The Backend Matters Too
Now fix the parser and only vary the backend.
Every row still prefers xz backend:
| Fixed parser | gzip backend | zstd backend | xz backend |
|---|---|---|---|
| gzip parser | 2,502,165 | 2,499,651 | 2,411,412 |
| zstd parser | 2,920,678 | 2,883,543 | 2,725,860 |
| xz parser | 2,468,361 | 2,431,131 | 2,291,404 |
This tells me something equally important:
- even if another format handed xz its token stream, xz would still usually compress it smaller
So xz’s advantage is not just “find better matches.” It is also:
- “encode those matches better once I have them”
That is why I think it is fair to say xz is king of compression ratio, not just king of one particular component.
Timing Keeps The Story Honest
Of course, ratio is not everything. Compression time matters too.
On the same 10 MiB input, using median steady-state times for the same-family path:
| Family | Parser median | Backend median | Total median |
|---|---|---|---|
| gzip | 0.328415s | 0.206880s | 0.535295s |
| zstd | 0.148028s | 0.268817s | 0.416845s |
| xz | 3.802965s | 1.929538s | 5.732503s |
This is the tradeoff in one table:
- zstd was fastest overall
- gzip was also fast
- xz was dramatically slower, especially on the parser side
So the right way to read this post is not:
- “xz is best, therefore use xz for everything”
It is:
- “xz earns its ratio lead with genuinely stronger parsing and backend coding, and it pays for that with much more compute”
small ratio gain
moderate cost"] Z["zstd
best speed/ratio balance
in this experiment"] X["xz
best ratio
much higher compute cost"] G --- Z --- X
The Real Takeaway
This project changed my mental model in three ways.
1. Window size is extremely important, but not the whole story
Deflate’s small window really is a major handicap. But even after I equalized the window, parser quality still differed dramatically.
2. Parser quality is a first-class source of compression ratio
This sounds obvious in theory. In practice, I think many people still underestimate it.
The xz -> * results show that a stronger parser can dominate even under the same match-space rules.
3. Backend quality is also a first-class source of compression ratio
The * -> xz results show that some backends simply code the same IR better than others.
So if one format wins in practice, the useful question is often not:
Is it the parser or the backend?
It is:
How much is each side contributing?
This experiment gave me a concrete way to start answering that.
| Layer | What this experiment says |
|---|---|
| Window size | A huge deal, but not enough to explain everything |
| Parser | One of the biggest sources of compression-ratio difference |
| Backend | Also a major source of compression-ratio difference |
| Compression level | Only meaningful relative to the surrounding constraints |
So Why Is XZ The King?
If I had to compress the entire article down to one sentence, it would be this:
xz gets smaller files because it is better both at deciding what to encode and at deciding how to encode it.
That is the deepest thing I learned from the experiment.
It is not just:
- bigger window
- better entropy coding
- slower search
It is all of those ideas interacting.
But the experiment makes one point especially clear:
- xz’s advantage is not a single trick
- it is a system-level advantage
- and that advantage survives even after some of its most obvious native benefits are taken away
That is why I think xz is still the king of compression ratio.