Why XZ Is Still the King of Compression Ratio

When people talk about practical compression formats, the usual tradeoff story goes something like this:

gzip is old, simple, and fast enough
zstd is the modern sweet spot
xz is slow, but gives the smallest files

That story is broadly true.

But it leaves out the most interesting question:

Why does xz get files so much smaller than the others?

It is tempting to answer with a vague statement like:

“xz just compresses harder”

That is not wrong, but it is not very satisfying.

I wanted a more technical answer.

This post is about an experiment I built to figure out where xz’s advantage really comes from. Is it mostly because xz finds better matches? Or because it encodes the same matches more efficiently? Or both?

The short answer is:

xz has the strongest parser in this experiment
xz also has the strongest backend
xz wins on both sides

That is why it remains the king of compression ratio.

At A Glance

Question	Short answer
Why is xz so small?	Because it wins at both parsing and backend coding
What was the biggest surprise?	Even after forcing xz into a Deflate-like `32 KiB` window, its parser was still much stronger
What was the main lesson?	Window size matters a lot, but parser quality matters far more than I expected
What was still fastest overall?	`zstd`

Not everyone reading this needs to be deep into compression internals, so it is worth starting with the common ground.

All three families in this experiment:

gzip / Deflate
zstd
xz / LZMA2

are built on the same broad idea: LZ77-style matching.

The basic concept is simple:

sometimes emit a literal byte
sometimes say “copy length bytes from distance bytes ago”

That is the core LZ77 model.

So even though these formats feel very different in practice, they all spend a large part of their work doing some variant of:

scan the input
find repeated substrings
turn the input into a stream of literals and matches

That common structure is why I could build a shared experiment.

I used a generic intermediate representation, or IR, that stores:

literal runs
matches of the form (length, distance)

At a high level, that IR is just a format-neutral LZ77 token stream.

That gave me a common language all three compressors could share.

Parser vs Backend

These are not standard end-user terms, so before going further, I should define what I mean.

When I say parser, I mean:

the part of the compressor that looks at the input bytes
finds repeated substrings
decides where to emit literals versus matches
produces an LZ77-style token stream

In plain English, the parser answers:

What repeated structure is in this file, and how should I describe it?

When I say backend, I mean:

the part of the compressor that takes that token stream
turns it into the final compressed bitstream
decides how efficiently literals, lengths, distances, repeats, and related symbols are encoded

In plain English, the backend answers:

Given this token stream, how do I encode it as compactly as possible?

So once I had that shared IR, I could split each compressor into two conceptual halves:

a parser
- input bytes -> generic LZ77-style IR
a backend
- generic LZ77-style IR -> final compressed bytes

That distinction is the whole point of the post.

When people say one compressor is better than another, they usually mean the whole package. But that whole package actually contains two different sources of strength:

how good it is at finding and arranging matches
how good it is at encoding those matches once it has them

I wanted to measure those separately.

The Experiment

Once each family had a parser and a backend, I could build a 3 x 3 matrix:

Parser	gzip backend	zstd backend	xz backend
gzip parser	gzip -> gzip	gzip -> zstd	gzip -> xz
zstd parser	zstd -> gzip	zstd -> zstd	zstd -> xz
xz parser	xz -> gzip	xz -> zstd	xz -> xz

That matrix lets me ask two very clean questions:

If I keep the parser fixed, which backend is best?
If I keep the backend fixed, which parser is best?

flowchart LR GP["gzip parser"] --> GB["gzip backend"] GP --> ZB["zstd backend"] GP --> XB["xz backend"] ZP["zstd parser"] --> GB ZP --> ZB ZP --> XB XP["xz parser"] --> GB XP --> ZB XP --> XB class XP,XB best;

The Important Constraint

This was not a comparison of fully native, unconstrained gzip vs zstd vs xz.

To make the cross-family swaps possible, I forced all three into a shared Deflate-like envelope:

window size: about 32 KiB
maximum match length: 258
maximum replay-safe match distance for the current gzip backend: 32506

That matters a lot.

It means this experiment is not asking:

which stock compressor wins with all of its native advantages?

Instead it is asking:

under roughly the same match-space constraints, which parser is better?
under the same token stream, which backend is better?

That is a much sharper question.

The Result: XZ Wins Everywhere

On a 10 MiB prefix of linux.tar, the default-level matrix produced:

Parser	gzip backend	zstd backend	xz backend
gzip	2,502,165	2,499,651	2,411,412
zstd	2,920,678	2,883,543	2,725,860
xz	2,468,361	2,431,131	2,291,404

This table is the whole story in one glance.

Two facts stand out immediately:

xz parser wins every backend column.
xz backend wins every parser row.

That means xz does not win for just one reason. It wins for two reasons at once:

it produces a better LZ77 token stream
it also encodes that token stream better

This was the central conclusion of the whole project.

Why That Result Is So Interesting

The xz backend result is impressive, but not especially shocking.

LZMA-style backend coding is rich and expensive. Most people already expect xz to be strong once it has a good parse.

The surprising part was the parser result.

I expected xz to lose much more of its edge after I cut it down to a Deflate-like 32 KiB window. It did not.

That means xz’s ratio advantage is not just “it has a huge dictionary.”

Large window helps, absolutely. But parser quality is also doing a huge amount of work.

What The Experiment Says About Each Family

Gzip

At its strongest settings, gzip is not weak. It searches hard:

long chain search
lazy matching
full 258-byte match length

But it is still fundamentally a local parser. It makes strong local decisions, not deep global optimization.

That makes gzip better than a naive greedy parser, but still far from what xz is doing.

Zstd

zstd is the most interesting middle case.

In practice, zstd is an excellent compressor. But in this constrained experiment, its parser was often weaker than I expected.

My read is that this comes from two design choices:

zstd is intentionally speed-conscious in parsing
zstd’s native parser assumptions are not centered on a Deflate-like 32 KiB world

There is also a concrete detail that matters here:

the zstd parser configuration used in this experiment naturally works with minMatch = 5

That means many short matches that gzip and xz can still exploit are simply not part of zstd’s normal search space here. This is an intentional speed/ratio tradeoff, not a bug.

Using minMatch = 5 cuts parser work down a lot because a 5-byte prefix is much more selective than a 3-byte or 4-byte prefix. Fewer positions look like plausible matches, so zstd spends less time chasing weak short candidates that do not pay off very well in its native coding model.

There is also an important second point from the earlier gzip vs zstd work. When I isolated the main zstd constraints at default level 3, almost all of the size loss came from shrinking zstd’s native window down to the shared Deflate-like window.

At level 3, zstd normally uses about a 2 MiB window. In the constrained experiment, I forced that down to 32 KiB. That single change explained almost all of the final size increase.

So zstd is being hit by two things in this setup:

a much smaller window than it normally wants
a parser tuned around minMatch = 5 rather than aggressive short-match capture

So zstd remains a very good real-world compressor, but this specific apple-to-apple setup is not especially kind to its parser.

Xz

xz is much more aggressive in parsing.

At a high level, it does something like this:

estimate the cost of literals
estimate the cost of normal matches
estimate the cost of repeat-based matches
simulate future states
choose the path with the lowest estimated total cost

That is not perfect foresight. It does not know the exact final bitstream in advance.

But it is much closer to a real optimum parser than gzip’s local lazy search or zstd’s more speed-balanced strategies.

That is why xz still dominates even after removing much of its native large-window advantage.

A Concrete Example Of Better Parsing

I did not want the conclusion to rest only on summary tables, so I also looked at local regions where gzip and xz parsed the exact same bytes differently.

In one text-heavy region around byte offset 7,191,744 of the test input, the two parsers behaved like this over roughly 520 bytes:

gzip parser:
- 500 matched bytes
- 20 literal bytes
- 74 tokens
xz parser:
- 517 matched bytes
- 3 literal bytes
- 67 tokens

That is a clean picture of parser quality:

xz turned more bytes into matches
xz emitted fewer literal breaks
xz did it with fewer total tokens

This is important because it is not a backend effect. The difference is already visible before the backend gets involved.

That is what I mean by “xz has a stronger parser.”

The Backend Matters Too

Now fix the parser and only vary the backend.

Every row still prefers xz backend:

Fixed parser	gzip backend	zstd backend	xz backend
gzip parser	2,502,165	2,499,651	2,411,412
zstd parser	2,920,678	2,883,543	2,725,860
xz parser	2,468,361	2,431,131	2,291,404

This tells me something equally important:

even if another format handed xz its token stream, xz would still usually compress it smaller

So xz’s advantage is not just “find better matches.” It is also:

“encode those matches better once I have them”

That is why I think it is fair to say xz is king of compression ratio, not just king of one particular component.

Timing Keeps The Story Honest

Of course, ratio is not everything. Compression time matters too.

On the same 10 MiB input, using median steady-state times for the same-family path:

Family	Parser median	Backend median	Total median
gzip	0.328415s	0.206880s	0.535295s
zstd	0.148028s	0.268817s	0.416845s
xz	3.802965s	1.929538s	5.732503s

This is the tradeoff in one table:

zstd was fastest overall
gzip was also fast
xz was dramatically slower, especially on the parser side

So the right way to read this post is not:

“xz is best, therefore use xz for everything”

It is:

“xz earns its ratio lead with genuinely stronger parsing and backend coding, and it pays for that with much more compute”

flowchart LR G["gzip
small ratio gain
moderate cost"] Z["zstd
best speed/ratio balance
in this experiment"] X["xz
best ratio
much higher compute cost"] G --- Z --- X

The Real Takeaway

This project changed my mental model in three ways.

1. Window size is extremely important, but not the whole story

Deflate’s small window really is a major handicap. But even after I equalized the window, parser quality still differed dramatically.

2. Parser quality is a first-class source of compression ratio

This sounds obvious in theory. In practice, I think many people still underestimate it.

The xz -> * results show that a stronger parser can dominate even under the same match-space rules.

3. Backend quality is also a first-class source of compression ratio

The * -> xz results show that some backends simply code the same IR better than others.

So if one format wins in practice, the useful question is often not:

Is it the parser or the backend?

It is:

How much is each side contributing?

This experiment gave me a concrete way to start answering that.

Layer	What this experiment says
Window size	A huge deal, but not enough to explain everything
Parser	One of the biggest sources of compression-ratio difference
Backend	Also a major source of compression-ratio difference
Compression level	Only meaningful relative to the surrounding constraints

So Why Is XZ The King?

If I had to compress the entire article down to one sentence, it would be this:

xz gets smaller files because it is better both at deciding what to encode and at deciding how to encode it.

That is the deepest thing I learned from the experiment.

It is not just:

bigger window
better entropy coding
slower search

It is all of those ideas interacting.

But the experiment makes one point especially clear:

xz’s advantage is not a single trick
it is a system-level advantage
and that advantage survives even after some of its most obvious native benefits are taken away

That is why I think xz is still the king of compression ratio.