Rust — write gunzip from scratch part 13 Link to heading
In this series, we will be writing gunzip decompressor from scratch in Rust. We want to write it ourselves not only to learn Rust but also to understand how it .gz compression works under the hood. For full source code, check out this Github repo.
You can find all articles of the series below:
- part 1:
main()function and skeletal structure - part 2:
bitreadmodule for reading bits from a byte stream - part 3:
gzipheader & footer for parsing the metadata and checksum - part 4:
inflatefor block type 0 (uncompressed) data - part 5:
codebookandhuffman_decodermodules for decoding Huffman codes - part 6:
lz77andsliding_windowmodules for decompressing LZ77-encoded data - part 7:
inflatefor block type 1 and 2 (compressed) data using fixed or dynamic Huffman codes - part 8:
checksum_writemodule for verifying the decompressed data - part 9: performance optimization
- part 10: multithread support
- part 11: streaming support
- part 12: memory optimization
- part 13: bug fix to reject non-compliant
.gzfile

Let’s first briefly review the format of .gz file. As we have discussed here, a .gz file is a concatenation of one or more members. A member is the minimal unit of some compressed data. It consists of a header, one or more blocks, and a footer.

When .gz file consists of multiple members, each member should be completely independent. More specifically, each member’s LZ77 Dictionary block can’t rely on the data from its previous members. See here for a quick refresher.
In our implementation, we are using the same sliding window with all the past context from the previous members. Technically, this should not be a problem for any compliant .gz files, as our program will work just fine. However, our program will fail to reject a non-compliant .gz file that has data dependency between members. Other gunzip implementation, such as GNU gzip or zlib aborts when it detects a non-compliant .gz file, so we should do the same.
Fix Link to heading
The actual fix is very straightforward. We reset our sliding window every time a footer has been encountered, indicating an end of a member.
--- src/producer.rs before
+++ src/producer.rs after
@@ -84,6 +84,7 @@
State::Inflate(is_final) => self.inflate(is_final)?,
State::Footer => {
self.state = State::Header;
+ self.window = SlidingWindow::new(); // reset history
Produce::Footer(Footer::read(&mut self.reader)?)
}
};
We already have logic to detect if LZ77 distance is greater than the history length, so this one-liner is all we need.
Bonus Link to heading
We are technically done here, but we could go a bit further to improve our program while at it. One thing I noticed is that the program does not really print out the detailed error message, making it difficult to debug. Let’s fix this by explicitly printing out the error before we exit.
--- src/bin/gunzip.rs before
+++ src/bin/gunzip1.rs after
@@ -27,6 +27,11 @@
let mut writer = std::io::stdout().lock();
let mut decompressor = Decompressor::new(reader, multithread);
- std::io::copy(&mut decompressor, &mut writer)?;
- Ok(())
+ match std::io::copy(&mut decompressor, &mut writer) {
+ Ok(_) => Ok(()),
+ Err(e) => {
+ eprintln!("{:?}", e);
+ std::process::exit(-1);
+ }
+ }
}
Alright, this is all for now. If you find any other bug in the program, please don’t hesitate to leave a comment or directly submit a PR here.
previous in series: https://medium.com/@techhara/rust-write-gunzip-from-scratch-12-f29e26679884
**