Rust— write gunzip from scratch 3 Link to heading
In this series, we will be writing gunzip decompressor from scratch in Rust. We want to write it ourselves not only to learn Rust but also to understand how it .gz compression works under the hood. For full source code, check out this Github repo. You can find all articles of the series below
- part 1:
main()function and skeletal structure - part 2:
bitreadmodule for reading bits from a byte stream - part 3:
gzipheader & footer for parsing the metadata and checksum - part 4:
inflatefor block type 0 (uncompressed) data - part 5:
codebookandhuffman_decodermodules for decoding Huffman codes - part 6:
lz77andsliding_windowmodules for decompressing LZ77-encoded data - part 7:
inflatefor block type 1 and 2 (compressed) data using fixed or dynamic Huffman codes - part 8:
checksum_writemodule for verifying the decompressed data - part 9: performance optimization
- part 10: multithread support
- part 11: streaming support
- part 12: memory optimization
- part 13: bug fix to reject non-compliant
.gzfile
Today, let’s write out skeletal control flow of the program.

RFC1952
A .gz file consists of one or more members. Each member consists of a header, one or more blocks, and a footer. With this in mind, we can fill in gunzip() method as below

// append to src/lib.rs
pub mod header; // todo
pub mod footer; // todo
use crate::error::Error;
use bitread::{BitReader, BitRead};
use header::Header; // todo
use footer::Footer; // todo
// replace
pub fn gunzip(read: impl Read, write: impl Write) -> Result<()> {
let mut reader = BitReader::new(read);
let mut member_idx = 0;
while reader.has_data_left()? {
Header::read(&mut reader)?;
member_idx += 1;
// TODO: read one or more blocks
let footer = Footer::read(&mut reader)?;
// TODO: do something with footer
}
if member_idx == 0 {
Err(Error::EmptyInput)
} else {
Ok(())
}
}
We continuously read and process a member until the input is empty. Here, Header and Footer modules will simply read a header and a footer section. We will deal with the blocks later.
Let’s first tackle Footer, which is the easiest. A footer consists of total 8-bytes, where the first 4-bytes is CRC32 checksum and the last 4-bytes is ISIZE, the size of the total decompressed data up to 32-bits.
// src/footer.rs
use crate::error::Result;
use std::io::Read;
pub struct Footer {
pub crc32: u32,
pub size: u32,
}
impl Footer {
pub fn read(mut read: impl Read) -> Result<Self> {
let mut buf = [0u8; 8];
read.read_exact(&mut buf)?;
let (a, b) = buf.split_at(4);
let crc32 = u32::from_le_bytes(a.try_into().unwrap());
let size = u32::from_le_bytes(b.try_into().unwrap());
Ok(Self { crc32, size })
}
}
Next is Header. This is a bit more complex than Footer but still relatively straightforward. One thing to note is that we may need to read null-terminated string, which requires read_until() method from BufRead trait. Since our BitReader struct implements a large buffer, we can easily achieve it. Let’s first define ReadUntil trait and implement it for BitReader.
// append to src/bitread.rs
pub trait ReadUntil {
fn read_until(&mut self, byte: u8, buf: &mut Vec<u8>) -> std::io::Result<usize>;
}
impl<R: Read> ReadUntil for BitReader<R> {
fn read_until(&mut self, byte: u8, buf: &mut Vec<u8>) -> std::io::Result<usize> {
self.byte_align();
let mut n = 0;
loop {
match self.buffer().iter().position(|x| *x == byte) {
Some(pos) => {
buf.extend_from_slice(&self.buffer()[..pos + 1]);
n += pos + 1;
self.begin += pos + 1;
return Ok(n);
}
None => {
buf.extend_from_slice(self.buffer());
n += self.buffer().len();
self.begin = self.cap;
if self.fill_buf()? == 0 {
return Ok(n);
}
}
}
}
}
}
We continue to search for the byte within our buffer. If not found, we refill the buffer and repeat. With read_until() method, we let’s go ahead and write Header struct according to RFC1952

RFC1952
// src/header.rs
use crate::bitread::ReadUntil;
use crate::error::{Result, Error}; // will add
use std::io::Read;
const ID1: u8 = 0x1f;
const ID2: u8 = 0x8b;
const DEFLATE: u8 = 8;
const FTEXT: u8 = 1;
const FHCRC: u8 = 2;
const FEXTRA: u8 = 4;
const FNAME: u8 = 8;
const FCOMMENT: u8 = 16;
pub struct Header {
pub header: [u8; 10],
pub extra_field: Option<Vec<u8>>,
pub name: Option<Vec<u8>>,
pub comment: Option<Vec<u8>>,
pub crc16: Option<u16>,
pub size: usize, // header size
}
impl Header {
pub fn read(mut reader: impl Read + ReadUntil) -> Result<Self> {
let mut buf = [0; 10];
reader.read_exact(&mut buf)?;
if buf[0..3] != [ID1, ID2, DEFLATE] || buf[3] & 0b11100000 != 0 {
return Err(Error::InvalidGzHeader);
}
let mut header = Self {
header: buf,
extra_field: None,
name: None,
comment: None,
crc16: None,
size: buf.len(),
};
if header.get_flg() & FEXTRA != 0 {
let mut buf = [0u8; 2];
reader.read_exact(&mut buf)?;
header.size += buf.len();
let n = u16::from_le_bytes(buf);
let mut buf = vec![0u8; n as usize];
reader.read_exact(&mut buf)?;
header.size += buf.len();
header.extra_field = Some(buf);
}
if header.get_flg() & FNAME != 0 {
let mut buf = Vec::new();
header.size += reader.read_until(0, &mut buf)?;
header.name = Some(buf);
}
if header.get_flg() & FCOMMENT != 0 {
let mut buf = Vec::new();
header.size += reader.read_until(0, &mut buf)?;
header.comment = Some(buf);
}
if header.get_flg() & FHCRC != 0 {
let mut buf = [0u8, 2];
reader.read_exact(&mut buf)?;
header.size += buf.len();
header.crc16 = Some(u16::from_le_bytes(buf));
}
Ok(header)
}
fn get_flg(&self) -> u8 {
self.header[3]
}
}
Notice how Header::read() requires Read and ReadUntil traits but not BitRead trait. Since the header consists of full-bytes, we do not need BitRead trait. It is a good practice to require minimal trait, which is Read + ReadUntil.
Unlike footer, header must check to see if it complies with the spec. If not, we return InvalidGzHeader. Let’s add this into our Error struct. While at it, let’s also add EmptyInput as well, which indicates .gz input is completely empty.
// replace src/error.rs
pub enum Error {
StdIoError(ErrorKind),
EmptyInput,
InvalidGzHeader,
}
OK, let’s see if our program compiles.
$ cargo build
error[E0277]: the trait bound `&mut BitReader<impl Read>`: ReadUntil is not satisfied
--> src/lib.rs:22:22
|
22 | Header::read(&mut reader)?;
| ------------ ^^^^^^^^^^^ the trait `ReadUntil` is not implemented for `&mut BitReader<impl Read>`
| |
| required by a bound introduced by this call
|
note: required by a bound in `Header::read`
--> src/header.rs:25:41
|
25 | pub fn read(mut reader: impl Read + ReadUntil) -> Result<Self> {
| ^^^^^^^^^ required by this bound in `Header::read`
help: consider removing the leading `&`-reference
|
22 | Header::read(reader)?;
The compiler is complaining that &mut reader does not satisfy ReadUntil. It also suggest that we feed in reader itself rather than &mut reader. We can’t do this because we have to keep using reader. We could modify the signature of Header::read() method to take in a reference, but then it is too limiting— sometimes a caller may actually want to pass the owned object itself.
Let’s first analyze what is happening here. The root cause is that BitReader itself implements ReadUntil but &mut BitReader does not. To resolve, we could explicitly implement ReadUntil trait for &mut BitReader, but there is a better, more generic approach— blanket implement ReadUntil for all &mut ReadUntil. This is exactly how Read is implemented for any &mut Read in the standard library. I went over in more depth in this article, so feel free to check it out.
// append to src/bitread.rs
impl<R: ReadUntil> ReadUntil for &mut R {
fn read_until(&mut self, byte: u8, buf: &mut Vec<u8>) -> std::io::Result<usize> {
(**self).read_until(byte, buf)
}
}
With the fix, the build is successful! In the next series, we will finally go into the actual meat, which is to process each block, so stay tuned!
Previous in series: https://towardsdev.com/rust-write-gunzip-from-scratch-2-31dcab43791
Next in series: https://towardsdev.com/rust-write-gunzip-from-scratch-4-d20f9da26f86