Rust — write gunzip from scratch 12 Link to heading

In this series, we will be writing gunzip decompressor from scratch in Rust. We want to write it ourselves not only to learn Rust but also to understand how it .gz compression works under the hood. For full source code, check out this Github repo. You can find all articles of the series below

  • part 1: main() function and skeletal structure
  • part 2: bitread module for reading bits from a byte stream
  • part 3: gzip header & footer for parsing the metadata and checksum
  • part 4: inflate for block type 0 (uncompressed) data
  • part 5: codebook and huffman_decoder modules for decoding Huffman codes
  • part 6: lz77 and sliding_window modules for decompressing LZ77-encoded data
  • part 7: inflate for block type 1 and 2 (compressed) data using fixed or dynamic Huffman codes
  • part 8: checksum_write module for verifying the decompressed data
  • part 9: performance optimization
  • part 10: multithread support
  • part 11: streaming support
  • part 12: memory optimization
  • part 13: bug fix to reject non-compliant .gz file

I thought I was done with the series as of part 11, but I figured I could continue to improve it. I won’t go into every detail anymore; instead, I will go over some of the major update/improvement to the program. Today, we will optimize our implementation to reduce its memory footprint.

Profile memory usage Link to heading

First, let’s measure how much memory the program consumes. A simple proxy is to use GNU time command to output maximum resident set size (RSS)

# time is bash built-in command, so we have to invoke GNU time by its full path
$ /usr/bin/time -v target/release/gunzip < linux.tar.gz > linux.tar
Command being timed: "target/release/gunzip"
User time (seconds): 2.11
System time (seconds): 0.40
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:02.52
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 10072
...

Our program is consuming about 10MB, which seems OK at first. Let’s compare with a reference, say GNU Gzip

$ /usr/bin/time -v gunzip < /mnt/ramdisk/linux.tar.gz > /mnt/ramdisk/linux.tar
Command being timed: "gunzip"
User time (seconds): 4.61
System time (seconds): 0.26
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:04.88
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1816
...

Though GNU Gzip is much slower than ours, its memory footprint is 5x less than ours at 1.8MB. If GNU Gzip can do this with less than 2MB, we’d better do something to reduce our program’s memory footprint to a comparable size.

Locate the issue Link to heading

Now, we need to figure out why the program is consuming so much memory. The best tool for this is valgrind. Let’s install the packages we will need today

$ sudo apt install -y valgrind massif-visualizer

We will use massif to record time evolution of memory footprint as the program is executing. Then we will use massif-visualizer to locate which part of the program is consuming the most memory

# run valgrind to generate massif data file
$ valgrind --tool=massif target/release/gunzip < linux.tar.gz > linux.tar
==20788== Massif, a heap profiler
==20788== Copyright (C) 2003-2017, and GNU GPL'd, by Nicholas Nethercote
==20788== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==20788== Command: target/release/gunzip
==20788== 

==20788== 


# run massif-visualizer to easily navigate the data
$ massif-visualizer massif.out.20788

massif-visualizer massif-visualizer

On the left, we have time-evolution of memory heap consumption, where the x-axis is time in unit of # instructions and y-axis is memory heap size. On the right, we see each snapshot at different time instances. Expanding the snapshot at the peak, we see 4MB of memory from producer.rs at line 117.

peak memory culprit peak memory culprit

Exactly at line 117, we indeed see buf variable, which is accumulating data until the inflate block is complete. Deflate specification does not limit how large a compressed block (type 1 or type 2) can be, so this buf variable can grow as large as the entire uncompressed file size—you can imagine a file that consists of only a single compressed block.

Fix the issue Link to heading

Now that we know precisely where the problem is, it is time to fix the issue. An obvious fix is to return partial block data each time the window is full, rather than accumulating. The fix, along with other minor code changes, can be found here. I won’t go into the code changes in detail here, as that is not the main focus of this article, but feel free to check out the patch. Below shows the inflate() method after the fix; we no longer accumulate the data.

inflate() method after the fix inflate() method after the fix

Evaluate the fix Link to heading

Let’s re-compile and run the same valgrind-massif evaluation. Below shows the new report after the fix has been applied

massif-visualizer after the fix massif-visualizer after the fix

Hooray! The total heap memory consumption is consistently staying below 300kB for the entire run. This is a massive difference compared to the previous report. Now, let’s measure maximum RSS

$ /usr/bin/time -v target/release/gunzip < linux.tar.gz > linux.tar
Command being timed: "target/release/gunzip"
User time (seconds): 2.17
System time (seconds): 0.35
Percent of CPU this job got: 100%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:02.52
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 2616
...

It has reduced to 2.6MB, again a massive reduction in memory footprint without any impact on the runtime. Now, I guess we can officially claim that our gunzip program is not only performant (runtime) but also memory-efficient!

Previous in series: https://medium.com/@techhara/rust-write-gunzip-from-scratch-11-1b4e7cbcc0ef

Next in series: https://medium.com/@techhara/rust-write-gunzip-from-scratch-part-13-57509b73ad1a