Rust implementation challenge— hash table part 1 Link to heading

Let’s implement a hash table data structure in Rust. Hash tables are incredibly important in data structures due to their efficiency and versatility. By implementing it from scratch, we can gain insights into the underlying algorithms and data structures involved. Not only that, we will also have a chance to improve our Rust skills.

Speaking of algorithm, we are going to implement linear probing open addressing hash table. There are other variants, but I think this is probably the easiest to get started.

I will take the top-down approach, where we start with higher level abstractions and walk our way down to lower level abstractions and implementations. Let’s start with the top-most abstraction: API. We probably don’t want to support the entire API of the Rust standard library std::collections::HashMap just yet. For starters, let’s trim it down to the core functionalities.

use std::borrow::Borrow;
use std::hash::Hash;

pub struct HashMap<Key, Val> {
    // todo
}

impl<Key: Eq + Hash, Val> HashMap<Key, Val> {
    pub fn new() -> Self {
        todo!()
    }

    pub fn len(&self) -> usize {
        todo!()
    }

    pub fn insert(&mut self, key: Key, val: Val) -> Option<Val> {
        todo!()
    }

    pub fn get<Q>(&self, key: &Q) -> Option<&Val>
    where
        Key: Borrow<Q>,
        Q: Eq + Hash,
    {
        todo!()
    }

    pub fn get_mut<Q>(&mut self, key: &Q) -> Option<&mut Val>
    where
        Key: Borrow<Q>,
        Q: Eq + Hash,
    {
        todo!()
    }

    pub fn remove<Q>(&mut self, key: &Q) -> Option<Val>
    where
        Key: Borrow<Q>,
        Q: Eq + Hash,
    {
        todo!()
    }
}

If you are already familiar with hash table algorithm, feel free to go ahead and try to implement it yourself— Rust playground. By the way, if you are not familiar with Borrow trait appearing in the API, please refer to my previous post. These methods have the exact same signature as provided in the standard HashMap library. In a nutshell, the Borrow API allows us to, for example, supply either &str or &String to a hash table of key type String.

Hash table is basically an array of items whose index is the key’s hash value modulo size of the array. We will naturally use Vec<T> for this array, but what is this type T? Let’s call it Entry and think about what Entry should capture. We know that Entry can be either empty or occupied. When it is occupied, it should contain both Key and Val. In order to support remove() method, we need a third state: Tombstone. This state represents an entry that was once occupied but currently empty. We will get to why we need such distinction later, but for now, here is our helper struct:

enum Entry<Key, Val> {
    Vacant,
    Tombstone,
    Occupied { key: Key, val: Val },
}

We need two usize fields for the hash table— one for keeping track of occupied entries, and another for keeping track of vacant entries. The first field is what we return for len() method. The second field lets us decide when to resize our array. With that, we can add some implementation details to our skeleton:

--- before
+++ after
@@ -1,17 +1,29 @@
 use std::borrow::Borrow;
 use std::hash::Hash;
 
+enum Entry<Key, Val> {
+    Vacant,
+    Tombstone,
+    Occupied { key: Key, val: Val },
+}
+
 pub struct HashMap<Key, Val> {
-    // todo
+    xs: Vec<Entry<Key, Val>>,
+    n_occupied: usize,
+    n_vacant: usize,
 }
 
 impl<Key: Eq + Hash, Val> HashMap<Key, Val> {
     pub fn new() -> Self {
-        todo!()
+        Self {
+            xs: Vec::new(),
+            n_occupied: 0,
+            n_vacant: 0,
+        }
     }
 
     pub fn len(&self) -> usize {
-        todo!()
+        self.n_occupied
     }

Now it’s time to write some helper method— we need a method that evaluates the hash value from the key and apply modulo array size to get the index.

--- before
+++ after
@@ -1,5 +1,6 @@
 use std::borrow::Borrow;
-use std::hash::Hash;
+use std::collections::hash_map::DefaultHasher;
+use std::hash::{Hash, Hasher};
 
 enum Entry<Key, Val> {
     Vacant,
@@ -14,6 +15,16 @@
 }
 
 impl<Key: Eq + Hash, Val> HashMap<Key, Val> {
+    fn get_index<Q>(&self, key: &Q) -> usize
+    where
+        Key: Borrow<Q>,
+        Q: Eq + Hash,
+    {
+        let mut hasher = DefaultHasher::new();
+        key.hash(&mut hasher);
+        hasher.finish() as usize % self.xs.len()
+    }
+
     pub fn new() -> Self {
         Self {
             xs: Vec::new(),

With this, now let’s implement get() method. All we need to do is to iterate through the entries starting at index and compare keys, until the match is found, or until we encounter a vacant entry. During the search, we simply ignore and skip tombstone entry.

--- before
+++ after
     pub fn get<Q>(&self, key: &Q) -> Option<&Val>
     where
         Key: Borrow<Q>,
         Q: Eq + Hash,
     {
-        todo!()
+        if self.len() == 0 {
+            return None;
+        }
+        let mut idx = self.get_index(key);
+        loop {
+            match &self.xs[idx] {
+                Entry::Vacant => {
+                    break None;
+                }
+                Entry::Occupied { key: k, val } if k.borrow() == key => {
+                    break Some(val);
+                }
+                _ => {
+                    idx = (idx + 1) % self.xs.len();
+                }
+            }
+        }
     }
 
     pub fn get_mut<Q>(&mut self, key: &Q) -> Option<&mut Val>

OK, it was not too bad. Now, can we do the same with get_mut() method? If you simply update immutables to mutables, you will get a few compiler errors (Rust playground). The fix requires a bit more involved change. For that, I will continue in the next story, so stay tuned!