Rust implementation challenge — hash table part 2

Let’s continue from where we left off in the previous story. We want to implement get_mut() method. This should work the same as get() method but compiler won’t let us simply update immutable to mutable variants. The solution is to loop through entries via an iterator, rather than old-school index counting. Since we need to start at a given index and cycle through the entire array, ending at index — 1, this itself is a bit tricky but can be done using Iterator::split_at_mut() method. See my previous post for more details. With that, we can finally implement get_mut() method

--- before
+++ after
@@ -15,6 +15,11 @@ pub struct HashMap<Key, Val> {
 }
 
 impl<Key: Eq + Hash, Val> HashMap<Key, Val> {
+    fn iter_mut_starting_at(&mut self, idx: usize) -> impl Iterator<Item = &mut Entry<Key, Val>> {
+        let (s1, s2) = self.xs.split_at_mut(idx);
+        s2.iter_mut().chain(s1.iter_mut())
+    }
+
     fn get_index<Q>(&self, key: &Q) -> usize
     where
         Key: Borrow<Q>,
@@ -70,7 +75,22 @@ impl<Key: Eq + Hash, Val> HashMap<Key, Val> {
         Key: Borrow<Q>,
         Q: Eq + Hash,
     {
-        todo!()
+        if self.len() == 0 {
+            return None;
+        }
+        let idx = self.get_index(key);
+        for entry in self.iter_mut_starting_at(idx) {
+            match entry {
+                Entry::Vacant => {
+                    return None;
+                }
+                Entry::Occupied { key: k, val } if (k as &Key).borrow() == key => {
+                    return Some(val);
+                }
+                _ => {}
+            }
+        }
+        panic!("fatal: unreachable");
     }
 
     pub fn remove<Q>(&mut self, key: &Q) -> Option<Val>

We should never reach the final line of get_mut() method by design. The iteration should always terminate. We will guarantee this by resizing our array during insert() method later.

We are left with just two methods: insert() and remove(). Now is time to discuss the role of Tombstone variant of our Entry enum. Our hash-collision resolution is to start at index and probe the chain of occupied entries until either we find a matching key or until we reach a vacant one, marking the end of the chain of hash-collision keys. If we were to find the matching key in the middle of the chain and remove it by marking it as Vacant, then are severing the chain into two parts. Next time we search through the same chain, we won’t be able to search the second half of the chain, as we will reach the Vacant entry in the middle.

This is why we can’t simply remove an entry and mark it as Vacant. The Tombstone variant lets us know that there is nothing there but we still need to continue probing. With that, we can implement remove() method — if we find an entry with the matching key, we mark it Tomtsbone and decrement our counter. Otherwise, it is a no-op. This sounds simple, but its implementation is a bit tricky in Rust. Let’s first implement a helper method that takes a value from an Entry.

--- before
+++ after
@@ -1,6 +1,7 @@
 use std::borrow::Borrow;
 use std::collections::hash_map::DefaultHasher;
 use std::hash::{Hash, Hasher};
+use std::mem::swap;
 
 enum Entry<Key, Val> {
     Vacant,
@@ -8,6 +9,23 @@ enum Entry<Key, Val> {
     Occupied { key: Key, val: Val },
 }
 
+impl<Key, Val> Entry<Key, Val> {
+    fn take(&mut self) -> Option<Val> {
+        match self {
+            Self::Occupied { key, val } => {
+                let mut occupied = Self::Tombstone;
+                swap(self, &mut occupied);
+                if let Self::Occupied { key, val } = occupied {
+                    Some(val)
+                } else {
+                    panic!("fatal: unreachable");
+                }
+            }
+            _ => None,
+        }
+    }
+}
+

With this, we can implement remove() method.

--- before
+++ after
     pub fn remove<Q>(&mut self, key: &Q) -> Option<Val>
     where
         Key: Borrow<Q>,
         Q: Eq + Hash,
     {
-        todo!()
+        if self.len() == 0 {
+            return None;
+        }
+        let idx = self.get_index(key);
+        let mut result = None;
+        for entry in self.iter_mut_starting_at(idx) {
+            match entry {
+                Entry::Occupied { key: k, .. } if (k as &Key).borrow() == key => {
+                    result = entry.take();
+                    break;
+                }
+                Entry::Vacant => {
+                    result = None;
+                    break;
+                }
+                _ => {}
+            }
+        }
+        result.map(|val| {
+            self.n_occupied -= 1;
+            val
+        })
     }
 }

We are left with one final method: insert(). In high level, this method first examines the load factor and resize the array if necessary. After all this, it will insert the key, value pair into the table. Let’s create a helper method insert_helper() that does the insertion part, without checking the load factor for resizing. We also need methods for calculating the load factor and resizing.

--- before
+++ after
@@ -33,6 +33,18 @@ pub struct HashMap<Key, Val> {
 }
 
 impl<Key: Eq + Hash, Val> HashMap<Key, Val> {
+    fn load_factor(&self) ->  f64 {
+        todo!()
+    }
+    
+    fn resize(&mut self) {
+        todo!()
+    }
+
+    fn insert_helper(&mut self, key: Key, val: Val) -> Option<Val> {
+        todo!()
+    }
+
     fn iter_mut_starting_at(&mut self, idx: usize) -> impl Iterator<Item = &mut Entry<Key, Val>> {
         let (s1, s2) = self.xs.split_at_mut(idx);
         s2.iter_mut().chain(s1.iter_mut())
@@ -61,7 +73,11 @@ impl<Key: Eq + Hash, Val> HashMap<Key, Val> {
     }
 
     pub fn insert(&mut self, key: Key, val: Val) -> Option<Val> {
-        todo!()
+        if self.load_factor() >= 0.75 {
+            self.resize();
+        }
+
+        self.insert_helper(key, val)
     }
 
     pub fn get<Q>(&self, key: &Q) -> Option<&Val>

The load_factor() method is easy, but there is an edge case — since we initialized hash table with an empty array, we want to take care of this case explicitly. As for resize() method, we want to double the size of the array, unless the current size is just too small. The easiest implementation is to create a new instance of a hash table, and simply insert every element from the current hash table, and finally swap the two.

--- before
+++ after
@@ -1,7 +1,8 @@
 use std::borrow::Borrow;
+use std::cmp::max;
 use std::collections::hash_map::DefaultHasher;
 use std::hash::{Hash, Hasher};
-use std::mem::swap;
+use std::mem::{swap, take};
 
 enum Entry<Key, Val> {
     Vacant,
@@ -34,11 +35,27 @@ pub struct HashMap<Key, Val> {
 
 impl<Key: Eq + Hash, Val> HashMap<Key, Val> {
     fn load_factor(&self) ->  f64 {
-        todo!()
+        if self.xs.is_empty() {
+            1.0
+        } else {
+            1.0 - self.n_vacant as f64 / self.xs.len() as f64
+        }
     }
     
     fn resize(&mut self) {
-        todo!()
+        let new_size = max(64, self.xs.len() * 2);
+        let mut new_table = Self {
+            xs: (0..new_size).map(|_| Entry::Vacant).collect(),
+            n_occupied: 0,
+            n_vacant: new_size,
+        };
+        for entry in take(&mut self.xs) {
+            if let Entry::Occupied { key, val } = entry {
+                new_table.insert_helper(key, val);
+            }
+        }
+
+        swap(self, &mut new_table);
     }
 
     fn insert_helper(&mut self, key: Key, val: Val) -> Option<Val> {

OK, we are left with insert_helper() method. Conceptually, this is not too difficult. We probe the entries and look for the matching key. If found, overwrite the value. If not, insert to the vacant entry. Don’t forget to update the counters accordingly.

--- before
+++ after
@@ -11,6 +11,16 @@ enum Entry<Key, Val> {
 }
 
 impl<Key, Val> Entry<Key, Val> {
+    fn replace(&mut self, mut x: Val) -> Option<Val> {
+        match self {
+            Self::Occupied { key, val } => {
+                swap(&mut x, val);
+                Some(x)
+            }
+            _ => None,
+        }
+    }
+    
     fn take(&mut self) -> Option<Val> {
         match self {
             Self::Occupied { key, val } => {
@@ -59,7 +69,26 @@ impl<Key: Eq + Hash, Val> HashMap<Key, Val> {
     }
 
     fn insert_helper(&mut self, key: Key, val: Val) -> Option<Val> {
-        todo!()
+        let idx = self.get_index(&key);
+        let mut result = None;
+        for entry in self.iter_mut_starting_at(idx) {
+            match entry {
+                Entry::Occupied { key: k, .. } if (k as &Key).borrow() == &key => {
+                    result = entry.replace(val);
+                    break;
+                }
+                Entry::Vacant => {
+                    *entry = Entry::Occupied { key, val };
+                    break;
+                }
+                _ => {}
+            }
+        }
+        if result.is_none() {
+            self.n_occupied += 1;
+            self.n_vacant -= 1;
+        }
+        result
     }
 
     fn iter_mut_starting_at(&mut self, idx: usize) -> impl Iterator<Item = &mut Entry<Key, Val>> {

That’s it! We are now complete with our implementation of a hash table in Rust. Here is the full code on Rust playground. It was a long journey, but we learned a lot.

In the next story, I will write some test and benchmark functions to measure how fast or slow our implementation is compared to the standard library std::collections::HashMap.