Algorithm — Friend or Foe Link to heading
Consider n-people who strictly follow three rules:
I am a friend of myself
If I am a friend of you, you are also a friend of mine
If you are a friend of mine, then your friends are friends of mine
We let them chat with each other and develop friendship. As people develop friend-relations, they will naturally form teams of friends. During this team-forming process, we want to ask some questions
- how many “teams” have been formed?
- is A a friend of B?
- which “team” is A in?
At first, all individuals start off as individual teams, i.e., n-teams. As they form friendship, the teams will merge and union, like set-operations.
OK. This is my lousy attempt to describe “Union-Find” algorithm using the “friend” analogy. In algorithmic terms, the aforementioned three rules translate to
reflexivity — one is a friend of oneself
symmetry — if A is a friend of B, then B is a friend of A
transitivity — if A and B are friends and B and C are friends, then A and C are also friends
So, here is how the API might look like:
trait FriendOrFoe {
// declare p is now a friend of q
fn friend(&mut self, p: usize, q: usize);
// are p and q friends?
fn is_friend(&mut self, p: usize, q: usize) -> bool;
// returns a unique id for the team p belongs to
fn team_id(&mut self, p: usize) -> usize;
// # of disjoint teams
fn num_teams(&self) -> usize;
}
For now, let’s not worry about the possibility of de-friending. We want to find algorithms to efficiently implement the API above.
How should we go about implementing this? Well, given that we are dealing with teams, we can observe how an organization is typically structured in real-life — i.e., chain of commands. Each member of the team, except one, will have one manager and may have direct reports. The one who does not have a manager is the head of the team and represents the team ID. This is just like a typical org-chart of any company, which is a tree structure.
Let’s start with team_id(). To identify the team of a person, we follow the manager chain until we reach the head. For a well-balanced tree, this should be O(log(n)) but in general this is O(d) where d is the maximum depth of the tree.
What about is_friend() method? Well, we can do team_id() for each person and see if they are the same. Still O(d) as before.
How about friend() method? Again, we call team_id() to find the head of each person. If they are the same, then nothing needs to be done, since they are already friends. If not, however, we need to merge the two teams. That is, one of the team’s head must be a direct report of the other team’s head. To balance the tree, we better make sure the smaller team is merged into the larger team. This is a very important condition. I will go into why in a bit. The runtime is still O(d).
Finally, num_teams() can be tracked separately, starting off with n and decrementing whenever two teams merge. The runtime is O(1).
So, writing this in code would look something like this
struct FofImpl {
manager_id: Vec<usize>, // if equal to the index, then is the head of the team
org_size: Vec<usize>, // size of the org lead by this index
n: usize, // number of distinct teams
}
impl FofImpl {
fn new(n: usize) -> Self {
// n-teams each having a single member
Self {
manager_id: (0..n).collect(),
org_size: vec![1; n],
n,
}
}
}
impl FriendOrFoe for FofImpl {
fn friend(&mut self, p: usize, q: usize) {
let head1 = self.team_id(p);
let head2 = self.team_id(q);
if head1 == head2 {
return;
}
if self.org_size[head1] <= self.org_size[head2] {
self.manager_id[head1] = head2;
self.org_size[head2] += self.org_size[head1];
} else {
self.manager_id[head2] = head1;
self.org_size[head1] += self.org_size[head2];
}
self.n -= 1;
}
fn is_friend(&mut self, p: usize, q: usize) -> bool {
self.team_id(p) == self.team_id(q)
}
fn team_id(&mut self, p: usize) -> usize {
let mut id = p;
loop {
let manager = self.manager_id[id];
if manager == id {
break id;
}
id = manager;
}
}
fn num_teams(&self) -> usize {
self.n
}
}
Let’s now do the complexity analysis. We know that all the methods run in O(d), except for num_teams() which runs in O(1). What is this d? Can we prove that d <= log(n)? — as in any algorithm context, log here represents log2.
Consider some person, say p. When does the chain of commands depth increase by 1 for this person p ? That is when p’s team P is merged into another team Q. We imposed a condition that P is merged into Q if team size of P is smaller than that of Q. Hence, the depth increases by 1 only if p’s team grows in size by at least 2x. Therefore, the depth d is maxed at log(n), since n is the max size of any team. Lastly, this analysis applies to anyone in any team, so the maximum depth of any tree is capped at log(n).
This is already quite an efficient implementation, having O(log(n)) complexity, but we can actually do better than this. In fact, we can do almost linear time complexity!
You might have noticed something strange with the API — methods is_friend() and team_id() are non-constant methods, i.e., they take the first argument as &mut self rather than &self. This is because these methods can do extra work, modifying the struct, in order to make subsequent calls more efficient. In the team_id() method, we climb up the report chain to find the head. Next time it is called, we do the same. That is waste of computation. Instead, every time team_id() is called, we can flatten out our org chart — bring everyone along the search path to be a direct report of the head. This ensures that we do not need to repeat this computation later on. This is called the path compression technique. In Rust, we can implement this with composition.
struct FofPathCompressionImpl {
imp: FofImpl,
}
impl FofPathCompressionImpl {
fn new(n: usize) -> Self {
Self {
imp: FofImpl::new(n),
}
}
}
impl FriendOrFoe for FofPathCompressionImpl {
fn friend(&mut self, p: usize, q: usize) {
self.imp.friend(p, q)
}
fn is_friend(&mut self, p: usize, q: usize) -> bool {
self.imp.is_friend(p, q)
}
fn team_id(&mut self, p: usize) -> usize {
let manager = self.imp.manager_id[p];
if manager != p {
// flatten the report chain
self.imp.manager_id[p] = self.team_id(manager);
self.imp.manager_id[p]
} else {
p
}
}
fn num_teams(&self) -> usize {
self.imp.num_teams()
}
}
For anyone who wants to learn more about this, I highly recommend an excellent free course by Kevin Wayne and Robert Sedgewick on Coursera.