# Disjoint Sets using union by rank and path compression Graph Algorithm

On January 14, 2020 by Raul Dinwiddie

Hello friends my name is Tushar and today

I’m going to talk about disjoint sets using union by rank and path compression. So

what is disjoint sets? Disjoint sets is a data structure which supports 3 operations

makeSet,union and findSet. makeSet is an operation to create a set with only

1 element in there. Union is an operation to take 2 different sets and

merge them into 1 set and findSet is an operation to return an identity of

a set which is usually an element in a set which acts as a representative of

that set. So what are the use cases of disjoint sets.Kruskal’s algorithm uses

disjoint set. We also use disjoint sets for finding cycle in an undirected graph.

There are few implementations of disjoint sets. The most popular one is

the one which uses union by rank and path compression to run this operations

efficiently. So in the next section let’s understand these operations and let’s

understand the algorithm which uses union by rank and path compression. Let’s

understand disjoint set operations with an example here. Suppose I have 7

different elements and all of them are in their own set so basically we have

called makeSet on each individual elements. So first operation I do is

union 1 and 2. So it means that 1 and 2 will be in

a set of its own and let’s say 1 is a representative of this set so basically

saying findSet 1 will return 1 and findSet 2 will also return 1.Next

we’ll say union of 2 and 3. So right now this is in one set and this

is in another set, so we merge these two sets together so this becomes something

like this and let’s say 1 continues to be the representative of this set so

findSet 1, findSet 2 or findSet 3, all of them will return 1. Next we’re

saying union of 4 and 5. So here 4 and 5 are in one set and let’s say

4 is the representative of this set. Then we are saying union of 6 and 7. So this is in a set of its

own and let’s say 6 is the representative of this set so findSet 6 and findSet,

findSet 7 will all return 6. Then we’re saying union of 5 and 6. So

merge the 2 sets in which,one of which is 5 and one of which is 6 so

basically we’ll merge these two sets together and this will become something

like this and we decide who will be the representative. So let’s say 4

continues to be the representative of this merged set. So findSet 4,findSet 5

findSet 6 and findSet 7 all of them will return 4 because 4 is the

representative of this combined set. Then we are saying merge sets 3 and 7. So

3 in this set,4 is in this set so we merge them together so this will

become something like this. So at this point all the elements are in the same

set and now we can decide who will be the representative of this set. So let’s say 4

continues to be the representative of this set. So findSet 1, findSet 2,

findSet 3,4,5,6 and 7 all of them should return 4 because 4 is the

representative of this combined set. So hopefully this helps you understand what

makeSet,union and findSet means for disjoint set. In the next section let’s

look at the implementation detail. So we’re going to represent our disjoint sets

using tree. Every node of the tree will have 3 three properties- rank,data and

parent. Rank is going to store their approximate depth of the tree, data is

going to store the actual data elements and parent is going to be the pointer from

child to parent. So let’s try again the same example. So first we call makeSet

on all the elements,7 elements we have. So it results in 7 different sets,all

the roots of the set are pointing to itself so the parent is pointing to itself, all

the ranks are 0 and the data is 1234567. At this point this nodes are not

connected to each other. So first we do is union of 1,2. So we come here and we’re saying

combine 1 and 2 into 1 set. So what we do is we check the rank of 1 and 2. The rank of

1 and 2 is same. So in this,in this case it doesn’t matter who becomes parent of whom.

So what union by rank says that make the guy who has higher rank the parent

and the guy who has lower rank the child. The reason behind that being that the guy,

it will keep the depth of the tree as minimum as possible which will improve

our time. So in this case they are same so it

doesn’t matter. So let’s make 1 as their,as the root

and 2 as a child so this is this and the parent of 1 continues to point itself.

Since if we combine 2 different,2 different sets of same rank,the new,the

new root will have a 1 rank higher. So this will become 0 plus 1 so this

becomes 1 and this is 0. Rank of non-root node doesn’t matter. Then,

then next we are saying union of 2,3. So we start from 2. We go to,

we find which set 2 belongs to so we follow the parent pointer so we reach 1 and

then we follow it’s parent pointer so that’s also 1. So 1 is the root of this

tree and 3 is the root of this tree.We check which has a higher rank, so union by rank says

that the one with the higher rank becomes a parent and other becomes

child. So 1 should be the parent and 3 should be the child. So this becomes

something like this. 1,2 this is this way and then 3’s parent is also 1

and 1’s parent is itself and then the rank for 1 does not change. So

rank for 1 continues to be 1. Next we’re saying union of 4,5. So again their

ranks are same so it doesn’t matter who becomes a parent so let’s convert that

into a tree,into,let’s combine them together. So rank here is 1,rank here is 0 and

this guy is pointing to itself. Then union of 6,7. So again by Union

by rank it doesn’t matter who becomes parent of whom. So let’s make 6 as a parent.

7 point in this direction and rank here is 0 and rank here is 1. Then we’re saying

union of 5,6. So at this point we go to the 5’s,we find the

representative of this set which is 4 so we follow the parent pointer so we

reach this guy and this pointer is pointing to itself so it means this is the

representative of this set and this is the representative of this set. Then we check who

has the higher,who has the higher rank. Again they have same rank so we merge them

together and it doesn’t matter who is the parent but let’s keep 4 as a

parent so this becomes something like this. 4,5,6,7 and since we merge two nodes,two sets

of same rank then the new rank will be 1 greater. So this is 2,this is 0.

Again for non-root node it doesn’t belong,doesn’t matter what the rank is

and then this is pointing to itself. At this point if someone says findSet 7,

so basically give me the representative of the,representative

element of the set which 7 belongs to. So we follow the parent pointer, we reach 6,again

we follow the parent pointer,we reach 4 and then we follow the parent pointer and

then it’s 4 itself so we know that 4 is the representive of this set. Again

there is a,here we do a small optimization. What we do is we do path

compression which means that now what we do is we make 7 point directly to

4. So we decrease the depth as much as possible and then make this guy point

directly to 4 so next time there’s a query for 7,7 can directly reach 4 and we

can find the root very, so we can find the representative element very quickly. So

7 now starts pointing to 4. Then we say union of 3,7. So we come here and we

come here,we start with 3,we find the representative element of this set.

So we follow the parent pointer,we reach this point and for 7 we follow the parent pointer

and reach this point.So we check who has a higher rank. So 4 has a higher rank.

So this becomes the child of this guy so let’s merge them together so this becomes

something like 4,5,6,7 all of them have parent pointer pointing

to 4 and then we will have this so that’s 1 and then 1 has 2 and 1 has 3

and this is 0,this is 0,0 and this is rank 2 and the pointer is in this

direction. So this our merged set at this point. So the rank at the route is

2 and the rank of the non-root node doesn’t matter but you can see the

pointer direction,direction of the parent pointers 2 to 0,0 to 4,3 to 0,0 to 4.

Now if someone says findSet 2 so give me the set which 2 belongs

to. So 2 goes to 0,following the, following the pattern pointer,then 0

goes to 4 and then 4 is pointing to itself so 4 is the set in which this

belongs to but also has a path of as a, as a,as a improvement we also apply path

compression so what happens is this starts pointing to 4. So next time if

there is a query for 2,2 can directly go to 4 instead of going via this route.Also

if there’s a findSet 3 so we go to 0 and we go to 4 and then

ultimately this will start pointing to 4 so that as an optimization. So in

the next section let’s look at the time and space complexity. Space complexity is

pretty straightforward. If there are n elements in the set,the space complexity

will be O(n). What about time complexity. So if there are ‘m’ operations

and if there are n elements,the time complexity will be O(m alpha n). So

there’s a proof in the CLRS book describing how this will be the time

complexity. Here alpha n is a very slow, slowly growing function and for all

practices,practical purposes alpha n is less than equal to 4. So this

will pretty much become O(4m) where m is the total number of operations which is as

good as O(m).So next let’s look at the code for this disjoint sets.I have a

class disjointSet. In there I have an inner class Node.Node has data,parent

and rank and we already talked about what each of them does and then I have a

map of data to know this will help us quickly and efficiently find the node

given the data. So I have 3 methods here, makesSet,union and findSet. So

what makeSet does is it creates set with single node so it assigns the data

in the node and then it points a parent to itself and rank is 0 and then we put

this data and node into the map. Then let’s understand what union and findSet

does using this examples. So I have two sets,one with rank 3 and 1 with rank 2.

The representatives of the set are 1 and 11. So now I’m saying union of 7

and 13. So we go to this method union,so first we do is we get the nodes for 7

and 13. So we go to the map and we get node 1 which is the node pointing to 7

and we go to map and get the node 2 for node pointing to 13. Then we call findSet

and try to get the parent for each of these nodes. So we go to findSet,so we

come to this point and then we get the parent so we go to 7.7’s parent is 5 and

then 5’s parent is,so 5 is not same as node so then we again go into a

recursion and findSet again so 5’s parent is 2 and then 2′ s parent is,again we

go into the recursion and so we call findSet again and 2’s parent is 1. At

this point 1’s parent is also 1 so we get into this if condition and we return

parent so by returning parent we are setting the parent of every node in the

path to 1 so that is a path compression so 2’s parent will become one 1,then

5’s parent will become 1 and then 7’s parent will also become 1 and we already

discussed this as a part of path compression. After applying path

compression this is how my set 1 looks. like where 5 points directly to 1 and 7

also points directly to 1. So then it, the line number 49 also returns

parent 1 which is 1 so then we go to, execute line number 50 and node 2 is,

node 2 is 13 so we call findSet and try to get the representative of set

2.So we again go to findSet and here parent is 11 and 11 is not same as 13 so

we again go into the recursion and here we pass node as 11 and the parent of 11

is 11 so we return parent and then it will come back here and we set 13’s

parent as 11 and then return 11. So we come here,come back to line number 50 and

parent2 is 11 so then we apply our logic for union by rank.This statement here

just makes sure that if the parents are same we do nothing because they are

already in the same set otherwise we check a parent whose rank is higher so

in this case parent1’s rank is 3 and parent2’s rank is 2. So parent1’s

rank is higher so we go into this if condition and we check did parent

1 and parent 2 have same rank so in this case parent1 and parent2 do not

have seen rank so we do not change the rank of parent1 and then we also set

parent2’s parent as parent1. So 11,now 11’s parent now instead of pointing to

itself now starts pointing to 1. This is how the final picture looks like

after we do the union of 7 and 13 so 11’s parent is now 1. Next let’s try

to do a find on 14. So we come to find findSet and then we pass 14 so in the

data so first thing we do is we get the node for 14 from the map and then we go

to findSet here. So parent of 14 is 12, so

parent is not same as node so we again go into the recursion and this time node

is 12,again parent of 12 is 11 so they are not same so again we go to findSet

and we go to 11. Parent of 11 is 1, again parent is not same as node so

we go on to findSet and then this time node is 1 and parent of 1 is 1 so

at this time we go into this if condition and return parent. So basically

we return 1. So while going back what we do is we set 11’s parent as 1

and then we set 12’s parent as 1 and then we also set 14’s parent is 1.So

this is how my tree looks like after findSet 14 is executed.14 points to 1 and

also 12 points to 1 and this is as per as we discussed for path compression. So

finally let’s run this code. So in the main I have created 7 distinct sets

and then I perform a union of (1,2),(2,3), (4,5),(6,7),(5,6) and (3,7) which means that

all the sets,all this distinct sets should be merged into 1 set. So when I perform

findSet from 1 to 7 all of them should give one representative element. Let me

quickly run this code. So as you can see I get all 4,so all the findSets

are returning,returning me 4,it means that all of them are in one set now. So

this is all I have to talk about disjoint sets. Please like this video,share

this video,comment on this video,check out my facebook page and also check out

my github link at github.com/mission- peace/interview/wiki.Thanks again for

watching this video.

## Leave a Reply