One Tip Per Day: Finding connected components of a graph

Monday, August 06, 2012

Finding connected components of a graph

I recently was bothered by a seemly-easy problem: how to merge lines with common number(s)?

For example, if the original file looks like:

1;2;3;4
4;5
7;5
10;13
22;34
11;13

Based on the rule, it will be merged into:
1;2;3;4;5;7
10;11;13
22;34

This is actually derived from a common problem in graph theory: how to detect the connect components?

Imagine each number in the txt file is the label of nodes, one line is a sub-graph (more extremely, it can be two numbers/nodes for just one edge). So if two lines have common number, they have common node connected, therefore should be merged together. In this case, I don't need to consider the topology of final result, which means the order of number in the final files does not matter.

Of course, there are more sophisticated ways to do so. But can we do it using bash command?

Here is it (../data/IDs is the node label file):

sed 's/;/\n/g' ../data/IDs | sort -u > ../data/uniqIDs

cp ../data/IDs a.temp

while IFS= read -r line

grep $line a.temp | tr ';' '\n' | sort -u | tr '\n' ';' | sed 's/;$/\n/' > b.temp

grep -v $line a.temp >> b.temp

cp b.temp a.temp

done < ../data/uniqIDs

One Tip Per Day

Pages

Monday, August 06, 2012

Finding connected components of a graph

No comments:

Post a Comment