Monday, June 08, 2020

Be careful of "sort -k1,1 -k2,2n -u"

When you attempted to sort and extract the unique genomic regions using "sort -k1,1 -k2,2n -u", you might make a mistake by missing the region with the same chr and start, but different end position.

The right way should be  "sort -k1,1 -k2,2n -k3,3n -u" or  "sort -k1,1 -k2,2n | sort -u"

No comments:

Post a Comment