As Heng said, "BAM is compressed. Sorting helps to give a better compression ratio because similar sequences are grouped together."... So it's not because of the removal of the unmapped reads (which are put at the end).
The tips is - always sort the output BAM after converting a SAM, e.g.
samtools view -Sbu in.sam | samtools sort - in.sorted
mv in.sorted.bam in.bam
Sorted BAM is smaller and better for searching.
---------------------------
Other tips are:
1. SAM->BAM does not require a sorted header, nor a header.
If there is header, samtools view -Sb in.sam > in.bam
if there is no header, samtools view -Sbt genome.fa.fai in.sam > in.bam
But the in.bam will not follow the order in the sequence dictionary (genome.fa.fai) unless you sort it by samtools sort.
2. If input BAM, cufflinks require a proper header for the BAM file, esp. the line of
@HD VN:1.0 SO:coordinate
Without the line, even if your BAM (or SAM) is sorted, but cufflinks cannot tell it by the file, only if you provide the info through the @HD line. So, I guess
@HD VN:1.0 SO:unsorted
won't work.
No comments:
Post a Comment