Tuesday, September 18, 2012

--report-secondary-alignments in Tophat is nothing with 0x100 SAM flag of SAM output

--report-secondary-alignments option in Tophat is off by default, which means only best or primary alignments will be reported. So, if a read mapped to multiple loci with perfect match (or equal best score), then they all should be reported by default (within the limit of -g/--max-multihits). For example, when using "-g 100" and the read mapped to 40 positions with perfect match, then these 40 alignments are output in the SAM file. (If "-g 40" and the read mapped to 100 positions perfectly, then only 40 hits will be randomly output).

However, if --report-secondary-alignments is on, Tophat will continue to find the secondary best alignments to get the 100 hits output, out of which 40 are called best/primary alignments.

In either case above, the SAM file only has one hit without 0x100 FLAG (i.e. all of the rest have 0x100 SAM flag, they all are called "secondary alignment" by SAM specification).

This is from discussion with Jessie. I will check to confirm.

2 comments:

  1. As I understand it:

    -g sets the max number of reported alignments

    In default mode, more than 1 (less than g) alignments per read will be reported, only if all have the same, optimal score.

    --report-secondary-alignment may result in reporting some alignments with suboptimal score.

    "Secondary" 0x100 SAM flag means something else - all suboptimal and all but one optimal alignment are "secondary".

    Do you know how Tophat chooses the SAM primary (i.e. non 0x100) alignment? Is it the first in terms of coordinates, first depending on random seed of where to start aligning or else?

    ReplyDelete
    Replies
    1. Hi Michael,

      Yes, your understanding are all very right.
      How does Tophat choose the primary alignment? I don't know. That's a good question. Please let me know if you get the answer. Also, why do you care?

      -Xianjun

      Delete