Thursday, August 30, 2012

Extract unmapped / unpaired reads using bowtie2

There are options in Bowtie2 for unmapped reads:
--un <path>


Write unpaired reads that fail to align to file at <path>. These reads correspond to the SAM records with the FLAGS 0x4 bit set and neither the 0x40 nor 0x80 bits set. 
--al <path>


Write unpaired reads that align at least once to file at <path>. These reads correspond to the SAM records with the FLAGS 0x40x40, and 0x80 bits unset.
--un is path to save unmapped reads (in FASTQ format), and --al is to save all aligned reads (including multiple mappers).  What's worthy to note is, --un only works when --no-unal option is unset (i.e. if --no-unal is set, --un <file> will be empty).

These two options are both for single-reads (SR). For paired-end(PE) reads, there are another two corresponding options:
--un-conc <path>
Write paired-end reads that fail to align concordantly to file(s) at <path>. These reads correspond to the SAM records with the FLAGS 0x4 bit set and either the 0x40 or 0x80 bit set (depending on whether it's mate #1 or #2). .1 and .2 strings are added to the filename to distinguish which file contains mate #1 and mate #2. If a percent symbol, %, is used in <path>, the percent symbol is replaced with 1 or 2 to make the per-mate filenames. Otherwise, .1 or .2 are added before the final dot in <path> to make the per-mate filenames.
--al-conc <path>
Write paired-end reads that align concordantly at least once to file(s) at <path>. These reads correspond to the SAM records with the FLAGS 0x4 bit unset and either the 0x40 or 0x80 bit set (depending on whether it's mate #1 or #2). .1 and .2 strings are added to the filename to distinguish which file contains mate #1 and mate #2. If a percent symbol, %, is used in <path>, the percent symbol is replaced with 1 or 2 to make the per-mate filenames. Otherwise, .1 or .2 are added before the final dot in <path> to make the per-mate filenames.

As above, --un-conc is for improperly paired reads ('discordantly', e.g. both mates are uniquely mapped, but aren't in the expcted relative orientation, or aren't within the expected disatance range, or both, controlled by --fr/--rf/--ff-I-X options), --al-conc is for concordantly paired reads (including multi-mappers). Respectively, --un-conc only works when --no-discordant is unset (this is what I guess, not test yet)



No comments:

Post a Comment