View the large version of the figure
Just one learning note:
If insert is not long enough (i.e. shorter than the read length), R1 will have contamination from Rd2 SP (e.g. AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC) and R2 will have contamination from reverse complementary of Rd1 SP (e.g. AGATCGGAAGAGCG)
So, basically you just need to check if the shared complementary part of Rd1 and Rd2 SP (which is AGATCGGAAGAGC) occurs in the reads. If yes, simply trim it and its following part (if any).
Note: if you don't understand the "shared complementary part", please refer my previous blog on Illumina adaptor. Here is the link: http://onetipperday.blogspot.com/2013/06/illumina-hiseq2000-adaptor.html
Here is one solution of howto remove adaptor contamination:
1. save the complementary part into a fasta file, e.g. adaptor.fa
>adaptor_complementary_part
AGATCGGAAGAGC
2. run fastq-mcf to remove adaptor
fastq-mcf -o filted -x 10 -l 15 -w 4 -u adaptor.fa input.fq.gz
Thanks, Xianjun! This is very helpful.
ReplyDeleteThanks for the feedback.
DeleteHi Xianjun,
ReplyDeleteWhy reverse complementary of Rd1 SP is AGATCGGAAGAGCG ? same with Rd2 SP ?
Thanks
hi Shicheng, you may want to refer my previous post on this: http://onetipperday.blogspot.com/2013/06/illumina-hiseq2000-adaptor.html. The "shared complementary part" is designed like that, I think because it's easy to form a Y shape with floppy overhang.
Delete