Tuesday, December 03, 2013

Illumina HiSeq2000 adaptor and sequencing


I've referred the following material when making the figure:

Just one learning note:

If insert is not long enough (i.e. shorter than the read length), R1 will have contamination from Rd2 SP (e.g. AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC) and R2 will have contamination from reverse complementary of Rd1 SP (e.g. AGATCGGAAGAGCG)

So, basically you just need to check if the shared complementary part of Rd1 and Rd2 SP (which is AGATCGGAAGAGC) occurs in the reads. If yes, simply trim it and its following part (if any).

Note: if you don't understand the "shared complementary part", please refer my previous blog on Illumina adaptor. Here is the link: http://onetipperday.blogspot.com/2013/06/illumina-hiseq2000-adaptor.html

Here is one solution of howto remove adaptor contamination:
1. save the complementary part into a fasta file, e.g. adaptor.fa
>adaptor_complementary_part
AGATCGGAAGAGC
2. run fastq-mcf to remove adaptor
fastq-mcf -o filted -x 10 -l 15 -w 4 -u adaptor.fa input.fq.gz

4 comments:

  1. Thanks, Xianjun! This is very helpful.

    ReplyDelete
  2. Hi Xianjun,

    Why reverse complementary of Rd1 SP is AGATCGGAAGAGCG ? same with Rd2 SP ?

    Thanks

    ReplyDelete
    Replies
    1. hi Shicheng, you may want to refer my previous post on this: http://onetipperday.blogspot.com/2013/06/illumina-hiseq2000-adaptor.html. The "shared complementary part" is designed like that, I think because it's easy to form a Y shape with floppy overhang.

      Delete