Since the smallRNA is generally shorter, the usual RNA-seq protocol does not capture them properly (only the 'tail' of the distribution of fragment length). A good solution is to use smallRNA RNA-seq protocol. I meant using short reads length and without size selection.
Based on that, we may be able to use tools like cufflinks/DESeq to calculate the RPKM/FPKM. But again, I am not sure cufflinks will still apply the bias-correction to the small RNA genes when the reads length is short enough. As this post discussed, "The Best way to normalize miRNA HTS data", methods can be:
- RPM ( Read Per Million )
- RPKM (Reads per kilobase per million mapped) : Mortazi et al, Nat. Methods, 2008 (http://www.ncbi.nlm.nih.gov/pubmed/18516045)
- Trimmed mean of M-values : Robinson, Oshlack, Genome Biology 2010 (http://genomebiology.com/2010/11/3/R25)
- upper-quantile : http://www.ncbi.nlm.nih.gov/pubmed/20167110
Simon (one of the authors of DESeq) suggested to use TMM or DESeq (which are reads count-based methods) for differential expression analysis, at least for small RNA genes.
As the DEseq document described, to get reads count you need alignment (SAM/BAM) and the annotation file as input for some tools, like
- the function summarizeOverlaps in the BioConductor GenomicRanges package,
- the geneCounts function in the Bioconductor easyRNASeq package,
- the htseq-count script distributed with the HTSeq Python package (Note: htseq-count produces one count file for each sample. DESeq offers the function newCountDataSetFromHTSeqCount to get an analysis started from these files)
- or the bedtools multicov -count -bams sample1.bam sample2.bam sample3.bam -bed genes.bed
Note the naming issue in Refseq or UCSC that same locus has been assigned with multiple gene names (e.g. SMN1/2), and htseq-count requires unique gene_id in the annotation. You might want to make sure of this before using htseq-count. One solution to this is to use Ensembl GTF.
Also, be aware of the difference between bedtools multicov and htseq-count, esp. when you use multi-mappers. Read details here: http://seqanswers.com/forums/showthread.php?t=24903.
Also, be aware of the difference between bedtools multicov and htseq-count, esp. when you use multi-mappers. Read details here: http://seqanswers.com/forums/showthread.php?t=24903.
No comments:
Post a Comment