Tuesday, July 28, 2015

Note from GEUVADIS paper

Note from GEUVADIS papers:

Lappalainen et al. Nature 2013 : Transcriptome and genome sequencing uncovers functional variation in humans, http://dx.doi.org/10.1038/nature12531
 
‘t Hoen et al. Nature Biotechnology 2013: Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories, http://dx.doi.org/10.1038/nbt.2702

1. how to detect sample outlier?
a. before alignment: distance of k-mer profile
b. after alignment: Spearman rank correlation between samples --> D-statistics (i.e.  the median correlation of one sample against all the other samples)
c. gender mismatch: XIST vs. chrY
d. ASE bias rate among heterozygous sites



2. eQTL
a. exon/gene quantification
b. filter out lowly expressed ones (e.g. 0 in >50% samples)
c. for each group, normalize with PEER, adding mean
    c1. use subset (??e.g. chr20, or chr20- 22??) using K=????0,1,,3,,57,10,13,15,20 for each dataset
    c2. run eQTL and number of genes (eGenes) for each K.
    c3. get the optimal K = K(with most number of eQTL genes)
    c4. run PEER on 20,000 exons to get covairantes for the final normalization
    c5. final PEER normalization using all dataset, residual + mean as final quantification
d. transform the final quantification to standard normal distribution (by ?)
e. eQTL using Matrix-eQTL: linear regression of quantification ~ genotypes + genotype_covariates
3. Differential expression analysis
a. TMM normalization (from edgeR)
b. filter: genes with more than 5 counts per million in at least 1 sample were analyzed in pairwise population comparisons
c. tweeDEseq (good for large samples), significance: FDR < 0.05 and log2 fold change greater than 2

No comments:

Post a Comment