Wednesday, August 07, 2013

methods of calling differential region of ChIP-seq

Related papers to read:

Model-based Analysis of ChIP-Seq (MACS)
MACS can also be applied to differential binding between two conditions by treating one of the samples as the control. Since peaks from either sample are likely to be biologically meaningful in this case, we cannot use a sample swap to calculate FDR, and the data quality of each sample needs to be evaluated against a real control.
Curr Protoc Bioinformatics. Author manuscript; available in PMC 2012 June 1.MACS empirically calculates FDR based on the number of peaks from control over ChIP that are called at the same p-value cutoff. Therefore if no control data is available, the FDR column does not exist in the output tabular file. Technically, MACS can also be applied to identify differential peaks between two conditions by treating one of the samples as the control. However, calculated FDR value should be ignored, as peaks from either sample are likely to be biologically meaningful in this case.  
http://www.nature.com/nprot/journal/v7/n9/full/nprot.2012.101.htmlthe warning message 'unbalanced reads between treatment and control' means that the FDR of the resulting peaks will be overestimated when the control sample has more reads and will be underestimated when the ChIP-seq sample is sequenced more deeply.

MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets
ChIPdiff and MACS identified four to six times more target regions associated with significantly increased ChIP-Seq signals for K562 cells compared with those found for H1 ES cells, whereas MAnorm yielded a similar number of cell type-biased peaks in each cell line. To compare the enrichment of cell type-specifically expressed genes in the sets of target genes of the differential binding regions discovered by the three methods, we selected the same number of target genes associated with top differential binding regions identified by each method. The target genes of top differential binding regions identified by MAnorm contained similar numbers of H1 ES cell highly expressed genes but a greater number of K562 cell highly expressed genes compared to those identified by ChIPdiff and MACS (Supplementary Table 1 in Additional file 4), suggesting MAnorm performs better in detecting differentially binding regions than the other two methods. Importantly, the fold changes of differential binding given by ChIPdiff and MACS were based on the total number of reads, which may not be appropriate, as discussed above. Additionally, MAnorm showed even better enrichment of cell type-specifically expressed genes in differential binding region targets than the method developed by Taslim et al. [12] when applied to ChIP-Seq data presented in their study (Supplementary Table 2 in Additional file 4).

1Supplementary Table 1. Enrichment of cell-type differentially expressed genes in genes near differential binding peaks defined by MAnorm, MACS and ChIPdiff. To compare the enrichment scores, we selected the same number of target genes associated with top differential binding regions identified by MAnorm with those identified by other two methods.
2Supplementary Table 1A. Enrichment of genes more highly expressed in H1 ES cells (as compared to K562) in genes near H1 ES enriched peaks  (as compared to K562) defined by MAnorm, MACS and ChIPdiff
3H3K27ac H1 ES-enriched target genesNumber of GenesOverlap with H1 ES up-regulated genesEnrichment Score
4MAnorm (M>1)268012432.49
5ChIPdiff (default)14678843.24
6MAnorm (top 1467 genes; same number of genes as identified by ChIPdiff with default settings)14679413.45
7MACS (P<1e-6)15899933.36
8MAnorm (top 1589 genes; same number of genes as identified by MACS with P<1e-6)15899873.34
9
10 
11 Supplementary Table 1B. Enrichment of K562 higher expressed genes (as compared to H1 ES) in genes near K562 enriched peaks (as compared to H1 ES) defined by MAnorm, MACS and ChIPdiff
12H3K27ac K562-enriched target genesNumber of GenesOverlap with K562 up-regulated genesEnrichment Score
13MAnorm (M<-1)26948952.78
14ChIPdiff (default)673314021.74
15ChIPdiff (confidence threshold=0.9999999999)22916972.55
16MAnorm (top 2291 genes; same number of genes as identified by ChIPdiff with confidence threshold=0.9999999999)22918203.00
17MACS (P<1e-6)934616001.43
18MACS (P<1e-150)15565673.05
19MAnorm (top 1556 genes; same number of genes as identified by MACS with P<1e-6))15566443.47
20





diffReps: Detecting Differential Chromatin Modification Sites from ChIP-seq Data with Biological Replicates
...shows that diffReps is the most sensitive method among all the methods compared, followed by edgeR, DESeq, ChIPDiff and, lastly, CCAT+DESeq. At each cutoff, diffReps (negative binomial test) typically detects a few thousands more differential sites than the secondly ranked method, edgeR.
The tool is in Perl (https://code.google.com/p/diffreps/)

Related tools in R:

Detecting differential binding of transcription factors with ChIP-seq, Bioinformatics, 28, 121-122, doi: 10.1093/bioinformatics/btr605
DBChIP: http://pages.cs.wisc.edu/~kliang/DBChIP/DBChIP.pdf

DiffBind : differential binding analysis of ChIP-Seq peak datahttp://bioconductor.org/packages/2.12/bioc/vignettes/DiffBind/inst/doc/DiffBind.pdf

I'd like to make an assessment for the tools later. 

No comments:

Post a Comment