# ----------- method 1
bamToBed -i accepted_hits.bam -split > accepted_hits.bed
awk '{if($6=="+") print}' accepted_hits.bed | sort -k1,1 | bedItemOverlapCount mm9 -chromSize=ChromInfo.txt stdin | sort -k1,1 -k2,2n > accepted_hits.plus.bedGraph
awk '{if($6=="-") print}' accepted_hits.bed | sort -k1,1 | bedItemOverlapCount mm9 -chromSize=ChromInfo.txt stdin | sort -k1,1 -k2,2n | awk '{OFS="\t"; print $1,$2,$3,"-"$4}' > accepted_hits.minus.bedGraph
bedGraphToBigWig accepted_hits.plus.bedGraph ChromInfo.txt accepted_hits.plus.bw
bedGraphToBigWig accepted_hits.minus.bedGraph ChromInfo.txt accepted_hits.minus.bw
# ----------- method 2
bamToBed -i accepted_hits.bam -split > accepted_hits.bed
sort -k1,1 accepted_hits.bed | awk -v '{print $0 >> "accepted_hits.bed"$6}'
bedItemOverlapCount $index -chromSize=ChromInfo.txt accepted_hits.bed+ | sort -k1,1 -k2,2n > accepted_hits.plus.bedGraph
bedItemOverlapCount $index -chromSize=ChromInfo.txt accepted_hits.bed- | sort -k1,1 -k2,2n | awk '{OFS="\t"; print $1,$2,$3,"-"$4}' > accepted_hits.minus.bedGraph
sort -k1,1 accepted_hits.bed | awk -v '{print $0 >> "accepted_hits.bed"$6}'
bedItemOverlapCount $index -chromSize=ChromInfo.txt accepted_hits.bed+ | sort -k1,1 -k2,2n > accepted_hits.plus.bedGraph
bedItemOverlapCount $index -chromSize=ChromInfo.txt accepted_hits.bed- | sort -k1,1 -k2,2n | awk '{OFS="\t"; print $1,$2,$3,"-"$4}' > accepted_hits.minus.bedGraph
bedGraphToBigWig accepted_hits.plus.bedGraph ChromInfo.txt accepted_hits.plus.bw
bedGraphToBigWig accepted_hits.minus.bedGraph ChromInfo.txt accepted_hits.minus.bw
# ----------- method 3
bedtools genomecov -ibam -bg -split -strand + -i accepted_hits.bam -g ChromInfo.txt > accepted_hits.plus.bedGraph
bedtools genomecov -ibam -bg -split -strand - -i accepted_hits.bam -g ChromInfo.txt > accepted_hits.minus.bedGraph
bedtools genomecov -ibam -bg -split -strand - -i accepted_hits.bam -g ChromInfo.txt > accepted_hits.minus.bedGraph
bedGraphToBigWig accepted_hits.plus.bedGraph ChromInfo.txt accepted_hits.plus.bw
bedGraphToBigWig accepted_hits.minus.bedGraph ChromInfo.txt accepted_hits.minus.bw
Hi!! Very useful post!!
ReplyDeleteI first tried the 3rd method because looks more simpler that the others. I put in a bash for loop like this:
for i in $(ls *.bam)
do
bedtools genomecov -ibam -bg -split -strand + -i $i -g chromInfo.txt > $i.plus.bedGraph
bedtools genomecov -ibam -bg -split -strand - -i $i -g chromInfo.txt > $i.minus.bedGraph
bedGraphToBigWig $i.plus.bedGraph chromInfo.txt accepted_hits.plus.bw
bedGraphToBigWig $i.minus.bedGraph chromInfo.txt accepted_hits.minus.bw
done
But this error appears:
Input error: Chromosome chr10 found in non-sequential lines. This suggests that the input file is not sorted correctly.
Input error: Chromosome chr18 found in non-sequential lines. This suggests that the input file is not sorted correctly.
Input error: Chromosome chr11 found in non-sequential lines. This suggests that the input file is not sorted correctly.
Input error: Chromosome chr2 found in non-sequential lines. This suggests that the input file is not sorted correctly.
Input error: Chromosome chr12 found in non-sequential lines. This suggests that the input file is not sorted correctly.
So, it's seems that a sort step is missing
But the first method works great, here is the bash for loop if some want to apply this pipeline to a bath of bams.
for i in $(ls *.bam)
do
name=${i%.bam}
echo $name
bamToBed -i $name.bam -split > $name.bed
awk '{if($6=="+") print}' $name.bed | sort -k1,1 | bedItemOverlapCount hg19 -chromSize=chromInfo.txt stdin | sort -k1,1 -k2,2n > $name.plus.bedGraph
awk '{if($6=="-") print}' $name.bed | sort -k1,1 | bedItemOverlapCount hg19 -chromSize=chromInfo.txt stdin | sort -k1,1 -k2,2n | awk '{OFS="\t"; print $1,$2,$3,"-"$4}'$
bedGraphToBigWig $name.plus.bedGraph chromInfo.txt $name.plus.bw
bedGraphToBigWig $name.minus.bedGraph chromInfo.txt $name.minus.bw
done
cheers.
bedtools require a sorted bam file, use samtools sort to sort it first
DeleteThe bedtools genomecov syntax in the 3rd method doesn't generate the right bedGraph, instead, it generates the data for coverage historgram.
ReplyDeleteThere is a little correction as bedtools expects bed/gff/vcf files. This was tested with bedtools v2.17.0.
ReplyDeleteMethod 3 should be
bedtools genomecov -bg -split -strand + -ibam accepted_hits.bam -g ChromInfo.txt > accepted_hits.plus.bedGraph
bedtools genomecov -bg -split -strand - -ibam accepted_hits.bam -g ChromInfo.txt > accepted_hits.minus.bedGraph
bedGraphToBigWig accepted_hits.plus.bedGraph ChromInfo.txt accepted_hits.plus.bw
bedGraphToBigWig accepted_hits.minus.bedGraph ChromInfo.txt accepted_hits.minus.bw
Hi!
ReplyDeleteThat is very helpful, but I have the problem that when I load my file on UCSC what it seems is that my minus file is just a shifted reflection of the plus and I do not see the strand specific information.
I have mapped with STAR and than with samtools generated a bam file and than I used the third method.
I am very puzzled.
Best
Giulia
Hi Giulia, did you managed to solve that? I have the same issue!
Deletediego (diego.sanchez@slcu.cam.ac.uk)
for f in *.bedgraph.gz; do
ReplyDeletezcat $f | grep -v track > $f.temp &&
bedGraphToBigWig $f.temp ./hg19.chrom.sizes `echo $f | sed -e s/bedgraph.gz/bw/` &&
rm $f.temp &
done;
You may want to use bam2bigwig.py (https://github.com/lpryszcz/bin).
ReplyDeleteMore info here: http://bioinfoexpert.com/?p=235