Thursday, July 12, 2012

Three ways to convert bam/bed file to bigwig, separated by strand

Here are three ways to convert bam/bed to bigwig, separated by strand:


# -----------  method 1


bamToBed -i accepted_hits.bam -split > accepted_hits.bed


awk '{if($6=="+") print}' accepted_hits.bed | sort -k1,1 | bedItemOverlapCount mm9 -chromSize=ChromInfo.txt stdin | sort -k1,1 -k2,2n > accepted_hits.plus.bedGraph
awk '{if($6=="-") print}' accepted_hits.bed | sort -k1,1 | bedItemOverlapCount mm9 -chromSize=ChromInfo.txt stdin | sort -k1,1 -k2,2n | awk '{OFS="\t"; print $1,$2,$3,"-"$4}' > accepted_hits.minus.bedGraph

bedGraphToBigWig accepted_hits.plus.bedGraph ChromInfo.txt accepted_hits.plus.bw
bedGraphToBigWig accepted_hits.minus.bedGraph ChromInfo.txt accepted_hits.minus.bw


# ----------- method 2

bamToBed -i accepted_hits.bam -split > accepted_hits.bed


sort -k1,1 accepted_hits.bed | awk -v '{print $0 >> "accepted_hits.bed"$6}'
bedItemOverlapCount $index -chromSize=ChromInfo.txt accepted_hits.bed+ | sort -k1,1 -k2,2n > accepted_hits.plus.bedGraph
bedItemOverlapCount $index -chromSize=ChromInfo.txt accepted_hits.bed- | sort -k1,1 -k2,2n | awk '{OFS="\t"; print $1,$2,$3,"-"$4}' > accepted_hits.minus.bedGraph

bedGraphToBigWig accepted_hits.plus.bedGraph ChromInfo.txt accepted_hits.plus.bw
bedGraphToBigWig accepted_hits.minus.bedGraph ChromInfo.txt accepted_hits.minus.bw


# ----------- method 3

bedtools genomecov -ibam -bg -split -strand + -i accepted_hits.bam -g ChromInfo.txt accepted_hits.plus.bedGraph
bedtools genomecov -ibam -bg -split -strand - -i accepted_hits.bam -g ChromInfo.txt accepted_hits.minus.bedGraph

bedGraphToBigWig accepted_hits.plus.bedGraph ChromInfo.txt accepted_hits.plus.bw
bedGraphToBigWig accepted_hits.minus.bedGraph ChromInfo.txt accepted_hits.minus.bw

8 comments:

  1. Hi!! Very useful post!!

    I first tried the 3rd method because looks more simpler that the others. I put in a bash for loop like this:


    for i in $(ls *.bam)

    do

    bedtools genomecov -ibam -bg -split -strand + -i $i -g chromInfo.txt > $i.plus.bedGraph
    bedtools genomecov -ibam -bg -split -strand - -i $i -g chromInfo.txt > $i.minus.bedGraph

    bedGraphToBigWig $i.plus.bedGraph chromInfo.txt accepted_hits.plus.bw
    bedGraphToBigWig $i.minus.bedGraph chromInfo.txt accepted_hits.minus.bw

    done

    But this error appears:

    Input error: Chromosome chr10 found in non-sequential lines. This suggests that the input file is not sorted correctly.
    Input error: Chromosome chr18 found in non-sequential lines. This suggests that the input file is not sorted correctly.
    Input error: Chromosome chr11 found in non-sequential lines. This suggests that the input file is not sorted correctly.
    Input error: Chromosome chr2 found in non-sequential lines. This suggests that the input file is not sorted correctly.
    Input error: Chromosome chr12 found in non-sequential lines. This suggests that the input file is not sorted correctly.

    So, it's seems that a sort step is missing

    But the first method works great, here is the bash for loop if some want to apply this pipeline to a bath of bams.

    for i in $(ls *.bam)

    do

    name=${i%.bam}

    echo $name

    bamToBed -i $name.bam -split > $name.bed


    awk '{if($6=="+") print}' $name.bed | sort -k1,1 | bedItemOverlapCount hg19 -chromSize=chromInfo.txt stdin | sort -k1,1 -k2,2n > $name.plus.bedGraph
    awk '{if($6=="-") print}' $name.bed | sort -k1,1 | bedItemOverlapCount hg19 -chromSize=chromInfo.txt stdin | sort -k1,1 -k2,2n | awk '{OFS="\t"; print $1,$2,$3,"-"$4}'$

    bedGraphToBigWig $name.plus.bedGraph chromInfo.txt $name.plus.bw
    bedGraphToBigWig $name.minus.bedGraph chromInfo.txt $name.minus.bw


    done


    cheers.

    ReplyDelete
    Replies
    1. bedtools require a sorted bam file, use samtools sort to sort it first

      Delete
  2. Anonymous2:26 PM

    The bedtools genomecov syntax in the 3rd method doesn't generate the right bedGraph, instead, it generates the data for coverage historgram.

    ReplyDelete
  3. There is a little correction as bedtools expects bed/gff/vcf files. This was tested with bedtools v2.17.0.

    Method 3 should be

    bedtools genomecov -bg -split -strand + -ibam accepted_hits.bam -g ChromInfo.txt > accepted_hits.plus.bedGraph
    bedtools genomecov -bg -split -strand - -ibam accepted_hits.bam -g ChromInfo.txt > accepted_hits.minus.bedGraph

    bedGraphToBigWig accepted_hits.plus.bedGraph ChromInfo.txt accepted_hits.plus.bw
    bedGraphToBigWig accepted_hits.minus.bedGraph ChromInfo.txt accepted_hits.minus.bw

    ReplyDelete
  4. Anonymous9:11 AM

    Hi!
    That is very helpful, but I have the problem that when I load my file on UCSC what it seems is that my minus file is just a shifted reflection of the plus and I do not see the strand specific information.
    I have mapped with STAR and than with samtools generated a bam file and than I used the third method.
    I am very puzzled.
    Best
    Giulia

    ReplyDelete
    Replies
    1. Anonymous1:22 PM

      Hi Giulia, did you managed to solve that? I have the same issue!
      diego (diego.sanchez@slcu.cam.ac.uk)

      Delete
  5. Anonymous3:30 PM

    for f in *.bedgraph.gz; do
    zcat $f | grep -v track > $f.temp &&
    bedGraphToBigWig $f.temp ./hg19.chrom.sizes `echo $f | sed -e s/bedgraph.gz/bw/` &&
    rm $f.temp &
    done;

    ReplyDelete
  6. You may want to use bam2bigwig.py (https://github.com/lpryszcz/bin).
    More info here: http://bioinfoexpert.com/?p=235

    ReplyDelete