bed --> gtf:
bedToGenePred input.bed stdout | genePredToGtf file stdin output.gtf
Some gene tracks are in a bed format in the database, perhaps with extra columns past the standard bed format. In this case, extract the standard bed columns, convert it to a genePred and then to a gtf. For example
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -N -e "select chrom,chromStart,chromEnd,name,score,strand,thickStart,thickEnd from wgRna;" hg19 | bedToGenePred stdin stdout | genePredToGtf file stdin wgRna.gtf
gtf --> bed:
download gtf2bed.pl from Eric's site:
http://code.google.com/p/ea-utils/source/browse/trunk/clipper/gtf2bed
perl gtf2bed.pl input.gtf > output.bed
If simply converting gtf to bed6 format (no exons/intron info), the following bash line can work:
fgrep -w transcript gencode.v17.annotation.gtf | sed 's/[";]//g;' | awk '{OFS="\t"; print $1, $4-4,$5,$12,0,$7}'
or a bed9 format with additional info on gene/type/name
fgrep -w transcript gencode.v17.annotation.gtf | sed 's/[";]//g;' | awk '{OFS="\t"; print $1, $4-4,$5,$12,0,$7,$18,$14,$10}'
Also, I edited Eric's code a bit to output a more meaningful gene ID, e,g.
($transid) = $f[8]=~ /transcript_id "([^"]+)"/;
($geneid) = $f[8]=~ /gene_id "([^"]+)"/;
($gene_type) = $f[8]=~ /gene_type "([^"]+)"/;
($gene_name) = $f[8]=~ /gene_name "([^"]+)"/;
($trans_type) = $f[8]=~ /transcript_type "([^"]+)"/;
$id="${gene_name}__${geneid}__${transid}__${gene_type}.${trans_type}";
There is a bug in the gtf2bed.pl script. See https://code.google.com/p/ea-utils/issues/list
ReplyDeleteIt's already clarified as not a bug. see https://code.google.com/p/ea-utils/issues/detail?id=28
Delete