Wednesday, August 08, 2012

How to get tRNA/rRNA/mitochondrial gene GTF file

Cufflinks/Tophat ask for a GTF file to mask the abundant transcripts (e.g. tRNA/rRNA/chrM). Here is the step to get such a file:

Go to UCSC Table browser:

For tRNA/rRNA:
    * Select "All Tables" from the group drop-down list
    * Select the "rmsk" table from the table drop-down list
    * Choose "GTF" as the output format
    * Type a filename (e.g. "rRNA.tRNA.gtf") in "output file" so your browser downloads the
result
    * Click "create" next to filter
    * Next to "repClass," type rRNA
    * Next to free-form query, select "OR" and type repClass = "tRNA"
    * Click submit on that page, then get output on the main page

For chrM genes:

    * Select "All Tables" from the group drop-down list
    * Select the "knownGene" table from the table drop-down list
    * Choose "GTF" as the output format
    * Type a filename (e.g. "chrM.gtf") in "output file" so your browser downloads the result
    * Click "create" next to filter
    * Next to "chrom," type chrM
    * Click submit on that page, then get output on the main page

cat the two files from above steps. That's it. 
P.S.  I just noticed that for galgal3 (chicken), UCSC does not have rRNA/tRNA in the repeat database. 

5 comments:

  1. Anonymous11:16 AM

    Thanks!

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Yue Hu5:22 PM

    Nice. Thanks!

    ReplyDelete
  4. Yue Hu5:36 PM

    * Select the "UCSC Genes" table from the table drop-down list ---- But I am unable to find "UCSC Genes", the closest one is "UCSC GenePfam", which is about protein domains. Anything I did wrong?

    ReplyDelete
    Replies
    1. Sorry the table name is now knownGene. I've fixed it.

      Delete