Monday, March 18, 2013

Question: why RNA repeats defined in repeatMasker are not found in annotated smallRNA?

I was fetching small RNA like tRNA/rRNA/snRNA/snoRNA from the latest Ensembl, and then I noticed that there are a class of repeat in RepBase called 'RNA repeat', which contains various RNA, tRNA, rRNA, snRNA, scRNA, and srpRNA. I expected they are annotated, but actually when I intersect between them and I found a large number of RNA repeats are not annotated. Here is it:


$ grep rRNA $repeats_from_RepBase | intersectBed -a stdin -b $snRNAsnoRNAtRNArRNA_from_Ensembl -s -v | sort -u | wc -l
1557

Here is an rRNA example:


and a tRNA example:
Both are not annotated as a rRNA/tRNA gene, but as a RNA repeat. 

So, what's the definition of RNA repeat for RepBase? pseudogene?

2 comments:

  1. Chirag2:02 PM

    Generally, when a RNA (mostly tRNA and snRNA) has multiple instances in the genome, they are classified as RNA repeat.
    Both tRNAs and snRNAs, often have multiple copies, many of them as pseudogenes, and are not annotated in Ensembl. For detail tRNA annotation, u can download from tRNA database from Lowe's lab.

    ReplyDelete
  2. Maybe there is no transcript originated from those "RNA repeat" regions, could be psudogenes of rRNAs that aren't transcribed. If that is the case, won't be EST o mRNAs reported, and as Ensembl are made based on EST and mRNAs, won't be any Ensembl Gene Prediction for that "RNA repeat" region.

    ReplyDelete