Evaluation of Different Reference Based Annotation Strategie



PLoS One. 2012; 7(10): e46415.

Published online 2012 Oct 3. doi:  10.1371/journal.pone.0046415

PMCID: PMC3463616

PMID: 23056304

Evaluation of Different Reference Based Annotation Strategies Using RNA-Seq – A Case Study in Drososphila pseudoobscura

Nicola Palmieri, * Viola Nolte, Anton Suvorov, Carolin Kosiol, and Christian Schlötterer

Nicola Palmieri

Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria,

Find articles by Nicola Palmieri

Viola Nolte

Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria,

Find articles by Viola Nolte

Anton Suvorov

Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria,

Find articles by Anton Suvorov

Carolin Kosiol

Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria,

Find articles by Carolin Kosiol

Christian Schlötterer

Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria,

Find articles by Christian Schlötterer

Dongxiao Zhu, Editor

Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria,

Wayne State University, United States of America,

* E-mail: ta.ca.inudemtev@ireimlap.alocin

Competing Interests: The authors have declared that no competing interests exist.

Conceived and designed the experiments: CS CK. Performed the experiments: VN. Analyzed the data: NP AS. Wrote the paper: NP AS CS.

Disclaimer

Received 2012 Jun 28; Accepted 2012 Aug 30.

Copyright © 2012 Palmieri et al

This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.

This article has been cited by other articles in PMC.

Abstract

RNA-Seq is a powerful tool for the annotation of genomes, in particular for the identification of isoforms and UTRs. Nevertheless, several software tools exist and no standard strategy to obtain a reliable annotation is yet established. We tested different combinations of the most commonly used reference-based alignment tools (TopHat, GSNAP) in combination with two frequently used reference-based assemblers (Cufflinks, Scripture) and evaluated the potential of RNA-Seq to improve the annotation of Drosophila pseudoobscura. While GSNAP maps a higher proportion of reads, TopHat resulted in a more accurate annotation when used in combination with Cufflinks. Scripture had the lowest sensitivity. Interestingly, after subsampling to the same coverage for GSNAP and TopHat, we find that both mappers have similar performance, implying that the advantage of TopHat is mainly an artifact of the lower coverage. Overall, we observed a low concordance among the different approaches tested both at junction and isoform levels. Using data from both sexes of two adult strains of D. pseudoobscura we detected alternative splicing for about 30% of the FlyBase multiple-exon genes. Moreover, we extended the boundaries for 6523 genes (about 40%). We annotated 669 new genes, 45% of them with splicing evidence. Most of the new genes are located on unassembled contigs, reflecting their incomplete annotation. Finally, we identified 99 additional new genes that are not represented in the current genome contigs of D. pseudoobscura, probably due to location in genomic regions that are difficult to assemble (e.g. heterochromatic regions).

Introduction

RNA-Seq technology is a powerful tool for the annotation of genomes due to its potential to identify precise exon boundaries and the ability to detect lowly expressed transcripts (e.g. , , , , , ). Annotation via RNA-Seq can be performed using a reference-based approach, de novo, or a combined strategy . The choice of strategy mainly depends on the availability of a reference genome. In a reference-based approach reads are aligned to a genomic reference using a mapper specifically designed for RNA-Seq data, followed by transcriptome reconstruction from the mapped reads. In the de novo approach transcripts are directly reconstructed from the reads. The major challenge in both approaches is the disentangling of different isoforms. Since the introduction of RNA-Seq many mapping tools have been developed, with TopHat being among the most popular ones. For transcriptome reconstruction the most commonly used software tools are Cufflinks and Scripture , which reconstruct a set of transcripts using reads mapped with TopHat. Although other mappers, such as GSNAP , have been described to be more accurate than TopHat , to our knowledge they have never been used in combination with the transcriptome reconstruction tools mentioned above.

Here we use RNA-Seq to improve the annotation of D. pseudoobscura, a frequently studied Drosophila species that is widely used to address questions such as the evolution of inversions (e.g. ), speciation (e.g. ) and sex chromosome evolution (e.g. ). The current annotation of D. pseudoobscura has remained almost unchanged since the first release of the genome in 2005 and suffers from some important limitations: only about 25% of the genes are supported by ESTs, alternative splicing is detected for only 2% of the genes and only few genes have annotated UTRs (2%). Here we improved the annotation of D. pseudoobscura by exploiting the resolution power of RNA-Seq. Our study extends the gene boundaries for 40% of the genes, detects 669 new genes and reveals alternative splicing for about 30% of the multiple-exon genes. Moreover, we provide evidence for 99 additional genes located in unassembled genomic regions.

Materials and Methods

Sample Preparation and RNA-Seq


上一篇:Nascent RNA Sequencing Reveals Widespread Pausing and Diverg
下一篇:Regulation of Heterochromatic Silencing and Histone H3 Lysin