Nascent RNA Sequencing Reveals Widespread Pausing and Diverg



Transcription of coding and noncoding RNA molecules by eukaryotic RNA polymerases requires their collaboration with hundreds of transcription factors to direct and control polymerase recruitment, initiation, elongation, and termination. Whole-genome microarrays and ultra-high-throughput sequencing technologies enable efficient mapping of the distribution of transcription factors, nucleosomes, and their modifications, as well as accumulated RNA transcripts throughout genomes (, ), thereby providing a global correlation of factors and transcription states. Studies using the chromatin immunoprecipitation assay coupled to genomic DNA microarrays (ChIP-chip) or to high-throughput sequencing (ChIP-seq) indicate that RNA polymerase II (Pol II) is present at disproportionately higher amounts near the 5′ end of many eukaryotic genes relative to downstream regions (–). However, these techniques cannot determine whether Pol II is simply promoter-bound or engaged in transcription. Small-scale analyses using independent methods have shown that this distribution likely represents transcriptionally engaged Pol II that has accumulated between ∼20 and 50 bases downstream of transcription start sites (TSSs) (, ), indicating that transcription can be regulated at the stage of elongation as well as the recruitment and initiation stages (). This promoter-proximal pausing or stalling () is proposed to be an important post-initiation, rate-limiting target for gene regulation (, ).

Here, we present a global run-on-sequencing (GRO-seq) assay to map and quantify transcriptionally engaged polymerase density genome-wide. These measurements provide a snapshot of genome-wide transcription and directly evaluate promoter-proximal pausing on all genes. We used nuclear run-on assays (NRO) to extend nascent RNAs that are associated with transcriptionally engaged polymerases under conditions where new initiation is prohibited. To specifically isolate NRO-RNA, we added a ribonucleotide analog [5-bromouridine 5′-triphosphate (BrUTP)] to BrU-tag nascent RNA during the run-on step (fig. S1). The length of the polynucleotide was kept short, and the NRO-RNA was chemically hydrolyzed into short fragments (∼100 bases) to facilitate high-resolution mapping of the polymerase originat the time of assay (). BrU-containing NRO-RNA was triple-selected through immunopurification with an antibody that is specific for this nucleotide analog, resulting in a 10,000-fold enrichment of the NRO-RNA pool that was determined to be >98% pure (). A NRO-cDNA library was then prepared for sequencing from what represents the 5′ end of the fragmented, BrU-incorporated RNA molecule by using the Illumina high-throughput sequencing platform. The origin and the orientation of the RNAs and therefore the associated transcriptionally engaged polymerases were documented genome-wide by mapping the reads to the reference human genome ().

In total, ∼2.5 × 107 33–base pair (bp) reads were obtained from two independent replicates () prepared from primary human lung fibroblast (IMR90) nuclei, of which ∼1.1 × 107 (44%) mapped uniquely to the human genome. Most reads (85.8%) align on the coding strand within boundaries of known RefSeq genes, human mRNAs, or expressed sequence tags (fig. S2). The number of transcriptionally active genes was determined by using an experimentally and computationally determined background of 0.04 reads per kilobase (). We found 16,882 (68%) of RefSeq genes to be active (P < 0.01) compared with 8438 active genes found by a microarray experiment performed in the same cell line (), reflecting, in part, the added sensitivity of sequencing platforms (). Examination of several large regions shows that GRO-seq can differentiate between transcriptionally active and inactive regions in large chromosomal domains (). In addition, we are able to detect a generally low, but significant (P < 0.01 relative to background) amount of antisense transcription for 14,545 genes (58.7% of genes in the genome) (fig. S3).

Fig. 1.

Sample of GRO-seq data view on the University of California at Santa Cruz (UCSC) genome browser. A 2.5-Mb region on chromosome 5 showing GRO-seq reads aligned to the genome at 1-bp resolution, followed by an up-close view around the NPM1 gene. Pol II ChIP results () are shown in green; mappable regions, black; GRO-seq reads on the plus strand (left to right), red; GRO-seq reads on the minus strand (right to left), light blue; RefSeq gene annotations, dark blue.

Aligning the GRO-seq data relative to RefSeq TSSs shows that the density of reads peaks near the TSS in both sense (∼50 bp) and antisense (∼–250 bp) directions (see below) (). Alignment of GRO-seq reads to annotated 3′ ends of genes reveals a broad peak that is maximal at about +1.5 kb and can extend greater than 10 kb downstream of polyadenylation (poly-A) sites (). This peak distance is consistent with previous and recent estimates (, ). A small peak followed by a sharp drop off is observed at the site of polyadenylation, likely representing the known 3′ cleavage before polyadenylation of the RNA ().

Fig. 2.


上一篇:An Efficient Method for Identifying Gene Fusions by Targeted
下一篇:Evaluation of Different Reference Based Annotation Strategie