Cold Spring Harbor Laboratory (CSHL), Cold Spring Harbor US
Speech Title: 
High resolution landscape of transcription in human cells

Steady state measurements of transcriptomes represent a snapshot of the sequence content and amounts of individual RNAs present in biological samples. .  As part of the ENCODE project, we have sought to both provide a comprehensive genome-wide catalogue of the human transcriptome and also to identify the sub-cellular context for distinct RNAs and their classes. This goal was achieved by identifying and characterizing both previously annotated and novel RNAs that are enriched in either of the two major cellular sub-compartments (nucleus and cytosol) for all 15 cell lines studied and for one cell line, three additional sub-nuclear compartments. In addition, we sought to determine if identified transcripts are modified at their 5’ and 3’ termini by the presence of a cap or polyadenylation, respectively and to determine as many precursor-product relationships for the identified RNAs as was possible.  Overall a sampling of our result indicate that a total of 62.1% and 74.7% of the human genome were observed to be covered by either processed (contigs and Gencode exons) and primary transcripts (contigs, junctions and Gencode genes), respectively with no cell line showing  more than 56.7% of the union of the expressed transcriptome across all cells. A consequence of these high-resolution RNA mapping observations is that the intergenic regions of the human genome is shrinking in size (most being <10,000 bp), having notable implications on the classic definition of a genic region.  Current genome-wide annotated catalogue of long polyadenlyated and short RNAs catalogued by the Gencode annotation group can be possibly increased by 94,800 exons (19%), 69,052 splice sites (22%), 73,325 transcripts (45%), and 41,204 genic regions (80%).Isoform expression by genes was observed not follow a minimalistic strategy resulting in genes tending to express many isoforms simultaneously. While the number of expressed isoforms appears to increase with the number of annotated isoforms, the expressed number appears to plateau at about 10-12 expressed isoforms per gene per cell line. The range of expression for detected transcripts in each cell line was measured, and covers 6 orders of magnitude for protein coding, non-coding and novel intergenic/antisense genes (10-2 – 10-4 RPKM) in the polyadenylated fraction and 5 orders of magnitude (10-2 – 10-3 RPKM) in the non-polyadenylated fraction.Finally, cell type-specific enhancers clearly contain promoters that are differentiable from other regulatory regions by the presence of novel RNA transcripts, chromatin marks and DNAse l hypersensitive sites. These and other results in these studies point not only to a human transcriptome that is complex but suggest layers of regulation that have yet to be characterized.