46
Organization of the Caenorhabditis e legans small non-coding transcript ome by Rnomics, tiling array and bioinformatics 陈陈陈 (Runsheng CHEN) Institute of Biophysics, CAS 2007-8-29

How many characters are in the “ Heaven Book ” ? 3*10 9 10,000 books

Embed Size (px)

DESCRIPTION

Organization of the Caenorhabditis elegans small non-coding transcriptome by Rnomics, tiling array and bioinformatics 陈润生 (Runsheng CHEN) Institute of Biophysics, CAS 2007-8-29. How many characters are in the “ Heaven Book ” ? 3*10 9 10,000 books 1 book 100 pages - PowerPoint PPT Presentation

Citation preview

Page 1: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

Organization of the Caenorhabditis elegans small non-coding transcriptome by Rn

omics, tiling array and bioinformatics

陈润生 (Runsheng CHEN)Institute of Biophysics, CAS

2007-8-29

Page 2: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

How many characters are in the “Heaven Book”? 3*10

9 10,000 books

1 book 100 pages

1 page 3,000 characters

CCGGTCTCCCCGCCCGCGCGCGAAGTAAAGGCCCAGCGCAGCCCGCGCTCCTGCCCTGGGGCCTCGTCTTTCTCCAGGAAAACGTGGACCGCTCTCCGCCGACAGTCTCTTCCACAGACCCCTGTCGCCTTCGCCCCCCGGTCTCTTCCGGTTCTGTCTTTTCGCTGGCTCGATACGAACAAGGAAGTCGCCCCCAGCGAGCCCCGGCTCCCCCAGGCAGAGGCGGCCCCGGGGGCGGAGTCAACGGCGGAGGCACGCCCTCTGTGAAAGGGCGGGGCATGCAAATTCGAAATGAAAGCCCGGGAACGCCGAAGAAGCACGGGTGTAAGATTTCCCTTTTCAAAGGCGGGAGAATAAGAAATCAGCCCGAGAGTGTAAGGGCGTCAATAGCGCTGTGGACGAGACAGAGGGAATGGGGCAAGGAGCGAGGCTGGGGCTCTCACCGCGACTTGAATGTGGATGAGAGTGGGACGGTGACGGCGGGCGCGAAGGCGAGCGCATCGCTTCTCGGCCTTTTGGCTAAGATCAAGTGTAGTATCTGTTCTTATCAGTTTAATATCTGATACGTCCTCTATCCGAGGACAATATATTAAATGGATTGATCAATCCGCTTCAGCCTCCCGAGTAGCTGGGACTACAGACGGTGCCATCACGCCCAGCTCATTGTTGATTCCCGCCCCCTTGGTAGAGACGGGATTCCGCTATATTGCCTGGGCTGGTGTCGAACTCATAGAACAAAGGATCCTCCCTCCTGGGCCTGGGCGTGGGCTCGCAAAACGCTGGGATTCCCGGATTACAGGCGGGCGCACCACACCAGGAGCAAACACTTCCGGTTTTAAAAATTCAGTTTGTGATTGGCTGTCATTCAGTATTATGCTAATTAAGCATGCCCGGTTTTAAACCTCTTAAAACAACTTTTAAAATTACCTTTCCACCTAAAACGTTAAAATTTGTCAAGTGATAATATTCGACAAGCTGTTATTGCCAAACTATTTTCCTATTTGTTTCCTAATGGCATCGGAACTAGCGAAAGTTTCTCGCCATCAGTTAAAAGTTTGCGGCAGATGTAGACCTAGCAGAGGTGTGCGAGGAGGCCGTTAAGACTATACTTTCAGGGATCATTTCTATAGTGTGTTACTAGAGAAGTTTCTCTGAACGTGTAGAGCACCGAAAACCACGAGGAAGAGAGGTAGCGTTTTCATCGGGTTACCTAAGTGCAGTGTCCCCCCTGGCGCGCAATTGGGAACCCCACACGCGGTGTAGAAATATATTTTAAGGGCGCG

(1250 characters)  

Page 3: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

Noncoding sequences: Sequences in genome, which are not coding for any proteins.

How many of the human genome are noncoding sequences?

More than 97%!!!

Page 4: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

BREAKTHROUGH OF THE YEAR(2001): Science celebrates nine other areas in which important findings were reported this year, from subatomic to atmospheric and beyond.

First runner-up: RNA ascending. Short RNAs clearly play important biological roles. Dozens of the molecules are now known to exist in the nematode and fruit fly. The coding for these molecules is contained in the DNA sequence. Some 100 of these tiny RNA "genes" have been found in the gut bacterium Escherichia coli, and some 200 were uncovered in DNA from mouse brain tissue. In the nematode and fruit fly, they seem to be involved in development; in E. coli, they may facilitate rapid responses to environmental change and could serve similar functions in mammals.

Page 5: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books
Page 6: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

Nature 391, 806 - 811 (19 February 1998)Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans ANDREW FIRE*, SIQUN XU*, MARY K. MONTGOMERY*, STEVEN A. KOSTAS*†, SAMUEL E. DRIVER‡ & CRAIG C. MELLO‡

* Carnegie Institution of Washington, Department of Embryology, 115 West University Parkway, Baltimore, Maryland 21210, USA† Biology Graduate Program, Johns Hopkins University, 3400 North Charles Street, Baltimore, Maryland 21218, USA‡ Program in Molecular Medicine, Department of Cell Biology, University of Massachusetts Cancer Center, Two Biotech Suite 213, 373 Plantation Street, Worcester, Massachusetts 01605, USA

Page 7: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books
Page 8: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

Genome and transcription (tiling array data) ( 基因组和转录 )

Protein coding sequence ( 编码蛋白序列 ) –人 (Human) ~2-3 % of genome

–线虫 (C.elegans) ~25 % of genome

Transcriptional activity ( 基因组的转录水平 )

–人 (Human) ≧ 60 % (20-30X) of genome

–线虫 (C.elegans) ~70 % (2-3X) of genome

The majority of transcripts are non-coding RNAs

The major differences among different organisms are ncRNAs

Transcriptional output/complexity基因组的转录情况

Page 9: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

Biological Dark MatterNewfound RNA suggests a hidden complexity inside cellsJohn Travis

In the early 1990s, Victor Ambros and his colleagues were conducting a gene hunt. In particular, they were searching for the gene that was mutated in a perplexing strain of Caenorhabditis elegans, the small nematode whose development many biologists study. Unlike most genes, the one identified by Ambros' group doesn't encode a protein. It spawns a small molecule of RNA—a chemical relative of DNA—that somehow turns off other genes that play a role in worm development. Several groups, including one led by Eddy, Ambros' team and two other research groups reported that Escherichia coli , worms, flies, and people contain dozens of previously undetected genes that spawn RNA instead of protein.

The RNA genes found so far are "just the tip of a huge iceberg," says Ruvkun.

Page 10: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

Organization of the Caenorhabditis elegans small non-coding transcriptome: Genomic features, biogenesis, and expression

Page 11: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

1 、 Found 100 novel noncoding RNAs an

d their genes in C.elegans by Rnomics Applying a novel cloning strategy, we have cloned 100 novel and 61 known or predicted Caenorhabditis elegans full-length ncRNAs (different from microRNA).

Genome Research 16: 20-29, 2006; NCBI accession number: AY948555-- AY948719

Page 12: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

Studying the genomic environment and transcriptional characteristics have shown that two-thirds of all ncRNAs, including many intronic snoRNAs, are independently transcribed under the control of ncRNA-specific upstream promoter elements. Furthermore, the transcription levels of at least 60% of the ncRNAs vary with developmental stages. We identified two new classes of ncRNAs, stem–bulge RNAs (sbRNAs) and snRNA-like RNAs (snlRNAs), both featuring distinct internal motifs, secondary structures, upstream elements, and high and developmentally variable expression. Most of the novel ncRNAs are conserved in Caenorhabditis briggsae, but only one homolog was found outside the nematodes.

Page 13: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

The stem-bulge RNAs of C. elegans

To classify two new categories

Page 14: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

The snRNA like RNAs of C. elegans

Page 15: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

Confirm three special upstream motifs of noncoding genes—UM1-3

located within 40-80 bp upstream of the transcription initiation sites of the ncRNA loci were further revealed by MEME (Bailey and Elkan, 1995).

Page 16: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

0

50

100

150

200

250

300

0 5 10 15 20 25

ncRNA library clone number

Hos

t gen

e E

ST

hits

Group V

Group II

The expression levels of non-motif snoRNAs with the frequencies of ESTs corresponding to exons of their host genes, produced a distinct positive correlation not found for motif-containing loci.

Found that many of the ncRNA genes are located in the introns of host protein-coding genes and are under the control of independent promoter elements.

Page 17: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

构建了编码与非编码基因同时测量的混合芯片Profiling Caenorhabditis elegans non-coding RNA expression with a combined microarrayHousheng He1,5, Lun Cai2,5, Geir Skogerbø1, Wei Deng1, Tao Liu1,5, Xiaopeng Zhu1,5, Yudong Wang1, Dong Jia1, Zhihua Zhang1,5, Yong Tao5,6, Haipan Zeng7,

Muhammad Nauman Aftab1,5, Yan Cui4, Guozhen Liu7 and Runsheng Chen1,2,3,*,

Nucleic Acids Research, 2006, Vol. 34, No. 10, 2976–2983

Page 18: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

Arrangements of transcriptional elements and genomic locations of small non-coding ncRNA loci, as inferred from genomic and experimental data.

Biogenesis of C. elegans ncRNAs

Page 19: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

Analysis of transcription levels of 106 ncRNA families were carried out with Northern blot. 61 showed variation exceeding two standard variation, composed of 6 distinct expression clusters.

Developmentally regulated ncRNAs

Page 20: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

Public release date: 9-Jan-2006Contact: Maria [email protected] Spring Harbor Laboratory

'Pregnant' protein-coding genes carry RNA 'babies'Scientists characterize large numbers of independently expressed, non-

protein-coding RNA genes in the introns of protein-coding genes

BEIJING, China Scientists from the Chinese Academy of Sciences have performed a comprehensive analysis of small, non-protein-coding RNAs in the model nematode, C. elegans. They characterize 100 heretofore-undescribed transcripts, including two novel classes; they provide insights into the genomic structure and transcriptional regulation of non-coding RNAs; and they underscore the importance of non-coding RNAs in nematode development. Their work appears this month in the journal Genome Research.

Page 21: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

*"The significance of non-protein-coding RNAs as central components of various cellular processes has risen sharply over the recent years," explains Prof. Runsheng Chen, principal investigator on the study. Excluding microRNAs (miRNAs), or small transcripts that have recently received widespread attention and are known to play important roles in transcriptional regulation, small non-coding RNAs (or ncRNAs) in C. elegans have not been extensively investigated until now.

Using a new, high-throughput procedure to clone small, full-length ncRNAs, Chen's laboratory isolated and characterized 161 unique transcripts. A major advantage of the new cloning procedure is that it achieves an extraordinarily high detection rate for ncRNAs by current standards. "Studies published over recent years have only been able to reach a detection rate of about 3%, but our method reached a detection rate of 30% a 10-fold increase in cloning efficiency," explains Chen. "It's like going from a Model T Ford to a Ferrari in one fell swoop!"

Page 22: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

Of the 161 transcripts detected by Chen's group, 100 were novel and 61 were previously known or predicted. Among the 100 novel genes, 30 had no known function, whereas 70 belonged to the ubiquitous class of small nucleolar RNAs (snoRNAs). Based on sequence and structural features, Chen and his colleagues were able to classify more than half of the 30 unknown RNAs into two new categories: stem-bulge RNAs (sbRNAs) and small nuclear-like RNAs (snlRNAs). Both classes of transcripts exhibited enhanced expression during the later stages of worm development, indicating a functional role for these transcripts in developmental processes.

"The interesting thing about nematodes is that their genomic organization of both snoRNAs and other ncRNAs is quite different from other animals," says Chen. In contrast to the genomes of other metazoans, where most snoRNAs are found in introns and are under the control of independent promoters, nematode snoRNA loci are both intergenic and intronic (with and without promoters). Interestingly, plant snoRNAs are primarily located in intergenic regions. Other ncRNA genes (i.e., non-snoRNA genes) are mainly located in intergenic regions in both plants and animals. But in nematodes, Chen's team found that many of these other ncRNA genes are located in the introns of host protein-coding genes and are under the control of independent promoter elements.

Page 23: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

Finally, Chen and his colleagues estimated that 2700 ncRNA genes are present in the C. elegans genome. "One particularly intriguing aspect of the non-coding transcriptome is its potential to fill the regulatory gap created by the surprisingly low number of protein-coding genes in higher organisms," says Chen. "Between one-celled yeast, thousand-celled nematodes, and trillion-celled mammals, there is a difference of a mere 6,000 to 19,000 to 25,000 in protein-coding gene numbers. We think that regulation by non-coding RNA accounts for this discrepancy and helps to explain the additional biological complexity

of higher organisms."

Page 24: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

2 、 Mapping the C. elegans noncoding transcriptome with a whole genome tiling microarray

Page 25: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

Tiling

Page 26: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

Structure of eukaryotic mRNA

Cap 5’-UTR Coding region 3’-UTR Poly-A

Initiation (AUG)Termination (AUG, UGA, UAA)

Page 27: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

RNA was extracted from a mixed stage population of wild type C. elegans strain N2

Three kinds of samples :PA: PolyA tailed RNA

NPA: Non-polyA tailed RNA

SNPA: small Non-polyA tailed RNA ( RNA<500nt )

Page 28: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

Build Transfrag

Page 29: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

Find TUFs (Transfrag of Unknown Function)

Page 30: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

Transfrag distribution in the three different samples. “Other annotated” mainly includes tandem repeats and pseudogenes

Page 31: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

Detection rates of annotated ncRNAs in the SNPA sample

Page 32: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

The NPA sample produced 97,548 transfrags which could potentially all represent non-coding transcripts. Nearly 24% are non-annotated intergenic TUFs. The RT-PCR analysis confirmed 89% (25/29) of randomly sampled intronic and intergenic TUFs, effectively excluding the possibility that the majority of the NPA TUFs are a result of microarray non-specific hybridization. TUFs in the NPA sample are also fairly well conserved, with 54% showing at least some level of conservation (weak WABA (Kent and Zahler, 2000)) in C. briggsae. NPA TUFs are generally short (mean 88 nt, median 75 nt), however, of these only 557 overlapped with the SNPA TUFs. A possible explanation is that short NPA TUFs in close proximity may repres

ent longer transcripts.

Page 33: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

Chromosomal distribution of small ncRNAs (tRNAs excluded) and SNPA TUFs

Page 34: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

The novel upstream motif 4 (UM4) compared to UM3 and UM1. All threemotifs share the submotif TGTCNG (green rectangles), but at different relative positions.

Page 35: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

What is mRNA-like ncRNA?

transcribed by RNA polymerase II PolyA tail They are often spliced They have none or very short orf

Bioinformatics Research Group, Institute of Computing Technology, CAS.

Page 36: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

September 2003 saw the birth of the ENCODE project — The Encyclopedia of DNA elements — the goal of which was to identify and document the functional elements within the genome using high-throughput methods. Thirty-five groups took part in the project, bringing expertise that ranged from genome annotation, to RNA-expression analysis, to comparative genomics. Their analysis of 1% of the human genome (distributed among 44 genomic regions) resulted in more than 200 experimental and computational data sets. Some of the most striking results concern transcription and its regulation.

ENCODE pilot project

[1]. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature, 2007, 447: 799.[2]. The ENCODE Project Consortium. Science, 2004, 306: 636.[3]. 注: ENCODE 详细情况,请登陆 http://www.genome.gov/ENCODE

Page 37: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

We learn that as much as 93% of the interrogated region can be transcribed, indicating that transcription is not confined to what we (for now) identify as genes. Many transcripts are non-coding, whereas others seem to form fusion transcripts between ORFs that had previously been annotated as distinct.

A gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products.

Page 38: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

2 、预测了人类三号染色体上的非编码基因(have predicted noncoding genes in chromosome 3 of human genome)

人类基因组完成图的绘制是基因组研究的重要步骤,现国际上正一个基因组、一个基因组的进行。现在 6, 7, 13, 14, 19, 20, 21 22, 和 Y 共九条染色体的完成图的绘制工作已结束,并都发表了 Nature 文章。我们参加了由美国贝勒医学院牵头的人三号染色体完成图的工作,具体负责 NcRNA 基因标注。为此我们建立了一套识别NcRNA 的软件包。文章已发表在 Nature 440 1194-1198 2006.

Page 39: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books
Page 40: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

人 3 号染色体上发现的非编码基因RAN classes Total Number Methods Number

H.F. 72 snRNA 83

RFAM 80

H.F. 35 Y RNA 46

RFAM 45

H.F. 6

RFAM 4 SnoRNA (C/D box) 21 snoScan 17

H.F. 12

Fisher 8 SnoRNA (HA/CA box) 22 C.M 8

H.F. 10 tRNA 13

tRNA –Scan 9

H.F. 1 SRP RNA 17

SRP RNA Scan 16

H.F. 1 miRNA 3

RFAM 3

H.F.-FANTOM 1

H.F.-FLJ/H-InV 452 mRNA–like ncRNA 481 Unigene Filter 28

H.F. 3 rRNA 10

RFAM 7

telomerase RNA 1 H.F. 3

7SK RNA 3 H.F. 3

snmRNA 2 H.F. 2

scaRNA 1 H.F. 1

Total 713 872 (Redundant)

Page 41: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

3 、构建了 NcRNA 数据库(have built the noncoding RNA database—NONCODE)

收集了在各种杂志上发表的、网站上公布的所有被实验证实的 NcRNA 基因,发展了相应的软件及检索工具,建成了 NcRNA 数据库。相关论文已送 Nucleic Acids Research 。韩国已要求成为我们的镜象。上网仅两个多月点击我们数据库的目前已超过 12 万次(平均每天约 2000 次)来自约 60 , 000 个不同的 IP 地址。

论文已发表在 2005年第一期 Nucleic Acids Nucleic Acids Research Research 上。上。

Page 42: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

ABSTRACTNONCODEis an integrated knowledge database dedicated to non-coding RNAs (ncRNAs), that is to say, RNAs that function without being translated into proteins. All ncRNAs in NONCODE were filtered automatically from literature and GenBank, and were later manually curated. The distinctive features of NONCODE are as follows: (i) the ncRNAs in NONCODE include almost all the types of ncRNAs, except transfer RNAs and ribosomal RNAs. (ii) All ncRNA sequences and their related information (e.g. function, cellular role, cellular location, chromosomal information, etc.) in NONCODE have been confirmed manually by consulting relevant literature: more than 80% of the entries are based on experimental data. (iii) Based on the cellular process and function, which a ivenncRNAis involved in,weintroduced a novel classification system, labeled process function class, to integrate existing classification systems. (iv) In addition, some 1100 ncRNAs have been grouped into nine other classes according to whether theyare specific to gender or tissue or associated with tumors and diseases, etc. (v) NONCODE provides a user-friendly interface, a visualization platform and a convenient search option, allowing efficient recovery of sequence, regulatory elements in the flanking sequences, secondary structure, related publications and other information. The first release of NONCODE (v1.0) contains 5339 non-redundant sequences from 861 organisms, including eukaryotes, eubacteria, archaebacteria, virus and viroids. Access is free for all users through a web interface at http://noncode.bioinfo.org.cn.

Page 43: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

ncRNA 数据库

Page 44: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books
Page 45: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books

谢谢大家!

Page 46: How many characters are in the  “ Heaven Book ” ?         3*10 9          10,000 books