String Algorithms Computational Biology Structural …tabio152/wiki.files/NCRNA_bio...String...

Preview:

Citation preview

יוקלסון-מיכל זיו. פרופ

בג"א, המחלקה למדעי המחשב

michaluz@cs.bgu.ac.il

http://www.cs.bgu.ac.il/~negevcb/index.php

String Algorithms

Computational Biology

Structural RNAomics

Outline

• RNA and its structure.

• The “RNA Revolution” and principles of regulation by non-coding RNAs

• RNA structure implies function: deciphering the evidence

Outline

• RNA and its structure.

• The “RNA Revolution” and principles of regulation by non-coding RNAs

• RNA structure implies function: deciphering the evidence

What is RNA?

• A biological molecule, composed as a

sequence over 4 types of building blocks

called bases or nucleotides.

• The different base types are denoted by

the letters A, G, C, and U.

•RNA bases A,C,G,U

•Canonical Base Pairs

–A-U

–G-C

–G-U

“wobble” pairing

–Bases can only pair with

one other base.

/

2 Hydrogen Bonds3 Hydrogen Bonds – more stable

What is RNA?

RNA Structure, Dimensions 1- 3 : Folding

RNA Quaternary structure:

microRNA:mRNA

Wang, et al.,

(PlosCB 2010)1

Bicoid mRNA

Dimerization.

Ferrandon et al.,

(EMBO 1997)

RNA Structure, Dimension 4 : Self -Dimerization and RNA-RNA

interactions

Zhang W , Chen S PNAS 2002;99:1931-1936

©2002 by National Academy of Sciences

RNA Structure, Dimensions 5: Folding dymamics

Outline

• RNA and its structure.

• The “RNA Revolution” and principles of regulation by non-coding RNAs

• RNA Structure prediction from sequences

> DNA sequence

AATTCATGAAAATCGTATACTGGTCTGGTACCGG

CAACACTGAGAAAATGGCAGAGCTCATCGCTAAA

GGTATCATCGAATCTGGTAAAGACGTCAACACCA

TCAACGTGTCTGACGTTAACATCGATGAACTGCT

GAACGAAGATATCCTGATCCTGGGTTGCTCTGCC

ATGGGCGATGAAGTTCTCGAGGAAAGCGAATTTG

Gene Function

> Protein sequence

MKIVYWSGTGNTEKMAELIAKGIIES

GKDVNTINVSDVNIDELLNEDILILGC

SAMGDEVLEESEFEPFIEEISTKISG

KKVALFGSYGWGDGKWMRDFEER

MNGYGCVVVETPLIVQNEPDEAEQD

CIEFGKKIANI

The Central Dogma of Molecular Biology

RNADNA PROTEIN

Genome: The digital backbone

of molecular biology

Transcripts: Perform functions

encoded in the genome

The Central Dogma of Molecular Biology

What are RNA and mRNA?

• Traditional role as messenger molecule (mRNA)

• RNA: a polymer of nucleotides A,U,C,G.

CS374 Stanford

RNA

AUUGCCGAUGACGGCAGUGAUGUAGUA

• Traditional role as messenger molecule (mRNA)

• RNA: a polymer of nucleotides A,U,C,G.

CS374 Stanford

RNA

AUUGCCGAUGACGGCAGUGAUGUAGUA

Down Regulation by mRNA Silencing

Up Regulation by mRNA stabilization

CS374 Stanford

RNA

AUUGCCGAUGACGGCAGUGAUGUAGU

binding site

The Central Dogma of

Molecular Biology

Protein

RNA

DNA

transcription

translation

CCTGAGCCAACTATTGATGAA

PEPTIDE

CCUGAGCCAACUAUUGAUGAA

שעתוק

תרגום

18

The Central Dogma of

Molecular Biology

Protein

RNA

DNA

transcription

translation

Non Coding RNA

- RNA molecule that is not translated into a protein

- Have been found to have roles in a great variety of processes

הדוגמה המרכזית של הביולוגיה

DNA

RNA

PROTEINS

DNA

RNA

PROTEINS

Ron Unger – Bar-Ilan University 2009

RNA: the molecule of the year 2002.Couzin J. (2002). Breakthrough of the Year:

Small RNAs Make Big Splash. Science 298, 2296.

20

Andrew Z. Fire Craig C. Mello

The Nobel Prize in Physiology or Medicine 2006

"for their discovery of RNA interference - gene

silencing by double-stranded RNA"

A Recent Example

DNA RNA Protein

21

AUUGCCGAUGACGGCAGUGAUGUAGUA

CCGUCAC

Shutting down a gene by via a hybridization between an mRNA

and a complementary small RNA that prevents it from being

translated into a protein.

Anti-Sense:RNA complementarity yields gene silencing

Interpretation: the light-blue rectangle symbolizes the Ribosome,

the gray cloverleaf represents the tRNA

and the green circle and amino acid

22

AUUGCCGAUGACGGCAGUGAUGUAGUA

CCGUCAC

Shutting down a gene by via a hybridization between an mRNA

and a complementary small RNA that prevents it from being

translated into a protein.

Anti-Sense:RNA complementarity yields gene silencing

23

AUUGCCGAUGACGGCAGUGAUGUAGUA

CCGUCAC

However, the single stranded anti-sense fragments

are unstable, digested by proteins, and seemingly not

very useful as a therapy.

Anti-Sense:RNA complementarity yields gene silencing

RISC

AUUGCCGAUGACGGCAGUGAUGUAGUA

Down Regulation by mRNA degradation

GCUACUG

RISC

AUUGCCGAUGACGGCAGUGAUGUAGUA

binding site

Down Regulation by mRNA degradation

AUUGCCGAUGACGGCAGUGAUGUAGUAGCUACUG

RISC

binding site

Down Regulation by mRNA degradation

AUUGCCGAUGACGCUACUG

RISC

binding site

Down Regulation by mRNA degradation

GCUACUG

RISC

AUUGCCGAUGAC

binding site

Down Regulation by mRNA degradation

GCUACUG

RISC

AUUGCCGAUGAC

binding site

Down Regulation by mRNA degradation

GCUACUG

RISC

AUUGCCGAUGAC

binding site

Down Regulation by mRNA degradation

GCUACUG

RISC

AUUGCCGAUGAC

binding site

Down Regulation by mRNA degradation

RNAסוגי מולקולות ....רשימה חלקית

Ron Unger – Bar-Ilan University 2009

RNAסוגי מולקולות ....רשימה חלקית

Ron Unger – Bar-Ilan University 2009

Long non-

coding RNAs

http://rfam.sanger.ac.uk/

Ron Unger – Bar-Ilan University 2009

?ncRNAבאילו תהליכים משתתפות מולקולות

• Translation (tRNA and rRNA)

• Ribosome maturation and RNA processing (snRNA and snoRNA)

• Splicing (U1, U2, U4, U5)

• Replication (telomerase RNA)

• Gene regulation (miRNA, siRNA)

• Editing (rna editing, e.g. serotenin receptor)

• Protein translocation (SRP RNA)

• Fighting pathogens (vRNA, CRISPR)

• Translation quality control (tmRNA).

Ron Unger – Bar-Ilan University 2009

noncoding RNAבקרה על ידי

הסבר אפשרי לחידת המורכבות שכן מבחינת עולם החלבונים אין הבדל גדול בין תולעים

.ובני אדם

Introns, Intergenic Regions, Repetitive sequences: מהרצף שמבוטא אינו מתורגם לחלבונים95%

:באופן כללי אורך וכמות האינטרונים קשורה במורכבות האורגניזם

100bpמהטרנסקריפטים יש אינטרונים באורך של כ 10-20%באוקריוטים פשוטים ל

500bpמהטרנסקריפטים יש אינטרונים באורך ממוצע של 50%בצמחים ל

500bpגם בנמטודות וזבובים האורך הממוצע הוא

3400bpבבני אדם האורך הממוצע של אינטרונים הוא

ncRNA רבים שוכנים בתוך אינטרונים

RNAהתא מקדיש מנגנונים רבים לטיפול ב

יותר מחצי מהם 456bpקטעי רצף שמורים באורך ממוצע של 400,000בין אדם לכלב ישנם כ

.אינם קשורים לגנים המקודדים לחלבונים

קשורים למנגנוני בקרה שתורמים למורכבות של יצורים ncRNAיתכן ש

Ron Unger – Bar-Ilan University 2009

ncRNAיתרונות הבקרה על ידי

מאפשרים בקרה גמישה ויעילה כמו שבמערכות תקשורת מודרניות ncRNAבקרה על ידי

.קווי הבקרה נפרדים מקווי הנתונים

בקרה כזאת מאפשרת למשל לבצע עידכונים

בזמן אמת ללא צורך ליצר מולקולות חדשות

,בקרה כזו מאפשרת למשל סינכרון של פעילות

למשל

ncRNAכאשר גן חלבוני משועתק מולקולות

הנמצאות באינטרונים שלו יכולות לדווח על יצור

החלבון

Mattick J. Non-coing RNAs: the architects of eukaryotic complexity.

Embo Reports, 21:986-991, 2001.

Ron Unger – Bar-Ilan University 2009

?בגנום ncRNAמדוע קשה לאתר מולקולות

אין לנו כלים נסיוניים טובים לאתר מולקולות כאלו במיוחד כאלה שהן קצרות חיים ונמצאות

.בכמויות קטנות

:מבחינה ביואינפרמטית הבעיה היא שבניגוד לחלבונים בהם ניתן להעזר בסיגנלים כמו

Start and Stop codons ,אין אנו מכירים סיגנלים ', וכו, פרומוטורים, מחזוריות הקודונים

:ncRNAכלליים כאלו לגבי

ניבוי מבנה שניוני: אז מה יש לנו

בין אורגניזמים(של רצף ומבנה)שימור

.סיגנלים ספציפיים למשפחות מסוימת

Ron Unger – Bar-Ilan University 2009

Outline

• RNA and its structure.

• The “RNA Revolution” and principles of regulation by non-coding RNAs

• RNA structure implies function: deciphering the evidence

Why is RNA Structure Interesting?

Accessible RNA binding sites

Binding site accessibility

A motif has to be accessible to binding

Binding site accessibility

A motif has to be accessible to binding

Binding site accessibility

A target-site motif accessible to binding

GCUACUG

RISC

AUUGCCGAUGAC

binding site

Down Regulation by mRNA degradation

RNA Structure Prediction

Challenges ?

Ron Unger – Bar-Ilan University 2009

51A target Gene

Conserved sequence in stem of “hairpin” structure

52A target Gene

Conserved sequence in stem of “hairpin” structure

54

Dicer

Gene

Conserved sequence in stem of “hairpin” structure

A target GeneAn inverted repeat

55

Gene

Conserved sequence in stem of “hairpin” structure

A target GeneAn inverted repeat

56

Gene

Conserved sequence in stem of “hairpin” structure

A target GeneAn inverted repeat

57

Gene

Conserved sequence in stem of “hairpin” structure

A target GeneAn inverted repeat

58

Gene

Conserved sequence in stem of “hairpin” structure

A target GeneAn inverted repeat

59

Gene

Conserved sequence in stem of “hairpin” structure

A target GeneAn inverted repeat

60

Conserved sequence in stem of “hairpin” structure

A target GeneAn inverted repeat

Witness 3: Structural (Co-evolutionary) Conservation –

Hairpin structure adapted for recognition by Drosha/Dicer

Within the hairpin structure, conserved sequence preserves

Complementarity with target site.

Problem 1a

Find these

Problem 1b

How do these fold?

Problem 2a

How to predict these?

Problem 2b

What are the targets

these bind to?

How to Solve Problem 1a

Find these

“GGUAU” “CCGUA”

GGUAU

CCGUA

[Mandal et al., 2003] predicted a potential pseudoknot between the two arms of the purine riboswitch aptamer.

Structural Cis-Elements: Purine Riboswitch

“GGUAU” “CCGUA”

GGUAU

CCGUA

[Mandal et al., 2003] predicted a potential pseudoknot between the two arms of the purine riboswitch aptamer.

Structural Cis-Elements: Purine Riboswitch

Three Structural Witnesses for RNA functionality

Witness 1: Structure Stability.

Witness 2: Conserved Structure.

Witness 3 : Conserved Sequence (within its structural context).

AUCCCCGUAUCGAUC

AAAAUCCAUGGGUAC

CCUAGUGAAAGUGUA

UAUACGUGCUCUGAU

UCUUUACUGAGGAGU

CAGUGAACGAACUGA

Witness 1: Stability of Structure

Measurements of Stability:

Max Base Pairs

Minimum Free Energy

Partition Function

Statistical Scores based on SCFGs/Machine Learning

Lactobacillus acidophilus Lactobacillus delbrueckii

G-U U-A

Witness 2: Conserved Structure

Lactobacillus acidophilus Lactobacillus delbrueckii

G-C C-G

Witness 2: Conserved Structure

Lactobacillus acidophilus Lactobacillus delbrueckii

U-A C-G

Witness 2: Conserved Structure

Lactobacillus acidophilus Lactobacillus delbrueckii

CAUCUUUGA CAUCUCUGA

Witness 3: Conserved Sequence(within its structural context)

Sequence and Structure Conservation in the context of

imprinted structural conservation

Lactobacillus acidophilus Lactobacillus delbrueckii

GGUAU

CCGUA

GGUAU

CCGUA

Three Structural Witnesses for RNA functionality

Witness 1: Structure Stability.

Witness 2: Conserved Structure.

Witness 3 : Conserved Sequence (within its structural context).

Three Structural Witnesses for RNA functionality

Witness 1: Structure Stability.

Witness 2: Conserved Structure.

Witness 3 : Conserved Sequence (within its structural context).

AUCCCCGUAUCGAUC

AAAAUCCAUGGGUAC

CCUAGUGAAAGUGUA

UAUACGUGCUCUGAU

UCUUUACUGAGGAGU

CAGUGAACGAACUGA

Witness 1: Stablity of Structure (2D, predicted)

RNA Secondary Structure Prediction: O(N3):

[Nusssinov-Jacobson 1980, Zuker-Stiegler-1981]

MFOLD: http://www.rpi.edu/~zukerm

Vienna RNA Package: http://www.tbi.univie.ac.at/~ivo/RNA

Nussinov Algorithm

Recommended