56
Comparative genomics of RNA regulatory elements Mikhail Gelfand Research and Training Center “Bioinformatics” Institute for Information Transmission Problems Moscow, Russia September 2006

Comparative genomics of RNA regulatory elements

  • Upload
    bess

  • View
    26

  • Download
    2

Embed Size (px)

DESCRIPTION

Comparative genomics of RNA regulatory elements. Mikhail Gelfand Research and Training Center “Bioinformatics” Institute for Information Transmission Problems Moscow, Russia. September 2006. Riboflavin biosynthesis pathway. 5 ’ UTR regions of riboflavin genes from various bacteria. - PowerPoint PPT Presentation

Citation preview

Page 1: Comparative genomics of RNA regulatory elements

Comparative genomics of RNA regulatory elements

Mikhail Gelfand

Research and Training Center “Bioinformatics”

Institute for Information Transmission Problems Moscow, Russia

September 2006

Page 2: Comparative genomics of RNA regulatory elements

Riboflavin biosynthesis pathway

ribAribA

ribA ribB

G TP cyclohydrolase II

ribD

ribD

ribG

ribG

P yrim id ine deam inase

3,4-D HB P synthase P yrim id ine reductase

ribHribH R ibo flavin synthase, -cha in

ribEribB

ypaA

R ibo flavin synthase, -chain

GTP

2,5-diam ino-6-hydroxy-4-(5`-phosphoribosylamino)pyrim idine

ribulose-5-phosphate

PENTOSE-PHOSPHATE PATHWAY

PU RINE BIO SYNTHESIS PATHWAY

3,4-dihydroxy-2-butanone-4-phosphate 5-am ino-6-(5`-phosphoribitylam ino)uracil

5-am ino-6-(5`-phosphoribosylamino)uracil

6,7-dimethyl-8-ribityllumazine

Riboflavin

Page 3: Comparative genomics of RNA regulatory elements

5’ UTR regions of riboflavin genes from various bacteria 1 2 2’ 3 Add. 3’ Variable 4 4’ 5 5’ 1’ =========> ==> <== ===> -><- <=== -> <- ====> <==== ==> <== <========= BS TTGTATCTTCGGGG-CAGGGTGGAAATCCCGACCGGCGGT 21 AGCCCGTGAC-- 8 4 8 -----TGGATTCAGTTTAA-GCTGAAGCCGACAGTGAA-AGTCTGGAT-GGGAGAAGGATGAT BQ AGCATCCTTCGGGG-TCGGGTGAAATTCCCAACCGGCGGT 19 AGTCCGTGAC-- 8 5 8 -----TGGATCTAGTGAAACTCTAGGGCCGACAGT-AT-AGTCTGGAT-GGGAGAAGGATATG BE TGCATCCTTCGGGG-CAGGGTGAAATTCCCGACCGGCGGT 20 AGCCCGCGA--- 3 4 3 -----AGGATCCGGTGCGATTCCGGAGCCGACAGT-AT-AGTCTGGAT-GGGAGAAGGATGCC HD TTTATCCTTCGGGG-CTGGGTGGAAATCCCGACCGGCGGT 19 AGTCCGTGAC-- 10 4 10 ----–TGGACCTGGTGAAAATCCGGGACCGACAGTGAA-AGTCTGGAT-GGGAGAAGGAAACG Bam TGTATCCTTCGGGG-CTGGGTGAAAATCCCGACCGGCGGT 23 AGCCCGTGAC-- 8 4 8 ----–TGGATTCAGTGAAAAGCTGAAGCCGACAGTGAA-AGTCTGGAT-GGGAGAAGGATGAG CA GATGTTCTTCAGGG-ATGGGTGAAATTCCCAATCGGCGGT 2 AGCCCGCAA--- 3 4 3 ------AGATCCGGTTAAACTCCGGGGCCGACAGTTAA-AGTCTGGAT-GAAAGAAGAAATAG DF CTTAATCTTCGGGG-TAGGGTGAAATTCCCAATCGGCGGT 2 AGCCCGCG---- 7 6 7 --------ATTTGGTTAAATTCCAAAGCCGACAGT-AA-AGTCTGGAT-GGAAGAAGATATTT SA TAATTCTTTCGGGG-CAGGGTGAAATTCCCAACCGGCAGT 6 AGCCTGCGAC-- 11 3 11 ----–CTGATCTAGTGAGATTCTAGAGCCGACAGTTAA-AGTCTGGAT-GGGAGAAAGAATGT LLX ATAAATCTTCAGGG-CAGGGTGTAATTCCCTACCGGCGGT 2 AGCCCGCGA--- 4 4 4 -----ATGATTCGGTGAAACTCCGAGGCCGACAGT-AT-AGTCTGGAT-GAAAGAAGATAATA PN AACTATCTTCAGGG-CAGGGTGAAATTCCCTACCGGTGGT 2 AGCCCACGA--- 3 4 3 -----ATGATTTGGTGAAATTCCAAAGCCGACAGT-AT-AGTCTGGAT-GAAAGAAGATAAAA TM AAACGCTCTCGGGG-CAGGGTGGAATTCCCGACCGGCGGT 3 AGCCCGCGAG-- 5 4 5 ----–TTGACCCGGTGGAATTCCGGGGCCGACGGTGAA-AGTCCGGAT-GGGAGAGAGCGTGA DR GACCTCTTTCGGGG-CGGGGCGAAATTCCCCACCGGCGGT 15 AGCCCGCGAA-- 8 12 9 ----–CCGATGCCGCGCAACTCGGCAGCCGACGGTCAC-AGTCCGGAC-GAAAGAAGGAGGAG TQ CACCTCCTTCGGGG-CGGGGTGGAAGTCCCCACCGGCGGT 3 AGCCCGCGAA-- 5 4 5 -----CCGACCCGGTGGAATTCCGGGGCCGACGGTGAA-AGTCCGGAT-GGGAGAAGGAGGGC AO AATAATCTTCAGGG-CAGGGTGAAATTCCCGATCGGCGGT 2 AGTCCGCGA--- 7 7 7 -----AGGAACCGGTGAGATTCCGGTACCGACAGT-AT-AGTCTGGAT-GGAAGAAGATGAAA DU TTTAATCTTCAGGG-CAGGGTGAAATTCCCGATCGGTGGT 2 AGTCCGCGA--- 13 4 12 -----AGGAACTAGTGAAATTCTAGTACCGACAGT-AT-AGTCTGGAT-GGAAGAAGAGCAGA CAU GAAGACCTTCGGGG-CAAGGTGAAATTCCTGATCGGCGGT 20 AGCCCGCGA--- 3 4 3 -----AGGACCCGGTGTGATTCCGGGGCCGACGGT-AT-AGTCCGGAT-GGGAGAAGGTCGGC FN TAAAGTCTTCAGGG-CAGGGTGAAATTCCCGACCGGTGGT 2 AGTCCACG---- 5 4 5 -------GATTTGGTGAAATTCCAAAACCGACAGT-AG-AGTCTGGAT-GGGAGAAGAATTAG TFU ACGCGTGCTCCGGG-GTCGGTGAAAGTCCGAACCGGCGGT 3 AGTCCGCGAC-- 8 5 8 -----TGGAACCGGTGAAACTCCGGTACCGACGGTGAA-AGTCCGGAT-GGGAGGTAGTACGTG SX -AGCGCACTCCGGG-GTCGGTGAAAGTCCGAACCGGCGGT 3 AGTCCGCGAC-- 8 5 8 -----TTGACCAGGTGAAATTCCTGGACCGACGGTTAA-AGTCCGGAT-GGGAGGCAGTGCGCG BU GTGCGTCTTCAGGG-CGGGGTGAAATTCCCCACCGGCGGT 30 AGCCCGCGAGCG 137 GTCAGCAGATCTGGTGAGAAGCCAGAGCCGACGGTTAG-AGTCCGGAT-GGAAGAAGATGTGC BPS GTGCGTCTTCAGGG-CGGGGCGAAATTCCCCACCGGCGGT 21 AGCCCGCGAGCG 8 4 8 GTCAGCAGATCTGGTCCGATGCCAGAGCCGACGGTCAT-AGTCCGGAT-GAAAGAAGATGTGC REU TTACGTCTTCAGGG-CGGGGTGCAATTCCCCACCGGCGGT 31 AGCCCGCGAGCG 7 5 7 GTCAGCAGATCTGGTGAGAGGCCAGGGCCGACGGTTAA-AGTCCGGAT-GAAAGAAGATGGGC RSO GTACGTCTTCAGGG-CGGGGTGGAATTCCCCACCGGCGGT 21 AGCCCGCGAGCG 11 3 11 GTCAGCAGATCCGGTGAGATGCCGGGGCCGACGGTCAG-AGTCCGGAT-GGAAGAAGATGTGC EC GCTTATTCTCAGGG-CGGGGCGAAATTCCCCACCGGCGGT 17 AGCCCGCGAGCG 8 4 8 GACAGCAGATCCGGTGTAATTCCGGGGCCGACGGTTAG-AGTCCGGAT-GGGAGAGAGTAACG TY GCTTATTCTCAGGG-CGGGGCGAAATTCCCCACCGGCGGT 67 AGCCCGCGAGCG 8 3 8 GTCAGCAGATCCGGTGTAATTCCGGGGCCGACGGTTAA-AGTCCGGAT-GGGAGAGGGTAACG KP GCTTATTCTCAGGG-CGGGGCGAAATTCCCCACCGGCGGT 20 AGCCCGCGAGCG 8 4 8 GTCAGCAGATCCGGTGTAATTCCGGGGCCGACGGTTAA-AGTCCGGAT-GGGAGAGAGTAACG HI TCGCATTCTCAGGG-CAGGGTGAAATTCCCTACCGGTGGT 2 AGCCCACGAGCG 26 9 30 GTCAGCAGATTTGGTGAAATTCCAAAGCCGACAGT-AA-AGTCTGGAT-GAAAGAGAATAAAA VK GCGCATTCTCAGGG-CAGGGTGAAATTCCCTACCGGTGGT 14 AGCCCACGAGCG 11 9 11 GTCAGCAGATTTGGTGAGAATCCAAAGCCGACAGT-AT-AGTCTGGAT-GAAAGAGAATAAGC VC CAATATTCTCAGGG-CGGGGCGAAATTCCCCACCGGTGGT 13 AGCCCACGAGCG 5 4 5 GTCAGCAGATCTGGTGAGAAGCCAGGGCCGACGGTTAC-AGTCCGGAT-GAGAGAGAATGACA YP GCTTATTCTCAGGG-CGGGGTGAAAGTCCCCACCGGCGGT 40 AGCCCGCGAGCG 16 6 16 GTCAGCAGACCCGGTGTAATTCCGGGGCCGACGGTTAT-AGTCCGGAT-GGGAGAGAGTAACG AB GCGCATTCTCAGGG-CAGGGTGAAAGTCCCTACCGGTGGT 25 AGCCCACGAGCG 16 4 27 GTCAGCAGATTTGGTGCGAATCCAAAGCCGACAGTGAC-AGTCTGGAT-GAAAGAGAATAAAA BP GTACGTCTTCAGGG-CGGGGTGCAATTCCCCACCGGCGGT 18 AGCCCGCGAGCG 10 4 10 GTCAGCAGACCTGGTGAGATGCCAGGGCCGACGGTCAT-AGTCCGGAT-GAGAGAAGATGTGC AC ACATCGCTTCAGGG-CGGGGCGTAATTCCCCACCGGCGGT 16 AGCCCGCGAGCA 10 3 11 ---CGCAGATCTGGTGTAAATCCAGAGCCGACGGT-AT-AGTCCGGAT-GAAAGAAGACGACG Spu AACAATTCTCAGGG-CGGGGTGAAACTCCCCACCGGCGGT 34 AGCCCGCGAGCG 6 6 6 GTCAGCAGATCTGGTG 52 TCCAGAGCCGACGGT 31 AGTCCGGAT-GGAAGAGAATGTAA PP GTCGGTCTTCAGGG-CGGGGTGTAAGTCCCCACCGGCGGT 13 AGCCCGCGAGCG 7 3 7 GTCAGCAGATCTGGTGCAACTCCAGAGCCGACGGTCAT-AGTCCGGAT-GAAAGAAGGCGTCA AU GGTTGTTCTCAGGG-CGGGGTGCAATTCCCCACCGGCGGT 17 AGCCCGCGAGCG 7 9 7 GTCAGCAGATCCGGTGAGAGGCCGGAGCCGACGGT-AT-AGTCCGGAT-GGAAGAGGACAAGG PU AAACGTTCTCAGGG-CGGGGTGCAATTCCCCACCGGCGGT 19 AGCCCGCGAGCG 19 4 18 GTCAGCAGACCCGGTGTGATTCCGGGGCCGACGGTCAC-AGTCCGGATGAAGAGAGAACGGGA PY TAACGTTCTCAGGG-CGGGGTGCAACTCCCCACCGGCGGT 19 AGCCCGCGAGCG 15 4 16 GTCAGCAGACCCGGTGTGATTCCGGGGCCGACGGTCAT-AGTCCGGATGAAGAGAGAGCGGGA PA TAACGTTCTCAGGG-CGGGGTGAAAGTCCCCACCGGCGGT 19 AGCCCGCGAGCG 14 4 13 GTCAGCAGACCCGGTGCGATTCCGGGGCCGACGGTCAT-AGTCCGGATAAAGAGAGAACGGGA MLO TAAAGTTCTCAGGG-CGGGGTGAAAGTCCCCACCGGCGGT 16 AGCCCGCGAGCG 8 5 8 GTCAGCAGATCCGGTGTGATTCCGGAGCCGACGGTTAG-AGTCCGGAT-GAAAGAGGACGAAA SM AAGCGTTCTCAGGG-CGGGGTGAAATTCCCCACCGGCGGT 34 AGCCCGCGAGCG 8 3 8 GTCAGCAGATCCGGTCGAATTCCGGAGCCGACGGTTAT-AGTCCGGAT-GGAAGAGAGCAAGC BME GCTTGTTCTCGGGG-CGGGGTGAAACTCCCCACCGGCGGT 17 AGCCCGCGAGCG 10 15 10 GTCAGCAGATCCGGTGAGATGCCGGAGCCGACGGTTAA-AGTCCGGAT-GGAAGAGAGCGAAT BS ATCAATCTTCGGGG-CAGGGTGAAATTCCCTACCGGCGGT 18 AGCCCGCGA--- 5 4 5 -----AGGATTCGGTGAGATTCCGGAGCCGACAGT-AC-AGTCTGGAT-GGGAGAAGATGGAG BQ GTCTATCTTCGGGG-CAGGGTGAAAATCCCGACCGGCGGT 27 AGCCCGCGA—-- 3 5 3 -----AGGATTTGGTGTGATTCCAAAGCCGACAGT-AT-AGTCTGGAT-GGGAGAAGATGGAG BE ATTCATCTTCGGGG-CAGGGTGAAATTCCCGACCGGCGGT 20 AGCCCGCGA--- 3 4 3 -----AGGATCCGGTGCGAGTCCGGAGCCGACAGT-AT-AGTCTGGAT-GGGAGAAGATGAAG CA AATGATCTTCAGGG-CAGGGTGAAATTCCCTACCGGCGGT 2 AGCCCGCGAG-- 3 4 3 ----TATGATCCGGTTTGATTCCGGAGCCGACAGT-AA-AGTCTGGAT-GAAAGAAGATATAT DF GAAGATCTTCGGGG-CAGGGTGAAATTCCCTACCGGCGGT 2 AGCCCGCG---- 6 4 6 -------GATTTGGTGAGATTCCAAAGCCGACAGT-AA-AGTCTGGAT-GAGAGAAGATATTT EF GTTCGTCTTCAGGGGCAGGGTGTAATTCCCGACCGGTGGT 3 AGTCCACGAC-- 5 3 5 ----ATTGAATTGGTGTAATTCCAATACCGACAGT-AT-AGTCTGGAT—-AAAGAAGATAGGG LLX AAATATCTTCAGGG-CACCGTGTAATTCGGGACCGGCGGT 21 ACTCCGCGAT-- 4 4 4 ----–TTGAAGCAGTGAGAATCTGCTAGCGACAGT-AA-AGTCTGGAT-GGAAGAAGATGAAC LO GTTCATCTTCGGGG-CAGGGTGCAATTCCCGACCGGTGGT 3 AGTCCACGAT-- 3 10 3 ----TTGACTCTGGTGTAATTCCAGGACCGACAGT-AT-AGTCTGGAT-GGGAGAAGATGTTG PN AAGAGTCTTCAGGG-CAGGGTGAAATTCCCGACCGGCGGT 125 AGTCCGTG---- 3 4 3 -------GATGTGGTGAGATTCCACAACCGACAGT-AT-AGTCTGGAT-GGGAGAAGACGAAA ST AAGTGTCTTCAGGG-CAGGGTGTGATTCCCGACCGGCGGT 14 AGTCCGCG---- 3 4 3 -------GATGTGGTGTAACTCCACAACCGACAGT-AT-AGTCTGGAT-GAGAGAAGACCGGG MN AAGTGTCTTCAGGG-CAGGGTGAGATTCCCGACCGGCGGT 104 AGTCCGCG---- 3 4 3 -------GATGTGGTGAAATTCCACAACCGACAGT-AA-AGTCTGGAT-GGGAGAAGACTGAG SA ATTCATCTTCGGGG-TCGGGTGTAATTCCCAACCGGCAGT 6 AGCCTGCGAC-- 11 3 11 ----–CTGATCTAGTGAGATTCTAGAGCCGACAGT-AT-AGTCTGGAT-GGGAGAAGATGGAG AMI TCACAGTTTCAGGG-CGGGGTGCAATTCCCCACTGGCGGT 14 AGCCCGCGC--- 5 5 5 ------TGATCTGGTGCAAATCCAGAGCCAACGGT-AT-AGTCCGGAT-GGAAGAAACGGAGC DHA ACGAACCTTCGAGG-TAGGGTGAAATTCCCGACCGGCGGT 20 AGCCCGCAAC-- 11 4 11 --CGACTGACTTGGTGAGACTCCAAGGCCGACGGT-AT-AGTCCGGAT-GGGAGAAGGTACAA FN AATAATCTTCGGGG-CAGGGTGAAATTCCCGACCGGTGGT 2 AGTCCACG---- 4 6 4 -------GATTTGGTGAAATTCCAAAACCGACAGT-AG-AGTCTGGAT-GAGAGAAGAAAAGA GLU ---TGTTCTCAGGG-CGGGGCGAAATTCCCCACCGGCGGT 28 AGCCCGCGAGCG 10 4 10 GTCAGCAGATCCGGTTAAATTCCGGAGCCGACGGTCAT-AGTCCGGAT-GCAAGAGAACC---

Page 4: Comparative genomics of RNA regulatory elements

Conserved secondary structure of the RFN-element

NNNNyYYUC

NNNNrRRAG

NgGGNcCC

rgGGxc

ARRgxuAG

GRCCYG

AcCG

AGCCRGY

GG YRCC

GRYBy CYRVrG N

YGNaA N U U x N

Nx

AGU

UrN A g

Y

variab lestem -loop

additionalstem -loop

3 4

2

1

5

5 ’ 3 ’

u K NRA

xK

*

****

Capitals: invariant (absolutely conserved) positions.

Lower case letters: strongly conserved positions.

Dashes and stars: obligatory and facultative base pairs

Degenerate positions: R = A or G; Y = C or U; K = G or U; B= not A; V = not U. N: any nucleotide. X: any nucleotide or deletion

Page 5: Comparative genomics of RNA regulatory elements

Attenuation of transcription

TerminatorThe RFN element

Antiterminator

Antiterminator

Bam GACAAAAAAATATTGATTGTATCCTTCGGGGCTGGGTG --- TCTGGATGGGAGAAGGATGA 59 ----------GTAAAGCCCCGAATGTGTAA---ACATTCGGGGCTTTTTGACGCCAAAT BS GGACAAATGAATAAAGATTGTATCTTCGGGGCAGGGTG --- TCTGGATGGGAGAAGGATGA 59 ----------CTAAAGCCCCGAATTTTTTA--TAAATTCGGGGCTTTTTTGACGGTAAA BQ CTATAATTTGAGCAAACAGCATCCTTCGGGGTCGGGTG --- TCTGGATGGGAGAAGGATAT 250 -----------CCAAACCCCAAGGATATTAAA--ATCCTTGGGGTTTTTTGTTTTTTTT BE ACATAACGATATAGTGATGCATCCTTCGGGGCAGGGTG --- TCTGGATGGGAGAAGGATGC 155 ------------TGAGCCCCCGGGGACAT--------CCCGGGGGTTTCATTTTTATTG HD AAATTGAATAATTAATTTTTATCCTTCGGGGCTGGGTG --- TCTGGATGGGAGAAGGAAAC 148 -------------ATGCCCCGTGAGAACAAAA-----TCTCTGGGGCTTTTTTGCGCGC CA TAATGGTAATTTAATAGGATGTTCTTCAGGGATGGGTG --- TCTGGATGAAAGAAGAAATA 34 -------------AATCTCCGAAGGATTACC----TTTCTTTGGAGATTTTTTTATTTG DF TAAATATAAATTTAATACTTAATCTTCGGGGTAGGGTG --- TCTGGATGGAAGAAGATATT 63 ------------TAAACCCTGAGTTAATT--------CTCAGGGTTTTTTGTTTAAAAA LLX ACTTTAGCTACAATTGAATAAATCTTCAGGGCAGGGTG --- TCTGGATGAAAGAAGATAAT 127 ----------AAAAGACCCTGAAATTTT------ATTTTAGGGTCTTATTTTTTATTAG PN* ATCATCTGTAATTGAATAACTATCTTCAGGGCAGGGTG --- TCTGGATGAAAGAAGATAAA 81 ----------TGTATGCCTTGAGTAGTCCCC---TATTCAAGGTATATTTTTTTGGAGG PN* ATCATCTGTAATTGAATAACTATCTTCAGGGCAGGGTG --- TCTGGATGAAAGAAGATAAA 19 ------------CGTGCTCTGAAATGATTACTTGTCATTTCAGAGCATTTTTGTTAATC TM AAAACTGAATACAAAAGAAACGCTCTCGGGGCAGGGTG --- TCCGGATGGGAGAGAGCGTG 13 -----------ATGGGACCCGAGA----------------GGGTCCCTTTTCTTTTACA AO ATTTGCAACAATTTTTTAATAATCTTCAGGGCAGGGTG --- TCTGGATGGAAGAAGATGAA 33 --------TTTACAAGCCTTGAGATCGAAAG----ATTTCAAGGCTTTTTTCATCATTA DU AATTTTTTTAATACTATTTTAATCTTCAGGGCAGGGTG --- TCTGGATGGAAGAAGAAGAG 47 --------TGCATAAGCCTTGAGATCTTAG----GATTTCAAGGCTTTTTCATTAGTTA FN TAATCGAATATGTAAAATAAAGTCTTCAGGGCAGGGTG --- TCTGGATGGGAGAAGAATTA 18 ----------ATATTGCTCAGACTTT------------GTTTGAGCATTTTTTTATTAA SA TATAACAATTTCATATATAATTCTTTCGGGGCAGGGTG --- TCTGGATGGGAGAAAGAATG 74 ------TTTTCTCCTTGCATCTTAATT----------GATGTGAGGATTTTTGTTTATA DHA ACTCTTTTTAGATGAATACGAACCTTCGAGGTAGGGTG --- TCCGGATGGGAGAAGGTACA 43 -----------GTTTATGCCTCGAGGAACACCATTTCCTCGAGGCATTTTTGTTCTTTC FN GAAAAATAAATATTAAAAATAATCTTCGGGGCAGGGTG --- TCTGGATGAGAGAAGAAAAG 40 ------------CTTACCCGAATTCTAT------------AATTCGGTTTTTTTATTTT CA AATATAAAAAAATAAAGAATGATCTTCAGGGCAGGGTG --- TCTGGATGAAAGAAGATATA 19 ----------–-TATGCCCTGACGTTTTT---------CGTTGGGGCTTTTTTAATGCT DF AAAATTAAAAAATCAAAGAAGATCTTCGGGGCAGGGTG --- TCTGGATGAGAGAAGATATT 45 ----------ATAAAAACTCGAAGATAGGG----TCTTCGAGTTTTTTGTTTTTCCTAA BS TAATTAAATTTCATATGATCAATCTTCGGGGCAGGGTG --- TCTGGATGGGAGAAGATGGA 103 --AAAGAACCTTTCCGTTTTCGAGTAAGATGTGATCGAAAAGGAGAGAATGAAGTGAAA BQ GGGAAAATAGAATATCGGTCTATCTTCGGGGCAGGGTG --- TCTGGATGGGAGAAGATGGA 54 -------ATTCTCCCTTTGTGTAAA------------ACACAAAGGGTTTTTTCGTTCTATG BE ATAAAAATGTATAAGCGATTCATCTTCGGGGCAGGGTG --- TCTGGATGGGAGAAGATGAA 114 --------GGCAGCCTTCTTCTTGTGAGGATGAATCACGAGAAGGGGAGGAGAACAAGCATG PN GTTTTTTGTTATGATAAAAGAGTCTTCAGGGCAGGGTG --- TCTGGATGGGAGAAGACGAA 137 -–AACTTCTTCTGATTTTATAG------------AAAATTGGAGGAACCTGTTATGACA ST TAAATCTGCTATGCTAGAAGTGTCTTCAGGGCAGGGTG --- TCTGGATGAGAGAAGACCGG 130 ---GGAACTTCTTTCAATTTGAAA-----------AAATTGGAGGAATTTTTTAATGTC MN ATTTTTTGATATGCTATAAGTGTCTTCAGGGCAGGGTG --- TCTGGATGGGAGAAGACTGA 138 ---–GGCCTTCTTTCGATTTGTAA-----------AAATTGGAGGAATTTTTTTATGAA SA AAATTTAATAATGTAAAATTCATCTTCGGGGTCGGGTG --- TCTGGATGGGAGAAGATGGA 17 --------TCCTCCTATTCTTACG--------AGATGAATGGAAGGAGAAAATTGAATATG EF AAAAAATATAATACAAGGTTCGTCTTCAGGGGCAGGGT --- GTCTGGATAAAGAAGATAGG 33 ---CTACTCTATTTTTCCCTGCAGA------------AAAATAGGGTTTTTTTGTATGA LLX TTTTTGTGCTATAATAAAAATATCTTCAGGGCACCGTG --- TCTGGATGGAAGAAGATGAA 66 -–TCAACTTCCTCGAAATTTGAAGAAT-TATTTTCTCATATTTGGAGGTTTTTTTATGT LO ATTGTAAGAAAATATTCGTTCATCTTCGGGGCAGGGTG --- TCTGGATGGGAGAAGATGTTG 79 ---ATGCACAAACTCTCCCTCAACTTTTTTTA--------GTTGAGGTTTTTTATTTGC

Page 6: Comparative genomics of RNA regulatory elements

Attenuation of translation

EC AATCCGCTTATTCTCAGGGCGGGGCG --- TCCGGATGGGAGAGAGTAACG 59 ----------CTGCCCTGATTCTGGTAACCATAATTTTAGTGAGGTTTTT-------TACCATGAATCAGACGCTA TY AACCCGCTTATTCTCAGGGCGGGGCG --- TCCGGATGGGAGAGGGTAACG 61 ----------CTGCCCTGATTCTGGTAACCATAATGTTAATGAGGTTTTTT------TACCATGAATCAGACGCTA KP ATCTCGCTTATTCTCAGGGCGGGGCG --- TCCGGATGGGAGAGAGTAACG 61 ----------CTGCCCTGATTCTGGTAACCATAATTTTAATGAGGTTTTTT------TACCATGAATCAGACGCTC HI TTAGCTCGCATTCTCAGGGCAGGGTG --- TCTGGATGAAAGAGAATAAAA 41 ----------CAGCCCTGATTCTGGTATTTAATTGAAATCTCAAAT-TAGGAAAT--TACTATGAATCAGTCAATT VK TATTTGCGCATTCTCAGGGCAGGGTG --- TCTGGATGAAAGAGAATAAGC 76 ----------CAGCCCTGATTCTGGTATCTAAATATCTTTATATTTCAAGGAATT--TACTATGAATCAGTCTATT AB TAGGCGCGCATTCTCAGGGCAGGGTG --- TCTGGATGAAAGAGAATAAAA 54 ----------CCGCCCTGATTCTGGTATAAATTCATCTTATTAAA—AAGGCATT---TACTATGAATCAGTCATTA YP ATGGGGCTTATTCTCAGGGCGGGGTG --- TCCGGATGGGAGAGAGTAACG 194 ----------CCGCCCTGATTCTGGTAATCCATAATTTTTTAATGAGGTTTCT---TTACCATGAATCAGACGCTT VC CACAACAATATTCTCAGGGCGGGGCG --- TCCGGATGAGAGAGAATGACA 83 ----------AAGCCCTGATTCTGGTCATTTTTT--------------GGAGTATT--ACCATGAATCAGTCCTCA Spu CTATCAACAATTCTCAGGGCGGGGTG --- TCCGGATGGAAGAGAATGTAA 145 ----------ACGCCCTGATTCTGGATATTCCCATGTCGTATTTTTGAAGGATATTAA-CCATGAATCAGTCTTTA MLO GACGTTAAAGTTCTCAGGGCGGGGTG --- TCCGGATGAAAGAGGACGAAA 44 -------CGTGCGTCCTGATTCTGGTTCGAAACGGA--------------AGGATGGACCCATGAATCAGCATTCC AC AAGCGACATCGCTTCAGGGCGGGGCG --- TCCGGATGAAAGAAGACGACG 51 ----------CAGTCCTGAAATGTTTAACCGTAATT-------------------TACGAGAGCATTTCATATGTC BP AAGCAGTACGTCTTCAGGGCGGGGTG --- TCCGGATGAGAGAAGATGTGC 62 ----------TAGCCCTGAAACGTTTTTCGCCATTTCCTTTTTT------------GCGAGAGCGTTTCAATGTCC BPS AGTCAGTGCGTCTTCAGGGCGGGGCG --- TCCGGATGAAAGAAGATGTGC 86 ----------GAGCCCTGAAACGTTTTTCGCCCATTCATGTTTC-----------GCGAGGAGCGTTTCACATCATG BU AATCAGTGCGTCTTCAGGGCGGGGTG --- GCCGGATGGAAGAAGATGTGC 99 ----------ATGCCCTGAAACGTTTTTCGCCCAACTTTT--------------GCGATGAGCGTTTCAACTATGT REU CATCGTTACGTCTTCAGGGCGGGGTG --- TCCGGATGAAAGAAGATGGGC 77 ----------ATCCCCTGAAACGCCCATCCATGGAAATCCACGCAC-------------GGAGCGTTTCAATGCTG RSO GCTTGGTACGTCTTCAGGGCGGGGTG --- TCCGGATGGAAGAAGATGTGC 80 ---------CGTGCCCTGGAACGTCTTGTCGCCCATTTCA---------------GCGAGGAGCGTTTCCATGTTG PP GGTCGGTCGGTCTTCAGGGCGGGGTG --- TCCGGATGAAAGAAGGCGTCA 50 ----------TCGCCCCGAGACGTTCATCGATCATTCA------------------CGAGGAGCGTTTCATGTTCA PY GCCGGTAACGTTCTCAGGGCGGGGTG --- CCGGATGAAGAGAGAGCGGGA 91 ----------ATGCCCTGTTTTTTCATTAAATT---------------------AAACAGGAGTCAGAACACGTGC PU CGGCGAAACGTTCTCAGGGCGGGGTG --- CCGGATGAAGAGAGAACGGGA 68 ----------ACGCCCTGTTTTTCACAC--------------------------AAACAGGAGTCAGAACATGCAA PA GGCCGTAACGTTCTCAGGGCGGGGTG --- CCGGATAAAGAGAGAACGGG 53 ---------AAAGCCCTGTTTTTCAC---------------------------GAAACAGGAGTTCGTCATATG-- BME CGCGGGCTTGTTCTCGGGGCGGGGTG --- TCCGGATGGAAGAGAGCGAAT 54 ----------GCGCCCTGATTCTAGTTTCGTG--------------------------AGGAACCTATGAACCAAA CAU AATCCGAAGACCTTCGGGGCAAGGTG --- TCCGGATGGGAGAAGGTCGGC 116 ------CGCGATGCCCCGAAGGTGTG-----------------------------TTCAGGGGTGTCGCGATGAAC TFU GTACACACGCGTGCTCCGGGGTCGGT --- GGATGGGAGGTAGTACGTGGT 58 -------GCCTTACCCCGGAGCCTGACCT-------------------------GGCTAGGGGGAAGGCTTCTCGCATG GLU TGAGTTTTGTTCTCAGGGCGGGGCG --- TCCGGATGCAAGAGAACCG 32 ---------AAGGCCCCGAGGATTACATGCTTTTAAATCCTTTGAAAAGGGGACAAGATCATGAATCCTATAACCG DR GAACCGACCTCTTTCGGGGCGGGGCG --- TCCGGACGAAAGAAGGAGGAG 1 GACGCTCAGCTTGCCCCCCA------------------------------------GCAGGCGGCGTCCGCGTATG SM GTCGCAAGCGTTCTCAGGGCGGGGTG --- TCCGGATGGAAGAGAGCAAGC 45 ATCATTGGAAAAATGCCAACCCTGAAA-------------------GGCTTGAGACCATGACCATACTT TQ TTCGGCACCTCCTTCGGGGCGGGGTG --- TCCGGATGGGAGAAGGAGGGCCACTTGCGC AMI CTTACTCACAGTTTCAGGGCGGGGTG --- TCCGGATGGAAGAAACGGAGCGCCTTATGG

SD-sequestorThe RFN element

Antisequestor

Page 7: Comparative genomics of RNA regulatory elements

RFN: the mechanism of regulation

• Transcription attenuation

• Translation attenuation

Page 8: Comparative genomics of RNA regulatory elements

Distribution of RFN-elements

Genomes Number of analyzed genomes

Number of genomes with RFN

Number of the RFN elements

α-proteobacteria 8 4 4

β-proteobacteria 7 4 4

γ-proteobacteria 17 15 15

δ- and ε-proteobacteria 3 0 0

Bacillus/Clostridium 12 12 19

Actinomycetes 9 4 4

Cyanobacteria 5 0 0

Other eubacteria 7 5 6

Total 68 47 52

Page 9: Comparative genomics of RNA regulatory elements

YpaA: riboflavin transporter in Gram-positive bacteria

• 5 predicted transmembrane segments => a transporter

• Upstream RFN element (likely co-regulation with riboflavin genes) => transport of riboflaving or a precursor

• S. pyogenes, E. faecalis, Listeria sp.: ypaA, no riboflavin pathway => transport of riboflavin

Prediction: YpaA is riboflavin transporter (Gelfand et al., 1999)

Verification:• YpaA transports flavines

(riboflavin, FMN, FAD) (by genetic analysis: Kreneva et al., 2000; directly: Burgess et al., 2006)

• ypaA is regulated by riboflavin (by microarray expression analysis, Lee et al., 2001)

• … via attenuation of transcription (and to some extent inhibition of translaition) (Winkler et al., 2003)

Page 10: Comparative genomics of RNA regulatory elements

Phylogenetic tree of RFN-elements

Page 11: Comparative genomics of RNA regulatory elements

thi-box and regulation of thiamine metabolism genes by thiamine pyrophosphate (Miranda-Rios et al., 2001)

TTCGGGATCCGCGGAACCTGA-TCAGGCTAA-TACCTGCG-AAGGGAACAAGAGTTA THIC_EC TTCGGGATCCGTTGAACCTGA-TCAGGTTAA-TACCTGCG-AAGGGAACAAGAGAAG THIC_VC GCAGTGACCCGTTGAACCTGA-TCCAGTTCA-TACTGGCG-TAGGGACGGTGCAAGC THIC_MLO GCAGTGACCCGTTGAACCTGA-TCCAGTTCA-CACTGGCG-TAGGGACGGTGCAGAC THIC_SM AGAAATACCCTTTACACCCGA-TCGGGATAA-TACCTGCG-TGGGGAGTTTTCACGG THIC_NM TTCTTAACCCTTTGGACCTGA-TCTGGTTCG-TACCAGCG-TGGGGAAGTAGAGGAA thiC_BS CCGTCGACCGTACGAACCTGA--CCGGGTAA-TGCCGGCG-TAGGGAGTTGCAAATG THIC_MT GGATCGACCCTTTGAACCTGA-TCCGGGTAA-TGCCGGCG-GAGGGAAATTATGTCG THIT2_TVO TCCTCGACCCCAAGAACCTGA-TCCGGGTAA-TGCCGGCG-GAGGGATCGGGGAAGG thi1_TM

Notation: Red– Conserved nucleotides; Green– Purine or Pyrimidine conserved nucleotides; Blue– Non-conserved nucleotides

Page 12: Comparative genomics of RNA regulatory elements

Alignment of THI-elements 1 2 3 3' FACULTATIVE STEM-LOOP 2' 4 5 5' 4' 1' ----====>===> -=====> <===== ========> <======= <=== ===> =====> <===== <=== <====---- BACILLUS/CLOSTRIDIUM GROUP BS_THIC TAGTTACTGGGGGTGCCCGCT----------------TTCcgGGCTGAGAGAGAAGGCA-------------AGCTTCTTAACCCTTT---GGACCTGA-TCTGGTTCG-TACCAGCG-TGGGGA-AGTAGAGGA BS_TENA TAACCACTAGGGGTGTCCTTC----------------ATAAGGGCTGAGATAAAAGTGT-------------GACTTTTAGACCCTCA---TAACTTGA-ACAGGTTCA-GACCTGCG-TAGGGA-AGTGGAGCG BS_YLMB TTCATCCTAGGGGTGCTTTG-------------------CGAAGCTGAGAGAGACTT-----------------TGTCTCAACCCTTT---TGACCTGA-TCTGGATCA-TGCCAGCG-GAGGGA-AGCGGTGAA BS_YKOF AAAGCACTAGGGGTGCTGT--------------------TTTGGCTGAGATAAAGCGCGGAA-----GAAACGCGCTTTGATCCCTTA---TGACCCGA-TCTGGATAA-TACCAGCG-TGGGGA-AGTGCAGGT SA_TENA GAACTACTAGGGGAGCCTAAT----------------GATATGGCTGAGATGAATT-------------------GTTCAGACCCTTA---TGACCTGA-TTTGGTTAG-TACCAACG-TAGGAA-AGTAGTTAT SA_YKOE CACACACTAGGGGTGTTT----------------------TATACTGAGATGAGGCTT---------------GCCCTCAAACCCTTT---GAACCTGA-TCTAGCTTG-AACTAGCG-TAGGAA-AGTGTTACT LLX_YUAJ TTTGCACAATGGGTCTATTGACAAA---------ACTGTCAGTAGCGAGA----------------------------AATACCATC----TGACCTGA-TCTGGGTAA-TGCCAGCG-TAGGAA-TGTGTTAAG CA_THIS ATAGTTAACGGGGAGCCTGTA-----------------GACAGGCTGAGAGTGGAATG--------------TGATTCCAGACCCTCA---TAACCTGA-TTTGGATAA-TGCCAACG-TAGGGA-GTTAATGCA CA_YUAJ TATGTGCTAGGGGTGCCTT---------------------TAGGCTGAGAAACAGTTT--------------GTCACGTTAACCCTT-----AACCTGA-TCTGGATAA-TACCAGCG-TAGGGA-AGCAGTTTG ST_YUAJ TTTCACAAAGGAGTGCTT-----------------------TGGCTGAGATCGCAA------------------TTGCGAAATCCTGA---GGACCTGA-TCTTGTTAG-TACAAGCG-TAGGGA-TTGTGACCA DHA_THIC TAATCACTAGGGGGGCCGAATA---------------AGGTCGGCTGAGATAAAGGACCCA---------AGAATCCTTTGACCCTT-----AACCTGA-TCTGGGTAA-TGCCAGCG-TAGGGAAGGTGGATAA LMO_TENA GAAAAACTAGGGGGGCCGAT-------------------TCTGGCTGAGATAGGAAGGTAAT-----------GCTTTCTGACCCTTT---GAACCTGT-TT--GTTAG-TGCAAGCG-TAGGGA-AGTGAATGT LMO_YUAJ TTACCACAGGGGGGGCTTC---------------------TTAGCTGAGATTGAGTCCACGTGT-----TTTTGGATTCTGACCCTTT---GAACCTGT-TC--GTTAA-TACGAGCG-TAGGGA-TTGTGGCGA PROTEOBACTERIA EC_THIB GTTCTCAACGGGGTGCCACGCGT------------ACGCGTGCGCTGAGAAA---------------------------ATACCCGTCGA---ACCTGA-TCCGGATAA-CGCCGGCG-AAGGGATTTGAGGC EC_THIM AAACGACTCGGGGTGCCCTTCTGC-------------GTGAAGGCTGAGAAA----------------------------TACCCGTATC---ACCTGA-TCTGGATAA-TGCCAGCG-TAGGGA-AGTCACG EC_THIC TTTCTTGTCGGAGTGCCTTA-------------------ACTGGCTGAGACCGTTT------------------ATTCGGGATCCGCGGA---ACCTGA-TCAGGCTAA-TACCTGCG-AAGGGA-ACAAGAG VC_THIC CCACTTGTCGGAGTGCCAT---------------------TGGGCTGAGACCGTTT------------------ATTCGGGATCCGTTGA---ACCTGA-TCAGGTTAA-TACCTGCG-AAGGGA-ACAAGAG VC_THID CCTGTAGTCGGGGAGCCTGAGAG-- 66 5 71 -AATTAAAGGCTGAGATCGCGT-------------------AGCGAGACCCGTTGA---ACCTGA-TTCAGTTAG-GACTGACG-TAGGGA-ACTATCC VC_THIB CCCACTCACGGGGGGCCACCCATTCAT-------CCGAATGGCGCTGAGATCAAGCAC---------------TGCTTGGGACCCGCA 21 -ACCTGA-ACCAGATAA-TGCTGGCG-TAGGAATTGAGCTA XFA_THIC TTTGAAGCGGGGGTACCATAGCCA------------AGCTGCGGTTGAGAC----------------------------ACACCCTTCGA---ACCTGA-TCCGGTTTA-CACCGGCG-TAGGAAAGCTTCGT MLO_THIC CATTCACCAGGGGAGTCCCGG----------------CAAGGGGCTGAGATACTGCTGGCTTTC------GCGGCGCAGTGACCCGTTGA---ACCTGA-TCCAGTTCA-TACTGGCG-TAGGGACGGTGCAA MLO_THIB CGCTCTAACGGGGTGCCGGA------ 5 3 5 -----GACCGGCTGAGAGGCAGT------------------CTCGCCAACCCGCTGA---ACCTGA-TCCGGTTTG-TACCGGCG-GAGGGA-TTAGACG MLO_YK GCCCATCCACAGGGGTGCTCCGTAC-------------GGTCGGGGCTGAGACGGGGGCGG-----------CAAGCCCACAGACCCTAGA----AGCTGA-TCTGGGTAA-TACCAGCG-GAGCGA-GGCGGGCG NX_CITX CTCCTTGTCGGAGTGCCGCCGC---------------CGGGCGGCTGAGATTGCGA------------------AAGCAGAATCCGTAGA---ACCTGT--CGGGGTAA-TGCCTGCG-TAGGAA-ACAAACC NX_THIC ATTGAAACAGGGGTGCTGCCTGAT----------GTTTAGGCGGCTGAGAA----------------------------ATACCCTTTAC---ACCCGA-TCGGGATAA-TACCTGCG-TGGGGA-GTTTTCA ACTINOBACTERIAE MT_THIO CTGTAGACACGGGAGTCCCGGG--------------AGCGGGGTCTGAGAGTGGGCGCGCCT-------------GCCCTTACCGTCAC----ACCTGA-TCCGGATCA-TGCCGGCG-AAGGGAGGTCAAGGATG MT_THIC GTACCCACGCGGGAGCGCACGC--------------CGAGTGCGCTGAGAGGACGGCTCGGG------------GCCGTCGACCGTACGA---ACCTGA--CCGGGTAA-TGCCGGCG-TAGGGAGTTGCAAATG CGL_THIC CAGTCCCCACGGGCGCCCGA-----------------GCACGGGCTGAGATCGCGCTGATT---------GCTGCGCGAGCACCGTTTGA---ACCTG--TCCGGTTAG-CACCGGCG-AAGGAAGAGAGGAATGGTGCAATG CGL_THID ACTAGGCACGGGGTGCCAACCGGATGG---AAAAATTCCGGAGGCTGAGAAA---------------------------ACACCCGTTGA---ACCTGC-TCTAGCTCG-TACTAGCG-AAGGGATGGCCTTAACGTG CGL_THIE CTTACCCCACGGGTGCCCAAT---------------GCATTGGGCTGAGATTGCGCGCTGT---------TGCTGCGCGGGACCGTTCGA---ACCTG--TCTGGTTAA-CACCAGCG-AAGGAAGCGAGGATTGATTGTCCCGTG CGL_YKOE TCATAGACACGGGTGCTCGGTGA------------AAATCCGGGCTGAGATCTGGCA----------------TAGCCACGACCGTCGA----ACCTG-ATCCGGATAA-TGCCGGCG-ATAGGGAGGAAAAATATG CGL_OARX TAGTGACACGGGGTGCAAAAGCACTTT----AAAAAAGCTTTCGCTGAGATT---------------------------ACACCCGTCGA---ACCTG-ATCCAGTTAG-TACTGGCG-AAGGGACTGTCGCAT CYANOBACTERIA NPU_THIC TCCATGCTAGGGGTGCCTACAT---------------AACCAGGCTGAGATC---------------------------ACACCCTTAAC---ACCTGAGTCTGGGTAA-TACCAGCG-GAGGGAAGCTGTTTATTG CY_THIC CCATAGCTAGGGGTGTCTAGAA---------------AGCTAGGCTGAGAA----------------------------AAACCCTTAGA---ACCTGAGACTGGGTAA-TACCAGCG-GAGGGAAGCTCACCATTC AN_THIC TCCATGCTAGGGGTGCTTGCAC---------------TAACAGGCTGAGATT---------------------------ACACCCTTAAC---ACCTGAGACTGGGTAA-TACCAGCG-AAGGGAAGCTGTTTATTG THERMUS/DEINOCOCCUS, THERMOTOGALES, Fusobacterium, CFB group DR_THIB CGCGTCACCGGGGGTGCCCTGCTT------------CGGCAGCGGCTGAGAAC---------------------------ACACCCCAGGA---ACCTGA-ACCGGGTCA-TTCCGGCG-GAGGGAGTGTGATGC DR_THIC ATCGTCAACAGGGGTGCCTCCGCATA--------TGGGCCGGAGGCTGAGAGGGCAACT---------------CGGGCCTAACCCTATGA---ACCTGA-ACTGGTTAG-CACCAGCG-GAGGGA-GTGTGACG TQ_THIBGGCCGTCACCGGGGGTGCCCCA------------------AAAGGGCTGAGAGC---------------------------ATACCCTTGGA---ACCTGA-TCCGGGTCA-TGCCGGCG-TAGGGAAGGTGACGGCC TM_THI1 CCTTCCCCAGGGGGAGCTCCTAT---------------TCCGGGGCTGAGAGGAGGACGG-------------AAGTCCTCGACCCCAAGA---ACCTGA-TCCGGGTAA-TGCCGGCG-GAGGGATCGGGGAAGGA FN_THIC TATATGTACTGGGGAGCTT----------------------TGTGCTGAGATTAGAACCT------------TTTTTCTTAGACCCATAGT---ACCT-GA-TTTGGATAA-TGCCAACG-AAGGGA—GTACCA FN_THIX ACTAGTTACAAGGGAGTTAATA-----------------AATTGACTGAGAAAAGGATG--------------TGAGCCTTGACCTTTTG----ACCT-GA-TTTGGATAA-TGCCAACG-TAGGAA--GTAAA PG_THIS AGACCGCTACGGGGGTGCTTGCCG--- 4 3 4 -GATACGGCAGGCTGAGAT---------------------------AATACCCATAG---ACCT-GA-TCCGGATAA-TACCGGCG-GAGGGAT-GTAG PG_OMR ATTGGGAGAAGGGGTGCTTCCTGTA--- 3 7 3 --GTGGATGGCTGAGAAC---------------------------AAACCCTCATC---ACCT-GA-ACCGGATAA-TACCGGCG-TAGGAAA-CTCTC BX_THIS TAAAGACAAAGGGGTGCCACC------------------CGGTGGCTGAGATT---------------------------ATACCCTAAGA---ACCT-GA-TGCAGTTAG-TACTGCCG-AAGGGA—TTGTG ARCHAEA TAC_T1 GGTGTGGTGGGGGAGCTCCAT-----------------AAGGGGCTGAGAGGATCCGG---------------ATGGATCGATCCCTGGA---ACCTGA-TCCGGGTAA-TACCGGCG-GAGGGAAATTATG FAC_T1 AGTTATACCGGGGAGCTAA---------------------AATGCTGAGAGGATAA-------------------GGATCGACCCGTGCA---ACCTGA-TCCGGACAA-TACCGGCG-GAGGGAGATGGATA

Page 13: Comparative genomics of RNA regulatory elements

Conserved secondary structure of the THI-element

MG

GG K

CC

C A

G G A

A G

C C U

THI-elem ent

Thi-box

1

4

5

2

C Y G G

G R C C

N U NR

UR

NG

YY

UC

RR

NAG

AG

A

G

3

GA U

GC

N

facultative stem -loop

Capitals: strongly conserved positions. Dashes and points: obligatory and facultative base pairs

Degenerate positions: R = A or G; Y = C or U; K = G or U; M= A or C; N = any nucleotide

Page 14: Comparative genomics of RNA regulatory elements

THI: the mechanism of regulation

1 ,2

1 ,2

•Thermus/Deinococcus group,•CFB group•Proteobacteria,

• Translation attenuation

•Actinobacteria,•Cyanobacteria,•Archaea

•Bacillus/Clostridium group,•Thermotoga, •Fusobacterium,•Chloroflexus

• Transcription attenuation

Page 15: Comparative genomics of RNA regulatory elements

Distribution of THI-elements

Genomes Number of analyzed genomes

Number of genomes

with THI

Number of the THI elements

-proteobacteria 7 7 15

-proteobacteria 6 6 12

-proteobacteria 18 17 38

- and proteobacteria 3 1 1

The Bacillus/Clostridium group 18 18 51

Actinomycetes 9 9 25

Cyanobacteria 5 5 5

Other eubacteria 14 11 11

Archaea (Thermoplasma) 17 3 6

Total 97 77 164

Mandal et al., 2003: THI in 3’UTR (plants). THI in untranslated intron (fungi)

Page 16: Comparative genomics of RNA regulatory elements

Metabolic reconstruction of the thiamin biosynthesis

thiN =

(Gram-positive bacteria)

(Gram-negative bacteria)

Transport of HMPTransport of HET

Page 17: Comparative genomics of RNA regulatory elements

Metabolic reconstruction of the thiamin biosynthesis

thiN =

(Gram-positive bacteria)

(Gram-negative bacteria)

Transport of HMPTransport of HET

confirmed(Morett et al., 2003 )

Page 18: Comparative genomics of RNA regulatory elements

The PnuC family of transporters

RFN elements

THI elements

Page 19: Comparative genomics of RNA regulatory elements

B12-box and regulation of cobalamin metabolism genes by cobalamine (Nou & Kadner, 2000; Ravnum &

Andersson, 2001; Nahvi et al., 2002)

• Long mRNA leader is essential for the regulation of btuB by vitamin B12.

• Involvement of a highly conserved B12-box rAGYCMGgAgaCCkGCcd in the regulation of the cobalamin biosynthetic genes (E. coli, S. typhimurium)

• Post-transcriptional regulation: RBS-sequestering hairpin is essential for the regulation of the btuB and cbiA

• Ado-CBL is an effector molecule involved in the regulation of the cobalamin biosynthesis genes

Page 20: Comparative genomics of RNA regulatory elements

Conserved RNA secondary structure of the regulatory B12-element

A

A

A

AA

AA

CGd

a

aa

a

a

ktk

h

CC

c

C

C

GG

G

GGG

G

GT

M

Y

K

y

c

c G

g

g G

G

G YG

tg

g

g

gN

RN

N

NN

r

r

r

g

g C

c

c T

C

C G

CC

a

ta N

B 12 box

P 0

5' 3'

P 1

P 4 V S

B I IB I

P 5 P 6

P 2

N

A dd- I

F acultative stem- loop

A dd- I I

The group

Bacillus/Clostridium

Other taxonomic groups

-proteobacteria

base stem

CGh

G

d

yc c

C C

P 3

Page 21: Comparative genomics of RNA regulatory elements

A

A

A

AA

AA

CGd

a

aa

a

a

ktk

h

CC

c

C

C

GG

G

GGG

G

GT

M

Y

K

y

c

c G

g

g G

G

G YG

tg

g

g

gN

RN

N

NN

r

r

r

g

g C

c

c T

C

C G

CC

a

ta N

P 0

P 1

P 4 P 5 P 6

P 2

N

CGh

G

d

yc c

C C

P 3

B12-element

+Ado-CBL

Ado-CBL

pseudoknot

terminator

1 2 3

1 2

antiterminator

3

A

A

A

AA

AA

CGd

a

aa

a

a

ktk

h

CC

c

C

C

GG

G

GGG

G

GT

M

Y

K

y

c

c G

g

g G

G

G YG

tg

g

g

gN

RN

N

NN

r

r

r

g

g C

c

c T

C

C G

CC

a

ta N

P 0

P 1

P 4 P 5 P 6

P 2

N

CGh

G

d

yc c

C C

P 3

B12-element

+Ado-CBL

Ado-CBL

pseudoknot

RBS-sequestorhairpin

1 2

1 2

antisequestor

A. B.

The predicted mechanism of the B12-mediated regulation of cobalamin genes: formation of a pseudoknot

Page 22: Comparative genomics of RNA regulatory elements

B12-element regulates cobalamin biosynthetic genes and transporters, cobalt transporters and a number of other cobalamin-related genes.

Distribution of B12-elements in bacterial genomes

Page 23: Comparative genomics of RNA regulatory elements

Metabolic reconstruction of

cobalamin biosynthesis: new

enzymes and transporters

Cobalt ion transportcbiMNQO, hoxN, hupE, cbtAB, cbtC, cbtD, cbtE, cbtG, cnoABCD

Page 24: Comparative genomics of RNA regulatory elements

Metabolic reconstruction of

cobalamin biosynthesis: new

enzymes and transporters

Cobalt ion transportcbiMNQO, hoxN, hupE, cbtAB, cbtC, cbtD, cbtE, cbtG, cnoABCD

recently confirmed(Zayas et al., 2006)

confirmed(Woodson et al.,

2004)

Page 25: Comparative genomics of RNA regulatory elements

If a bacterial genome contains B12-dependent and B12-independent isoenzymes, the genes encoding the B12-

independent isoenzymes are regulated by B12-elements

Ribonucleotide reductasesRibonucleotide reductases

NrdJ NrdJ ((BB1212-dependent-dependent)

NrdAB/NrdDG NrdAB/NrdDG ((BB1212-independent-independent))

+ ––

–– +

+ +

Methionine synthaseMethionine synthase

MetH MetH ((BB1212-dependent-dependent))

MetEMetE((BB1212-independent-independent))

++ ––

–– ++

++ ++

B12B12 B12

Page 26: Comparative genomics of RNA regulatory elements

If a bacterial genome contains B12-dependent and B12-independent isoenzymes, the genes encoding the B12-

independent isoenzymes are regulated by B12-elements

nrdAB in Streptomyces coelicolor: experimental confirmation in (Borovok et al., 2005)

Ribonucleotide reductasesRibonucleotide reductases

NrdJ NrdJ ((BB1212-dependent-dependent)

NrdAB/NrdDG NrdAB/NrdDG ((BB1212-independent-independent))

+ ––

–– +

+ +

Methionine synthaseMethionine synthase

MetH MetH ((BB1212-dependent-dependent))

MetEMetE((BB1212-independent-independent))

++ ––

–– ++

++ ++

B12B12 B12

Page 27: Comparative genomics of RNA regulatory elements

LYS-element, a.k.a. L-box: lysine riboswitch

uaAG

u

CG

P 1

5' 3'base stem

R Yr y

Gy

y

r

aa

g

u g

a a a GG

r Cr G

y G Cyk

a G ug R

C a Yu

a

Gg N

a

aA

a N

acUGC

GA

G G gaR

ru

Yy

P 2

P 5P 6

P 7

P 3P 4

Page 28: Comparative genomics of RNA regulatory elements

Reconstruction of the lysine metabolism

-aspartyl-phosphate

aspartate semialdehyde

homoserine

dihydrodipicolinate

tetrahydrodipicolinate

N-acetyl-2-amino-6-ketopimelateN-succinyl-2-amino-6-ketopimelate

N-acetyl-L,L-diaminopimelateN-succinyl-L,L-diaminopimelate

L,L-diaminopimelate

meso -diaminopimelate

Lysine transport

L-aspartate

lysC,dapG,yclMlysC,thrA,metL

asd

hom

thrA,metL

dapA

dapB

dapDdapD

ykuR

dapC(argD)

ddh

patA

dapE

dapF, dal

lysA

predicted genes are boxed (pathway of acetylated intermediates in B. subtilis)

Page 29: Comparative genomics of RNA regulatory elements

Regulation of the lysine catabolism: the first example of an activating riboswitch

• LYS-elements upstream of the pspFkamADEatoDA operon in Thermoanaerobacter tengcongensis; kamADElysE operon in Fusobacterium nucleatum– lysine catabolism pathway– LYS element overlaps candidate terminator

=> acts as activator

• similar architecture of activating adenine riboswitch upstream of purine efflux pump ydhL (pbuE) in B. subtilis (Mandal and Breaker, 2004)

Page 30: Comparative genomics of RNA regulatory elements

S-box (SAM riboswitch)

g u y

c a r

NaAUGc

AP 1

5' 3'base stem

u R

CA

U

U

uGa

P 4

NaGA

g

c

GR

CA

aCcD H

Gg

UGCY

a

AA NuccN

r

N

N

G gy

C cr

P 2

G GG A

C C DC

rG

N y G A a

Ac

gg

P 3

P 5g

Grundy and Henkin, 1998

Page 31: Comparative genomics of RNA regulatory elements

Reconstruction of the methionine metabolism

Cystathionine

Homocysteinemethyl-THF

Sulfide

CH

methylene-THF

THF

3

O-acetylhomoserine

Homoserine

Aspartate semialdehyde

Methionine

S-ribosyl-hom ocysteine

(SRH)

S-adenosyl-hom ocysteine

(SAH)

S-adenosyl-methionine

(SAM)

Methylthioribose (MTR)MTA

Threonine

metI yrhB

metC yrhAmetF

yxjH*

metK

mtnKSUVW XYZ

hom

cysH-...metB

metH

metX

metEmtn

mtn

metY

predicted genes are boxed and marked by *(transport, salvage cycle)

Page 32: Comparative genomics of RNA regulatory elements

A new family of amino acid transporters

S-box (rectangle frame)MetJ (circle frame)LYS-element (circles)Tyr-T-box (rectangles)

BC1434

FN 062 4

269.47

SON-3

CJ

CPE

LysT

MetT

TyrT

MleN

DF

CTCCB

OB

SO N-2VC-2

NM B

SON-1

VC-1

BHHP

C

TTE-nhaC

AC0744

FN0978

BL1111

CTC 00901

OB2874OB1118

NMB05 36

FN0352BC4121

EF-nhaC 1

EF-nhaC 2

PPE

LP-nha2

LP-nha1 L

L

M

G A

ELB

BS-yheL

BS-m leN

FN0650

VC2037

BC1709

SA 2292HI1107

VV21061FN207 7

BH3946

BC0373

FN14 22

BB0638

BB0637

F N1420

CTC02529SO1087

VCA0193

BT1270

C

CB

T C02520

CPE2317

FN1414

SA2117

Archaea

clostrid ia

Pasteure llaceae

malate/lactate

Page 33: Comparative genomics of RNA regulatory elements

Repression of reverse pathway Met Cysin Clostridium acetobutylicum

in the presence of Cys and absence of Met

ubiG yrhA

antisense transcript

Cysteine

S-adenosylmethionine

yrhB

AA

Cys-T-box S-box

sense transcript

Page 34: Comparative genomics of RNA regulatory elements

Firmicutes

Other genomes with S-boxes: the Zoo• Petrotoga• actinobacteria

(Streptomyces, Thermobifida)• Chlorobium, Chloroflexus, Cytophaga• Fusobacterium• Deinococcus

Lactobacillales:Met-T-box (Met-tRNA-dependent attenuator)

Streptotoccales:MtaR (transcription factor);SAM-III riboswitch (metK)

(the Henkin group)

Bacillales:S-box

Clostridiales:S-box

Loss of S-boxes

E.coli:TFs

Xanthomonas:S-box

alphas:SAM-II

Geobacter:S-box

proteobacteria

Need more genomes

Page 35: Comparative genomics of RNA regulatory elements

Riboswitches in metagenomes

0

20

40

60

80

100

120

140

160

SargassoSea

Minnesotasoil

Whale fall

new functions:S-box: eukaryotic-type translation initiation factor eIF-2B (COG0182)B12-box: fatty-acid desaturase (COG1398)GCVT: malate synthase glcB, phosphoserine aminotransferase serC

Page 36: Comparative genomics of RNA regulatory elements

0

50

100

150

200

250

300

350

Whale fall Minnesotasoil

SargassoSea

G-box

YKOK

GLMS

LYS

YKKC

S-box

RFN

YYBP

B12

THI

GCVT

Riboswitch composition of metagenomes

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Whale fall Minnesotasoil

SargassoSea

G-box

YKOK

GLMS

LYS

YKKC

S-box

RFN

YYBP

B12

THI

GCVT

total per 100 000 contigs: 47 27 26

Page 37: Comparative genomics of RNA regulatory elements

Riboswitches in metagenomes by taxonomy

0

20

40

60

80

100

120

140

160G-box

YKOK

GLMS

LYS

YKKC

S-box

RFN

YYBP

B12

THI

GCVT

62

4430 26

19 15 11 8 3 total per 100 000 contigs

Page 38: Comparative genomics of RNA regulatory elements

Conserved structures of riboswitches (circled: X-ray)

NNNNyYYUC

NNNNrRRAG

NgGG

NcCC

Rg

GGxc G

Aux

gRRA

GRC

CYG

AcCG

AGCCRGYGG YRCC GRYBy CYRVr

G N

YGN

aA N U U x N

Nx

AGU

UrN

A gY

uK N

RA

xK

Var

Add

RFN-element

MG

GG

A

G G A

A G

C C U

THI-element

C Y G GN U N

RUR

UC

RR G

A

A

A

AA

AA

CGd

a

aa

a

a

ktk

h

CC

c

C

C

GG

G

GGG

G

GT

M

Y

K

y

c

c G

g

g G

G

G YG

tg

g

g

gN

RN

N

NN

r

r

r

g

g C

c

c T

C

C G

CC

a

ta N

B 12 box

P1

5' 3'

P2

P5 P6 P7

P3

N

base stem

CGh

G

d

yc c

C C

P4

g u y

c a r

NaAUGc

AP1

5' 3'

u R

CA

U

U

uGa

P4

NaGA

g

c

GR

CA

aCcD H

Gg

UGCY

a

AA NuccN

r

N

N

G gy

C cr

P2G GG A

C C DC

rG

N y G A a

Ac

gg

P3

P5g

AUR

UA

P1

5' 3'

C GU R

Y

CA RUAU

GG

P2

AN

U

A

C

GU N U U

A

UA

A A

G

GCC

P3

C

N G A

U

P1

P2

P3

P4

P5

P3 P2

P4

base stem base stem5' 3' 5' 3'

B12-element

base stem

S box-

base stem

G box-

Add

Add I

Add II

Add III

Var

P5

P1

uaAG

u

CG

P1

5' 3'base stem

R Yr y

Gy

y

r

aa

g

u g

aa a GG

r Cr G

y G Cyk

a G ug R

C a Yu

a

Gg N

a

aA

a N

acUGC

GA

G G gaR

r

uYy

P2

P5P6

P7

P3P4

LYS-element

Page 39: Comparative genomics of RNA regulatory elements

Mechanisms

UUUUUUUU

5 ’

33 ’

5 ’

Regulatory hairpin(terminator of transcription and or RBS-sequestor)/

In the case of regulation of transcription

In the case of regulation of translation

GENES

3 ’ GENES

RNA-element

A

5 ’

1 3UUUUUUUU

Antiterm inator/Antisequestor

3 ’ GENES

5 ’ 1 2

RNA-element

3 ’ GENES

B 5 ’

2 3

Antiterminator/Antisequestor

3 ’ GENES

C

5 ’

RNA-element

3 ’ GENES

12

5 ’

1 23 ’ GENES

Regulatory hairpin

+ Effector

UUUUUUUU

- Effector

2

1gcvT: ribozyme, cleaves its mRNA (the Breaker group)

THI-box in plants: inhibition of splicing (the Breaker and Hanamoto groups)

Page 40: Comparative genomics of RNA regulatory elements

Characterized riboswitches (more are predicted)RFN Riboflavin

biosynthesis and transport

FMN (flavin mononucleotide)

Bacillus/Clostridium group, proteobacteria, actinobacteria, other bacteria

THI Biosynthesis and transport of thiamin and related compounds

TPP (thiamin pyrophosphate)

Bacillus/Clostridium group, proteobacteria, actinobacteria, cyanobacteria, other bacteria, archea (thermoplasmas), plants, fungi

B12 Biosynthesis of cobalamine, transport of cobalt, cobalamin-dependent enzymes

Coenzyme B12 (adenosyl-cobalamin)

Bacillus/Clostridium group, proteobacteria, actinobacteria, cyanobacteria, spirochaetes, other bacteria

S-boxSAM-IISAM-III

Metabolism of methionine and cystein

SAM (S-adenosyl- methionine)

Bacillus/Clostridium group and some other bacteriaSAM-II (alpha), SAM-III (Streptococci)

LYS Lysine metabolism lysine Bacillus/Clostridium group, enterobacteria, other bacteria

G-box Metabolism of purines

purines Bacillus/Clostridium group and some other bacteria

glmS (ribozyme)

Synthesis of glucosamine-6-phosphate

glucosamine-6-phosphate

Bacillus/Clostridium group

gcvT (tandem)

Catabolism of glycine

glycine Bacillus/Clostridium group

Page 41: Comparative genomics of RNA regulatory elements

Properties of riboswitches

• Direct binding of ligands

• High conservation

– Including “unpaired” regions: tertiary interactions, ligand binding

• Same structure – different mechanisms: transcription, translation, splicing, (RNA cleavage)

• Distribution in all taxonomic groups

– diverse bacteria

– archaea: thermoplasmas

– eukaryotes: plants and fungi

• Correlation of the mechanism and taxonomy:

– attenuation of transcription (anti-anti-terminator) – Bacillus/Clostridium group

– attenuation of translation (anti-anti-sequestor of translation initiation) – proteobacteria

– attenuation of translation (direct sequestor of translation initiation) – actinobacteria

• Evolution: horizontal transfer, duplications, lineage-specific loss

• Sometimes very narrow distribution: evolution from scratch?

Page 42: Comparative genomics of RNA regulatory elements

• RFN, S-box– early identification of a conserved element – model of regulation from comparative analysis– use for functional annotation– experimental validation

• THI, B12, PUR, LYS– scavenging of unexplained published experimental results– models of regulation from comparative analysis– experimental validation– use for functional annotation

• GcvT, GlmS– large-scale computational screens– prediction of ligand from functions of regulated genes– experimental validation

• SAM-II, SAM-III– gaps in regulatory systems– computational screens– experimental validation

• Structures: PUR, THI, S-box

Study scenarios

Page 43: Comparative genomics of RNA regulatory elements

Teaser:

Systematic analysis of T-boxes

• T-boxes: the mechanism (Grundy & Henkin)

Page 44: Comparative genomics of RNA regulatory elements

Terminator(underlined) ===========> <===========

Antiterminator ==> ===> <===<== SA serS -> 26 CGTTA 51 AAATAGGGTGGCAACGCGTAGAC------------CACGTCCCTTGTAGGGATGTGGTCTTTTTTTA DHA tyrZ -> 47 CGTTA 65 AGGTAAGGTGGTAACACGGGAGCA-------TACTCTCGTCCTTCTGGCAATGAAGGACGGGAGTTTTTTGTTTT ST trpS -> 37 CCTTA 61 AATTGAGGTGGTACCGCGTATTACTT----GTAATAACGCCCTCACGTTTTAATAGCGTGGGGACTTTTTGCTAT CA aspS -> 39 CGTTA 34 ATAAAGGATGGCACCGTGAAAA----------GCCTTCACTCCTTACTGGAGTGGAGGCTTTTTTTATTTTAAATAAA DF valS -> 41 CGTTA 77 AATTAAGGTGGTAACGCGAGC------------TTTTCGTCCTTTTTAAAGAGGATGAAGAGCTCTTTTTTATTTCT PN thrS -> 30 CGTTA 38 AATGAAGGTGGAACCACGTTG-------------CGACGTCCTTTCGAGGATGTCGCATTTTTTTATTAG MN ileS -> 89 CGTTA 68 AATTAAGGTGGTACCACGAGC-------------TTTCGTCCTTTGATGAAAGTTCTTTTTTATTGAT DF leuS -> 28 AGCTA 29 AATTAGGGTGGTACCGCGAAGATT-------TATCCTCGTCCCTAAACGTAAGTTTAGTGACGAGGATTTTTTATTTTCA HD argS -> 41 CGTTA 27 AACGAGAGTGGTACCGCGGGTAA---------AAGCTCGCCTCTTTTTAGAAGAGGCGGGTTTTTTATTTT DF proS -> 33 CGTTA 30 AACTAGAGTGGTACCGCGGAAAT-----TAAACCTTTCGTCTCTATACTTGTATAGAGATGAGAGGTTTTTTATATTTTCAGGA ZC lysS -> 46 CGTTA 63 AACTGAGGTGGTACCGCGAAGCTAA-----CAACTCTCGTCCTCAAGATGAATAATCTTGGGGGTGGGAGTTTTTTTGTTGCAT BQ metS -> 55 CGTTA 66 AAATAAGGTGGTACCGCGACTGTTTA---TACAGCCCCGCCCTTATCTTTTTTAGATAAGGGCGGGGCTTTTTATATTTAA MN pheS -> 14 AATTA 20 AAAACGGATGGTACCGCGTGTC-------------AACGCTCCGCTTAAGGAGTTTTGGCACTTTTTTTGTTTT MN glyQ -> 14 AGCTA 23 AATTAGGGTGGAACCGCGTTT------------CAAACGCCCCTATGTCAGTTGGCATGGGAGTGATTGAGCGTGGCTCTTTT ST alaS -> 20 AATTA 18 AATAGAGGTGGTACCGCGGTT--------------TTCGCCCTCTGTGAGATGGACTTGTTTTGTATGGAGGACTATTTGAAA SA trpE -> 32 AATTA 4 AACTAAGGTGGCACCACGGTA-------------ACGCGTCCTTACAGGTATATGCGTTATGTGGTGTCTTTTT BS ilvB -> 50 CGTTA 47 AACAAGGGTGGTACCGCGGAAAGAAA---AGCCTTTTCGCCCCTTTTAGCTATCGCAGTTACTGCGCGGCTGATTGT CA ilvC -> 40 CGTTA 14 AATTTGGGTGGTACCGCGCGACCAAA-----AATTCTCGCCCCAAGCAGGGAATTTTGGCCGTTTTTTTATATAAATAAAT BQ asnA -> 51 CGTTA 62 AATTTGGGTGGTACCGCGGAACC-----AAAGCCTTTCGTCCCAGTTTTTTGGGAAAGAAGGGCTTTTTTTGTTGGCTT BS proB -> 33 CGTTA 30 AATCAAGGTGGTACCACGGAAAC--------CCATTTCGTCCTTATGAATCAGGATGAAATGGGTTTTTTTATTGTAGA SA cysE -> 33 CATTA 62 ATTCAGAGTGGAACCGTGCGG-------------AAGCGCCTCTAACAATACAATTTGTATGTTAGTGGTGCTTTTTTG MN hisC -> 46 CGTTA 50 AATGAAGGTGGAACCACGTGTGT---------GTCAGCGTCCTTGCAAGTTTTTTGCAAGGGCGCTTTTTTGAATAGT DHA pheA -> 41 CGTTA 50 AAAAAGGGTGGTACCGCGTGAC---------TTAACTCGTCCCTTATTTGGGGGTGAGGTAAGTCTTTTTTTATTTA HD serA -> 42 cgtta 57 AATGAGGGTGGCACCGCGGTATG-------AACCTTCCGCCCCTCACGACAGTCGTCGTGTGGGCAGAAGGTTTTTTTACTATCA BQ phhA -> 51 CGTTA 34 AAATAGGGTGGTACCGCGATTC------------TTTCGCCCCTATCGGATTTTCCGATAGGGGCTTTTTCTATTTC EF yxjH -> 40 CGTTA 51 AAAAAAGGTGGTACCGCGATAA-----------TAATCGCCCTTTTACTAGTTACGGCTAGTAAAAGGGCGTTTTTTTATAAA CA yckK -> 38 CGTTA 57 AATTAGAGTGGTACCGTGGAATT-------CAACTTCTGCCTCTAACTATGAGGATAGAAGTTTTTTGTTTTTAT DF yqiX -> 41 CCTTA 30 AAAAAGAGTGGTAACGCGGATAT----------AATTCGTCTCTTAGCTGTAAAGCTAAGGGACTTTTTTGATTTA HD BH0807->74 TGTTA 56 AACTGGGGTGGCACCACGACAAG----------TGATCGTCCCCAAGACTTTTATCAGTCTTGGGGACGTTTTTTTGTTCAT EF yheL -> 8 AATTA 33 AATTAAGGTGGTACCGCGGAGA-----------GATTCGTCCTTATTCTTTAAGGATGAATCTCTCTTTTTATGTAGC BQ ykbA -> 46 CGTTA 45 AACAAGGGTGGAACCACGAATAT--------AACACTCGTCCCTTTTTTAGGGAGGAGTGTTTTTTTATT BQ sdt2 -> 40 CGTTA 56 AATTGAGGTGGTACCACGGTATTAACATTACATATATCGTCCTCTACATGCATATTTGCGTGTAGGGGACTTTTTTATTTTC EF yusC -> 42 CGTTA 60 AATTAAGGTGGTATCACGAAATGA-----CAAACTTTCGTCCTTTTTGCTGTAATAGCAAAAGGATGGAAGTTTTTTTGTTT CA yhaG -> 48 CGTTA 51 AATTTAGGTGGTACCGCGGAAGT---------ATCTCCGTCCTAATTAATAAGATTAGGGCGGAGTTTTTTATTTGC BQ brnQ -> 44 CGTTA 66 AATTAGGGTGGTATCGCGGGTAAA------TATAACTCGTCCCTTTCTTTAGGGACGAGTTTTTTGTGTTCTT REF01723 -> 44 CGTTA 55 AATTGAGGTGGCACCACGAATGC----------GATTCGTCCTCTTGGCTCACAGCCAAGAGGCTTTTTTGTTTTTTTAATA BS yvbW -> 56 CGTTA 32 AACAAGAGTGGTACCGCGGTCAGC--CGAAGGCTCGTCGTCTCTTTATCTATTAGATTAGGTAGGAGACGGCGGGCTTTTTT

Aminoacyl-tRNA synthetases

Amino acid biosynthetic genes

Amino acid transporters

TGG: T-box

Partial alignment of predicted T-boxes

Page 45: Comparative genomics of RNA regulatory elements

specifier hairpin ===> ==> ===> <=== <== SC<=== SA SERS SER ---GTAGGACAAGTA 19 AGAGAGCTTGTGGTT---AGTGTGAACAAG--- 15 GAA--TCTACCTACTT -> DHA tyrZ Tyr ----AAGAACAAGTA 18 AGAAAGTTGCCGGCT---GATGAGAGGCGCTT 18 GAA--TACCTCTTTGA -> ST trpS Trp ---ATTAGAAGAGTA 16 AGAGAGTTAGTGGTT---GGTGCAAGCTAAC- 12 GAAA-TGGACTAATGA -> CA ASPS ASP -----GAGAAAAGTA 18 AGCGAATTGGGAAAT---GGTGTGAGCCCAA- 15 GAAA-GACATCTCGGA -> DF VALS VAL -GAAGAAGAGGAGTA 16 AGAGAGGAAAATTCACTGGCTGTAAGATTTTC 17 GAAT-GTAGCTTTGGA -> PN THRS THR ----AGAGACAAGTC 18 AGAGAGTGCGTGGTT---GCTGGAAACGCAT- 14 GAT--ACTACTCTTGA -> MN ileS Ile ----CAAAAACACAA 17 AGCGAATAGGTGAT----GGTGTAAGACCTATT 18 -----ATCATTTTGTT -> DF leuS Leu ----CTAGAGCAGTA 19 AGAGGAAGTGGAA-----GGTGAGAACTAATATT 10 GAA--CTTACTAGATT -> HD ARGS ARG -----TGGGAGAGTA 20 AGCGAGTCGGGAT-----GGTGGGAGCCGAT- 14 GAAA-CGCACCCATGA -> DF proS Pro ---AAAGAAATAGTA 18 AGAGAGAAAACGGT----GGTGAGAGTTTTC-- 14 GAA--CCTGTCTTTTA -> ZC lysS Lys ---AAGAGAAGAGTA 19 AGAGAGCTCTGGTA----GCTGAGAAAGAGC-- 15 GAAAAAAGACTTGGAG -> BQ metS Met ---AAAGGAAAAGTA 19 AGAGAGCTTCGGTA----GCTGAGAAGAAGC-- 14 GAACAATGGCCTTTGA -> MN pheS Phe ----TGAGATTAGTA 18 AGGGAATGCGGGGCGTG-ACTGGAAACCCGC- 16 GAA--TTCACTCAGAA -> MN glyQ Gly ---AGAAAGAGAGTT 15 AGCGAACCTGAGAG----AGTGTAAGTCAGGT 14 GACT-GGCACTTTCTC -> ST alaS Ala -AGTTAAGAATTGTT 17 AGAAAAGTGACGGTT---GCTGCGAGTCATT- 17 -----GCTACTTAACT ->

SA trpE Trp TCTAAAGAAATAGTA 22 AGAAAGCTAATGGGT---GATGGGAATTAGC-- 14 GAAT-TGGACTTTGGA -> BS ilvB Leu ---TGAGGATAAGTA 20 AGAGAACCGGGTTA----GCTGAGAACCGG--- 16 GAA--CTCGCCTCAGA -> CA ilvC Val -----AGGAAGAGTA 17 AGAGAGTGAGATACT---GGTGGGAACTCAT-- 13 GAAG-GTAGCCTTTGA -> BQ asnA Asn --AGGACGAGTAGTA 15 AGCGAGTCAGGGGT----GGTGTGAGCCTGA-- 15 GAAG-AACCTCCTGGA -> BS proB Pro -----AGGATTAGTA 18 AGAGAGCAAAATGAACC-GCTGAAACATTTTGC 15 GAA--CCTGCCTTGGA -> SA cysE Cys --CGAAGGATTAGTA 18 AGAGAGTGTACGGTT---GCTGTGAGTACA--- 14 GAA--TGCACCTTCGT -> MN hisC His -----AGAGAAAAAA 16 AGAGAGTATGGGAA----GCTGAAAACATAC-- 15 -----CACATTCTTGA -> DHA pheA Phe -----AAAGAGAGCA 19 AGGGAACTAAAGTCGGAGACTGAAAGCTTTAGT 14 GAGA-TTCACTCTGGA -> HD serA Ser ----GAAGATGAGGA 17 AGAGAGCTGGTGGTT---GCTGTGAACCAGCT- 18 -----AGCCCTTCTGA -> BQ phhA Tyr AGAATCGCAGTAGTA 17 AGAGAGCTAATGGTC---GGTGGAAATTGGC-- 14 GAAT-TACAATTCTGG -> EF yxjH Met -----TAGGAAAGTA 17 AGAGAGACTTTGGTT---GGTGAAAAAAGTT-- 13 GAAAAATGGCCTAGGA ->

CA yckK Cys ----AAGAACCAGTA 17 AGAGAAAAATCTCCAAG-GCTGAAAGGGATTTT 15 GAA--TGCATCTTTGA -> DF yqiX Arg -----AGAGAAAGTA 16 AGCGAGTTAGGGGTT---GGTGTAAGCCTAGC- 14 GAAG-AGAGCTCTGGA -> HD BH0807 Lys ----AGAGAAGAGTA 19 AGAAAGCCTGTAGTT---GCTGAGAACGGGT-- 14 GAAGCAAGACTCTGAG -> EF yheL Tyr -TTATTAGCCCAGTA 19 AGAAAGTCGATGGTT---GCTGCGAATCGAT-- 13 GAAT-TACACTAATAA -> BQ ykbA Thr --GAGGACACGATCA 16 AGAGAGGGAAGCCTTTG-GCTGTGAGCTTCCT- 14 GATT-ACCACCTCTGA -> BQ sdt2 Trp ---GCAAGAAGAGTA 18 AGAGAGCTGGGGGAA---GGTGTGAGCCCGGT- 15 GAA--TGGGCTTGCGA -> EF yusC Met ----AAAGAAGAGTA 18 AGAGAGCCCTGTTT----GCTGAGAATGGG--- 16 GAAG-ATGGTCTTTGA -> CA yhaG Trp ----AAGGAAGAGTA 18 AGAGAGCTGAGGGT----GGTGTGATCTCAGT- 15 GAA--TGGACCTTTTA -> BQ brnQ Ile ----GAGAACGAGTA 19 AGAGAGTTGGCGATTT--GCTGAAAGCCAAC-- 15 GAAA-ATCATCTCCGA -> REF01723 His --TTAGGACATAGTA 18 AGAGACTTTTTCATTG--GCTGAAAGAAAAAG- 17 -----CACACCTAAAA -> BS yvbW Leu -----GGGAGCAGTA 18 AGAGAGCTGCGGGGT---GGTGCGACGCAGC-- 13 GAA--CTCGCCCGGGA ->

Aminoacyl-tRNA synthetases

Amino acid biosynthetic genes

Amino acid transporters

… continued (in the 5’ direction) anti-anti (specifier) codon

Page 46: Comparative genomics of RNA regulatory elements

~800 T-boxes in ~90 bacteria• Firmicutes

– aa-tRNA synthetases– enzymes– transporters– all amino acids excluding glutamine, glutamate, lysine

• Actinobacteria (regulation of translation – predicted) – branched chain (ileS)– aromatic (Atopobium minutum)

• Delta-proteobacteria – branched chain (leu – enzymes)

• Thermus/Deinococcus group (aa-tRNA synthases)– branched chain (ileS, valS)– glycine

• Chloroflexi, Dictyoglomi– aromatic (trp – enzymes)– branched chain (ileS)– threonine

Page 47: Comparative genomics of RNA regulatory elements

Same enzymes – different regulators (common part of the aromatic amino acids biosynthesis pathway)

P H E T Y R

trpE

P E P E 4 P

D A H P

S H IK IM AT E

C H O R IS M AT E

trpDCFBA

tyrA hisC aspB

phhA

aroF

aroI aroE

aroA

aroD

aroB

aroC

aroA pheB aroH

yhaG

T R P

T R P

k in u ren in e p a th w ay

A N T H R A N IL AT E

F O L AT E

pabA pabB

A D C

trpG

TRP trpXYZ

TRP\PHE yocR fam ily

TYR yheL

aro:Regulated by TYR (BC)Regulated by PHE (SW O, DRE, HMO, CH, MTH, CTH)Regulated by TRP (DE, DEH)

cf. E.coli: AroF,G,H: feedback inhibition by TRP, TYR, PHE; transcriptional regulation by TrpR, TyrR

Page 48: Comparative genomics of RNA regulatory elements

Recent duplications and bursts: ARG-T-box in Clostridium difficile

LJ_ARGS

LME_ARGS

LR_ARGS

LP_ARGS

CBE_ARGS

CPE_ARGSCB_ARGS

CTC_ARGS

CAC_ARGS

CDF_YQIXYZ

RDF02391

СDF_ARGC

CDF_ARGH

BC_ARGS2EF_ARGS

BH_ARGS

LSA_ARGSPPE_ARGS

LGA_ARGS

Bacillales

argSyqiXYZ

RDF02391

argCJBDF

predictedamino acidtransporters

NEW

argG

argH

Clostridiumdifficile

amino acidbiosynthetic genes

: ARG-specific T-box regulatory site

aminoacyl-tRNA synthetase

biosynthetic genes

amino acid transporters

NEW

Lactobacillales Clostridiales

argS argS

others

Page 49: Comparative genomics of RNA regulatory elements

Expansion of T-box regulon

regulation of expression of arginine biosynthetic and transport genes by T-box antitermination

: ARG-specific T-box regulatory site

Binding to 5’ UTR gene region regulation of gene expression

Other clostridia spp. (CA, CTC, CTH, CPE, CB, CPE)

yqiXYZ

argC

argH

yqiXYZ

argC

argG

argH

AhrC regulatory protein (negative regulation of arginine metabolism positive regulation of arginine catabolism)

...AhrC site

: AhrC binding site

Gram+ bacteria: Clostridiumdifficile:

AhrC is lost

5’

Page 50: Comparative genomics of RNA regulatory elements

More duplications: THR-T-box in C. difficile

MMY_THRS

OOE_THRS

HMO_YNGICAC_THRZ

BC_THRZ*

BC_THRZ

BC_HOM

BH_THRS

BE_THRSBCE_BRNQ2

BC_THRS

BL_THRZ

BCL_THRZ*

BS_THRZ*

BCL_THRZ

BS_THRZ

BL_THRSBS_THRS

BCL_THRSLMO_THRS

LB_THRSPPE_THRS

LJ_THRS

LP_THRS

TR_THRZ

EX_THRS

CBE_THRZ CTH_THRZCPE_THRS

TTE_THRZ

CDF_THRZ

CDF_HOMCDF_THRC

CDF_HOM*

С _THRZBCTE_THRZ

CBE_THRS

CTC_BRNQ1

LL_THRS

SUI_THRS

STH_THRS

SG_THRS

SMI_THRSSPN_THRS

SMU_THRSSAG_THRSSUB_THRS

SEQ_THRSSPY_THRS

SA_THRS

LME_THRS

MFL_THRS

: THR-specific T-box regulatory site

Bacillales

Clostridiales

LactobacillaceaeLeuconostocaceae

thrS

hom

thrS

thrZ

hom

thrS

thrZ

thrCB

С. difficile

hom

thrS

thrZ

brnQ

hom

thrS

thrZ

brnQB. cereus

brnQ

thrS

thrZ

thrS

thrZ

others

aminoacyl-tRNA synthetase

biosynthetic genes

amino acid transporters

Streptococcaecae

thrCB

Page 51: Comparative genomics of RNA regulatory elements

ASN/ASP/HIS T-boxes:

Duplications and changes in specificity

CB_ASNS2

CDF_ASNA

EF_HISS

EX_HISS

BCL_HISSBH_HISS

OB_HISS

BC_HISS

TTE_HISS

DRE_HISS

CH_HISSCTH_HISS

PL_HISS

BE_HISSBL_HISS

BS_HISS

LME_HISXYZCDF_HISZX

LRE_HISXYZLSA_HISXYZ

OOE_HISXYZ

LP_HISXYZ

SGO_HISCSMU_HISC

EF_HISXYZ

LMO_HISXYZ

EF_HISXYZ

LME_HIS(Z G\ )

LL_HISCLP_HISZ

LCA_HISZCB_ASNS3

CAC_ASNS32

BC_ASNS2

PPE_HISXYZ

PPE_ASNS

LB_ASNA

LD_ASNALJ_ QHMPgln

LJ_ASNA

PPE_ASNALP_ASNA

EX_ASNA

LB_ASNS2

CTC_ASNS2

PPE_HISSLP_HISS

LB_HISS

LJ_HISS

LRE_HISS

LRE_ASPS

LCA_HISS

CPE_ASNA

BC_ASNACBE_ASNS2

CTC_ASNACDF_ASNS2

CPE_ASNS2

his operon

his XYZ

Lactobacillales

NEW

hisS

Other Gram+

ASP\ASN

HIS

Bacillales

HIS

aspS

SMU_ASPS2SG_ASPS2glnQHMP

L. johnsoniiasnA

ASP

ASN

asnAASN

Lac acillalestobasnS

ASN

aspS

hisXYZ

P. pentosaceus

asnS

HIS

ASP

Clostridiales

asnAASN

ASN

asnA

asnS

asnA

Page 52: Comparative genomics of RNA regulatory elements

Blow-up

PPE_ASNS2

LB_ASNA

LD_ASNALJ_GLNQHMP

LJ_ASNA

PPE_ASNALP_ASNA

PPE_HISSLP_HISS

LB_HISS

LJ_HISS

LRE_HISS

LRE_ASPS

LCA_HISS

aspShisSASP

Lac acillalestob

HIS ASPhisS

L. reuteri

aspS

ASP HIS

CACGAC

asnAASN

Lac acillalestob

disruption of hisS-aspS operonmutation of regulatory codon

L. johnsonii

asnA

ASP

ASN

glnQHMP

PPE_HISXYZ

ASN

AAC

P. pentosaceus

HIS

ASPhisXYZ

asnS

HIS

CAC

ASPASN

AAC GAC

Page 53: Comparative genomics of RNA regulatory elements

Branched-chain amino acids: duplications and changes in specificity

DG_VALS EX_VALS BCL_VALS

CTH_VALS

BC_VALS

BH_VALSBE_VALS

CH_VALS LMO_VALS

CA_ILVC

SA_VALS

OOE_LEUS

PPE_LEUS

LB_LEUS EF_LEUS

LJ_LEUSLGA_LEUS

OB_ILVB

LP_LEUS

LSA_LEUS

OB_LEUSSW O_029_0008

SWO_LEUS BS_YVBW

BL_YVBW

DRE_070_0004CH_LEUS

LM

O_L

EU

S

BL_LEUS

BS_LEUS

BE_LEUSBH_LEUS

BC_LEUS

BCL_LEUS

DTH_ILVB

BS_ILVB

PL_ILVB

BH_ILVB

BE_ILVBBL_ILVB

BCL_ILVB

GSU_LEUA

DH

A_L

EU

A

TTE_LEUS

CTH_148_0001

DF_LEUS

CDF_LEUA

CPE_LEUS

CBE_LEUS

CTC_LEUS

CB_LEUSCA_LEUS

EX_LEUS

DAC_LEUA

BC_YOCR3

OB1271

LP3666

STH_ILES

LP_BRNQ1_ile

SUB_ILES

LL_ILES

LCR_ILES

SPY_ILES

SZ_ILES

SEQ_ILES

SAG_ILES

SMU_ILES

SOB_ILES SMI_ILESSP_ILES

SG_ILES

EF_ILES LME_ILES

LJ_ILES

LD_ILES SA_ILES

LB_ILES

OOE_LP3666LRE_PANE

LP_BRNQ2_val

LCA_BRNQ1_val

LCA_BRNQ2_ileLRE_BRNQ _ile

LJ_BRNQ _ile

LSA_ILES

LJ_OPP

CTC_ILES

CB_ILES

CPE_ILES

TTE_ILES

PPE_ILES

LRE_3666_1

LMO_ILES

DF_ILES

EX_ILES

BC_ILES

BS_ILESBL_ILES

BH_ILES

BCL_ILES

OOE_ILES

DRE_ILVD*_leu

DRE_ILVD _ile

CH_YBGE

BC_ILVB

BE_ILES

DRE_ILES

HMO_ILES

CH_ILES

LRE_3666_2

DHA_ILES

OB_ILES

CTC_BRNQ2CPE_BRNQ

CDF_ILVCCTC_BRNQ1

CAC_BRNQ

BC_ILES2

BCE_BRNQ1

LP_ILES

CTH_ILESLR_LEUS

HMO_ILVB

BC_YBGE*BC_YBGE

DF_VALS

CB_VALS

CBE_VALS

CAC_VALS

CTC_VALS

LL_VALSLCR_VALS LR_VALS

LP_VALS

LSA_VALS

LME_VALS

EF_VALS

PPE_VALS

LD_VALS

LJ_VALS

CPE_VALS

DHA_VALS

BS_VALS

BL_VALS

DRE_VALS

TTE_VALS

HMO_VALS

LEU

VAL ILE

valSVAL

leuSLEU

Firmicutes

Ilv operon

LEU

Bacillales

.......

leu operon

LEU

δ-proteobacteriaClostridium difficileDesulfitobacteriumhafniense

.......

148_0001

LEU

C therm ocellum.

029_0008

LEU

Syntrophom onaswolfei

yvbW

LEU

B. Subtilis.B licheniform is

YOCR3

LEU

B. cereus

OB1271

LEU

Oceanobacillusiheyensis

C acetobutylicum .ilvC

VAL

FirmicutesileS

ILE

LactobacillaceaeClostridiaceaeBacillus cereus

brnQILE

Firmicutes

Lactobacillus casei Lactobacillus plantarum

brnQ

VAL

Ilv operon

LEU

Desulfotomaculum reducens

.......

IlvBN

ILE

Heliobacillus mobilis

Ilv operon2

ILE.......

Ilv operon

ILE

Carboxydothermushydrogenoformans

.......

IlvCB

ILE

.C difficile

ILE

ILE

Recent T-box duplication and mutation of regulatory codon

LEU

ATCCTC

lp3666

ILE

Lactobacillales

opp

ILE

Lactobacillus johnsonii

panE

ILE

Lactobacillus reuteri

Ilv operon2

ILE.......

B. cereus

ILE VAL

GTCCTC

T-box duplication and mutation of regulatory codon

ATC

ATC CTC

Page 54: Comparative genomics of RNA regulatory elements

Blow-up

transporter:

dual regulation of common enzymes:

ATC CTC

ATC GTC

Page 55: Comparative genomics of RNA regulatory elements

Double and one-and-a-half T-boxes• TRP: trp operon (Bacillales,

C. beijerincki, D. hafniense)• TYR: pah (B. cereus)• THR: thrZ (Bacillales);

hom (C. difficile)• ILE: ilv operon (B. cereus)• LEU: leuA (C. thermocellum)

• ILE-LEU: ilvDBNCB-leuACDBA (Desulfotomaculum reducens)

• TRP: trp operon (T. tengcongensis)• PHE: arpLA-pheA (D. reducens, S. wolfei) • PHE: trpXY2 (D. reducens) • PHE: yngI (D. reducens) • TYR: yheL (B. cereus) • SER: serCA (D. hafniense)• THR: thrZ (S. uberis)• THR: brnQ-braB1 (C. thermocellum)• HIS: hisXYZ (Lactobacillales)• ARG: yqiXYZ (C. difficile)

Page 56: Comparative genomics of RNA regulatory elements

• Andrei Mironov– software genome analysis,

conserved RNA patterns• Alexei Vitreschak

– analysis of RNA structures• Dmitry Rodionov

– metabolic reconstruction

• Support:– Howard Hughes Medical Institute– INTAS– Russian Fund of Basic Research– Russian Academy of Sciences