Upload
bess
View
26
Download
2
Embed Size (px)
DESCRIPTION
Comparative genomics of RNA regulatory elements. Mikhail Gelfand Research and Training Center “Bioinformatics” Institute for Information Transmission Problems Moscow, Russia. September 2006. Riboflavin biosynthesis pathway. 5 ’ UTR regions of riboflavin genes from various bacteria. - PowerPoint PPT Presentation
Citation preview
Comparative genomics of RNA regulatory elements
Mikhail Gelfand
Research and Training Center “Bioinformatics”
Institute for Information Transmission Problems Moscow, Russia
September 2006
Riboflavin biosynthesis pathway
ribAribA
ribA ribB
G TP cyclohydrolase II
ribD
ribD
ribG
ribG
P yrim id ine deam inase
3,4-D HB P synthase P yrim id ine reductase
ribHribH R ibo flavin synthase, -cha in
ribEribB
ypaA
R ibo flavin synthase, -chain
GTP
2,5-diam ino-6-hydroxy-4-(5`-phosphoribosylamino)pyrim idine
ribulose-5-phosphate
PENTOSE-PHOSPHATE PATHWAY
PU RINE BIO SYNTHESIS PATHWAY
3,4-dihydroxy-2-butanone-4-phosphate 5-am ino-6-(5`-phosphoribitylam ino)uracil
5-am ino-6-(5`-phosphoribosylamino)uracil
6,7-dimethyl-8-ribityllumazine
Riboflavin
5’ UTR regions of riboflavin genes from various bacteria 1 2 2’ 3 Add. 3’ Variable 4 4’ 5 5’ 1’ =========> ==> <== ===> -><- <=== -> <- ====> <==== ==> <== <========= BS TTGTATCTTCGGGG-CAGGGTGGAAATCCCGACCGGCGGT 21 AGCCCGTGAC-- 8 4 8 -----TGGATTCAGTTTAA-GCTGAAGCCGACAGTGAA-AGTCTGGAT-GGGAGAAGGATGAT BQ AGCATCCTTCGGGG-TCGGGTGAAATTCCCAACCGGCGGT 19 AGTCCGTGAC-- 8 5 8 -----TGGATCTAGTGAAACTCTAGGGCCGACAGT-AT-AGTCTGGAT-GGGAGAAGGATATG BE TGCATCCTTCGGGG-CAGGGTGAAATTCCCGACCGGCGGT 20 AGCCCGCGA--- 3 4 3 -----AGGATCCGGTGCGATTCCGGAGCCGACAGT-AT-AGTCTGGAT-GGGAGAAGGATGCC HD TTTATCCTTCGGGG-CTGGGTGGAAATCCCGACCGGCGGT 19 AGTCCGTGAC-- 10 4 10 ----–TGGACCTGGTGAAAATCCGGGACCGACAGTGAA-AGTCTGGAT-GGGAGAAGGAAACG Bam TGTATCCTTCGGGG-CTGGGTGAAAATCCCGACCGGCGGT 23 AGCCCGTGAC-- 8 4 8 ----–TGGATTCAGTGAAAAGCTGAAGCCGACAGTGAA-AGTCTGGAT-GGGAGAAGGATGAG CA GATGTTCTTCAGGG-ATGGGTGAAATTCCCAATCGGCGGT 2 AGCCCGCAA--- 3 4 3 ------AGATCCGGTTAAACTCCGGGGCCGACAGTTAA-AGTCTGGAT-GAAAGAAGAAATAG DF CTTAATCTTCGGGG-TAGGGTGAAATTCCCAATCGGCGGT 2 AGCCCGCG---- 7 6 7 --------ATTTGGTTAAATTCCAAAGCCGACAGT-AA-AGTCTGGAT-GGAAGAAGATATTT SA TAATTCTTTCGGGG-CAGGGTGAAATTCCCAACCGGCAGT 6 AGCCTGCGAC-- 11 3 11 ----–CTGATCTAGTGAGATTCTAGAGCCGACAGTTAA-AGTCTGGAT-GGGAGAAAGAATGT LLX ATAAATCTTCAGGG-CAGGGTGTAATTCCCTACCGGCGGT 2 AGCCCGCGA--- 4 4 4 -----ATGATTCGGTGAAACTCCGAGGCCGACAGT-AT-AGTCTGGAT-GAAAGAAGATAATA PN AACTATCTTCAGGG-CAGGGTGAAATTCCCTACCGGTGGT 2 AGCCCACGA--- 3 4 3 -----ATGATTTGGTGAAATTCCAAAGCCGACAGT-AT-AGTCTGGAT-GAAAGAAGATAAAA TM AAACGCTCTCGGGG-CAGGGTGGAATTCCCGACCGGCGGT 3 AGCCCGCGAG-- 5 4 5 ----–TTGACCCGGTGGAATTCCGGGGCCGACGGTGAA-AGTCCGGAT-GGGAGAGAGCGTGA DR GACCTCTTTCGGGG-CGGGGCGAAATTCCCCACCGGCGGT 15 AGCCCGCGAA-- 8 12 9 ----–CCGATGCCGCGCAACTCGGCAGCCGACGGTCAC-AGTCCGGAC-GAAAGAAGGAGGAG TQ CACCTCCTTCGGGG-CGGGGTGGAAGTCCCCACCGGCGGT 3 AGCCCGCGAA-- 5 4 5 -----CCGACCCGGTGGAATTCCGGGGCCGACGGTGAA-AGTCCGGAT-GGGAGAAGGAGGGC AO AATAATCTTCAGGG-CAGGGTGAAATTCCCGATCGGCGGT 2 AGTCCGCGA--- 7 7 7 -----AGGAACCGGTGAGATTCCGGTACCGACAGT-AT-AGTCTGGAT-GGAAGAAGATGAAA DU TTTAATCTTCAGGG-CAGGGTGAAATTCCCGATCGGTGGT 2 AGTCCGCGA--- 13 4 12 -----AGGAACTAGTGAAATTCTAGTACCGACAGT-AT-AGTCTGGAT-GGAAGAAGAGCAGA CAU GAAGACCTTCGGGG-CAAGGTGAAATTCCTGATCGGCGGT 20 AGCCCGCGA--- 3 4 3 -----AGGACCCGGTGTGATTCCGGGGCCGACGGT-AT-AGTCCGGAT-GGGAGAAGGTCGGC FN TAAAGTCTTCAGGG-CAGGGTGAAATTCCCGACCGGTGGT 2 AGTCCACG---- 5 4 5 -------GATTTGGTGAAATTCCAAAACCGACAGT-AG-AGTCTGGAT-GGGAGAAGAATTAG TFU ACGCGTGCTCCGGG-GTCGGTGAAAGTCCGAACCGGCGGT 3 AGTCCGCGAC-- 8 5 8 -----TGGAACCGGTGAAACTCCGGTACCGACGGTGAA-AGTCCGGAT-GGGAGGTAGTACGTG SX -AGCGCACTCCGGG-GTCGGTGAAAGTCCGAACCGGCGGT 3 AGTCCGCGAC-- 8 5 8 -----TTGACCAGGTGAAATTCCTGGACCGACGGTTAA-AGTCCGGAT-GGGAGGCAGTGCGCG BU GTGCGTCTTCAGGG-CGGGGTGAAATTCCCCACCGGCGGT 30 AGCCCGCGAGCG 137 GTCAGCAGATCTGGTGAGAAGCCAGAGCCGACGGTTAG-AGTCCGGAT-GGAAGAAGATGTGC BPS GTGCGTCTTCAGGG-CGGGGCGAAATTCCCCACCGGCGGT 21 AGCCCGCGAGCG 8 4 8 GTCAGCAGATCTGGTCCGATGCCAGAGCCGACGGTCAT-AGTCCGGAT-GAAAGAAGATGTGC REU TTACGTCTTCAGGG-CGGGGTGCAATTCCCCACCGGCGGT 31 AGCCCGCGAGCG 7 5 7 GTCAGCAGATCTGGTGAGAGGCCAGGGCCGACGGTTAA-AGTCCGGAT-GAAAGAAGATGGGC RSO GTACGTCTTCAGGG-CGGGGTGGAATTCCCCACCGGCGGT 21 AGCCCGCGAGCG 11 3 11 GTCAGCAGATCCGGTGAGATGCCGGGGCCGACGGTCAG-AGTCCGGAT-GGAAGAAGATGTGC EC GCTTATTCTCAGGG-CGGGGCGAAATTCCCCACCGGCGGT 17 AGCCCGCGAGCG 8 4 8 GACAGCAGATCCGGTGTAATTCCGGGGCCGACGGTTAG-AGTCCGGAT-GGGAGAGAGTAACG TY GCTTATTCTCAGGG-CGGGGCGAAATTCCCCACCGGCGGT 67 AGCCCGCGAGCG 8 3 8 GTCAGCAGATCCGGTGTAATTCCGGGGCCGACGGTTAA-AGTCCGGAT-GGGAGAGGGTAACG KP GCTTATTCTCAGGG-CGGGGCGAAATTCCCCACCGGCGGT 20 AGCCCGCGAGCG 8 4 8 GTCAGCAGATCCGGTGTAATTCCGGGGCCGACGGTTAA-AGTCCGGAT-GGGAGAGAGTAACG HI TCGCATTCTCAGGG-CAGGGTGAAATTCCCTACCGGTGGT 2 AGCCCACGAGCG 26 9 30 GTCAGCAGATTTGGTGAAATTCCAAAGCCGACAGT-AA-AGTCTGGAT-GAAAGAGAATAAAA VK GCGCATTCTCAGGG-CAGGGTGAAATTCCCTACCGGTGGT 14 AGCCCACGAGCG 11 9 11 GTCAGCAGATTTGGTGAGAATCCAAAGCCGACAGT-AT-AGTCTGGAT-GAAAGAGAATAAGC VC CAATATTCTCAGGG-CGGGGCGAAATTCCCCACCGGTGGT 13 AGCCCACGAGCG 5 4 5 GTCAGCAGATCTGGTGAGAAGCCAGGGCCGACGGTTAC-AGTCCGGAT-GAGAGAGAATGACA YP GCTTATTCTCAGGG-CGGGGTGAAAGTCCCCACCGGCGGT 40 AGCCCGCGAGCG 16 6 16 GTCAGCAGACCCGGTGTAATTCCGGGGCCGACGGTTAT-AGTCCGGAT-GGGAGAGAGTAACG AB GCGCATTCTCAGGG-CAGGGTGAAAGTCCCTACCGGTGGT 25 AGCCCACGAGCG 16 4 27 GTCAGCAGATTTGGTGCGAATCCAAAGCCGACAGTGAC-AGTCTGGAT-GAAAGAGAATAAAA BP GTACGTCTTCAGGG-CGGGGTGCAATTCCCCACCGGCGGT 18 AGCCCGCGAGCG 10 4 10 GTCAGCAGACCTGGTGAGATGCCAGGGCCGACGGTCAT-AGTCCGGAT-GAGAGAAGATGTGC AC ACATCGCTTCAGGG-CGGGGCGTAATTCCCCACCGGCGGT 16 AGCCCGCGAGCA 10 3 11 ---CGCAGATCTGGTGTAAATCCAGAGCCGACGGT-AT-AGTCCGGAT-GAAAGAAGACGACG Spu AACAATTCTCAGGG-CGGGGTGAAACTCCCCACCGGCGGT 34 AGCCCGCGAGCG 6 6 6 GTCAGCAGATCTGGTG 52 TCCAGAGCCGACGGT 31 AGTCCGGAT-GGAAGAGAATGTAA PP GTCGGTCTTCAGGG-CGGGGTGTAAGTCCCCACCGGCGGT 13 AGCCCGCGAGCG 7 3 7 GTCAGCAGATCTGGTGCAACTCCAGAGCCGACGGTCAT-AGTCCGGAT-GAAAGAAGGCGTCA AU GGTTGTTCTCAGGG-CGGGGTGCAATTCCCCACCGGCGGT 17 AGCCCGCGAGCG 7 9 7 GTCAGCAGATCCGGTGAGAGGCCGGAGCCGACGGT-AT-AGTCCGGAT-GGAAGAGGACAAGG PU AAACGTTCTCAGGG-CGGGGTGCAATTCCCCACCGGCGGT 19 AGCCCGCGAGCG 19 4 18 GTCAGCAGACCCGGTGTGATTCCGGGGCCGACGGTCAC-AGTCCGGATGAAGAGAGAACGGGA PY TAACGTTCTCAGGG-CGGGGTGCAACTCCCCACCGGCGGT 19 AGCCCGCGAGCG 15 4 16 GTCAGCAGACCCGGTGTGATTCCGGGGCCGACGGTCAT-AGTCCGGATGAAGAGAGAGCGGGA PA TAACGTTCTCAGGG-CGGGGTGAAAGTCCCCACCGGCGGT 19 AGCCCGCGAGCG 14 4 13 GTCAGCAGACCCGGTGCGATTCCGGGGCCGACGGTCAT-AGTCCGGATAAAGAGAGAACGGGA MLO TAAAGTTCTCAGGG-CGGGGTGAAAGTCCCCACCGGCGGT 16 AGCCCGCGAGCG 8 5 8 GTCAGCAGATCCGGTGTGATTCCGGAGCCGACGGTTAG-AGTCCGGAT-GAAAGAGGACGAAA SM AAGCGTTCTCAGGG-CGGGGTGAAATTCCCCACCGGCGGT 34 AGCCCGCGAGCG 8 3 8 GTCAGCAGATCCGGTCGAATTCCGGAGCCGACGGTTAT-AGTCCGGAT-GGAAGAGAGCAAGC BME GCTTGTTCTCGGGG-CGGGGTGAAACTCCCCACCGGCGGT 17 AGCCCGCGAGCG 10 15 10 GTCAGCAGATCCGGTGAGATGCCGGAGCCGACGGTTAA-AGTCCGGAT-GGAAGAGAGCGAAT BS ATCAATCTTCGGGG-CAGGGTGAAATTCCCTACCGGCGGT 18 AGCCCGCGA--- 5 4 5 -----AGGATTCGGTGAGATTCCGGAGCCGACAGT-AC-AGTCTGGAT-GGGAGAAGATGGAG BQ GTCTATCTTCGGGG-CAGGGTGAAAATCCCGACCGGCGGT 27 AGCCCGCGA—-- 3 5 3 -----AGGATTTGGTGTGATTCCAAAGCCGACAGT-AT-AGTCTGGAT-GGGAGAAGATGGAG BE ATTCATCTTCGGGG-CAGGGTGAAATTCCCGACCGGCGGT 20 AGCCCGCGA--- 3 4 3 -----AGGATCCGGTGCGAGTCCGGAGCCGACAGT-AT-AGTCTGGAT-GGGAGAAGATGAAG CA AATGATCTTCAGGG-CAGGGTGAAATTCCCTACCGGCGGT 2 AGCCCGCGAG-- 3 4 3 ----TATGATCCGGTTTGATTCCGGAGCCGACAGT-AA-AGTCTGGAT-GAAAGAAGATATAT DF GAAGATCTTCGGGG-CAGGGTGAAATTCCCTACCGGCGGT 2 AGCCCGCG---- 6 4 6 -------GATTTGGTGAGATTCCAAAGCCGACAGT-AA-AGTCTGGAT-GAGAGAAGATATTT EF GTTCGTCTTCAGGGGCAGGGTGTAATTCCCGACCGGTGGT 3 AGTCCACGAC-- 5 3 5 ----ATTGAATTGGTGTAATTCCAATACCGACAGT-AT-AGTCTGGAT—-AAAGAAGATAGGG LLX AAATATCTTCAGGG-CACCGTGTAATTCGGGACCGGCGGT 21 ACTCCGCGAT-- 4 4 4 ----–TTGAAGCAGTGAGAATCTGCTAGCGACAGT-AA-AGTCTGGAT-GGAAGAAGATGAAC LO GTTCATCTTCGGGG-CAGGGTGCAATTCCCGACCGGTGGT 3 AGTCCACGAT-- 3 10 3 ----TTGACTCTGGTGTAATTCCAGGACCGACAGT-AT-AGTCTGGAT-GGGAGAAGATGTTG PN AAGAGTCTTCAGGG-CAGGGTGAAATTCCCGACCGGCGGT 125 AGTCCGTG---- 3 4 3 -------GATGTGGTGAGATTCCACAACCGACAGT-AT-AGTCTGGAT-GGGAGAAGACGAAA ST AAGTGTCTTCAGGG-CAGGGTGTGATTCCCGACCGGCGGT 14 AGTCCGCG---- 3 4 3 -------GATGTGGTGTAACTCCACAACCGACAGT-AT-AGTCTGGAT-GAGAGAAGACCGGG MN AAGTGTCTTCAGGG-CAGGGTGAGATTCCCGACCGGCGGT 104 AGTCCGCG---- 3 4 3 -------GATGTGGTGAAATTCCACAACCGACAGT-AA-AGTCTGGAT-GGGAGAAGACTGAG SA ATTCATCTTCGGGG-TCGGGTGTAATTCCCAACCGGCAGT 6 AGCCTGCGAC-- 11 3 11 ----–CTGATCTAGTGAGATTCTAGAGCCGACAGT-AT-AGTCTGGAT-GGGAGAAGATGGAG AMI TCACAGTTTCAGGG-CGGGGTGCAATTCCCCACTGGCGGT 14 AGCCCGCGC--- 5 5 5 ------TGATCTGGTGCAAATCCAGAGCCAACGGT-AT-AGTCCGGAT-GGAAGAAACGGAGC DHA ACGAACCTTCGAGG-TAGGGTGAAATTCCCGACCGGCGGT 20 AGCCCGCAAC-- 11 4 11 --CGACTGACTTGGTGAGACTCCAAGGCCGACGGT-AT-AGTCCGGAT-GGGAGAAGGTACAA FN AATAATCTTCGGGG-CAGGGTGAAATTCCCGACCGGTGGT 2 AGTCCACG---- 4 6 4 -------GATTTGGTGAAATTCCAAAACCGACAGT-AG-AGTCTGGAT-GAGAGAAGAAAAGA GLU ---TGTTCTCAGGG-CGGGGCGAAATTCCCCACCGGCGGT 28 AGCCCGCGAGCG 10 4 10 GTCAGCAGATCCGGTTAAATTCCGGAGCCGACGGTCAT-AGTCCGGAT-GCAAGAGAACC---
Conserved secondary structure of the RFN-element
NNNNyYYUC
NNNNrRRAG
NgGGNcCC
rgGGxc
ARRgxuAG
GRCCYG
AcCG
AGCCRGY
GG YRCC
GRYBy CYRVrG N
YGNaA N U U x N
Nx
AGU
UrN A g
Y
variab lestem -loop
additionalstem -loop
3 4
2
1
5
5 ’ 3 ’
u K NRA
xK
*
****
Capitals: invariant (absolutely conserved) positions.
Lower case letters: strongly conserved positions.
Dashes and stars: obligatory and facultative base pairs
Degenerate positions: R = A or G; Y = C or U; K = G or U; B= not A; V = not U. N: any nucleotide. X: any nucleotide or deletion
Attenuation of transcription
TerminatorThe RFN element
Antiterminator
Antiterminator
Bam GACAAAAAAATATTGATTGTATCCTTCGGGGCTGGGTG --- TCTGGATGGGAGAAGGATGA 59 ----------GTAAAGCCCCGAATGTGTAA---ACATTCGGGGCTTTTTGACGCCAAAT BS GGACAAATGAATAAAGATTGTATCTTCGGGGCAGGGTG --- TCTGGATGGGAGAAGGATGA 59 ----------CTAAAGCCCCGAATTTTTTA--TAAATTCGGGGCTTTTTTGACGGTAAA BQ CTATAATTTGAGCAAACAGCATCCTTCGGGGTCGGGTG --- TCTGGATGGGAGAAGGATAT 250 -----------CCAAACCCCAAGGATATTAAA--ATCCTTGGGGTTTTTTGTTTTTTTT BE ACATAACGATATAGTGATGCATCCTTCGGGGCAGGGTG --- TCTGGATGGGAGAAGGATGC 155 ------------TGAGCCCCCGGGGACAT--------CCCGGGGGTTTCATTTTTATTG HD AAATTGAATAATTAATTTTTATCCTTCGGGGCTGGGTG --- TCTGGATGGGAGAAGGAAAC 148 -------------ATGCCCCGTGAGAACAAAA-----TCTCTGGGGCTTTTTTGCGCGC CA TAATGGTAATTTAATAGGATGTTCTTCAGGGATGGGTG --- TCTGGATGAAAGAAGAAATA 34 -------------AATCTCCGAAGGATTACC----TTTCTTTGGAGATTTTTTTATTTG DF TAAATATAAATTTAATACTTAATCTTCGGGGTAGGGTG --- TCTGGATGGAAGAAGATATT 63 ------------TAAACCCTGAGTTAATT--------CTCAGGGTTTTTTGTTTAAAAA LLX ACTTTAGCTACAATTGAATAAATCTTCAGGGCAGGGTG --- TCTGGATGAAAGAAGATAAT 127 ----------AAAAGACCCTGAAATTTT------ATTTTAGGGTCTTATTTTTTATTAG PN* ATCATCTGTAATTGAATAACTATCTTCAGGGCAGGGTG --- TCTGGATGAAAGAAGATAAA 81 ----------TGTATGCCTTGAGTAGTCCCC---TATTCAAGGTATATTTTTTTGGAGG PN* ATCATCTGTAATTGAATAACTATCTTCAGGGCAGGGTG --- TCTGGATGAAAGAAGATAAA 19 ------------CGTGCTCTGAAATGATTACTTGTCATTTCAGAGCATTTTTGTTAATC TM AAAACTGAATACAAAAGAAACGCTCTCGGGGCAGGGTG --- TCCGGATGGGAGAGAGCGTG 13 -----------ATGGGACCCGAGA----------------GGGTCCCTTTTCTTTTACA AO ATTTGCAACAATTTTTTAATAATCTTCAGGGCAGGGTG --- TCTGGATGGAAGAAGATGAA 33 --------TTTACAAGCCTTGAGATCGAAAG----ATTTCAAGGCTTTTTTCATCATTA DU AATTTTTTTAATACTATTTTAATCTTCAGGGCAGGGTG --- TCTGGATGGAAGAAGAAGAG 47 --------TGCATAAGCCTTGAGATCTTAG----GATTTCAAGGCTTTTTCATTAGTTA FN TAATCGAATATGTAAAATAAAGTCTTCAGGGCAGGGTG --- TCTGGATGGGAGAAGAATTA 18 ----------ATATTGCTCAGACTTT------------GTTTGAGCATTTTTTTATTAA SA TATAACAATTTCATATATAATTCTTTCGGGGCAGGGTG --- TCTGGATGGGAGAAAGAATG 74 ------TTTTCTCCTTGCATCTTAATT----------GATGTGAGGATTTTTGTTTATA DHA ACTCTTTTTAGATGAATACGAACCTTCGAGGTAGGGTG --- TCCGGATGGGAGAAGGTACA 43 -----------GTTTATGCCTCGAGGAACACCATTTCCTCGAGGCATTTTTGTTCTTTC FN GAAAAATAAATATTAAAAATAATCTTCGGGGCAGGGTG --- TCTGGATGAGAGAAGAAAAG 40 ------------CTTACCCGAATTCTAT------------AATTCGGTTTTTTTATTTT CA AATATAAAAAAATAAAGAATGATCTTCAGGGCAGGGTG --- TCTGGATGAAAGAAGATATA 19 ----------–-TATGCCCTGACGTTTTT---------CGTTGGGGCTTTTTTAATGCT DF AAAATTAAAAAATCAAAGAAGATCTTCGGGGCAGGGTG --- TCTGGATGAGAGAAGATATT 45 ----------ATAAAAACTCGAAGATAGGG----TCTTCGAGTTTTTTGTTTTTCCTAA BS TAATTAAATTTCATATGATCAATCTTCGGGGCAGGGTG --- TCTGGATGGGAGAAGATGGA 103 --AAAGAACCTTTCCGTTTTCGAGTAAGATGTGATCGAAAAGGAGAGAATGAAGTGAAA BQ GGGAAAATAGAATATCGGTCTATCTTCGGGGCAGGGTG --- TCTGGATGGGAGAAGATGGA 54 -------ATTCTCCCTTTGTGTAAA------------ACACAAAGGGTTTTTTCGTTCTATG BE ATAAAAATGTATAAGCGATTCATCTTCGGGGCAGGGTG --- TCTGGATGGGAGAAGATGAA 114 --------GGCAGCCTTCTTCTTGTGAGGATGAATCACGAGAAGGGGAGGAGAACAAGCATG PN GTTTTTTGTTATGATAAAAGAGTCTTCAGGGCAGGGTG --- TCTGGATGGGAGAAGACGAA 137 -–AACTTCTTCTGATTTTATAG------------AAAATTGGAGGAACCTGTTATGACA ST TAAATCTGCTATGCTAGAAGTGTCTTCAGGGCAGGGTG --- TCTGGATGAGAGAAGACCGG 130 ---GGAACTTCTTTCAATTTGAAA-----------AAATTGGAGGAATTTTTTAATGTC MN ATTTTTTGATATGCTATAAGTGTCTTCAGGGCAGGGTG --- TCTGGATGGGAGAAGACTGA 138 ---–GGCCTTCTTTCGATTTGTAA-----------AAATTGGAGGAATTTTTTTATGAA SA AAATTTAATAATGTAAAATTCATCTTCGGGGTCGGGTG --- TCTGGATGGGAGAAGATGGA 17 --------TCCTCCTATTCTTACG--------AGATGAATGGAAGGAGAAAATTGAATATG EF AAAAAATATAATACAAGGTTCGTCTTCAGGGGCAGGGT --- GTCTGGATAAAGAAGATAGG 33 ---CTACTCTATTTTTCCCTGCAGA------------AAAATAGGGTTTTTTTGTATGA LLX TTTTTGTGCTATAATAAAAATATCTTCAGGGCACCGTG --- TCTGGATGGAAGAAGATGAA 66 -–TCAACTTCCTCGAAATTTGAAGAAT-TATTTTCTCATATTTGGAGGTTTTTTTATGT LO ATTGTAAGAAAATATTCGTTCATCTTCGGGGCAGGGTG --- TCTGGATGGGAGAAGATGTTG 79 ---ATGCACAAACTCTCCCTCAACTTTTTTTA--------GTTGAGGTTTTTTATTTGC
Attenuation of translation
EC AATCCGCTTATTCTCAGGGCGGGGCG --- TCCGGATGGGAGAGAGTAACG 59 ----------CTGCCCTGATTCTGGTAACCATAATTTTAGTGAGGTTTTT-------TACCATGAATCAGACGCTA TY AACCCGCTTATTCTCAGGGCGGGGCG --- TCCGGATGGGAGAGGGTAACG 61 ----------CTGCCCTGATTCTGGTAACCATAATGTTAATGAGGTTTTTT------TACCATGAATCAGACGCTA KP ATCTCGCTTATTCTCAGGGCGGGGCG --- TCCGGATGGGAGAGAGTAACG 61 ----------CTGCCCTGATTCTGGTAACCATAATTTTAATGAGGTTTTTT------TACCATGAATCAGACGCTC HI TTAGCTCGCATTCTCAGGGCAGGGTG --- TCTGGATGAAAGAGAATAAAA 41 ----------CAGCCCTGATTCTGGTATTTAATTGAAATCTCAAAT-TAGGAAAT--TACTATGAATCAGTCAATT VK TATTTGCGCATTCTCAGGGCAGGGTG --- TCTGGATGAAAGAGAATAAGC 76 ----------CAGCCCTGATTCTGGTATCTAAATATCTTTATATTTCAAGGAATT--TACTATGAATCAGTCTATT AB TAGGCGCGCATTCTCAGGGCAGGGTG --- TCTGGATGAAAGAGAATAAAA 54 ----------CCGCCCTGATTCTGGTATAAATTCATCTTATTAAA—AAGGCATT---TACTATGAATCAGTCATTA YP ATGGGGCTTATTCTCAGGGCGGGGTG --- TCCGGATGGGAGAGAGTAACG 194 ----------CCGCCCTGATTCTGGTAATCCATAATTTTTTAATGAGGTTTCT---TTACCATGAATCAGACGCTT VC CACAACAATATTCTCAGGGCGGGGCG --- TCCGGATGAGAGAGAATGACA 83 ----------AAGCCCTGATTCTGGTCATTTTTT--------------GGAGTATT--ACCATGAATCAGTCCTCA Spu CTATCAACAATTCTCAGGGCGGGGTG --- TCCGGATGGAAGAGAATGTAA 145 ----------ACGCCCTGATTCTGGATATTCCCATGTCGTATTTTTGAAGGATATTAA-CCATGAATCAGTCTTTA MLO GACGTTAAAGTTCTCAGGGCGGGGTG --- TCCGGATGAAAGAGGACGAAA 44 -------CGTGCGTCCTGATTCTGGTTCGAAACGGA--------------AGGATGGACCCATGAATCAGCATTCC AC AAGCGACATCGCTTCAGGGCGGGGCG --- TCCGGATGAAAGAAGACGACG 51 ----------CAGTCCTGAAATGTTTAACCGTAATT-------------------TACGAGAGCATTTCATATGTC BP AAGCAGTACGTCTTCAGGGCGGGGTG --- TCCGGATGAGAGAAGATGTGC 62 ----------TAGCCCTGAAACGTTTTTCGCCATTTCCTTTTTT------------GCGAGAGCGTTTCAATGTCC BPS AGTCAGTGCGTCTTCAGGGCGGGGCG --- TCCGGATGAAAGAAGATGTGC 86 ----------GAGCCCTGAAACGTTTTTCGCCCATTCATGTTTC-----------GCGAGGAGCGTTTCACATCATG BU AATCAGTGCGTCTTCAGGGCGGGGTG --- GCCGGATGGAAGAAGATGTGC 99 ----------ATGCCCTGAAACGTTTTTCGCCCAACTTTT--------------GCGATGAGCGTTTCAACTATGT REU CATCGTTACGTCTTCAGGGCGGGGTG --- TCCGGATGAAAGAAGATGGGC 77 ----------ATCCCCTGAAACGCCCATCCATGGAAATCCACGCAC-------------GGAGCGTTTCAATGCTG RSO GCTTGGTACGTCTTCAGGGCGGGGTG --- TCCGGATGGAAGAAGATGTGC 80 ---------CGTGCCCTGGAACGTCTTGTCGCCCATTTCA---------------GCGAGGAGCGTTTCCATGTTG PP GGTCGGTCGGTCTTCAGGGCGGGGTG --- TCCGGATGAAAGAAGGCGTCA 50 ----------TCGCCCCGAGACGTTCATCGATCATTCA------------------CGAGGAGCGTTTCATGTTCA PY GCCGGTAACGTTCTCAGGGCGGGGTG --- CCGGATGAAGAGAGAGCGGGA 91 ----------ATGCCCTGTTTTTTCATTAAATT---------------------AAACAGGAGTCAGAACACGTGC PU CGGCGAAACGTTCTCAGGGCGGGGTG --- CCGGATGAAGAGAGAACGGGA 68 ----------ACGCCCTGTTTTTCACAC--------------------------AAACAGGAGTCAGAACATGCAA PA GGCCGTAACGTTCTCAGGGCGGGGTG --- CCGGATAAAGAGAGAACGGG 53 ---------AAAGCCCTGTTTTTCAC---------------------------GAAACAGGAGTTCGTCATATG-- BME CGCGGGCTTGTTCTCGGGGCGGGGTG --- TCCGGATGGAAGAGAGCGAAT 54 ----------GCGCCCTGATTCTAGTTTCGTG--------------------------AGGAACCTATGAACCAAA CAU AATCCGAAGACCTTCGGGGCAAGGTG --- TCCGGATGGGAGAAGGTCGGC 116 ------CGCGATGCCCCGAAGGTGTG-----------------------------TTCAGGGGTGTCGCGATGAAC TFU GTACACACGCGTGCTCCGGGGTCGGT --- GGATGGGAGGTAGTACGTGGT 58 -------GCCTTACCCCGGAGCCTGACCT-------------------------GGCTAGGGGGAAGGCTTCTCGCATG GLU TGAGTTTTGTTCTCAGGGCGGGGCG --- TCCGGATGCAAGAGAACCG 32 ---------AAGGCCCCGAGGATTACATGCTTTTAAATCCTTTGAAAAGGGGACAAGATCATGAATCCTATAACCG DR GAACCGACCTCTTTCGGGGCGGGGCG --- TCCGGACGAAAGAAGGAGGAG 1 GACGCTCAGCTTGCCCCCCA------------------------------------GCAGGCGGCGTCCGCGTATG SM GTCGCAAGCGTTCTCAGGGCGGGGTG --- TCCGGATGGAAGAGAGCAAGC 45 ATCATTGGAAAAATGCCAACCCTGAAA-------------------GGCTTGAGACCATGACCATACTT TQ TTCGGCACCTCCTTCGGGGCGGGGTG --- TCCGGATGGGAGAAGGAGGGCCACTTGCGC AMI CTTACTCACAGTTTCAGGGCGGGGTG --- TCCGGATGGAAGAAACGGAGCGCCTTATGG
SD-sequestorThe RFN element
Antisequestor
RFN: the mechanism of regulation
• Transcription attenuation
• Translation attenuation
Distribution of RFN-elements
Genomes Number of analyzed genomes
Number of genomes with RFN
Number of the RFN elements
α-proteobacteria 8 4 4
β-proteobacteria 7 4 4
γ-proteobacteria 17 15 15
δ- and ε-proteobacteria 3 0 0
Bacillus/Clostridium 12 12 19
Actinomycetes 9 4 4
Cyanobacteria 5 0 0
Other eubacteria 7 5 6
Total 68 47 52
YpaA: riboflavin transporter in Gram-positive bacteria
• 5 predicted transmembrane segments => a transporter
• Upstream RFN element (likely co-regulation with riboflavin genes) => transport of riboflaving or a precursor
• S. pyogenes, E. faecalis, Listeria sp.: ypaA, no riboflavin pathway => transport of riboflavin
Prediction: YpaA is riboflavin transporter (Gelfand et al., 1999)
Verification:• YpaA transports flavines
(riboflavin, FMN, FAD) (by genetic analysis: Kreneva et al., 2000; directly: Burgess et al., 2006)
• ypaA is regulated by riboflavin (by microarray expression analysis, Lee et al., 2001)
• … via attenuation of transcription (and to some extent inhibition of translaition) (Winkler et al., 2003)
Phylogenetic tree of RFN-elements
thi-box and regulation of thiamine metabolism genes by thiamine pyrophosphate (Miranda-Rios et al., 2001)
TTCGGGATCCGCGGAACCTGA-TCAGGCTAA-TACCTGCG-AAGGGAACAAGAGTTA THIC_EC TTCGGGATCCGTTGAACCTGA-TCAGGTTAA-TACCTGCG-AAGGGAACAAGAGAAG THIC_VC GCAGTGACCCGTTGAACCTGA-TCCAGTTCA-TACTGGCG-TAGGGACGGTGCAAGC THIC_MLO GCAGTGACCCGTTGAACCTGA-TCCAGTTCA-CACTGGCG-TAGGGACGGTGCAGAC THIC_SM AGAAATACCCTTTACACCCGA-TCGGGATAA-TACCTGCG-TGGGGAGTTTTCACGG THIC_NM TTCTTAACCCTTTGGACCTGA-TCTGGTTCG-TACCAGCG-TGGGGAAGTAGAGGAA thiC_BS CCGTCGACCGTACGAACCTGA--CCGGGTAA-TGCCGGCG-TAGGGAGTTGCAAATG THIC_MT GGATCGACCCTTTGAACCTGA-TCCGGGTAA-TGCCGGCG-GAGGGAAATTATGTCG THIT2_TVO TCCTCGACCCCAAGAACCTGA-TCCGGGTAA-TGCCGGCG-GAGGGATCGGGGAAGG thi1_TM
Notation: Red– Conserved nucleotides; Green– Purine or Pyrimidine conserved nucleotides; Blue– Non-conserved nucleotides
Alignment of THI-elements 1 2 3 3' FACULTATIVE STEM-LOOP 2' 4 5 5' 4' 1' ----====>===> -=====> <===== ========> <======= <=== ===> =====> <===== <=== <====---- BACILLUS/CLOSTRIDIUM GROUP BS_THIC TAGTTACTGGGGGTGCCCGCT----------------TTCcgGGCTGAGAGAGAAGGCA-------------AGCTTCTTAACCCTTT---GGACCTGA-TCTGGTTCG-TACCAGCG-TGGGGA-AGTAGAGGA BS_TENA TAACCACTAGGGGTGTCCTTC----------------ATAAGGGCTGAGATAAAAGTGT-------------GACTTTTAGACCCTCA---TAACTTGA-ACAGGTTCA-GACCTGCG-TAGGGA-AGTGGAGCG BS_YLMB TTCATCCTAGGGGTGCTTTG-------------------CGAAGCTGAGAGAGACTT-----------------TGTCTCAACCCTTT---TGACCTGA-TCTGGATCA-TGCCAGCG-GAGGGA-AGCGGTGAA BS_YKOF AAAGCACTAGGGGTGCTGT--------------------TTTGGCTGAGATAAAGCGCGGAA-----GAAACGCGCTTTGATCCCTTA---TGACCCGA-TCTGGATAA-TACCAGCG-TGGGGA-AGTGCAGGT SA_TENA GAACTACTAGGGGAGCCTAAT----------------GATATGGCTGAGATGAATT-------------------GTTCAGACCCTTA---TGACCTGA-TTTGGTTAG-TACCAACG-TAGGAA-AGTAGTTAT SA_YKOE CACACACTAGGGGTGTTT----------------------TATACTGAGATGAGGCTT---------------GCCCTCAAACCCTTT---GAACCTGA-TCTAGCTTG-AACTAGCG-TAGGAA-AGTGTTACT LLX_YUAJ TTTGCACAATGGGTCTATTGACAAA---------ACTGTCAGTAGCGAGA----------------------------AATACCATC----TGACCTGA-TCTGGGTAA-TGCCAGCG-TAGGAA-TGTGTTAAG CA_THIS ATAGTTAACGGGGAGCCTGTA-----------------GACAGGCTGAGAGTGGAATG--------------TGATTCCAGACCCTCA---TAACCTGA-TTTGGATAA-TGCCAACG-TAGGGA-GTTAATGCA CA_YUAJ TATGTGCTAGGGGTGCCTT---------------------TAGGCTGAGAAACAGTTT--------------GTCACGTTAACCCTT-----AACCTGA-TCTGGATAA-TACCAGCG-TAGGGA-AGCAGTTTG ST_YUAJ TTTCACAAAGGAGTGCTT-----------------------TGGCTGAGATCGCAA------------------TTGCGAAATCCTGA---GGACCTGA-TCTTGTTAG-TACAAGCG-TAGGGA-TTGTGACCA DHA_THIC TAATCACTAGGGGGGCCGAATA---------------AGGTCGGCTGAGATAAAGGACCCA---------AGAATCCTTTGACCCTT-----AACCTGA-TCTGGGTAA-TGCCAGCG-TAGGGAAGGTGGATAA LMO_TENA GAAAAACTAGGGGGGCCGAT-------------------TCTGGCTGAGATAGGAAGGTAAT-----------GCTTTCTGACCCTTT---GAACCTGT-TT--GTTAG-TGCAAGCG-TAGGGA-AGTGAATGT LMO_YUAJ TTACCACAGGGGGGGCTTC---------------------TTAGCTGAGATTGAGTCCACGTGT-----TTTTGGATTCTGACCCTTT---GAACCTGT-TC--GTTAA-TACGAGCG-TAGGGA-TTGTGGCGA PROTEOBACTERIA EC_THIB GTTCTCAACGGGGTGCCACGCGT------------ACGCGTGCGCTGAGAAA---------------------------ATACCCGTCGA---ACCTGA-TCCGGATAA-CGCCGGCG-AAGGGATTTGAGGC EC_THIM AAACGACTCGGGGTGCCCTTCTGC-------------GTGAAGGCTGAGAAA----------------------------TACCCGTATC---ACCTGA-TCTGGATAA-TGCCAGCG-TAGGGA-AGTCACG EC_THIC TTTCTTGTCGGAGTGCCTTA-------------------ACTGGCTGAGACCGTTT------------------ATTCGGGATCCGCGGA---ACCTGA-TCAGGCTAA-TACCTGCG-AAGGGA-ACAAGAG VC_THIC CCACTTGTCGGAGTGCCAT---------------------TGGGCTGAGACCGTTT------------------ATTCGGGATCCGTTGA---ACCTGA-TCAGGTTAA-TACCTGCG-AAGGGA-ACAAGAG VC_THID CCTGTAGTCGGGGAGCCTGAGAG-- 66 5 71 -AATTAAAGGCTGAGATCGCGT-------------------AGCGAGACCCGTTGA---ACCTGA-TTCAGTTAG-GACTGACG-TAGGGA-ACTATCC VC_THIB CCCACTCACGGGGGGCCACCCATTCAT-------CCGAATGGCGCTGAGATCAAGCAC---------------TGCTTGGGACCCGCA 21 -ACCTGA-ACCAGATAA-TGCTGGCG-TAGGAATTGAGCTA XFA_THIC TTTGAAGCGGGGGTACCATAGCCA------------AGCTGCGGTTGAGAC----------------------------ACACCCTTCGA---ACCTGA-TCCGGTTTA-CACCGGCG-TAGGAAAGCTTCGT MLO_THIC CATTCACCAGGGGAGTCCCGG----------------CAAGGGGCTGAGATACTGCTGGCTTTC------GCGGCGCAGTGACCCGTTGA---ACCTGA-TCCAGTTCA-TACTGGCG-TAGGGACGGTGCAA MLO_THIB CGCTCTAACGGGGTGCCGGA------ 5 3 5 -----GACCGGCTGAGAGGCAGT------------------CTCGCCAACCCGCTGA---ACCTGA-TCCGGTTTG-TACCGGCG-GAGGGA-TTAGACG MLO_YK GCCCATCCACAGGGGTGCTCCGTAC-------------GGTCGGGGCTGAGACGGGGGCGG-----------CAAGCCCACAGACCCTAGA----AGCTGA-TCTGGGTAA-TACCAGCG-GAGCGA-GGCGGGCG NX_CITX CTCCTTGTCGGAGTGCCGCCGC---------------CGGGCGGCTGAGATTGCGA------------------AAGCAGAATCCGTAGA---ACCTGT--CGGGGTAA-TGCCTGCG-TAGGAA-ACAAACC NX_THIC ATTGAAACAGGGGTGCTGCCTGAT----------GTTTAGGCGGCTGAGAA----------------------------ATACCCTTTAC---ACCCGA-TCGGGATAA-TACCTGCG-TGGGGA-GTTTTCA ACTINOBACTERIAE MT_THIO CTGTAGACACGGGAGTCCCGGG--------------AGCGGGGTCTGAGAGTGGGCGCGCCT-------------GCCCTTACCGTCAC----ACCTGA-TCCGGATCA-TGCCGGCG-AAGGGAGGTCAAGGATG MT_THIC GTACCCACGCGGGAGCGCACGC--------------CGAGTGCGCTGAGAGGACGGCTCGGG------------GCCGTCGACCGTACGA---ACCTGA--CCGGGTAA-TGCCGGCG-TAGGGAGTTGCAAATG CGL_THIC CAGTCCCCACGGGCGCCCGA-----------------GCACGGGCTGAGATCGCGCTGATT---------GCTGCGCGAGCACCGTTTGA---ACCTG--TCCGGTTAG-CACCGGCG-AAGGAAGAGAGGAATGGTGCAATG CGL_THID ACTAGGCACGGGGTGCCAACCGGATGG---AAAAATTCCGGAGGCTGAGAAA---------------------------ACACCCGTTGA---ACCTGC-TCTAGCTCG-TACTAGCG-AAGGGATGGCCTTAACGTG CGL_THIE CTTACCCCACGGGTGCCCAAT---------------GCATTGGGCTGAGATTGCGCGCTGT---------TGCTGCGCGGGACCGTTCGA---ACCTG--TCTGGTTAA-CACCAGCG-AAGGAAGCGAGGATTGATTGTCCCGTG CGL_YKOE TCATAGACACGGGTGCTCGGTGA------------AAATCCGGGCTGAGATCTGGCA----------------TAGCCACGACCGTCGA----ACCTG-ATCCGGATAA-TGCCGGCG-ATAGGGAGGAAAAATATG CGL_OARX TAGTGACACGGGGTGCAAAAGCACTTT----AAAAAAGCTTTCGCTGAGATT---------------------------ACACCCGTCGA---ACCTG-ATCCAGTTAG-TACTGGCG-AAGGGACTGTCGCAT CYANOBACTERIA NPU_THIC TCCATGCTAGGGGTGCCTACAT---------------AACCAGGCTGAGATC---------------------------ACACCCTTAAC---ACCTGAGTCTGGGTAA-TACCAGCG-GAGGGAAGCTGTTTATTG CY_THIC CCATAGCTAGGGGTGTCTAGAA---------------AGCTAGGCTGAGAA----------------------------AAACCCTTAGA---ACCTGAGACTGGGTAA-TACCAGCG-GAGGGAAGCTCACCATTC AN_THIC TCCATGCTAGGGGTGCTTGCAC---------------TAACAGGCTGAGATT---------------------------ACACCCTTAAC---ACCTGAGACTGGGTAA-TACCAGCG-AAGGGAAGCTGTTTATTG THERMUS/DEINOCOCCUS, THERMOTOGALES, Fusobacterium, CFB group DR_THIB CGCGTCACCGGGGGTGCCCTGCTT------------CGGCAGCGGCTGAGAAC---------------------------ACACCCCAGGA---ACCTGA-ACCGGGTCA-TTCCGGCG-GAGGGAGTGTGATGC DR_THIC ATCGTCAACAGGGGTGCCTCCGCATA--------TGGGCCGGAGGCTGAGAGGGCAACT---------------CGGGCCTAACCCTATGA---ACCTGA-ACTGGTTAG-CACCAGCG-GAGGGA-GTGTGACG TQ_THIBGGCCGTCACCGGGGGTGCCCCA------------------AAAGGGCTGAGAGC---------------------------ATACCCTTGGA---ACCTGA-TCCGGGTCA-TGCCGGCG-TAGGGAAGGTGACGGCC TM_THI1 CCTTCCCCAGGGGGAGCTCCTAT---------------TCCGGGGCTGAGAGGAGGACGG-------------AAGTCCTCGACCCCAAGA---ACCTGA-TCCGGGTAA-TGCCGGCG-GAGGGATCGGGGAAGGA FN_THIC TATATGTACTGGGGAGCTT----------------------TGTGCTGAGATTAGAACCT------------TTTTTCTTAGACCCATAGT---ACCT-GA-TTTGGATAA-TGCCAACG-AAGGGA—GTACCA FN_THIX ACTAGTTACAAGGGAGTTAATA-----------------AATTGACTGAGAAAAGGATG--------------TGAGCCTTGACCTTTTG----ACCT-GA-TTTGGATAA-TGCCAACG-TAGGAA--GTAAA PG_THIS AGACCGCTACGGGGGTGCTTGCCG--- 4 3 4 -GATACGGCAGGCTGAGAT---------------------------AATACCCATAG---ACCT-GA-TCCGGATAA-TACCGGCG-GAGGGAT-GTAG PG_OMR ATTGGGAGAAGGGGTGCTTCCTGTA--- 3 7 3 --GTGGATGGCTGAGAAC---------------------------AAACCCTCATC---ACCT-GA-ACCGGATAA-TACCGGCG-TAGGAAA-CTCTC BX_THIS TAAAGACAAAGGGGTGCCACC------------------CGGTGGCTGAGATT---------------------------ATACCCTAAGA---ACCT-GA-TGCAGTTAG-TACTGCCG-AAGGGA—TTGTG ARCHAEA TAC_T1 GGTGTGGTGGGGGAGCTCCAT-----------------AAGGGGCTGAGAGGATCCGG---------------ATGGATCGATCCCTGGA---ACCTGA-TCCGGGTAA-TACCGGCG-GAGGGAAATTATG FAC_T1 AGTTATACCGGGGAGCTAA---------------------AATGCTGAGAGGATAA-------------------GGATCGACCCGTGCA---ACCTGA-TCCGGACAA-TACCGGCG-GAGGGAGATGGATA
Conserved secondary structure of the THI-element
MG
GG K
CC
C A
G G A
A G
C C U
THI-elem ent
Thi-box
1
4
5
2
C Y G G
G R C C
N U NR
UR
NG
YY
UC
RR
NAG
AG
A
G
3
GA U
GC
N
facultative stem -loop
Capitals: strongly conserved positions. Dashes and points: obligatory and facultative base pairs
Degenerate positions: R = A or G; Y = C or U; K = G or U; M= A or C; N = any nucleotide
THI: the mechanism of regulation
1 ,2
1 ,2
•Thermus/Deinococcus group,•CFB group•Proteobacteria,
• Translation attenuation
•Actinobacteria,•Cyanobacteria,•Archaea
•Bacillus/Clostridium group,•Thermotoga, •Fusobacterium,•Chloroflexus
• Transcription attenuation
Distribution of THI-elements
Genomes Number of analyzed genomes
Number of genomes
with THI
Number of the THI elements
-proteobacteria 7 7 15
-proteobacteria 6 6 12
-proteobacteria 18 17 38
- and proteobacteria 3 1 1
The Bacillus/Clostridium group 18 18 51
Actinomycetes 9 9 25
Cyanobacteria 5 5 5
Other eubacteria 14 11 11
Archaea (Thermoplasma) 17 3 6
Total 97 77 164
Mandal et al., 2003: THI in 3’UTR (plants). THI in untranslated intron (fungi)
Metabolic reconstruction of the thiamin biosynthesis
thiN =
(Gram-positive bacteria)
(Gram-negative bacteria)
Transport of HMPTransport of HET
Metabolic reconstruction of the thiamin biosynthesis
thiN =
(Gram-positive bacteria)
(Gram-negative bacteria)
Transport of HMPTransport of HET
confirmed(Morett et al., 2003 )
The PnuC family of transporters
RFN elements
THI elements
B12-box and regulation of cobalamin metabolism genes by cobalamine (Nou & Kadner, 2000; Ravnum &
Andersson, 2001; Nahvi et al., 2002)
• Long mRNA leader is essential for the regulation of btuB by vitamin B12.
• Involvement of a highly conserved B12-box rAGYCMGgAgaCCkGCcd in the regulation of the cobalamin biosynthetic genes (E. coli, S. typhimurium)
• Post-transcriptional regulation: RBS-sequestering hairpin is essential for the regulation of the btuB and cbiA
• Ado-CBL is an effector molecule involved in the regulation of the cobalamin biosynthesis genes
Conserved RNA secondary structure of the regulatory B12-element
A
A
A
AA
AA
CGd
a
aa
a
a
ktk
h
CC
c
C
C
GG
G
GGG
G
GT
M
Y
K
y
c
c G
g
g G
G
G YG
tg
g
g
gN
RN
N
NN
r
r
r
g
g C
c
c T
C
C G
CC
a
ta N
B 12 box
P 0
5' 3'
P 1
P 4 V S
B I IB I
P 5 P 6
P 2
N
A dd- I
F acultative stem- loop
A dd- I I
The group
Bacillus/Clostridium
Other taxonomic groups
-proteobacteria
base stem
CGh
G
d
yc c
C C
P 3
A
A
A
AA
AA
CGd
a
aa
a
a
ktk
h
CC
c
C
C
GG
G
GGG
G
GT
M
Y
K
y
c
c G
g
g G
G
G YG
tg
g
g
gN
RN
N
NN
r
r
r
g
g C
c
c T
C
C G
CC
a
ta N
P 0
P 1
P 4 P 5 P 6
P 2
N
CGh
G
d
yc c
C C
P 3
B12-element
+Ado-CBL
Ado-CBL
pseudoknot
terminator
1 2 3
1 2
antiterminator
3
A
A
A
AA
AA
CGd
a
aa
a
a
ktk
h
CC
c
C
C
GG
G
GGG
G
GT
M
Y
K
y
c
c G
g
g G
G
G YG
tg
g
g
gN
RN
N
NN
r
r
r
g
g C
c
c T
C
C G
CC
a
ta N
P 0
P 1
P 4 P 5 P 6
P 2
N
CGh
G
d
yc c
C C
P 3
B12-element
+Ado-CBL
Ado-CBL
pseudoknot
RBS-sequestorhairpin
1 2
1 2
antisequestor
A. B.
The predicted mechanism of the B12-mediated regulation of cobalamin genes: formation of a pseudoknot
B12-element regulates cobalamin biosynthetic genes and transporters, cobalt transporters and a number of other cobalamin-related genes.
Distribution of B12-elements in bacterial genomes
Metabolic reconstruction of
cobalamin biosynthesis: new
enzymes and transporters
Cobalt ion transportcbiMNQO, hoxN, hupE, cbtAB, cbtC, cbtD, cbtE, cbtG, cnoABCD
Metabolic reconstruction of
cobalamin biosynthesis: new
enzymes and transporters
Cobalt ion transportcbiMNQO, hoxN, hupE, cbtAB, cbtC, cbtD, cbtE, cbtG, cnoABCD
recently confirmed(Zayas et al., 2006)
confirmed(Woodson et al.,
2004)
If a bacterial genome contains B12-dependent and B12-independent isoenzymes, the genes encoding the B12-
independent isoenzymes are regulated by B12-elements
Ribonucleotide reductasesRibonucleotide reductases
NrdJ NrdJ ((BB1212-dependent-dependent)
NrdAB/NrdDG NrdAB/NrdDG ((BB1212-independent-independent))
+ ––
–– +
+ +
Methionine synthaseMethionine synthase
MetH MetH ((BB1212-dependent-dependent))
MetEMetE((BB1212-independent-independent))
++ ––
–– ++
++ ++
B12B12 B12
If a bacterial genome contains B12-dependent and B12-independent isoenzymes, the genes encoding the B12-
independent isoenzymes are regulated by B12-elements
nrdAB in Streptomyces coelicolor: experimental confirmation in (Borovok et al., 2005)
Ribonucleotide reductasesRibonucleotide reductases
NrdJ NrdJ ((BB1212-dependent-dependent)
NrdAB/NrdDG NrdAB/NrdDG ((BB1212-independent-independent))
+ ––
–– +
+ +
Methionine synthaseMethionine synthase
MetH MetH ((BB1212-dependent-dependent))
MetEMetE((BB1212-independent-independent))
++ ––
–– ++
++ ++
B12B12 B12
LYS-element, a.k.a. L-box: lysine riboswitch
uaAG
u
CG
P 1
5' 3'base stem
R Yr y
Gy
y
r
aa
g
u g
a a a GG
r Cr G
y G Cyk
a G ug R
C a Yu
a
Gg N
a
aA
a N
acUGC
GA
G G gaR
ru
Yy
P 2
P 5P 6
P 7
P 3P 4
Reconstruction of the lysine metabolism
-aspartyl-phosphate
aspartate semialdehyde
homoserine
dihydrodipicolinate
tetrahydrodipicolinate
N-acetyl-2-amino-6-ketopimelateN-succinyl-2-amino-6-ketopimelate
N-acetyl-L,L-diaminopimelateN-succinyl-L,L-diaminopimelate
L,L-diaminopimelate
meso -diaminopimelate
Lysine transport
L-aspartate
lysC,dapG,yclMlysC,thrA,metL
asd
hom
thrA,metL
dapA
dapB
dapDdapD
ykuR
dapC(argD)
ddh
patA
dapE
dapF, dal
lysA
predicted genes are boxed (pathway of acetylated intermediates in B. subtilis)
Regulation of the lysine catabolism: the first example of an activating riboswitch
• LYS-elements upstream of the pspFkamADEatoDA operon in Thermoanaerobacter tengcongensis; kamADElysE operon in Fusobacterium nucleatum– lysine catabolism pathway– LYS element overlaps candidate terminator
=> acts as activator
• similar architecture of activating adenine riboswitch upstream of purine efflux pump ydhL (pbuE) in B. subtilis (Mandal and Breaker, 2004)
S-box (SAM riboswitch)
g u y
c a r
NaAUGc
AP 1
5' 3'base stem
u R
CA
U
U
uGa
P 4
NaGA
g
c
GR
CA
aCcD H
Gg
UGCY
a
AA NuccN
r
N
N
G gy
C cr
P 2
G GG A
C C DC
rG
N y G A a
Ac
gg
P 3
P 5g
Grundy and Henkin, 1998
Reconstruction of the methionine metabolism
Cystathionine
Homocysteinemethyl-THF
Sulfide
CH
methylene-THF
THF
3
O-acetylhomoserine
Homoserine
Aspartate semialdehyde
Methionine
S-ribosyl-hom ocysteine
(SRH)
S-adenosyl-hom ocysteine
(SAH)
S-adenosyl-methionine
(SAM)
Methylthioribose (MTR)MTA
Threonine
metI yrhB
metC yrhAmetF
yxjH*
metK
mtnKSUVW XYZ
hom
cysH-...metB
metH
metX
metEmtn
mtn
metY
predicted genes are boxed and marked by *(transport, salvage cycle)
A new family of amino acid transporters
S-box (rectangle frame)MetJ (circle frame)LYS-element (circles)Tyr-T-box (rectangles)
BC1434
FN 062 4
269.47
SON-3
CJ
CPE
LysT
MetT
TyrT
MleN
DF
CTCCB
OB
SO N-2VC-2
NM B
SON-1
VC-1
BHHP
C
TTE-nhaC
AC0744
FN0978
BL1111
CTC 00901
OB2874OB1118
NMB05 36
FN0352BC4121
EF-nhaC 1
EF-nhaC 2
PPE
LP-nha2
LP-nha1 L
L
M
G A
ELB
BS-yheL
BS-m leN
FN0650
VC2037
BC1709
SA 2292HI1107
VV21061FN207 7
BH3946
BC0373
FN14 22
BB0638
BB0637
F N1420
CTC02529SO1087
VCA0193
BT1270
C
CB
T C02520
CPE2317
FN1414
SA2117
Archaea
clostrid ia
Pasteure llaceae
malate/lactate
Repression of reverse pathway Met Cysin Clostridium acetobutylicum
in the presence of Cys and absence of Met
ubiG yrhA
antisense transcript
Cysteine
S-adenosylmethionine
yrhB
AA
Cys-T-box S-box
sense transcript
Firmicutes
Other genomes with S-boxes: the Zoo• Petrotoga• actinobacteria
(Streptomyces, Thermobifida)• Chlorobium, Chloroflexus, Cytophaga• Fusobacterium• Deinococcus
Lactobacillales:Met-T-box (Met-tRNA-dependent attenuator)
Streptotoccales:MtaR (transcription factor);SAM-III riboswitch (metK)
(the Henkin group)
Bacillales:S-box
Clostridiales:S-box
Loss of S-boxes
E.coli:TFs
Xanthomonas:S-box
alphas:SAM-II
Geobacter:S-box
proteobacteria
Need more genomes
Riboswitches in metagenomes
0
20
40
60
80
100
120
140
160
SargassoSea
Minnesotasoil
Whale fall
new functions:S-box: eukaryotic-type translation initiation factor eIF-2B (COG0182)B12-box: fatty-acid desaturase (COG1398)GCVT: malate synthase glcB, phosphoserine aminotransferase serC
0
50
100
150
200
250
300
350
Whale fall Minnesotasoil
SargassoSea
G-box
YKOK
GLMS
LYS
YKKC
S-box
RFN
YYBP
B12
THI
GCVT
Riboswitch composition of metagenomes
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Whale fall Minnesotasoil
SargassoSea
G-box
YKOK
GLMS
LYS
YKKC
S-box
RFN
YYBP
B12
THI
GCVT
total per 100 000 contigs: 47 27 26
Riboswitches in metagenomes by taxonomy
0
20
40
60
80
100
120
140
160G-box
YKOK
GLMS
LYS
YKKC
S-box
RFN
YYBP
B12
THI
GCVT
62
4430 26
19 15 11 8 3 total per 100 000 contigs
Conserved structures of riboswitches (circled: X-ray)
NNNNyYYUC
NNNNrRRAG
NgGG
NcCC
Rg
GGxc G
Aux
gRRA
GRC
CYG
AcCG
AGCCRGYGG YRCC GRYBy CYRVr
G N
YGN
aA N U U x N
Nx
AGU
UrN
A gY
uK N
RA
xK
Var
Add
RFN-element
MG
GG
A
G G A
A G
C C U
THI-element
C Y G GN U N
RUR
UC
RR G
A
A
A
AA
AA
CGd
a
aa
a
a
ktk
h
CC
c
C
C
GG
G
GGG
G
GT
M
Y
K
y
c
c G
g
g G
G
G YG
tg
g
g
gN
RN
N
NN
r
r
r
g
g C
c
c T
C
C G
CC
a
ta N
B 12 box
P1
5' 3'
P2
P5 P6 P7
P3
N
base stem
CGh
G
d
yc c
C C
P4
g u y
c a r
NaAUGc
AP1
5' 3'
u R
CA
U
U
uGa
P4
NaGA
g
c
GR
CA
aCcD H
Gg
UGCY
a
AA NuccN
r
N
N
G gy
C cr
P2G GG A
C C DC
rG
N y G A a
Ac
gg
P3
P5g
AUR
UA
P1
5' 3'
C GU R
Y
CA RUAU
GG
P2
AN
U
A
C
GU N U U
A
UA
A A
G
GCC
P3
C
N G A
U
P1
P2
P3
P4
P5
P3 P2
P4
base stem base stem5' 3' 5' 3'
B12-element
base stem
S box-
base stem
G box-
Add
Add I
Add II
Add III
Var
P5
P1
uaAG
u
CG
P1
5' 3'base stem
R Yr y
Gy
y
r
aa
g
u g
aa a GG
r Cr G
y G Cyk
a G ug R
C a Yu
a
Gg N
a
aA
a N
acUGC
GA
G G gaR
r
uYy
P2
P5P6
P7
P3P4
LYS-element
Mechanisms
UUUUUUUU
5 ’
33 ’
5 ’
Regulatory hairpin(terminator of transcription and or RBS-sequestor)/
In the case of regulation of transcription
In the case of regulation of translation
GENES
3 ’ GENES
RNA-element
A
5 ’
1 3UUUUUUUU
Antiterm inator/Antisequestor
3 ’ GENES
5 ’ 1 2
RNA-element
3 ’ GENES
B 5 ’
2 3
Antiterminator/Antisequestor
3 ’ GENES
C
5 ’
RNA-element
3 ’ GENES
12
5 ’
1 23 ’ GENES
Regulatory hairpin
+ Effector
UUUUUUUU
- Effector
2
1gcvT: ribozyme, cleaves its mRNA (the Breaker group)
THI-box in plants: inhibition of splicing (the Breaker and Hanamoto groups)
Characterized riboswitches (more are predicted)RFN Riboflavin
biosynthesis and transport
FMN (flavin mononucleotide)
Bacillus/Clostridium group, proteobacteria, actinobacteria, other bacteria
THI Biosynthesis and transport of thiamin and related compounds
TPP (thiamin pyrophosphate)
Bacillus/Clostridium group, proteobacteria, actinobacteria, cyanobacteria, other bacteria, archea (thermoplasmas), plants, fungi
B12 Biosynthesis of cobalamine, transport of cobalt, cobalamin-dependent enzymes
Coenzyme B12 (adenosyl-cobalamin)
Bacillus/Clostridium group, proteobacteria, actinobacteria, cyanobacteria, spirochaetes, other bacteria
S-boxSAM-IISAM-III
Metabolism of methionine and cystein
SAM (S-adenosyl- methionine)
Bacillus/Clostridium group and some other bacteriaSAM-II (alpha), SAM-III (Streptococci)
LYS Lysine metabolism lysine Bacillus/Clostridium group, enterobacteria, other bacteria
G-box Metabolism of purines
purines Bacillus/Clostridium group and some other bacteria
glmS (ribozyme)
Synthesis of glucosamine-6-phosphate
glucosamine-6-phosphate
Bacillus/Clostridium group
gcvT (tandem)
Catabolism of glycine
glycine Bacillus/Clostridium group
Properties of riboswitches
• Direct binding of ligands
• High conservation
– Including “unpaired” regions: tertiary interactions, ligand binding
• Same structure – different mechanisms: transcription, translation, splicing, (RNA cleavage)
• Distribution in all taxonomic groups
– diverse bacteria
– archaea: thermoplasmas
– eukaryotes: plants and fungi
• Correlation of the mechanism and taxonomy:
– attenuation of transcription (anti-anti-terminator) – Bacillus/Clostridium group
– attenuation of translation (anti-anti-sequestor of translation initiation) – proteobacteria
– attenuation of translation (direct sequestor of translation initiation) – actinobacteria
• Evolution: horizontal transfer, duplications, lineage-specific loss
• Sometimes very narrow distribution: evolution from scratch?
• RFN, S-box– early identification of a conserved element – model of regulation from comparative analysis– use for functional annotation– experimental validation
• THI, B12, PUR, LYS– scavenging of unexplained published experimental results– models of regulation from comparative analysis– experimental validation– use for functional annotation
• GcvT, GlmS– large-scale computational screens– prediction of ligand from functions of regulated genes– experimental validation
• SAM-II, SAM-III– gaps in regulatory systems– computational screens– experimental validation
• Structures: PUR, THI, S-box
Study scenarios
Teaser:
Systematic analysis of T-boxes
• T-boxes: the mechanism (Grundy & Henkin)
Terminator(underlined) ===========> <===========
Antiterminator ==> ===> <===<== SA serS -> 26 CGTTA 51 AAATAGGGTGGCAACGCGTAGAC------------CACGTCCCTTGTAGGGATGTGGTCTTTTTTTA DHA tyrZ -> 47 CGTTA 65 AGGTAAGGTGGTAACACGGGAGCA-------TACTCTCGTCCTTCTGGCAATGAAGGACGGGAGTTTTTTGTTTT ST trpS -> 37 CCTTA 61 AATTGAGGTGGTACCGCGTATTACTT----GTAATAACGCCCTCACGTTTTAATAGCGTGGGGACTTTTTGCTAT CA aspS -> 39 CGTTA 34 ATAAAGGATGGCACCGTGAAAA----------GCCTTCACTCCTTACTGGAGTGGAGGCTTTTTTTATTTTAAATAAA DF valS -> 41 CGTTA 77 AATTAAGGTGGTAACGCGAGC------------TTTTCGTCCTTTTTAAAGAGGATGAAGAGCTCTTTTTTATTTCT PN thrS -> 30 CGTTA 38 AATGAAGGTGGAACCACGTTG-------------CGACGTCCTTTCGAGGATGTCGCATTTTTTTATTAG MN ileS -> 89 CGTTA 68 AATTAAGGTGGTACCACGAGC-------------TTTCGTCCTTTGATGAAAGTTCTTTTTTATTGAT DF leuS -> 28 AGCTA 29 AATTAGGGTGGTACCGCGAAGATT-------TATCCTCGTCCCTAAACGTAAGTTTAGTGACGAGGATTTTTTATTTTCA HD argS -> 41 CGTTA 27 AACGAGAGTGGTACCGCGGGTAA---------AAGCTCGCCTCTTTTTAGAAGAGGCGGGTTTTTTATTTT DF proS -> 33 CGTTA 30 AACTAGAGTGGTACCGCGGAAAT-----TAAACCTTTCGTCTCTATACTTGTATAGAGATGAGAGGTTTTTTATATTTTCAGGA ZC lysS -> 46 CGTTA 63 AACTGAGGTGGTACCGCGAAGCTAA-----CAACTCTCGTCCTCAAGATGAATAATCTTGGGGGTGGGAGTTTTTTTGTTGCAT BQ metS -> 55 CGTTA 66 AAATAAGGTGGTACCGCGACTGTTTA---TACAGCCCCGCCCTTATCTTTTTTAGATAAGGGCGGGGCTTTTTATATTTAA MN pheS -> 14 AATTA 20 AAAACGGATGGTACCGCGTGTC-------------AACGCTCCGCTTAAGGAGTTTTGGCACTTTTTTTGTTTT MN glyQ -> 14 AGCTA 23 AATTAGGGTGGAACCGCGTTT------------CAAACGCCCCTATGTCAGTTGGCATGGGAGTGATTGAGCGTGGCTCTTTT ST alaS -> 20 AATTA 18 AATAGAGGTGGTACCGCGGTT--------------TTCGCCCTCTGTGAGATGGACTTGTTTTGTATGGAGGACTATTTGAAA SA trpE -> 32 AATTA 4 AACTAAGGTGGCACCACGGTA-------------ACGCGTCCTTACAGGTATATGCGTTATGTGGTGTCTTTTT BS ilvB -> 50 CGTTA 47 AACAAGGGTGGTACCGCGGAAAGAAA---AGCCTTTTCGCCCCTTTTAGCTATCGCAGTTACTGCGCGGCTGATTGT CA ilvC -> 40 CGTTA 14 AATTTGGGTGGTACCGCGCGACCAAA-----AATTCTCGCCCCAAGCAGGGAATTTTGGCCGTTTTTTTATATAAATAAAT BQ asnA -> 51 CGTTA 62 AATTTGGGTGGTACCGCGGAACC-----AAAGCCTTTCGTCCCAGTTTTTTGGGAAAGAAGGGCTTTTTTTGTTGGCTT BS proB -> 33 CGTTA 30 AATCAAGGTGGTACCACGGAAAC--------CCATTTCGTCCTTATGAATCAGGATGAAATGGGTTTTTTTATTGTAGA SA cysE -> 33 CATTA 62 ATTCAGAGTGGAACCGTGCGG-------------AAGCGCCTCTAACAATACAATTTGTATGTTAGTGGTGCTTTTTTG MN hisC -> 46 CGTTA 50 AATGAAGGTGGAACCACGTGTGT---------GTCAGCGTCCTTGCAAGTTTTTTGCAAGGGCGCTTTTTTGAATAGT DHA pheA -> 41 CGTTA 50 AAAAAGGGTGGTACCGCGTGAC---------TTAACTCGTCCCTTATTTGGGGGTGAGGTAAGTCTTTTTTTATTTA HD serA -> 42 cgtta 57 AATGAGGGTGGCACCGCGGTATG-------AACCTTCCGCCCCTCACGACAGTCGTCGTGTGGGCAGAAGGTTTTTTTACTATCA BQ phhA -> 51 CGTTA 34 AAATAGGGTGGTACCGCGATTC------------TTTCGCCCCTATCGGATTTTCCGATAGGGGCTTTTTCTATTTC EF yxjH -> 40 CGTTA 51 AAAAAAGGTGGTACCGCGATAA-----------TAATCGCCCTTTTACTAGTTACGGCTAGTAAAAGGGCGTTTTTTTATAAA CA yckK -> 38 CGTTA 57 AATTAGAGTGGTACCGTGGAATT-------CAACTTCTGCCTCTAACTATGAGGATAGAAGTTTTTTGTTTTTAT DF yqiX -> 41 CCTTA 30 AAAAAGAGTGGTAACGCGGATAT----------AATTCGTCTCTTAGCTGTAAAGCTAAGGGACTTTTTTGATTTA HD BH0807->74 TGTTA 56 AACTGGGGTGGCACCACGACAAG----------TGATCGTCCCCAAGACTTTTATCAGTCTTGGGGACGTTTTTTTGTTCAT EF yheL -> 8 AATTA 33 AATTAAGGTGGTACCGCGGAGA-----------GATTCGTCCTTATTCTTTAAGGATGAATCTCTCTTTTTATGTAGC BQ ykbA -> 46 CGTTA 45 AACAAGGGTGGAACCACGAATAT--------AACACTCGTCCCTTTTTTAGGGAGGAGTGTTTTTTTATT BQ sdt2 -> 40 CGTTA 56 AATTGAGGTGGTACCACGGTATTAACATTACATATATCGTCCTCTACATGCATATTTGCGTGTAGGGGACTTTTTTATTTTC EF yusC -> 42 CGTTA 60 AATTAAGGTGGTATCACGAAATGA-----CAAACTTTCGTCCTTTTTGCTGTAATAGCAAAAGGATGGAAGTTTTTTTGTTT CA yhaG -> 48 CGTTA 51 AATTTAGGTGGTACCGCGGAAGT---------ATCTCCGTCCTAATTAATAAGATTAGGGCGGAGTTTTTTATTTGC BQ brnQ -> 44 CGTTA 66 AATTAGGGTGGTATCGCGGGTAAA------TATAACTCGTCCCTTTCTTTAGGGACGAGTTTTTTGTGTTCTT REF01723 -> 44 CGTTA 55 AATTGAGGTGGCACCACGAATGC----------GATTCGTCCTCTTGGCTCACAGCCAAGAGGCTTTTTTGTTTTTTTAATA BS yvbW -> 56 CGTTA 32 AACAAGAGTGGTACCGCGGTCAGC--CGAAGGCTCGTCGTCTCTTTATCTATTAGATTAGGTAGGAGACGGCGGGCTTTTTT
Aminoacyl-tRNA synthetases
Amino acid biosynthetic genes
Amino acid transporters
TGG: T-box
Partial alignment of predicted T-boxes
specifier hairpin ===> ==> ===> <=== <== SC<=== SA SERS SER ---GTAGGACAAGTA 19 AGAGAGCTTGTGGTT---AGTGTGAACAAG--- 15 GAA--TCTACCTACTT -> DHA tyrZ Tyr ----AAGAACAAGTA 18 AGAAAGTTGCCGGCT---GATGAGAGGCGCTT 18 GAA--TACCTCTTTGA -> ST trpS Trp ---ATTAGAAGAGTA 16 AGAGAGTTAGTGGTT---GGTGCAAGCTAAC- 12 GAAA-TGGACTAATGA -> CA ASPS ASP -----GAGAAAAGTA 18 AGCGAATTGGGAAAT---GGTGTGAGCCCAA- 15 GAAA-GACATCTCGGA -> DF VALS VAL -GAAGAAGAGGAGTA 16 AGAGAGGAAAATTCACTGGCTGTAAGATTTTC 17 GAAT-GTAGCTTTGGA -> PN THRS THR ----AGAGACAAGTC 18 AGAGAGTGCGTGGTT---GCTGGAAACGCAT- 14 GAT--ACTACTCTTGA -> MN ileS Ile ----CAAAAACACAA 17 AGCGAATAGGTGAT----GGTGTAAGACCTATT 18 -----ATCATTTTGTT -> DF leuS Leu ----CTAGAGCAGTA 19 AGAGGAAGTGGAA-----GGTGAGAACTAATATT 10 GAA--CTTACTAGATT -> HD ARGS ARG -----TGGGAGAGTA 20 AGCGAGTCGGGAT-----GGTGGGAGCCGAT- 14 GAAA-CGCACCCATGA -> DF proS Pro ---AAAGAAATAGTA 18 AGAGAGAAAACGGT----GGTGAGAGTTTTC-- 14 GAA--CCTGTCTTTTA -> ZC lysS Lys ---AAGAGAAGAGTA 19 AGAGAGCTCTGGTA----GCTGAGAAAGAGC-- 15 GAAAAAAGACTTGGAG -> BQ metS Met ---AAAGGAAAAGTA 19 AGAGAGCTTCGGTA----GCTGAGAAGAAGC-- 14 GAACAATGGCCTTTGA -> MN pheS Phe ----TGAGATTAGTA 18 AGGGAATGCGGGGCGTG-ACTGGAAACCCGC- 16 GAA--TTCACTCAGAA -> MN glyQ Gly ---AGAAAGAGAGTT 15 AGCGAACCTGAGAG----AGTGTAAGTCAGGT 14 GACT-GGCACTTTCTC -> ST alaS Ala -AGTTAAGAATTGTT 17 AGAAAAGTGACGGTT---GCTGCGAGTCATT- 17 -----GCTACTTAACT ->
SA trpE Trp TCTAAAGAAATAGTA 22 AGAAAGCTAATGGGT---GATGGGAATTAGC-- 14 GAAT-TGGACTTTGGA -> BS ilvB Leu ---TGAGGATAAGTA 20 AGAGAACCGGGTTA----GCTGAGAACCGG--- 16 GAA--CTCGCCTCAGA -> CA ilvC Val -----AGGAAGAGTA 17 AGAGAGTGAGATACT---GGTGGGAACTCAT-- 13 GAAG-GTAGCCTTTGA -> BQ asnA Asn --AGGACGAGTAGTA 15 AGCGAGTCAGGGGT----GGTGTGAGCCTGA-- 15 GAAG-AACCTCCTGGA -> BS proB Pro -----AGGATTAGTA 18 AGAGAGCAAAATGAACC-GCTGAAACATTTTGC 15 GAA--CCTGCCTTGGA -> SA cysE Cys --CGAAGGATTAGTA 18 AGAGAGTGTACGGTT---GCTGTGAGTACA--- 14 GAA--TGCACCTTCGT -> MN hisC His -----AGAGAAAAAA 16 AGAGAGTATGGGAA----GCTGAAAACATAC-- 15 -----CACATTCTTGA -> DHA pheA Phe -----AAAGAGAGCA 19 AGGGAACTAAAGTCGGAGACTGAAAGCTTTAGT 14 GAGA-TTCACTCTGGA -> HD serA Ser ----GAAGATGAGGA 17 AGAGAGCTGGTGGTT---GCTGTGAACCAGCT- 18 -----AGCCCTTCTGA -> BQ phhA Tyr AGAATCGCAGTAGTA 17 AGAGAGCTAATGGTC---GGTGGAAATTGGC-- 14 GAAT-TACAATTCTGG -> EF yxjH Met -----TAGGAAAGTA 17 AGAGAGACTTTGGTT---GGTGAAAAAAGTT-- 13 GAAAAATGGCCTAGGA ->
CA yckK Cys ----AAGAACCAGTA 17 AGAGAAAAATCTCCAAG-GCTGAAAGGGATTTT 15 GAA--TGCATCTTTGA -> DF yqiX Arg -----AGAGAAAGTA 16 AGCGAGTTAGGGGTT---GGTGTAAGCCTAGC- 14 GAAG-AGAGCTCTGGA -> HD BH0807 Lys ----AGAGAAGAGTA 19 AGAAAGCCTGTAGTT---GCTGAGAACGGGT-- 14 GAAGCAAGACTCTGAG -> EF yheL Tyr -TTATTAGCCCAGTA 19 AGAAAGTCGATGGTT---GCTGCGAATCGAT-- 13 GAAT-TACACTAATAA -> BQ ykbA Thr --GAGGACACGATCA 16 AGAGAGGGAAGCCTTTG-GCTGTGAGCTTCCT- 14 GATT-ACCACCTCTGA -> BQ sdt2 Trp ---GCAAGAAGAGTA 18 AGAGAGCTGGGGGAA---GGTGTGAGCCCGGT- 15 GAA--TGGGCTTGCGA -> EF yusC Met ----AAAGAAGAGTA 18 AGAGAGCCCTGTTT----GCTGAGAATGGG--- 16 GAAG-ATGGTCTTTGA -> CA yhaG Trp ----AAGGAAGAGTA 18 AGAGAGCTGAGGGT----GGTGTGATCTCAGT- 15 GAA--TGGACCTTTTA -> BQ brnQ Ile ----GAGAACGAGTA 19 AGAGAGTTGGCGATTT--GCTGAAAGCCAAC-- 15 GAAA-ATCATCTCCGA -> REF01723 His --TTAGGACATAGTA 18 AGAGACTTTTTCATTG--GCTGAAAGAAAAAG- 17 -----CACACCTAAAA -> BS yvbW Leu -----GGGAGCAGTA 18 AGAGAGCTGCGGGGT---GGTGCGACGCAGC-- 13 GAA--CTCGCCCGGGA ->
Aminoacyl-tRNA synthetases
Amino acid biosynthetic genes
Amino acid transporters
… continued (in the 5’ direction) anti-anti (specifier) codon
~800 T-boxes in ~90 bacteria• Firmicutes
– aa-tRNA synthetases– enzymes– transporters– all amino acids excluding glutamine, glutamate, lysine
• Actinobacteria (regulation of translation – predicted) – branched chain (ileS)– aromatic (Atopobium minutum)
• Delta-proteobacteria – branched chain (leu – enzymes)
• Thermus/Deinococcus group (aa-tRNA synthases)– branched chain (ileS, valS)– glycine
• Chloroflexi, Dictyoglomi– aromatic (trp – enzymes)– branched chain (ileS)– threonine
Same enzymes – different regulators (common part of the aromatic amino acids biosynthesis pathway)
P H E T Y R
trpE
P E P E 4 P
D A H P
S H IK IM AT E
C H O R IS M AT E
trpDCFBA
tyrA hisC aspB
phhA
aroF
aroI aroE
aroA
aroD
aroB
aroC
aroA pheB aroH
yhaG
T R P
T R P
k in u ren in e p a th w ay
A N T H R A N IL AT E
F O L AT E
pabA pabB
A D C
trpG
TRP trpXYZ
TRP\PHE yocR fam ily
TYR yheL
aro:Regulated by TYR (BC)Regulated by PHE (SW O, DRE, HMO, CH, MTH, CTH)Regulated by TRP (DE, DEH)
cf. E.coli: AroF,G,H: feedback inhibition by TRP, TYR, PHE; transcriptional regulation by TrpR, TyrR
Recent duplications and bursts: ARG-T-box in Clostridium difficile
LJ_ARGS
LME_ARGS
LR_ARGS
LP_ARGS
CBE_ARGS
CPE_ARGSCB_ARGS
CTC_ARGS
CAC_ARGS
CDF_YQIXYZ
RDF02391
СDF_ARGC
CDF_ARGH
BC_ARGS2EF_ARGS
BH_ARGS
LSA_ARGSPPE_ARGS
LGA_ARGS
Bacillales
argSyqiXYZ
RDF02391
argCJBDF
predictedamino acidtransporters
NEW
argG
argH
Clostridiumdifficile
amino acidbiosynthetic genes
: ARG-specific T-box regulatory site
aminoacyl-tRNA synthetase
biosynthetic genes
amino acid transporters
NEW
Lactobacillales Clostridiales
argS argS
others
Expansion of T-box regulon
regulation of expression of arginine biosynthetic and transport genes by T-box antitermination
: ARG-specific T-box regulatory site
Binding to 5’ UTR gene region regulation of gene expression
Other clostridia spp. (CA, CTC, CTH, CPE, CB, CPE)
yqiXYZ
argC
argH
yqiXYZ
argC
argG
argH
AhrC regulatory protein (negative regulation of arginine metabolism positive regulation of arginine catabolism)
...AhrC site
: AhrC binding site
Gram+ bacteria: Clostridiumdifficile:
AhrC is lost
5’
More duplications: THR-T-box in C. difficile
MMY_THRS
OOE_THRS
HMO_YNGICAC_THRZ
BC_THRZ*
BC_THRZ
BC_HOM
BH_THRS
BE_THRSBCE_BRNQ2
BC_THRS
BL_THRZ
BCL_THRZ*
BS_THRZ*
BCL_THRZ
BS_THRZ
BL_THRSBS_THRS
BCL_THRSLMO_THRS
LB_THRSPPE_THRS
LJ_THRS
LP_THRS
TR_THRZ
EX_THRS
CBE_THRZ CTH_THRZCPE_THRS
TTE_THRZ
CDF_THRZ
CDF_HOMCDF_THRC
CDF_HOM*
С _THRZBCTE_THRZ
CBE_THRS
CTC_BRNQ1
LL_THRS
SUI_THRS
STH_THRS
SG_THRS
SMI_THRSSPN_THRS
SMU_THRSSAG_THRSSUB_THRS
SEQ_THRSSPY_THRS
SA_THRS
LME_THRS
MFL_THRS
: THR-specific T-box regulatory site
Bacillales
Clostridiales
LactobacillaceaeLeuconostocaceae
thrS
hom
thrS
thrZ
hom
thrS
thrZ
thrCB
С. difficile
hom
thrS
thrZ
brnQ
hom
thrS
thrZ
brnQB. cereus
brnQ
thrS
thrZ
thrS
thrZ
others
aminoacyl-tRNA synthetase
biosynthetic genes
amino acid transporters
Streptococcaecae
thrCB
ASN/ASP/HIS T-boxes:
Duplications and changes in specificity
CB_ASNS2
CDF_ASNA
EF_HISS
EX_HISS
BCL_HISSBH_HISS
OB_HISS
BC_HISS
TTE_HISS
DRE_HISS
CH_HISSCTH_HISS
PL_HISS
BE_HISSBL_HISS
BS_HISS
LME_HISXYZCDF_HISZX
LRE_HISXYZLSA_HISXYZ
OOE_HISXYZ
LP_HISXYZ
SGO_HISCSMU_HISC
EF_HISXYZ
LMO_HISXYZ
EF_HISXYZ
LME_HIS(Z G\ )
LL_HISCLP_HISZ
LCA_HISZCB_ASNS3
CAC_ASNS32
BC_ASNS2
PPE_HISXYZ
PPE_ASNS
LB_ASNA
LD_ASNALJ_ QHMPgln
LJ_ASNA
PPE_ASNALP_ASNA
EX_ASNA
LB_ASNS2
CTC_ASNS2
PPE_HISSLP_HISS
LB_HISS
LJ_HISS
LRE_HISS
LRE_ASPS
LCA_HISS
CPE_ASNA
BC_ASNACBE_ASNS2
CTC_ASNACDF_ASNS2
CPE_ASNS2
his operon
his XYZ
Lactobacillales
NEW
hisS
Other Gram+
ASP\ASN
HIS
Bacillales
HIS
aspS
SMU_ASPS2SG_ASPS2glnQHMP
L. johnsoniiasnA
ASP
ASN
asnAASN
Lac acillalestobasnS
ASN
aspS
hisXYZ
P. pentosaceus
asnS
HIS
ASP
Clostridiales
asnAASN
ASN
asnA
asnS
asnA
Blow-up
PPE_ASNS2
LB_ASNA
LD_ASNALJ_GLNQHMP
LJ_ASNA
PPE_ASNALP_ASNA
PPE_HISSLP_HISS
LB_HISS
LJ_HISS
LRE_HISS
LRE_ASPS
LCA_HISS
aspShisSASP
Lac acillalestob
HIS ASPhisS
L. reuteri
aspS
ASP HIS
CACGAC
asnAASN
Lac acillalestob
disruption of hisS-aspS operonmutation of regulatory codon
L. johnsonii
asnA
ASP
ASN
glnQHMP
PPE_HISXYZ
ASN
AAC
P. pentosaceus
HIS
ASPhisXYZ
asnS
HIS
CAC
ASPASN
AAC GAC
Branched-chain amino acids: duplications and changes in specificity
DG_VALS EX_VALS BCL_VALS
CTH_VALS
BC_VALS
BH_VALSBE_VALS
CH_VALS LMO_VALS
CA_ILVC
SA_VALS
OOE_LEUS
PPE_LEUS
LB_LEUS EF_LEUS
LJ_LEUSLGA_LEUS
OB_ILVB
LP_LEUS
LSA_LEUS
OB_LEUSSW O_029_0008
SWO_LEUS BS_YVBW
BL_YVBW
DRE_070_0004CH_LEUS
LM
O_L
EU
S
BL_LEUS
BS_LEUS
BE_LEUSBH_LEUS
BC_LEUS
BCL_LEUS
DTH_ILVB
BS_ILVB
PL_ILVB
BH_ILVB
BE_ILVBBL_ILVB
BCL_ILVB
GSU_LEUA
DH
A_L
EU
A
TTE_LEUS
CTH_148_0001
DF_LEUS
CDF_LEUA
CPE_LEUS
CBE_LEUS
CTC_LEUS
CB_LEUSCA_LEUS
EX_LEUS
DAC_LEUA
BC_YOCR3
OB1271
LP3666
STH_ILES
LP_BRNQ1_ile
SUB_ILES
LL_ILES
LCR_ILES
SPY_ILES
SZ_ILES
SEQ_ILES
SAG_ILES
SMU_ILES
SOB_ILES SMI_ILESSP_ILES
SG_ILES
EF_ILES LME_ILES
LJ_ILES
LD_ILES SA_ILES
LB_ILES
OOE_LP3666LRE_PANE
LP_BRNQ2_val
LCA_BRNQ1_val
LCA_BRNQ2_ileLRE_BRNQ _ile
LJ_BRNQ _ile
LSA_ILES
LJ_OPP
CTC_ILES
CB_ILES
CPE_ILES
TTE_ILES
PPE_ILES
LRE_3666_1
LMO_ILES
DF_ILES
EX_ILES
BC_ILES
BS_ILESBL_ILES
BH_ILES
BCL_ILES
OOE_ILES
DRE_ILVD*_leu
DRE_ILVD _ile
CH_YBGE
BC_ILVB
BE_ILES
DRE_ILES
HMO_ILES
CH_ILES
LRE_3666_2
DHA_ILES
OB_ILES
CTC_BRNQ2CPE_BRNQ
CDF_ILVCCTC_BRNQ1
CAC_BRNQ
BC_ILES2
BCE_BRNQ1
LP_ILES
CTH_ILESLR_LEUS
HMO_ILVB
BC_YBGE*BC_YBGE
DF_VALS
CB_VALS
CBE_VALS
CAC_VALS
CTC_VALS
LL_VALSLCR_VALS LR_VALS
LP_VALS
LSA_VALS
LME_VALS
EF_VALS
PPE_VALS
LD_VALS
LJ_VALS
CPE_VALS
DHA_VALS
BS_VALS
BL_VALS
DRE_VALS
TTE_VALS
HMO_VALS
LEU
VAL ILE
valSVAL
leuSLEU
Firmicutes
Ilv operon
LEU
Bacillales
.......
leu operon
LEU
δ-proteobacteriaClostridium difficileDesulfitobacteriumhafniense
.......
148_0001
LEU
C therm ocellum.
029_0008
LEU
Syntrophom onaswolfei
yvbW
LEU
B. Subtilis.B licheniform is
YOCR3
LEU
B. cereus
OB1271
LEU
Oceanobacillusiheyensis
C acetobutylicum .ilvC
VAL
FirmicutesileS
ILE
LactobacillaceaeClostridiaceaeBacillus cereus
brnQILE
Firmicutes
Lactobacillus casei Lactobacillus plantarum
brnQ
VAL
Ilv operon
LEU
Desulfotomaculum reducens
.......
IlvBN
ILE
Heliobacillus mobilis
Ilv operon2
ILE.......
Ilv operon
ILE
Carboxydothermushydrogenoformans
.......
IlvCB
ILE
.C difficile
ILE
ILE
Recent T-box duplication and mutation of regulatory codon
LEU
ATCCTC
lp3666
ILE
Lactobacillales
opp
ILE
Lactobacillus johnsonii
panE
ILE
Lactobacillus reuteri
Ilv operon2
ILE.......
B. cereus
ILE VAL
GTCCTC
T-box duplication and mutation of regulatory codon
ATC
ATC CTC
Blow-up
transporter:
dual regulation of common enzymes:
ATC CTC
ATC GTC
Double and one-and-a-half T-boxes• TRP: trp operon (Bacillales,
C. beijerincki, D. hafniense)• TYR: pah (B. cereus)• THR: thrZ (Bacillales);
hom (C. difficile)• ILE: ilv operon (B. cereus)• LEU: leuA (C. thermocellum)
• ILE-LEU: ilvDBNCB-leuACDBA (Desulfotomaculum reducens)
• TRP: trp operon (T. tengcongensis)• PHE: arpLA-pheA (D. reducens, S. wolfei) • PHE: trpXY2 (D. reducens) • PHE: yngI (D. reducens) • TYR: yheL (B. cereus) • SER: serCA (D. hafniense)• THR: thrZ (S. uberis)• THR: brnQ-braB1 (C. thermocellum)• HIS: hisXYZ (Lactobacillales)• ARG: yqiXYZ (C. difficile)
• Andrei Mironov– software genome analysis,
conserved RNA patterns• Alexei Vitreschak
– analysis of RNA structures• Dmitry Rodionov
– metabolic reconstruction
• Support:– Howard Hughes Medical Institute– INTAS– Russian Fund of Basic Research– Russian Academy of Sciences