38
DNA +Pro : Alignment of Combined DNA- Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email: [email protected] Website: www.DNAPlusPro.com 青青青青青青青 2010 青 12 青 2 青

DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

Embed Size (px)

Citation preview

Page 1: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

DNA+Pro: Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes

• Xiaolong Wang• Ocean University of China• Email: [email protected] • Website: www.DNAPlusPro.com

青岛市眼科医院 2010 年 12 月 2 日

Page 2: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

• 2001 年 2 月 12 日美、英、日、法、德、中六国科学家,以及 Celera 公司联合公布人类基因组图谱,及初步分析结果

• NATURE | VOL 409 | 15 FEBRUARY 2001 |• SCIENCE| Vol 291 | 16 FEBRUARY 2001

Nature 和 Science 2001 年 2 月 15 日和 16日人类基因组专刊封面。 Science 封面中的五位成年人分别为 Celera 公司人类基因组测序计划基因材料的提供者。

Page 3: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

2005 年,在《 Nature 》上,Margulies 等人发表文章介绍了一种快速简单的测序方法:结合了 DNA 扩增的乳胶系统(emulsion system) 和皮升级焦磷酸 (pyrophosphate) 为基础的测序方法——焦磷酸测序(pyrosequencing) 方法。同年年底,研究人员将这种崭新的测序技术转化成了商品化的仪器—— 454 Genome Sequencer系统,由此拉开了快速基因组测序时代的序幕。

Page 4: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

ArabidopsisEscherichia coli

Buchnerasp. APS

Rickettsia prowazekii

Ureaplasma urealyticum

Bacillus subtilis

Drosophila melanogaster

Thermoplasma acidophilum

Plasmodium falciparum

MouseRat

Caenorhabitis elegans

rat

Borrelia burgorferi

Aquifex aeolicus

Neisseria meningitidis Z2491

Mycobacterium tuberculosis

Borrelia burgorferi

Thermotoga maritima

Helicobacter pylori

已完成基因组测序的物种(部分)

human

Page 5: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

生物信息学的必要性 首先伴随着基因组研究,相关信息出现了爆炸

性增长,迫切需要对海量生物信息进行处理。 文献的增长 生物数据的增长 基因组研究需要

依赖生物信息学

生物信息学家们面对的是堆集如山的 DNA 片段

Page 6: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

约 600 万年前开始,源自同一个祖先,人类和黑猩猩走上了不同的进化道路。 600万年后的今天,科学家们另辟蹊径,通过对人类的亲戚———黑猩猩的基因组序列分析,并将其与人类的基因组序列相比较,来解答人类起源和进化过程中的问题。

Page 7: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

From the Cell to Protein Machines

Page 8: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

• Protein inhibitors ( Virus as an example)–attachment, entry and fusion inhibitors –DNA polymerase inhibitors – integrase inhibitors – interferons –maturation inhibitors –monoclonal antibodies –neuraminidase inhibitors –NS3 protease inhibitors –nucleoside reverse transcriptase inhibitors –protease inhibitors – reverse transcriptase inhibitors –RNA polymerase inhibitors

Page 9: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

•Nucleic acid inhibitors(Antisense oligonucleotides or RNAi)–Targeting mRNA –Targeting microRNA–Targeting genomic DNA–Interfere mRNA processing–Aptamers oligonucleotide or peptide

molecules that bind to a specific target molecule

Page 10: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

From Finding Homologs to drug design

Page 11: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

Use ClustalW to do a progressive MSA

http://www2.ebi.ac.uk/clustalw/

Page 12: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

Use ClustalW to do a progressive MSA

http://www.clustal.org/

Page 13: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

Use ClustalW or ClustalX to do a progressive MSA

http://www.clustal.org/

Page 14: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

ClustalW

Page 15: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

Praline

Page 16: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

MUSCLE

Page 17: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

Probcons

Page 18: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

TCoffee

Page 19: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

MAFFT

ProbCons

Praline

MUSCLE

CLUSTAL

TCOFFEE

Page 20: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

DNA+Pro: Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes

Page 21: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

HV1J3

HV1OY

HV1B1

HV1C4

HV1A2

HV1RH

HV1EL

HV1ND

HV1Z84

HV1MA

HV1ZH

SIVCZ

HV2BE

HV2D1

HV2G1

HV2NZ

HV2CA

HV2D2

5499

54

99

100

8497

100

84

83

55

99

27

17

39

HV1J3

HV1OY

HV1B1

HV1C4

HV1A2

HV1RH

HV1EL

HV1ND

HV1Z84

HV1MA

HV1ZH

SIVCZ

HV2BE

HV2D1

HV2G1

HV2NZ

HV2CA

HV2D2

5499

54

99

100

8497

100

84

83

55

99

27

17

39

HV1J3

HV1OY

HV1B1

HV1C4

HV1A2

HV1RH

HV1EL

HV1ND

HV1Z84

HV1MA

HV1ZH

SIVCZ

HV2BE

HV2D1

HV2G1

HV2NZ

HV2CA

HV2D2

5499

54

99

100

8497

100

84

83

55

99

27

17

39

218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238<HV1J3> I N N S T K D N I K N - - - - D N S T R Y

<HV1B1> I D N - - - - - - - - - - - - - D T T S Y

<HV1C4> I D D N K N T - - - - - - - - T N N T K Y

<HV1A2> I D N A S T T - - - - - - - - T N Y T N Y

<HV1OY> I D - - - - - - - - - - - - - K N D T K F

<HV1RH> I E K G N I S P K N N T S N N T S Y G N Y

<HV1ND> I D N N N - - - - - - - - - R T N S T N Y

<HV1EL> I D N D S - - - - - - - - - S T N S T N Y

<HV1Z84> I D D D N S A N T S - - - - N T N Y T N Y

<HV1MA> I D D S D - - - - - - - - - - - - N S S Y

<HV1ZH> I G G N S S N - - - - - - - - G D S S K Y

<SIVCZ> L G N E N - - - - - - - - - - - - - N T Y

<HV1J3> ata aat aat agt acc aag gat aat ata aaa aat --- --- --- --- gat aat agt acc aga tat<HV1B1> ata gat aat --- --- --- --- --- --- --- --- --- --- --- --- --- gat act acc agc tat<HV1C4> ata gat gat aat aaa aat act --- --- --- --- --- --- --- --- acc aac aac acc aaa tat<HV1A2> ata gat aat gct agt act act --- --- --- --- --- --- --- --- acc aac tat acc aac tat<HV1OY> ata gat --- --- --- --- --- --- --- --- --- --- --- --- --- aag aat gat act aaa ttt<HV1RH> ata gag aag ggt aat att agc cct aag aat aat act agc aat aat act agc tat ggt aac tat<HV1ND> ata gac aat aat aat --- --- --- --- --- --- --- --- --- agg acc aat agt act aat tat<HV1EL> ata gac aat gat agt --- --- --- --- --- --- --- --- --- agt acc aat agt acc aat tat<HV1Z84> ata gat gat gat aat agt gct aat acc agt --- --- --- --- aat acc aat tat acc aat tat<HV1MA> ata gat gat agt gat --- --- --- --- --- --- --- --- --- --- --- --- aat agt agt tat<HV1ZH> att ggg gga aat agt agt aat --- --- --- --- --- --- --- --- ggt gat agt agt aaa tat<SIVCZ> cta ggg aat gag aac --- --- --- --- --- --- --- --- --- --- --- --- --- aac aca tat<HV1J3> ataI aatN aatN agtS accT aagK gatD aatN ataI aaaK aatN ---- ---- ---- ---- gatD aatN agtS accT agaR tatY

<HV1B1> ataI gatD aatN ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- gatD actT accT agcS tatY

<HV1C4> ataI gatD gatD aatN aaaK aatN actT ---- ---- ---- ---- ---- ---- ---- ---- accT aacN aacN accT aaaK tatY

<HV1A2> ataI gatD aatN gctA agtS actT actT ---- ---- ---- ---- ---- ---- ---- ---- accT aacN tatY accT aacN tatY

<HV1OY> ataI gatD ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- aagK aatN gatD actT aaaK tttF

<HV1RH> ataI gagE aagK ggtG aatN attI agcS cctP aagK aatN aatN actT agcS aatN aatN actT agcS tatY ggtG aacN tatY

<HV1ND> ataI gacD aatN aatN aatN ---- ---- ---- ---- ---- ---- ---- ---- ---- aggR accT aatN agtS actT aatN tatY

<HV1EL> ataI gacD aatN gatD agtS ---- ---- ---- ---- ---- ---- ---- ---- ---- agtS accT aatN agtS accT aatN tatY

<HV1Z84> ataI gatD gatD gatD aatN agtS gctA aatN accT agtS ---- ---- ---- ---- aatN accT aatN tatY accT aatN tatY

<HV1MA> ataI gatD gatD agtS gatD ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- aatN agtS agtS tatY

<HV1ZH> attI gggGggaG aatN agtS agtS aatN ---- ---- ---- ---- ---- ---- ---- ---- ggtG gatD agtS agtS aaaK tatY

<SIVCZ> ctaL gggG aatN gagE aacN ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- aacN acaT tatY

1A

Page 22: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

HV1J3

HV1B1

HV1C4

HV1A2

HV1OY

HV1RH

HV1EL

HV1ND

HV1Z84

HV1MA

HV1ZH

SIVCZ

HV2CA

HV2NZ

HV2D1

HV2G1

HV2BE

HV2D2

95

99

99

99

99

99

99

99

99

99

9997

86

93

63

HV1J3

HV1B1

HV1C4

HV1A2

HV1OY

HV1RH

HV1EL

HV1ND

HV1Z84

HV1MA

HV1ZH

SIVCZ

HV2CA

HV2NZ

HV2D1

HV2G1

HV2BE

HV2D2

95

99

99

99

99

99

99

99

99

99

9997

86

93

63

HV1J3

HV1B1

HV1C4

HV1A2

HV1OY

HV1RH

HV1EL

HV1ND

HV1Z84

HV1MA

HV1ZH

SIVCZ

HV2CA

HV2NZ

HV2D1

HV2G1

HV2BE

HV2D2

95

99

99

99

99

99

99

99

99

99

9997

86

93

63

  256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280

HV1J3 ataI aatN aatN agtS accT aagK gatD aatN ataI aaaK aatN gatD aatN agtS accT ---- ---- ---- ---- ---- agaR ---- ---- ---- tatY

HV1B1 ataI ---- ---- ---- ---- ---- gatD aatN ---- ---- gatD ---- ---- actT accT agcS tatY ---- ---- acgT ---- ---- ---- ---- ----

HV1C4 ataI ---- ---- ---- ---- ---- gatD gatD ---- ---- aatN aaaK aatN actT accT aacNaacN ---- ---- accT aaaK ---- ---- ---- tatY

HV1A2 ataI ---- ---- ---- ---- ---- gatD aatN ---- ---- gctA agtS actT actT accT aacN tatY ---- ---- accT aacN ---- ---- ---- tatY

HV1OY ataI ---- ---- ---- ---- ---- gatD aagK ---- ---- aatN gatD ---- actT ---- ---- ---- ---- ---- ---- aaaK ---- ---- ---- tttF

HV1RH ataI gagE ---- ---- aagK ggtG aatN attI agcS cctP aagK aatN aatN actT agcS aatN aatN ---- ---- actT agcS tatY ggtG aacN tatY

HV1ND ataI ---- ---- ---- ---- ---- gacD aatN ---- ---- ---- aatN aatN aggR ---- ---- ---- ---- ---- accT aatN agtS actT aatN tatY

HV1EL ataI ---- ---- ---- ---- ---- gacD aatN ---- ---- ---- gatD agtS agtS ---- ---- ---- ---- ---- accT aatN agtS accT aatN tatY

HV1Z84

ataI ---- ---- ---- ---- ---- gatD gatD ---- ---- ---- gatD aatN agtS gctA aatN accT agtS aatN accT aatN tatY accT aatN tatY

HV1MA ataI ---- ---- ---- ---- ---- gatD gatD agtS ---- ---- gatD aatN agtS ---- ---- ---- ---- ---- ---- ---- ---- ---- agtS tatY

HV1ZH attI gggGggaGaatN ---- ---- ---- agtS agtS aatN ggtG gatD agtS agtS aaaK ---- ---- ---- ---- ---- ---- ---- ---- ---- tatY

SIVCZ ctaL gggG aatN ---- ---- ---- ---- gagEaacN ---- ---- ---- aacN acaT ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- tatY

HV1J3 I N N S T K D N I K N D N S T - - - - - R - - - Y

HV1B1 I - - - - - D N - - D - - T T S Y - - T - - - - -

HV1C4 I - - - - - D D - - N K N T T N N - - T K - - - Y

HV1A2 I - - - - - D N - - A S T T T N Y - - T N - - - Y

HV1OY I - - - - - D K - - N D - T - - - - - - K - - - F

HV1RH I E - - K G N I S P K N N T S N N - - T S Y G N Y

HV1ND I - - - - - D N - - - N N R - - - - - T N S T N Y

HV1EL I - - - - - D N - - - D S S - - - - - T N S T N YHV1Z84 I - - - - - D D - - - D N S A N T S N T N Y T N Y

HV1MA I - - - - - D D S - - D N S - - - - - - - - - S Y

HV1ZH I G G N - - - S S N G D S S K - - - - - - - - - Y

SIVCZ L G N - - - - E N - - - N T - - - - - - - - - - Y

HV1J3 ata aat aat agt acc aag gat aat ata aaa aat gat aat agt acc --- --- --- --- --- aga --- --- --- tatHV1B1 ata --- --- --- --- --- gat aat --- --- gat --- --- act acc agc tat --- --- acg --- --- --- --- ---HV1C4 ata --- --- --- --- --- gat gat --- --- aat aaa aat act acc aac aac --- --- acc aaa --- --- --- tatHV1A2 ata --- --- --- --- --- gat aat --- --- gct agt act act acc aac tat --- --- acc aac --- --- --- tatHV1OY ata --- --- --- --- --- gat aag --- --- aat gat --- act --- --- --- --- --- --- aaa --- --- --- tttHV1RH ata gag --- --- aag ggt aat att agc cct aag aat aat act agc aat aat --- --- act agc tat ggt aac tatHV1ND ata --- --- --- --- --- gac aat --- --- --- aat aat agg --- --- --- --- --- acc aat agt act aat tatHV1EL ata --- --- --- --- --- gac aat --- --- --- gat agt agt --- --- --- --- --- acc aat agt acc aat tatHV1Z84 ata --- --- --- --- --- gat gat --- --- --- gat aat agt gct aat acc agt aat acc aat tat acc aat tat

HV1MA ata --- --- --- --- --- gat gat agt --- --- gat aat agt --- --- --- --- --- --- --- --- --- agt tatHV1ZH att ggg gga aat --- --- --- agt agt aat ggt gat agt agt aaa --- --- --- --- --- --- --- --- --- tatSIVCZ cta ggg aat --- --- --- --- gag aac --- --- --- aac aca --- --- --- --- --- --- --- --- --- --- tat

1B

Page 23: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

HIV 从哪里来 ?

Freeman & Herron, 2001. Evolutionary Analysis. Prentice Hall

Page 24: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

2003/6/13 Science

Page 25: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

来自不同种类猴子的两个病毒在非洲黑猩猩体内经重组后形成了引发人类艾滋病的 SIV 菌株 SIVcpz 是通过来自红盖猴和花鼻猴的 SIVs 病毒不断地传播和重组的过程变成了起源于黑猩猩的SIVcpz 的。黑猩猩捕食这两种猴子。这些猴子和黑猩猩在西部中非洲有重叠的活动区域。 人类不是通过自然状态下物种间的传播而获得两种不同 SIVs 菌株的唯一物种,这种自然状态下的物种间传播很可能是由捕食行为产生的。 黑猩猩捕食小型猴子是不是导致了它们获得其它的 SIV 感染 ? 这些 SIV 与 SIVcpa 的共同感染或与 SIVcpz 进行重组可能性有多大 ? 这些适应了黑猩猩的 SIV 是不是最终更可能感染人类 ?

Page 26: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

HV1J3

HV1B1

HV1C4

HV1A2

HV1OY

HV1RH

HV1EL

HV1ND

HV1MA

SIVCZ

HV2CA

HV2NZ

HV2BE

HV2D1

HV2G1

SIVM1

SIVV1

SIVG1

SIVGB

74

94

90

100

100

100

84

100

100

91

100

94100

78

88

82

HV1J3

HV1B1

HV1C4

HV1A2

HV1OY

HV1RH

HV1EL

HV1ND

HV1MA

SIVCZ

HV2CA

HV2NZ

HV2BE

HV2D1

HV2G1

SIVM1

SIVV1

SIVG1

SIVGB

49

100

73

100

100

100

98

100100

100

99

82

100

74

43

67

HV1C4

HV1J3

HV1B1

HV1A2

HV1RH

HV1OY

HV1EL

HV1ND

HV1MA

SIVCZ

HV2D1

HV2G1

HV2BE

HV2CA

HV2NZ

SIVM1

SIVG1

SIVV1

SIVGB

38

98

44

96

100

100

85

10098

100

92

4798

31

17

16

HV1C4

HV1B1

HV1J3

HV1RH

HV1A2

HV1OY

HV1EL

HV1ND

HV1MA

SIVCZ

HV2BE

HV2D1

HV2G1

HV2NZ

HV2CA

SIVM1

SIVG1

SIVV1

SIVGB

100

52

84

72

98

100

99

10099

83

100

71

38

27

30

100

2B DNA+PRO

gag

gag

2A Protein only

env

env

Page 27: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

BamHI homologs

OkrAI

HchORF2488P

Bsp98I

Csp7822ORF584P

HauORF2756P

BsuBSP1ORFAP

DdsI

BamHI

100

75

64

40

61

OkrAI

HchORF2488P

Bsp98I

Csp7822ORF584P

HauORF2756P

BsuBSP1ORFAP

DdsI

BamHI

97

63

48

38

42

3B DNA+PRO

3A Protein only

Page 28: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

JH9

JH1

MW2

MSSA476

USA300 TCH1516

USA300 FPR3757

str NewMan

NCTC

COL

MRSA252

RP62A

TM300

100

100

100

86

93 14

100

11

19

JH9

JH1

MW2

MSSA476

MRSA252

COL

USA300 TCH1516

str NewMan

NCTC

USA300 FPR3757

RP62A

TM300

99

99

79

66

100

100

12

913

SAUSA300_2431 homologs

4B DNA+PRO

4A Protein only

4C

A Robust multi-gene phylogenetic tree

Page 29: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

HV1J3

HV1OY

HV1B1

HV1C4

HV1A2

HV1RH

HV1EL

HV1ND

HV1Z84

HV1MA

HV1ZH

SIVCZ

HV2CA

HV2NZ

HV2D1

HV2BE

HV2G1

HV2D2

54

99

54

99

100

8497

100

84

83

55

99

27

17

39

S1A ClustalW

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

<HV1J3> A L F Y K H D V V P I N N S T K D N I K N - - - - D N S T R Y R L I S C N T S V I<HV1OY> A L F Y K L D V L P I D - - - - - - - - - - - - - K N D T K F R L I H C N T S T I<HV1B1> A F F Y K L D I I P I D N - - - - - - - - - - - - - D T T S Y T L T S C N T S V I<HV1C4> A L F Y K L D V E P I D D N K N T - - - - - - - - T N N T K Y R L I N C N T S V I<HV1A2> A L F R N L D V V P I D N A S T T - - - - - - - - T N Y T N Y R L I H C N R S V I<HV1RH> A L F Y K L D V V P I E K G N I S P K N N T S N N T S Y G N Y T L I H C N S S V I<HV1ND> A L F Y K L D I V P I D N N N - - - - - - - - - R T N S T N Y R L I N C D T S T I<HV1EL> A L F Y R L D I V P I D N D S - - - - - - - - - S T N S T N Y R L I N C N T S A I<HV1Z84> A L F Y R L D V V P I D D D N S A N T S - - - - N T N Y T N Y R L I N C N T S A I<HV1MA> A T F Y N L D L V Q I D D S D - - - - - - - - - - - - N S S Y R L I N C N T S V I<HV1ZH> S L F Y R L D I V P I G G N S S N - - - - - - - - G D S S K Y R L I N C N T S A I<SIVCZ> S L F Y V E D V V N L G N E N - - - - - - - - - - - - - N T Y R I I N C N T T A I<HV2NZ> E A W Y S K D V V C D N - - - N T S S - - - - - - - - Q S K C Y M N H C N T S V I<HV2CA> E T W Y S S D V V C D N S T D Q T T N - - - - - - - - E T T C Y M N H C N T S V I<HV2G1> E T W Y S K D V V C E S N N T K D G - - - - - - - - - K N R C Y M N H C N T S V I<HV2D1> D A W Y S R D V V C D K T N G - - - - - - - - - - - - T G T C Y M R H C N T S V I<HV2BE> D T W Y L E D V V C D N T T - - - - - - - - - - - - - A G T C Y M R H C N T S I I<HV2D2> D T W Y S E D L E C N N T R - - - K Y - - - - - - - - T S R C Y I R T C N T T I I

Page 30: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

S1BMAFFT

HV1J3

HV1B1

HV1A2

HV1RH

HV1C4

HV1OY

HV1ND

HV1EL

HV1Z84

HV1MA

HV1ZH

SIVCZ

HV2NZ

HV2CA

HV2G1

HV2D1

HV2BE

HV2D2

5499

63

97

100

9793

75

100

63

42

23

47

21

100

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

<HV1J3> A L F Y K H D V V P I N N S T K D N I K N D N S T - - - - R Y R L I S C N T S V I<HV1B1> A F F Y K L D I I P I D N - - - - - - - - - D T T - - - - S Y T L T S C N T S V I<HV1A2> A L F R N L D V V P I D N A S T T - - - - T N Y T - - - - N Y R L I H C N R S V I<HV1RH> A L F Y K L D V V P I E K G N I S P K N N T S N N T S Y G N Y T L I H C N S S V I<HV1C4> A L F Y K L D V E P I D D N K N T - - - - T N N T - - - - K Y R L I N C N T S V I<HV1OY> A L F Y K L D V L P I D - - - - - - - - - K N D T - - - - K F R L I H C N T S T I<HV1ND> A L F Y K L D I V P I D N N N R - - - - - T N S T - - - - N Y R L I N C D T S T I<HV1EL> A L F Y R L D I V P I D N D S S - - - - - T N S T - - - - N Y R L I N C N T S A I<HV1Z84> A L F Y R L D V V P I D D D N S A N T S N T N Y T - - - - N Y R L I N C N T S A I<HV1MA> A T F Y N L D L V Q I D D - - - - - - - - S D N S - - - - S Y R L I N C N T S V I<HV1ZH> S L F Y R L D I V P I G G N S S N - - - - G D S S - - - - K Y R L I N C N T S A I<SIVCZ> S L F Y V E D V V N L G N E N N - - - - - - - - - - - - - T Y R I I N C N T T A I<HV2NZ> E A W Y S K D V V C D N N T - - - - - - - S S Q S - - - - K C Y M N H C N T S V I<HV2CA> E T W Y S S D V V C D N S T D Q T - - - - T N E T - - - - T C Y M N H C N T S V I<HV2G1> E T W Y S K D V V C E S N N T K - - - - - D G K N - - - - R C Y M N H C N T S V I<HV2D1> D A W Y S R D V V C D K T N - - - - - - - - G T G - - - - T C Y M R H C N T S V I<HV2BE> D T W Y L E D V V C D N T T - - - - - - - - - A G - - - - T C Y M R H C N T S I I<HV2D2> D T W Y S E D L E C N N T R - - - - - - - K Y T S - - - - R C Y I R T C N T T I I

Page 31: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

S1C MUSCLE

HV1J3

HV1B1

HV1C4

HV1A2

HV1OY

HV1RH

HV1EL

HV1ND

HV1Z84

HV1MA

HV1ZH

SIVCZ

HV2CA

HV2NZ

HV2D1

HV2G1

HV2BE

HV2D2

95

99

99

99

99

99

99

99

99

99

9997

86

93

63

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

<HV1J3> A L F Y K H D V V P I N N S T - - - - K D N I K N D N S T R Y R L I S C N T S V I<HV1B1> A F F Y K L D I I P I D - - - - - - - - - - - - - N D T T S Y T L T S C N T S V I<HV1C4> A L F Y K L D V E P I D - - - - - - - - D N K N T T N N T K Y R L I N C N T S V I<HV1A2> A L F R N L D V V P I D - - - - - - - - N A S T T T N Y T N Y R L I H C N R S V I<HV1OY> A L F Y K L D V L P I D - - - - - - - - - - - - - K N D T K F R L I H C N T S T I<HV1RH> A L F Y K L D V V P I E K G N I S P K N N T S N N T S Y G N Y T L I H C N S S V I<HV1ND> A L F Y K L D I V P I D - - - - - - - - - N N N R T N S T N Y R L I N C D T S T I<HV1EL> A L F Y R L D I V P I D - - - - - - - - - N D S S T N S T N Y R L I N C N T S A I<HV1Z84> A L F Y R L D V V P I D D D N - - - - S A N T S N T N Y T N Y R L I N C N T S A I<HV1MA> A T F Y N L D L V Q I D - - - - - - - - - - - - D S D N S S Y R L I N C N T S V I<HV1ZH> S L F Y R L D I V P I G - - - - - - - - G N S S N G D S S K Y R L I N C N T S A I<SIVCZ> S L F Y V E D V V N L G - - - - - - - - - - - - - N E N N T Y R I I N C N T T A I<HV2NZ> E A W Y S K D V V C D N - - - - - - - - - - - N T S S Q S K C Y M N H C N T S V I<HV2CA> E T W Y S S D V V C D N - - - - - - - - S T D Q T T N E T T C Y M N H C N T S V I<HV2G1> E T W Y S K D V V C E S - - - - - - - - - N N T K D G K N R C Y M N H C N T S V I<HV2D1> D A W Y S R D V V C D K - - - - - - - - - - - - T N G T G T C Y M R H C N T S V I<HV2BE> D T W Y L E D V V C D N - - - - - - - - - - - - - T T A G T C Y M R H C N T S I I<HV2D2> D T W Y S E D L E C N N - - - - - - - - - - - T R K Y T S R C Y I R T C N T T I I

Page 32: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

S1D T-coffee

HV1J3

HV1B1

HV1C4

HV1A2

HV1OY

HV1RH

HV1EL

HV1ND

HV1Z84

HV1MA

HV1ZH

SIVCZ

HV2CA

HV2NZ

HV2D1

HV2G1

HV2BE

HV2D2

95

99

99

99

99

99

99

99

99

99

9997

86

93

63

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

<HV1J3> A L F Y K H D V V P I N N S T K D N I K N D N S T - - - - R Y R L I S C N T S V I<HV1B1> A F F Y K L D I I P I D N - - - - - - - - - D T T - - - - S Y T L T S C N T S V I<HV1C4> A L F Y K L D V E P I D D N K N T - - - - T N N T - - - - K Y R L I N C N T S V I<HV1A2> A L F R N L D V V P I D N A S T T - - - - T N Y T - - - - N Y R L I H C N R S V I<HV1OY> A L F Y K L D V L P I D - - - - - - - - - K N D T - - - - K F R L I H C N T S T I<HV1RH> A L F Y K L D V V P I E K G N I S P K N N T S N N T S Y G N Y T L I H C N S S V I<HV1ND> A L F Y K L D I V P I D N N N R - - - - - T N S T - - - - N Y R L I N C D T S T I<HV1EL> A L F Y R L D I V P I D N D S S - - - - - T N S T - - - - N Y R L I N C N T S A I<HV1Z84> A L F Y R L D V V P I D D D N S A N T S N T N Y T - - - - N Y R L I N C N T S A I<HV1MA> A T F Y N L D L V Q I D D - - - - - - - - S D N S - - - - S Y R L I N C N T S V I<HV1ZH> S L F Y R L D I V P I G G N S S N - - - - G D S S - - - - K Y R L I N C N T S A I<SIVCZ> S L F Y V E D V V N L G N E N N - - - - - - - - - - - - - T Y R I I N C N T T A I<HV2NZ> E A W Y S K D V V C D N N T - - - - - - - S S Q S - - - - K C Y M N H C N T S V I<HV2CA> E T W Y S S D V V C D N S T D Q T - - - - T N E T - - - - T C Y M N H C N T S V I<HV2G1> E T W Y S K D V V C E S N N T K - - - - - D G K N - - - - R C Y M N H C N T S V I<HV2D1> D A W Y S R D V V C D K T N - - - - - - - - G T G - - - - T C Y M R H C N T S V I<HV2BE> D T W Y L E D V V C D N T T - - - - - - - - - A G - - - - T C Y M R H C N T S I I<HV2D2> D T W Y S E D L E C N N T R - - - - - - - K Y T S - - - - R C Y I R T C N T T I I

Page 33: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

S1E PRANK

HV1J3

HV1B1

HV1C4

HV1A2

HV1OY

HV1RH

HV1EL

HV1ND

HV1Z84

HV1MA

HV1ZH

SIVCZ

HV2CA

HV2NZ

HV2D1

HV2G1

HV2BE

HV2D2

95

99

99

99

99

99

99

99

99

99

9997

86

93

63

  287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

<HV1J3> A L F Y K H D V V P I N N S T K D N I K N D - - - - - - - - - - N S T - - - - - - - - - - - - - - - - - R Y R L I S C N T S V I

<HV1B1> A F F Y K L D I I P I - - - - - - - - - D N - - - - - - - - - - D T T - - - - - - - - - - - - - - - - - S Y T L T S C N T S V I

<HV1C4> A L F Y K L D V E P I - - - - - - - - - D D - - - - - N K N T T N N T - - - - - - - - - - - - - - - - - K Y R L I N C N T S V I

<HV1A2> A L F R N L D V V P I - - - - - - - - - D N A S T T T - - - - - N Y T - - - - - - - - - - - - - - - - - N Y R L I H C N R S V I

<HV1OY> A L F Y K L D V L P I - - - - - - - - - D K - - - - - - - - - - N D T - - - - - - - - - - - - - - - - - K F R L I H C N T S T I

<HV1RH> A L F Y K L D V V P I - - - - - - - - - E KG N I S P K N N T S N N T - - - - - - - - - - - - - - S Y G N Y T L I H C N S S V I

<HV1ND> A L F Y K L D I V P I - - - - - - - - - D N - N N R - - - - - T N S T - - - - - - - - - - - - - - - - - N Y R L I N C D T S T I

<HV1EL> A L F Y R L D I V P I - - - - - - - - - D N - D S S - - - - - T N S T - - - - - - - - - - - - - - - - - N Y R L I N C N T S A I

<HV1Z84> A L F Y R L D V V P I - - - - - - - - - D D - D N S A N T S N T N Y T - - - - - - - - - - - - - - - - - N Y R L I N C N T S A I

<HV1MA> A T F Y N L D L V Q I - - - - - - - - - D D S D N S - - - - - S - - - - - - - - - - - - - - - - - - - - - Y R L I N C N T S V I

<HV1ZH> S L F Y R L D I V P I - - - - - - - - - - - - - - GG N S S NG D S S - - - - - - - - - - - - - - - - - K Y R L I N C N T S A I

<SIVCZ> S L F Y V E D V V - - - - - - - - - - - - - - - - - - N L G N E N N T - - - - - - - - - - - - - - - - - - Y R I I N C N T T A I

<HV2NZ> E AWY S K D V V - - - - - - - - - - - - - - - - - - - C D - - N N T - - - - - S - - - - S Q S K - - - - C Y MN H C N T S V I

<HV2CA> E T WY S S D V V - - - - - - - - - - - - - - - - - - - C D - - N S T - DQ T - T - - - - N E T T - - - - C Y MN H C N T S V I

<HV2G1> E T WY S K D V V - - - - - - - - - - - - - - - - - - - C E - - S N N - - - - - T K DG K N - - R - - - - C Y MN H C N T S V I

<HV2D1> D AWY S R D V V - - - - - - - - - - - - - - - - - - - C D - - K T NG - - - - T - - - - G - - T - - - - C Y MR H C N T S V I

<HV2BE> D T WY L E D V V - - - - - - - - - - - - - - - - - - - C D - - N T T - - - - - A - - - - G - - T - - - - C Y MR H C N T S I I

<HV2D2> D T WY S E D L E - - - - - - - - - - - - - - - - - - - C N - - N T R - - - - - K - - - - Y T S R - - - - C Y I R T C N T T I I

Page 34: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

HV1J3

HV1B1

HV1C4

HV1A2

HV1OY

HV1RH

HV1EL

HV1ND

HV1Z84

HV1MA

HV1ZH

SIVCZ

HV2CA

HV2NZ

HV2D1

HV2G1

HV2BE

HV2D2

95

99

99

99

99

99

99

99

99

99

9997

86

93

63

S1F DNA+PRO

  245

246

247248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

HV1J3 A L F Y K H D - V V P I N N S T K D N I K N D N S T - - - - - R - - - Y R L I S C N T S V IHV1B1 A F F Y K L D - I I P I - - - - - D N - - D - - T T S Y - - T - - - - - - L T S C N T S V IHV1C4 A L F Y K L D - V E P I - - - - - D D - - N K N T T N N - - T K - - - Y R L I N C N T S V IHV1A2 A L F R N L D - V V P I - - - - - D N - - A S T T T N Y - - T N - - - Y R L I H C N R S V IHV1OY A L F Y K L D - V L P I - - - - - D K - - N D - T - - - - - - K - - - F R L I H C N T S T IHV1RH A L F Y K L D - V V P I E - - K G N I S P K N N T S N N - - T S Y G N Y T L I H C N S S V IHV1ND A L F Y K L D - I V P I - - - - - D N - - - N N R - - - - - T N S T N Y R L I N C D T S T I

HV1EL A L F Y R L D - I V P I - - - - - D N - - - D S S - - - - - T N S T N Y R L I N C N T S A IHV1Z84 A L F Y R L D - V V P I - - - - - D D - - - D N S A N T S N T N Y T N Y R L I N C N T S A IHV1MA A T F Y N L D - L V Q I - - - - - D D S - - D N S - - - - - - - - - S Y R L I N C N T S V I

HV1ZH S L F Y R L D - I V P I G G N - - - S S N G D S S K - - - - - - - - - Y R L I N C N T S A ISIVCZ S L F Y V - E D V V N L G N - - - - E N - - - N T - - - - - - - - - - Y R I I N C N T T A IHV2NZ E A W Y S K D - V V - - - - - - C D - - - - N N T S S - - Q S K - - C Y M - N H C N T S V IHV2CA E T W Y S S D - V V - - - - - - C D N S T - D Q T T N - - E T T - - C Y M - N H C N T S V IHV2G1 E T W Y S K D - V V - - - - - - C E S - - - N N T K - - - D G K N R C Y M - N H C N T S V IHV2D1 D A W Y S R D - V V - - - - - - C D - - - - - K T N G - T G - T - - C Y M - R H C N T S V IHV2BE D T W Y L E D - V V - - - - - - C D - - - - - N T T - - - A G T - - C Y M - R H C N T S I IHV2D2 D T W Y S E D - L E - - - - - - C - - - - - N N T R K - Y T S R - - C Y I - R T C N T T I I

Page 35: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

HV1W1

HV1J3

HV1B1

HV1BN

HV1C4

HV1RH

HV1A2

HV1OY

HV1ND

HV1EL

HV1Z84

HV1MA

HV1ZH

SIVCZ

HV2D1

HV2G1

HV2BE

HV2NZ

HV2CA

SIVM1

HV2D2

SIVG1

SIVV1

SIVGB

100

40

43

90

85

59

100

78

100

84

99

75

100

59

39

35

99

29

22

13

27

HV1W1

HV1C4

HV1B1

HV1RH

HV1A2

HV1OY

HV1J3

HV1BN

HV1ND

HV1EL

HV1Z84

HV1MA

HV1ZH

SIVCZ

HV2BE

HV2D1

HV2G1

HV2NZ

HV2CA

HV2D2

SIVM1

SIVG1

SIVV1

SIVGB

100

66

73

68

84

48

100

97

100

84

90

97

100

8439

39

38

100

27

13

18

HV1W1

HV1C4

HV1B1

HV1RH

HV1A2

HV1J3

HV1OY

HV1BN

HV1ND

HV1EL

HV1Z84

HV1MA

HV1ZH

SIVCZ

HV2BE

HV2D1

HV2G1

HV2NZ

HV2CA

HV2D2

SIVM1

SIVG1

SIVV1

SIVGB

100

66

72

66

83

49

100

97

100

84

92

97

100

8438

42

36

100

25

11

17

HV1W1

HV1RH

HV1A2

HV1OY

HV1C4

HV1J3

HV1B1

HV1BN

HV1ZH

HV1ND

HV1EL

HV1Z84

HV1MA

SIVCZ

HV2BE

HV2D1

HV2G1

HV2NZ

HV2CA

SIVM1

HV2D2

SIVG1

SIVV1

SIVGB

100

51

53

86

82

42

100

99

100

60

84

94

100

6345

37

25

32

6100

13

HV1W1

HV1C4

HV1B1

HV1A2

HV1RH

HV1BN

HV1J3

HV1OY

HV1ND

HV1EL

HV1Z84

HV1MA

HV1ZH

SIVCZ

HV2BE

HV2D1

HV2G1

HV2NZ

HV2CA

HV2D2

SIVM1

SIVG1

SIVV1

SIVGB

47

18

18

8

16

34

100

92

96

93

76

100

100

47

74

52

67

75

100

100

99

S2B MAFFT

S2E PRANK

S2C MUSCLES2A ClustalW

S2F DNA+PRO S2D T-coffee

ENV

HV1BN

HV1J3

HV1B1

HV1C4

HV1W1

HV1A2

HV1OY

HV1RH

HV1ND

HV1EL

HV1Z84

HV1MA

HV1ZH

SIVCZ

HV2NZ

HV2CA

HV2BE

HV2G1

HV2D1

HV2D2

SIVM1

SIVV1

SIVG1

SIVGB

74

93

93

100

81

100

100

83

100

97

93

99

100

99

94

100

66

45

47

28

44

Page 36: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

S3B DNA+PRO

gag

gagS3A Protein only

MN

BRU

JRFL

AD8

WEAU

RF

ETH2220

92BR025

IN21068

96BW05

92UG037

Q23

IBNG

DJ264

DJ263

PVMY

100

100

99

80

100

100

100

88

100

96

60

24

99env

MN

JRFL

BRU

AD8

WEAU

RF

ETH2220

92BR025

IN21068

96BW05

92UG037

Q23

IBNG

DJ264

DJ263

PVMY

100

100

88

86

100

97

98

51

100

57

66

74

47

env

MN

JRFL

BRU

AD8

RF

WEAU

ETH2220

92BR025

IN21068

96BW05

92UG037

Q23

IBNG

DJ264

DJ263

PVMY

100

100

100

59

100

100

82

100

100

82

80

67

59

MN

JRFL

BRU

AD8

RF

WEAU

ETH2220

92BR025

IN21068

96BW05

92UG037

Q23

IBNG

DJ264

DJ263

PVMY

99

97

99

95

95

100

59

99

100

74

55

56

21

Page 37: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email:

DNA+Pro: www.DNAPlusPro.com

Page 38: DNA +Pro : Alignment of Combined DNA-Protein Sequences for evolutionary analysis of virus genes and genomes Xiaolong Wang Ocean University of China Email: