31
1 中中中中中中中中中 (BRC) 90/4/9 pm Introduction of Genome Introduction of Genome Research Research Bioinformatics Research Center Institute of Biomedical Sciences ACADEMIA SINICA 莊莊莊 www.sinica.edu.tw/~trees/bioinformatics E-mail: [email protected]

Introduction of Genome Research

  • Upload
    yukio

  • View
    27

  • Download
    0

Embed Size (px)

DESCRIPTION

Introduction of Genome Research. Bioinformatics Research Center Institute of Biomedical S ciences ACADEMIA SINICA. 莊樹諄. www.sinica.edu.tw/~trees/bioinformatics E-mail: [email protected]. Introduction. Outline. Introduction. Some Research Topics. Related Links and Resources. - PowerPoint PPT Presentation

Citation preview

Page 1: Introduction of  Genome Research

1 (BRC)中研院生物資訊中心90/4/9 pm

Introduction of Introduction of Genome ResearchGenome Research

Bioinformatics Research CenterInstitute of Biomedical Sciences

ACADEMIA SINICA

莊樹諄www.sinica.edu.tw/~trees/bioinformaticsE-mail: [email protected]

Page 2: Introduction of  Genome Research

Outline

IntroductionIntroduction Some Research Topics Related Links and Resources Bioinformation Research Center (BR

C)

Introduction

290/4/9 pm 中研院生物資訊中心 (BRC)

Page 3: Introduction of  Genome Research

390/4/9 pm 中研院生物資訊中心 (BRC)

Chromosome

Page 4: Introduction of  Genome Research

Introduction

490/4/9 pm

GeneDNA Sequence

Intron

5‘ 3’

5‘UTR 3’UTR

mRNAcDNAComplement DNA

ORF

Exon(coding regions)

DNA

RNA

Protein

Function

Page 5: Introduction of  Genome Research

90/4/9 pm (BRC)中研院生物資訊中心 5

DNA sequence: A, C, G, T --- 4 letters RNA sequence: A, C, G, U (Uracil, (U), 尿嘧啶 ) --- 4 letters

Introduction

DNA nucleotide acid ( 核苷酸 )

Phosphoric acid( 磷酸 ) Deoxyribose ( 去氧核糖 ) Nitrogenous base ( 含氮鹽基 )

Nitrogenous base ( 含氮鹽基 )

Purines :

Pyrimidine :

Nitrogenous base ( 含氮鹽基 )

Adenine (A, 腺嘌呤 ) Guanine (G, 鳥糞嘌呤 )

Cytosine (C, 胞嘧啶 ) Thymine (T, 胸腺嘧啶 )

Page 6: Introduction of  Genome Research

ACCGTGTGGCAGTGCACAGGTATTTGGCCATAGACA

5‘ 3‘

TGGCACACCGTCACGTGTCCATAAACCGGTATCTGT3‘ 5‘

Codon

ACCGTGTGGCAGTGCACAGGTATTTGGCCATAGACA

Amino acid

90/4/9 pm 中研院生物資訊中心 (BRC) 6

43 = 64 20

Page 7: Introduction of  Genome Research

IntroductionDNA sequence: A, C, G, T --- 4 lettersRNA sequence: A, C, G, U --- 4 lettersAmino acid sequence: --- 20 letters

7

Second position ThirdPosition (3’)

FirstPosition (5’) U C A G

UCAG

UCAGUCAG

UCAG

U

C

A

G

Phe (F) Ser (S) Tyr (Y) Cys (C)Phe (F) Ser (S) Tyr (Y) Cys (C)Leu (L) Ser (S) StopStop StopStopLeu (L) Ser (S) StopStop Trp (W)Leu (L) Pro (P) His (H) Arg (R)Leu (L) Pro (P) His (H) Arg (R)Leu (L) Pro (P) Gln (Q) Arg (R)Leu (L) Pro (P) Gln (Q) Arg (R)Ile (I) Thr (T) Asn (N) Ser (S)Ile (I) Thr (T) Asn (N) Ser (S)Ile (I) Thr (T) Lys (K) Arg (R)Met (M)Met (M) Thr (T) Lys (K) Arg (R)Val (V) Ala (A) Asp (D) Gly (G)Val (V) Ala (A) Asp (D) Gly (G)Val (V) Ala (A) Glu (E) Gly (G)Val (V) Ala (A) Glu (E) Gly (G)

StopStop StopStopStopStop

Met (M)Met (M)

中研院生物資訊中心90/4/9 pm

Page 8: Introduction of  Genome Research

6-frame translations6-frame translationsaagctgatcgatcgattttagatagagaaaaaact K L I D R F - I E K Kaagctgatcgatcgattttagatagagaaaaaact S - S I D F R - R K N aagctgatcgatcgattttagatagagaaaaaact A D R S I L D R E K Tagttttttctctatctaaaatcgatcgatcagctt S F F S I - N R S I Sagttttttctctatctaaaatcgatcgatcagctt V F S L S K I D R S Aagttttttctctatctaaaatcgatcgatcagctt F F L Y L K S I D Q L

5'3' Frame 1

5'3' Frame 2

5'3' Frame 3

3'5' Frame 1

3'5' Frame 2

3'5' Frame 3

Introduction

8中研院生物資訊中心90/4/9 pm

Page 9: Introduction of  Genome Research

90/4/9 pm (BRC)中研院生物資訊中心 9

Introduction

EST (Expressed Sequence Tags) DBEST (Expressed Sequence Tags) DB

HGI (Human Gene Index) DBHGI (Human Gene Index) DB

Gene : Gene : ExonExon & Intron & IntroncDNA DatabasecDNA Database

UniGene DBUniGene DB

Page 10: Introduction of  Genome Research

Introduction

Human Genome Sequencing (2/11/2001)

Draft 61.0 %

Finished 32.5%

Total 93.5 %

10中研院生物資訊中心90/4/9 pm

Page 11: Introduction of  Genome Research

gap

Chromosome

90/4/9 pm 12中研院生物資訊中心

Page 12: Introduction of  Genome Research

90/4/9 pm (BRC)中研院生物資訊中心 12

Introduction

Phase 0: Single-few pass reads of a single clone (not contigs)

Genome Database -- 3×10Genome Database -- 3×109 9

HTGS (High Throughput Genomic Sequences)HTGS (High Throughput Genomic Sequences)

Phase 1: Unfinished, may be unordered, unoriented contigs, with gaps.

Phase 2: Unfinished, ordered, oriented contigs, with or without gaps.

Phase 3: Finished, no gaps (with or without annotations).

Page 13: Introduction of  Genome Research

90/4/9 pm (BRC)中研院生物資訊中心 13

Size range (kb) Contigs Aggregate size (kb) Percent of total  

<30 kb 44 666 0.1%  

30-100 479 32172 4.9%  

100-250 1628 260933 39.9%  

250-500 421 144518 22.1%  

500-1000 145 98623 15.1%  

>1000 kb 43 116557 17.8%  

total 2760 653471 100.0%  

Introduction

Page 14: Introduction of  Genome Research

Outline

Some Research Topics Related Links and Resources Bioinformation Research Center (BR

C)

Introduction Some Research Topics

14中研院生物資訊中心90/4/9 pm

Page 15: Introduction of  Genome Research

15

Early estimate: 60,000~100,000

By Ch22: ~45,000

By EST: ~140,000

By Ch22 & HGI-5.0: ~120,000 (1.38-fold gene

rich and extremely cleaning and assemble process)

By 2/16/2001 Science: ~ 30,000

There are many more genes awaiting discovery

within the sequence

Gene number of human

中研院生物資訊中心90/4/9 pm

Page 16: Introduction of  Genome Research

90/4/9 pm (BRC)中研院生物資訊中心 16

Some Research Topics

Alternative SplicingAlternative Splicing

Human DiversityHuman Diversity

Gene SignatureGene Signature

Genome AnnotationGenome Annotation

Page 17: Introduction of  Genome Research

Human Genome: 3x109 bp

Genomic Sequence

Coding Region Non-coding Region

Gene

Single Nucleotide Polymorphism (SNP)

Inter-genic Region

Variations

gSNP

cSNP rSNP iSNP nSNP

106-107

Functional Variants (5%)17中研院生物資訊中心90/4/9 pm

Page 18: Introduction of  Genome Research

Gene-based SNPsGene-based SNPs

18中研院生物資訊中心90/4/9 pm

Gene 1 Gene 2

P1 P2

nSNPrSNP

cSNP iSNP

exon

Intron

Page 19: Introduction of  Genome Research

90/4/9 pm (BRC)中研院生物資訊中心 19

Human DiversityHuman Diversity

SNP (Single Nucleotide Polymorphism) cSNP (Coding SNP)

acccgctcgtcgct tgtgtt cggctaattgcgcgaat C

cC

Synonymous(tgt tgc C)

Silent

gH

Non-synonymous(tgt C, tgg W)

C: polar W: nonpolar(Non-conservative)

tat YY: polar

(Conservative)

Page 20: Introduction of  Genome Research

90/4/9 pm (BRC)中研院生物資訊中心 20

Human DiversityHuman Diversity

SNP (Single Nucleotide Polymorphism) cSNP (Coding SNP)

Purines (A/G) & Pyrimidines (C/T)Transition: A G, C TTransversion: A/G C/T

CD-CV: common diseases - common variants.

Page 21: Introduction of  Genome Research

90/4/9 pm (BRC)中研院生物資訊中心 21

Ch22: 134 pseudogenes (134/679 19%)

Pseudogene

Processed pseudogene (cDNAgenebank, 82% of 134 pseudogenes)

a) Single block

b) Lack characteristic intron – exon structure

Spliced pseudogene – segments of duplicated gene families

PseudogenePseudogene

Page 22: Introduction of  Genome Research

90/4/9 pm (BRC)中研院生物資訊中心 22

Tandem Repeats

Repetitive SequenceRepetitive Sequence

SINEs (Short Interspersed Elements): Alu, MIR, MER, LTR, PTR,

LINEs (Long Interspersed Elements): LINE1, LINE2,

Interspersed Repeats

Mini Satellite (Variable Number Tandem Repeats (VNTR)): 15~100 bp

Micro Satellite (Short Tandem Repeats (STR)): 2~5 bp

α-Satellite: at centromere

Telomere Repeats

CentromereTelomere

Page 23: Introduction of  Genome Research

Outline

Some Research Topics Related Links and Resources Bioinformation Research Center (BR

C)

Introduction

Related Links and Resources

2390/4/9 pm 中研院生物資訊中心

Page 24: Introduction of  Genome Research

90/4/9 pm (BRC)中研院生物資訊中心 24

TIGR(The Institute for Genomic Research) http://www.tigr.org/

Japan Science and Technology Corporation - Advanced Lifescience Information System JST - ALIS )

http://www-alis.tokyo.jst.go.jp/HGS/top.pl

NCBI (National Center for Biotechnology Information) http://www.ncbi.nlm.nih.gov/ Sanger --- http://www.ensembl.org/

Related Links and Resources

Page 25: Introduction of  Genome Research

90/4/9 pm (BRC)中研院生物資訊中心 25

Gene Prediction ProgramsGene Prediction Programs http://www.bork.embl-heidelberg.de/genepredict.html

http://linkage.rockefeller.edu/wli/gene/programs.html

ExPASy_Traslate ToolExPASy_Traslate Toolhttp://expasy.nhri.org.tw/tools/dna.html

Bioinformatics Research Center, Academia SinicaBioinformatics Research Center, Academia Sinicahttp://www.sinica.edu.tw/~trees/bioinformatics/bioinformatics.html

Related Links and Resources

Page 26: Introduction of  Genome Research

Outline

Some Research Topics Related Links and Resources Bioinformation Research Center (BR

C)

Introduction

Bioinformation Research Center (BRC)

2690/4/9 pm 中研院生物資訊中心

Page 27: Introduction of  Genome Research

Firewall

Local Server

Lab. 1 Lab. 2 Lab. 3

27中研院生物資訊中心90/4/9 pm

Page 28: Introduction of  Genome Research

90/4/9 pm (BRC)中研院生物資訊中心 28

CRASA:CRASA: CComplexity RReduction AAlgorithm for SSequence AAnalysis

Genome Annotation Alternative Splicing SNP (Single Nucleotide Polymorphism)

cDNA database

Genome Sequences: Chromosome1~22,

X,Y

Page 29: Introduction of  Genome Research

90/4/9 pm (BRC)中研院生物資訊中心 29

CRASA:CRASA: CComplexity RReduction AAlgorithm for SSequence AAnalysis

PC Clustering: 10 PC (PIII-667), 1 Server Win2000 (NT) HD: IDE support RAID DB2

Progressive Processing: Pyramid Structure Pattern Match Direct Search Parallel Processing

Environment

Algorithm

Page 30: Introduction of  Genome Research

Server

query

p1 p2 p3

HD I/O bound

Network I/O bound

Sorting & assembling: CPU bound

Parallel ProcessingParallel Processing

30中研院生物資訊中心90/4/9 pm

Page 31: Introduction of  Genome Research

BioinformaticsBioinformatics

Computer Science Biology Computer Science Biology

??