Upload
suk-namgoong
View
579
Download
9
Embed Size (px)
DESCRIPTION
한남대학교 생물정보학 강의 9강-개인지놈시퀀싱 #1
Citation preview
Bioinformatics
2014 2학기
생명시스템과학과
한남대학교
9강 2014.10.28
강의계획서
주 수업내용
1주 생물정보학의개요및기본이론
2주차 추석(휴강)
3주차 서열분석의원리 I
4주차 서열분석의원리 II
5주차 단백질의구조및기능예측
6주차 지놈시퀀싱및시퀀스어셈블리
7주차 중간고사
8주차 차세대시퀀싱 (Next Generation Sequencing)
9주차 개인유전체학 I
10주차
개인유전체학 II
11주차
발현체학
12주차
메타지놈
13주차
최신연구동향
Personal Genome
- 개별적인인간사이의유전적인변이
-유전적인변이에의해달라지는표현형
• 피부색• 머리카락색• 눈색• 외관• 신장….
Snyder M et al. Genes Dev. 2010;24:423-431
개별적인인간사이의유전적인변이에는어떤것이있나?
SNP/Indel
Phased SNP
Deletion
Insertion
Inverstion
ACGTTTGGATAC
TGCAAACCTATG
ACGTTTGTATAC
TGCAAACATAT
G
SNP (Single Nucleotide Polymorphisms)
• DNA 염기서열의 1염기의변화
• 표준참조서열과비교하면개개인은약 3-400만개의 SNP를지니고있음
• 빈도가매우낮은변이가있는가하면, 빈도가높은경우도있음
- Common Variant (20-40% 빈도)
- Rare Variant (1% 이하의빈도)
SNPs vs. SNVs
둘다한염기에서발견되는변이이지만..• SNP
– 특정한종에서이미발견된변이 (특성파악이잘되어있음)– 집단에서특정한비율로이미존재한다고알려진변이– 집단에서검증되서– dbSNP에기록(http://www.ncbi.nlm.nih.gov/snp)
• SNV– ‘단한사람’ 에게만발견된변이 (특성파악이잘되어있지않음)– 아주낮은빈도로만발생– 집단의다른사람에게서존재한다는것이검증되지않음
Really a matter of frequency of occurrence
http://ccsb.stanford.edu/education/Nair_NGS.pptx
TGCAAACCTATG
Indel (Insertion/Deletion)
• 미세한 (1kb 이하) 염기의추가혹은삭제• 개인별로약 30-60만개의 Insertion/Deletion 이있는것으로추산됨• Large Scale Structural Variation (2kb 이상의추가혹은삭제)
- 개인별로약 1,000곳이상
TGCAAAC-TATG
TGCAAACC-TATG
TGCAAACCCTATG
Structural Variation
• Large Scale Change of DNA (1kb ~ 3Mb)
• 구분- Microscopic structural variation
- Copy Number variation
- Chromosomal Inversion
Microscopic structural variation
• 현미경으로관찰할수있을수준의큰유전적인변이
• Aneuplodidy : 23쌍의 염색체대신추가로염색체가존재하는현상
• Chromosome Translocation
다운증후군 (Down syndrome)
Copy Number Variation
지놈내의특정영역/유전자의증폭혹은감소
휴먼지놈내의 Copy Number Variations
인간의유전변이에대한조사
• The 1000 Genomes Project
– http://www.1000genomes.org/
– SNPs and structural variants
– genomes of about 2500 unidentified people from about 25 populations around the
world will be sequenced using NGS technologies
• HapMap
– http://hapmap.ncbi.nlm.nih.gov/
– identify and catalog genetic similarities and differences
• dbSNP
– http://www.ncbi.nlm.nih.gov/snp/
– Database of SNPs and multiple small-scale variations that include indels,
microsatellites, and non-polymorphic variants
• COSMIC
– http://www.sanger.ac.uk/genetics/CGP/cosmic/
– Catalog of Somatic Mutations in Cancer
• TCGA
– http://cancergenome.nih.gov/
– The Cancer Genome Atlas researchers are mapping the genetic changes in 20
selected cancers
개인간의변이의검출
• SNP, Indel, Insertion/Deletion, Inversion…
• 이러한변화를어떻게검출할것인가?
Microarray : High throughput
PCR-Sanger Sequencing : Low throughput
Next Generation Sequencing : Method of Choice nowaday
Cost of DNA sequencing and cumulative number of genomes sequenced as a function of
time.
Snyder M et al. Genes Dev. 2010;24:423-431
NGS에의한개인지놈변이결정
Snyder M et al. Genes Dev. 2010;24:423-431
Methods for detecting variation in a human genome sequence using DNA sequencing
technologies.
Snyder M et al. Genes Dev. 2010;24:423
431
NGS Read Mapping
NGS 에서얻은시퀀싱데이터 (Reads)를참조지놈서열 (Reference Genome Sequence)에매핑
ATGAGATAGAGATAGAAAGGGAGAGAGAATAGA
Genome Sequence
Sequence Reads
이미우리는 BLAST 혹은 BLAT을이용하여이런것을할수있다는것을배웠음.
그러나시퀀싱데이터의크기는막대하여 BLAST 혹은 BLAT 에비해훨씬빠른방법이필요
NGS Read Mapping Software
Earlier
Eland
SOAP
MAQ
Newer
Bowtie
BWA
SOAP2
FasterUses Less Memory
NGS Read Mapping
필요한것
- Reference Genome Sequences (Fasta Format)
- Sequence Data (Fastq format)
Software
-Bwa
• 대부분의 Software는 unix 기반• 지놈데이터는매우큰관계로일반적인 PC에서구동하기는버거움
Flow
Sequence DataFastQ
Genome Sequence Alignment File(sam format)
Galaxy
휴대폰이야기가아님 -.-
http://usegalaxy.org
대부분의 NGS 관련분석을웹인터페이스에서수행가능
First Thing to do..
데이터 (시퀀싱데이터, 레퍼런스지놈시퀀스)를얻어업로드
파일혹은인터넷위치를지정
업로드된데이터혹은분석결과는History 에저장됨
ftp://ftp.gmod.org/pub/gmod/Courses/2012/SummerSchool/Galaxy/phiX174_genome.fa
ftp://ftp.gmod.org/pub/gmod/Courses/2012/SummerSchool/Galaxy/phiX174_reads.fastqsanger
http://gmod.org/wiki/Galaxy_Tutorial_2012_Extras
https://usegalaxy.org/u/luce/h/workshopdatasets
타인이올려둔예제데이터를이용
FASTQ Format
Data QC
Data Trimming
퀄리티가나쁜데이터를잘라버림
Before After
https://usegalaxy.org/u/galaxyproject/p/galaxy-101-ngs-variant
샘플데이터 : 엄마 – 자식의미토콘드리아 DNA Sequencing Data• 미토콘드리아는모계유전. • 이것을미토콘드리아 Reference DNA에매핑하고 Variant를찾는다
시퀀싱데이터는 Paired End
업로드됨
4개의데이터에대해서퀄리티체크 (FastQC)
나쁘지않으므로그냥 Trimming 없이 Mapping 진행
Map with BWA
레퍼런스는 Human mtDNA
이데이터는 Paired End다
Child의첫번째
Child의두번째
매핑개시
이번에는엄마데이터
매핑완료
매핑된위치
Paired End Reads
Genome
이런식으로제대로매핑된 Read만골라냄
필터링할 sam 파일을고르고
Yes
매핑되고제대로짝을이룬 Read만골라낸다
Sam 형식의파일을 Bam 형식 (여러프로그램에서지원하는) 으로변환
Mother와 Child 데이터를통합
Variant Calling
• mPileup (SamTools)
• Genome Analysis Toolkit (GATK)
Bam/Sam File (Alignment File) 로부터 SNP, Indel 을찾아내는작업
SAM/BAM VCFSequence Alignments Variant Informations
Variant Caller
Visualize mapping
Mapping 된데이터를 BAM 포맷으로변경
Downloads
Sam/Bam Format 의 alignment를보기위해서는여러가지소프트웨어가존재
SNV Filtering
Pre-processing in the mapping phase and SNV filtering help minimize false positives
• Absent in dbSNP
• Exclude LOH events
• Retain non-synonymous
• Sufficient depth of read coverage
• SNV present in given number of reads
• High mapping and SNV quality
• SNV density in a given bp window
• SNV greater than a given bp from a predicted indel
• Strand balance/bias
• Concordance across various SNV callers
http://ccsb.stanford.edu/education/Nair_NGS.pptx
Variant Annotation
• 실제찾아진 Variant에대한해석• SeattleSeq
– annotation of known and novel SNPs
– includes dbSNP rs ID, gene names and accession numbers, SNP functions (e.g., missense), protein positions and amino-acid changes, conservation scores, HapMap frequencies, PolyPhen predictions, and clinical association
• Annovar– Gene-based annotation
– Region-based annotations
– Filter-based annotation
http://snp.gs.washington.edu/SeattleSeqAnnotation/http://www.openbioinformatics.org/annovar/
http://ccsb.stanford.edu/education/Nair_NGS.pptx
Galaxy Demo
Discovery of CNV
SNP/SNV의해석에비해난이도가높은편
CNV를발견하는방법 : Comparative Genome Hybridization, WGS
Comparative Genome Hybridization
Discovery of CNV by WGS
1. Generation of ‘Mate-Pair’ Library
Long Insert :
2. Mapping of Reads To Genome
3. Detection of Deletion
4. Detection of Insertion
5. Detection of Duplication
Variation의해석
- 개인지놈에서나타나는 SNP/SNV, Indel, Insertion, Deletion, Inversion 등의생물학적인의미를해석
- 기존에알려진 Variation 과의비교
- 새로운 Variation인경우, 그의미를추정
dbSNPDatabase for the short DNA variations
Example of SNP
ACTN3 유전자의 R577X, 유전자의기능을상실
http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=1815739
해당 SNP에대한자세한정보
해당 SNP에대한자세한정보
SNPedia Wiki for SNP informations http://www.snpedia.com/index.php/SNPedia
여기서 Rs1815739 에대한데이터를검색
http://www.snpedia.com/index.php/Rs1815739
ACTN3 : 근육단백질인 alpha-actinin-3 이라는단백질에 Stop codon을유도
C:C 단백질을제대로만드는경우에는 RRT:T 단백질을제대로만들지못하는경우에는 XX
운동선수에는 T:T 분포가매우적음정상적인생활에는상관이없으나운동선수에게는해당유전자가필요함
침을뱉어서회사에보내면..
DNA를추출한후 SNP Genotyping
결과를웹사이트에서확인가능..