19
Session12 GENCODE: The reference human genome annotation for The ENCODE Project (Harrow et. al) 発表者:野健太(Preferred Infrastructure[email protected] Twitter:@delta2323_ 2012/09/29 ENCODE勉強会@柏キャンパス

Encode勉強会:GENCODE: The reference human genome annotation for The ENCODE Project (Harrow et. al)

Embed Size (px)

Citation preview

  • Session12 GENCODE: The reference human genome annotation for The ENCODE Project (Harrow et. al)

    Preferred Infrastructure

    [email protected]

    Twitter:@delta2323_

    2012/09/29

    ENCODE@

  • GENCODE

  • GENCODE(1/1) Encyclopedia of genes and gene variants Identify all gene features in the human genome

    Annotation of protein-coding gene, lncRNA loci and pseudogenes

    Combination of computational analysis, manual annotation, and experimental validation.

    713

  • (1/3 : GENCODE 13)

  • (2/3 : Transctipts, CDS(Fig.7))

  • (3/3 : Transctipts, CDS(Supplemental Figure 5))

  • GENCODE(1/3)

    http://www.gencodegenes.org/releases/13.html

    UIAPI (Supplemental Table 11)

    UCSC genome brower

    Ensembl genome browser

    Ensembl BioMart (Perl/Ruby APIDB)

    1000 Genome Project Consortium

    International Cancer Genome Consortium

  • UCSC Genome Browser (2/3) http://genome-mirror.moma.ki.au.dk/cgi-bin/hgTracks

  • UCSC Genome Browser (3/3:Result)

    http://genome-mirror.moma.ki.au.dk/cgi-bin/hgc?hgsid=192670&c=chr21&o=33032021&t=33041244&g=wgEncodeGencodeBasicV12&i=ENST00000470944.1

  • (1/2 : ) (Fig. 1)

  • (2/2 : ) (Supplementary Table1)

    LocusBiotype, Status, Level

    Biotype : protein-coding/lncRNA/(polymorphic)pseudogene Status : Known/Novel/Putative

    Level : Level1:manually annotated and experimentally validated

    Level2:manually annotated

    Level3:automatically annotated

    transcripts e.g. protein-coding locus

    protein coding, NMD, NSD etc.

  • (1/1)

    long non-coding RNA (lncRNA)

    5058 lincRNA, 3214 antisense loci, 378 sense intronic loci, and 930 processed transcripts loci.

    pseudogene PTENP1[Poliseno et al. 2010], DHFRL1[McEntee et al.

    2011]

    11224 pseudogene loci(=Level2)

    7183 lociLevel1

    934 loci(480(EST or cDNA)+454(HBM)) transcribed

  • (1/1)

    100protein-coding loci

    lincRNA(GENCODE42798lincRNA) NMDtranscript4

    NMDmicroRNA[Bruno et al. 2011]

    pseudogenepseudogene10000

    de novo assemblytranscript

    (Pacific Bioscience)Transcriptome annotation

  • (1/1)

    GENCODE Ver.3~Ver.7

    Quality Control System (Annotrack : Error Tracking System)

    RT-PCR-seq PhyloCSFtranscript

    putativetranscript

  • (Supplemental Table 5)

  • (Supplemental Figure 6)

  • Biomart Ruby Client (github)

  • pseudogenebiotype (Supplemental Table 1)