31931-31941

Embed Size (px)

Citation preview

  • 8/13/2019 31931-31941

    1/64

  • 8/13/2019 31931-31941

    2/64

    SAGE TECHNOLOGY AND

    ITS APPLICATIONS

    PRESENTED BYDr. R.A.Siddique &Dr.Anand KumarAnimal Biochemistry DivisionN.D.R.I., Karnal (Haryana)India, 132001E-mail: [email protected]

  • 8/13/2019 31931-31941

    3/64

    WHAT IS SAGE?

    Serial analysis of gene expression (SAGE) isa powerful tool that allows digital analysis ofoverall gene expression patterns.

    Produces a snapshot of the mRNA populationin the sample of interest.

    SAGE provides quantitative andcomprehensive expression profiling in a givencell population.

  • 8/13/2019 31931-31941

    4/64

    SAGE invented at Johns Hopkins University in

    USA (Oncology Center) by Dr. Victor Velculescuin 1995.

    An overview of a cells complete gene activity.

    Addresses specific issues such as determination ofnormal gene structure and identification ofabnormal genome changes.

    Enables precise annotation of existing genes anddiscovery of new genes.

  • 8/13/2019 31931-31941

    5/64

    NEED FOR SAGE..

    Gene expression refers to the study of howspecific genes are transcribed at a given point intime in a given cell.

    Examining which transcripts are present in a cell.

    SAGE enables large scale studies of DNAexpression; these can be used to create'expression profiles.

  • 8/13/2019 31931-31941

    6/64

    Allows rapid, detailed analysis of thousands of

    transcripts in a cell.

    By comparing different types of cells, generate

    profiles that will help to understand healthy cellsand what goes wrong during diseases.

  • 8/13/2019 31931-31941

    7/64

    THREE PRINCIPLES UNDERLIE THESAGE METHODOLOGY:

    A short sequence tag (10-14bp) contains sufficientinformation to uniquely identify a transcript provided that

    the tag is obtained from a unique position within eachtranscript

    Sequence tags can be linked together to from long serialmolecules that can be cloned and sequenced

    Quantitation of the number of times a particular tag isobserved provides the expression level of thecorresponding transcript.

  • 8/13/2019 31931-31941

    8/64

  • 8/13/2019 31931-31941

    9/64

    PRE REQUISITES:

    Extensive sequencing techniques

    Deep bioinformatic knowledge Powerful computer software (assemble and analyze results

    from SAGE experiments)

    Limited use of this sensitive technique inacademic research laboratories

  • 8/13/2019 31931-31941

    10/64

    STEPS IN BRIEF..

    1. Isolate the mRNA of an input sample (e.g. atumour ).

    2. Extract a small chunk of sequence from adefined position of each mRNA molecule.

    3. Link these small pieces of sequence together toform a long chain (or concatamer).

  • 8/13/2019 31931-31941

    11/64

    4. Clone these chains into a vector whichcan be taken up by bacteria.

    5. Sequence these chains using modern high-throughput DNA sequencers .

    6. Process this data with a computer to countthe small sequence tags.

  • 8/13/2019 31931-31941

    12/64

    SAGE FLOWCHART

  • 8/13/2019 31931-31941

    13/64

    SAGE TECHNIQUE (in detail)Trap RNAs with beads Messenger RNAs end with a long string of "As" ( adenine )

    Adenine forms very strong chemical bonds with another nucleotide,thymine (T )

    Molecule that consists of 20 or so Ts acts like a chemical bait tocapture RNAs

    Researchers coat microscopic, magnetic beads with chemical baits i.e."TTTTT " tails hanging out

    When the contents of cells are washed past the beads, the RNAmolecules will be trapped

    A magnet is used to withdraw the bead and the RNAs out of the"soup"

  • 8/13/2019 31931-31941

    14/64

  • 8/13/2019 31931-31941

    15/64

    cDNA SYNTHESIS

    Double stranded cDNA is synthesized from the extractedmRNA by means of biotinylated oligo (dT) primer.

    cDNA synthesized is immobilised to streptavidin beads.

  • 8/13/2019 31931-31941

    16/64

  • 8/13/2019 31931-31941

    17/64

    ENZYMATIC CLEAVAGE OF cDNA.

    The cDNA molecule is cleaved with a restrictionenzyme.

    Type II restriction enzyme used.

    Also known as Anchoring enzyme. E.g. NlaIII. Any 4 base recognising enzyme used.

    Average length of cDNA 256bp with sticky endscreated.

  • 8/13/2019 31931-31941

    18/64

    The biotinylated 3 cDNA are affinity purified using strepatavidin coated magnetic beads.

  • 8/13/2019 31931-31941

    19/64

    LIGATION OF LINKERS TO BOUND

    cDNAThese captured cDNAs are divided into twohalves, then ligated to linkers A and B,respectively at their ends.Linkers also known as docking modules. They are oligonucleotide duplexes.Linkers contain:

    NlaIII 4- nucleotide cohesive overhangType IIS recognition sequencePCR primer sequence (primer A or B).

  • 8/13/2019 31931-31941

    20/64

    Type IIS restriction enzyme tagging enzyme.

    Linker/docking module:

    PRIMER TE AE TAG

  • 8/13/2019 31931-31941

    21/64

    CLEAVAGE WITH TAGGINGENZYME

    Tagging enzyme, usually BmsFI cleave DNA 14-15 nucleotides, releasing the linker adaptedSAGE tag from each cDNA.

    Repair of ends to make blunt ended tags usingDNA polymerase (Klenow) and dNTPs.

  • 8/13/2019 31931-31941

    22/64

  • 8/13/2019 31931-31941

    23/64

    FORMATION OF DITAGSWhat is left is a collection of short tags taken from eachmolecule.

    Two groups of cDNAs are ligated to each other, to create a

    ditag with linkers on either end.

  • 8/13/2019 31931-31941

    24/64

    Ligation using T4 DNA ligase.

  • 8/13/2019 31931-31941

    25/64

    PCR AMPLIFICATION OFDITAGS

    The linker-ditag-linker constructs areamplified by PCR using primers specificto the linkers.

  • 8/13/2019 31931-31941

    26/64

    ISOLATION OF DITAGSThe cDNA is again digested by the AE.

    Breaking the linker off right where it was added in the beginning.

    This leaves a sticky end with the sequence GTAC (orCATG on the other strand) at each end of the ditag.

  • 8/13/2019 31931-31941

    27/64

    CONCATAMERIZATION OF

    DITAGSTags are combined into much longer molecules, calledconcatemers.

    Between each ditag is the AE site, allowing the scientistand the computer to recognize where one ends and the next begins.

  • 8/13/2019 31931-31941

    28/64

    CLONING CONCATAMERS

    AND SEQUENCINGLots of copies are required- So the concatemers are putinto bacteria, which act like living "copy machines" tocreate millions of copies from the original

    These copies are then sequenced, using machines that canread the nucleotides in DNA. The result is a long list ofnucleotides that has to be analyzed by computer

    Analysis will do several things: count the tags, determinewhich ones come from the same RNA molecule, and figureout which ones come from known, well-studied genes andwhich ones are new

  • 8/13/2019 31931-31941

    29/64

    Quantitation of gene expressionAnd data presentation

  • 8/13/2019 31931-31941

    30/64

    How does SAGE work?1. Isolate mRNA.

    2.(b) Synthesize ds cDNA.

    2.(a) Add biotin -labeled dT primer:

    4.(a) Divide into two pools and add linker sequences:4.(b) Ligate.

    3.(c) Discard loose fragments.

    3.(a) Bind to streptavidin -coated beads.3.(b) Cleave with anchoring enzyme .

    5. Cleave wi th tagging enzyme .

    6. Combine pools and ligate.

    7. Amplify ditags, then cleave with anchoring enzyme .

    8. Ligate ditags.

    9. Sequence an d record the tags and frequencies.

  • 8/13/2019 31931-31941

    31/64

    Vast amounts of data is produced, whichmust be sifted and ordered for useful

    information to become apparent.Sage reference databases:

    SAGE mapSAGE Genie

    http://www.ncbi.nlm.nih.gov/cgap

    http://www.ncbi.nlm.nih.gov/cgaphttp://www.ncbi.nlm.nih.gov/cgap
  • 8/13/2019 31931-31941

    32/64

    What does the data look like?TAG COUNT TAG COUNT TAG COUNT

    CCCATCGTCC 1286 CACTACTCAC 245 TTCACTGTGA 150

    CCTCCAGCTA 715 ACTAACACCC 229 ACGCAGGGAG 142

    CTAAGACTTC 559 AGCCCTACAA 222 TGCTCCTACC 140

    GCCCAGGTCA 519 ACTTTTTCAA 217 CAAACCATCC 140CACCTAATTG 469 GCCGGGTGGG 207 CCCCCTGGAT 136

    CCTGTAATCC 448 GACATCAAGT 198 ATTGGAGTGC 136

    TTCATACACC 400 ATCGTGGCGG 193 GCAGGGCCTC 128

    ACATTGGGTG 377 GACCCAAGAT 190 CCGCTGCACT 127

    GTGAAACCCC 359 GTGAAACCCT 188 GGAAAACAGA 119

    CCACTGCACT 359 CTGGCCCTCG 186 TCACCGGTCA 118

    TGATTTCACT 358 GCTTTATTTG 185 GTGCACTGAG 118

    ACCCTTGGCC 344 CTAGCCTCAC 172 CCTCAGGATA 114

    ATTTGAGAAG 320 GCGAAACCCT 167 CTCATAAGGA 113

    GTGACCACGG 294 AAAACATTCT 161 ATCATGGGGA 110

  • 8/13/2019 31931-31941

    33/64

    FROM TAGS TO GENES Collect sequence records from GenBankAssign sequence orientation (by finding poly-Atail or poly-A signal or from annotations)

    Extract 10-bases - adjacent to 3 -most CATGAssign UniGene identifier to each sequence with aSAGE tagRecord (for each tag-gene pair)

    #sequences with this tag#sequences in gene cluster with this tag

    Maps available at http://www.ncbi.nlm.nih.gov/SAGE

  • 8/13/2019 31931-31941

    34/64

  • 8/13/2019 31931-31941

    35/64

    DIFFERENTIAL GENE

    EXPRESSION BY SAGEIdentification of differentially expressedgenes in samples from different

    physiological or pathological conditions.Application of many statistical methods

    Poisson approximation

    Bayesian methodChi square test.

  • 8/13/2019 31931-31941

    36/64

    SAGE software searches GenBank for matchesto each tagThis allows assignment to 3 categories of tags:

    mRNAs derived from known genesanonymous mRNAs, also known as expressed sequencetags (ESTs)

    mRNAs derived from currently unidentified genes

  • 8/13/2019 31931-31941

    37/64

    SAGE VS MICROARRAYSAGE An open system which detects both known andunknown transcripts and genes.

  • 8/13/2019 31931-31941

    38/64

    COMPARISON

    SAGEDetects 3 region oftranscript. Restriction siteis determining factor.

    Collects sequenceinformation and copy no.Sequencing error andquantitation bias.

    MICROARRAYTargets various regions ofthe transcript.Basecomposition for specificityof hybridization.Fluorescent signals andsignal intensity.Labeling bias and noisesignals.

  • 8/13/2019 31931-31941

    39/64

    Contd

    F eatures SAGE M icroarray

    Detects unknowntranscripts

    Yes No

    Quantification Absolute measure Relative measureSensitivity High Moderate

    Specificity Moderate High

    Reproducibility Good for higher

    abundance transcripts

    Good for data from

    intra-platformcomparison

    Direct cost 5-10X higher thanarrays.

    5-10 X lower thanSAGE

  • 8/13/2019 31931-31941

    40/64

    RECENT SAGE APPLICATIONS

    Analysis of yeast transcriptomeGene Expression Profiles in Normal and Cancer CellInsights into p53-mediated apoptosisIdentification and classification of p53-regulated genesAnalysis of human transcriptomesSerial microanalysis of renal transcriptomesGenes Expressed in Human Tumor Endothelium

    Analysis of colorectal metastases (PRL-3)Characterization of gene expression in colorectal adenomasand cancerUsing the transcriptome to analyze the genome (Long SAGE)

  • 8/13/2019 31931-31941

    41/64

    LIMITATIONS Does not measure the actual expression level of a gene.

    Average size of a tag produced during SAGE analysis isten bases and this makes it difficult to assign a tag to aspecific transcript with accuracy

    Two different genes could have the same tag and the samegene that is alternatively spliced could have different tags atthe 3' ends

    Assigning each tag to an mRNA transcript could be madeeven more difficult and ambiguous if sequencing errors arealso introduced in the process

  • 8/13/2019 31931-31941

    42/64

    Quantitation bias: Contamination of of large quantities of linker-dimer molecules. low efficiency in blunt end ligation. Amplification bias.

    Depending upon anchoring enzyme and tagging enzymeused, some fraction of mRNA species would be lost.

  • 8/13/2019 31931-31941

    43/64

    Advances over SAGE

    Generation of longer 3` cDNA from SAGE tagsfor gene identification (GLGI)

    Long SAGE Cap Analysis of Gene Expression (CAGE)

    Gene Identification Signature (GIS)

    SuperSAGE

    Digital karyotyping

    Paired-end ditag

  • 8/13/2019 31931-31941

    44/64

    Long SAGE

    Increased specificity of SAGE tags fortranscript identification and SAGE tag

    mapping.Collects tags of 21bpDifferent TypeII restriction enzyme- Mmel

    Adapts SAGE principle to genomic DNA.Allows localisation of TIS and PAS.

  • 8/13/2019 31931-31941

    45/64

  • 8/13/2019 31931-31941

    46/64

    CAGE (Capped Analysis of Gene Expression)

    Aims to identify TIS and promoters.Collects 21 bp from 5 ends of cap purified cDNA.

    Used in mouse and human transcriptome studies. The method essentially uses full-lengthcDNAs , to the 5 ends of which linkers areattached.This is followed by the cleavage of the first 20base pairs by class II restriction enzymes,PCR, concatamerization, and cloning of the

    CAGE tags

  • 8/13/2019 31931-31941

    47/64

    AAAAA

    AAAAABiotin

    Biotin +

    Mmel

    xBiotin

    + Xma JI

    Biotin

    Biotin Mmel-PCR

    BiotinUni-PCR

    XmaJI tag1 tag2 XmaJIConcatenationCloningSequencing

    PCR amplification

    Ligation to second linker

    MmeI digestion of dsDN

    ssDNA captureSecond strand synthesi

    Full strand DNA synthesisssDNA release

    Reverse transcription

  • 8/13/2019 31931-31941

    48/64

    Micro SAGE

    Requires 500-5000 fold less starting input RNA.Simplifies by the incorporation of a one tube procedurefor all steps.Characterization of expression profiles in tissue biopsies,tumor metastases or in cases where tissue is scarce.Generation of region-specific expression profiles ofcomplex heterogeneous tissues.Limited number of additional PCR cycles are performed togenerate sufficient ditag.

  • 8/13/2019 31931-31941

    49/64

    An expression profile can be obtained from aslittle as 1-5 ng of mRNA.

    Comparison between the two SAGE MicroSAGE

    Amount of inputmaterial

    2.5-5 ug RNA 1-5 ng of mRNA

    Capture ofcDNA

    Streptavidin coatedmagnetic beads

    Streptavidin coated PCRtube

    Multiple tube vs.Single tube

    reaction

    Subsequent reactions inmultiple tubes

    Multiple PCI extractionand ethanol precipitation

    steps

    Single tube reactionEasy change of buffers No PCI extraction or

    ethanol ppt step.Fewer manipulations

    PCR 25-28 cycles 28 cycles followed by re-PCR on excised ditag (8-

    15)

  • 8/13/2019 31931-31941

    50/64

    SuperSAGE

    Increases the specificity of SAGE tags anduse of tags as microarray probes.

    Type III RE EcoP15I tag releasingCollects 26 bp tagsHas been used in plant SAGE studies.Study of gene expression in which sequenceinformation is not available.

  • 8/13/2019 31931-31941

    51/64

    Flowchart of superSAGE

  • 8/13/2019 31931-31941

    52/64

    Gene Identification Signature

    (GIS)Identifies gene boundaries.Collects 20bp LongSAGE tags from 3 and5 end of the transcript. Applied to human and mouse transcriptionstudies.

  • 8/13/2019 31931-31941

    53/64

    DIGITAL KARYOTYPING

    Analyses gene structure.Identification amplification and deletion in severalcancers.

    PAIRED END DITAG

    Identifies protein binding sites in genome.

    Applied to identify p-53 binding sites in thehuman genome.

  • 8/13/2019 31931-31941

    54/64

  • 8/13/2019 31931-31941

    55/64

    1. SAGE: A LOOKING GLASS

    FOR CANCERDeciphering pathways involved in tumor genesis and identifying noveldiagnostic tools, prognostic markers, and potential therapeutic targets.

    SAGE is one of the techniques

    used in the National Cancer Institute funded Cancer Genome Anatomy Project (CGAP).

    A database with archived SAGE tag counts and on-line query tools wascreated - the largest source of public SAGE data.

    More than 3 million tags from 88 different libraries have beendeposited on the National Center for Biotechnology Education/CGAPSAGEmap web site (http://www.ncbi.nlm.nih.gov/SAGE/).

  • 8/13/2019 31931-31941

    56/64

    Several interesting patterns have emerged.cancerous and normal cells derived from the same tissue type are verysimilar.

    tumors of the same tissue of origin but of different histological type orgrade have distinct gene expression patterns

    cancer cells usually increase the expression of genes associated with proliferation and survival and decrease the expression of genes involved in differentiation.

    SAGE studies have been performed in patients with colon, pancreatic,lung, bladder, ovarian, and breast cancers.

    SAGE experiments validated in multiple tumor and normal tissue pairsusing a variety of approaches, including Northern blot analysis, real-

    time PCR, mRNA in situ hybridization, and

    immunohistochemistry.

    Identification of an ideal tumor marker. E.g. Matrix metalloprotease1in ovarian cancer is overexpressed.

  • 8/13/2019 31931-31941

    57/64

  • 8/13/2019 31931-31941

    58/64

    p53- TUMOR SUPRESSOR GENE

    p53 is thought to play a role in the regulation of cell cycle checkpoints,apoptosis, genomic stability, and angiogenesis.

    Sequence-specific transactivation is essential for p53 -mediated tumorsuppression.

    The analysis of transcriptomes after p53 expression has determinedthat p53 exerts its diverse cellular functions by influencing theexpression of a large group of genes.

    Identification of Previously Unidentified p53-Regulated Genes bySAGE analysis.

    Variability exists with regard to the extent, timing, and p53 dependenceof the expression of these genes.

  • 8/13/2019 31931-31941

    59/64

    2. IMMUNOLOGICAL STUDIES

    Only a few SAGE analysis has been applied for the study ofimmunological phenomena.

    SAGE analyses were conducted for human monocytes and theirdifferentiated descendants, macrophages and dendritic cells.

    DC cDNA library represented more than 17,000 different genes. Genesdifferentially expressed were those encoding proteins related to cellmotility and structure.

    SAGE has been applied to B cell lymphomas to analyze genesinvolved in BCR mediated apoptosis.- polyamine regulation isinvolved in apoptosis during B cell clonal deletion.

  • 8/13/2019 31931-31941

    60/64

    Contd LongSAGE has been used to identify genes of T cells with SLE thatdetermine commitment to the disease.

    Findings indicate that the immatureCD4+ T lymphocytes may be

    responsible for the pathogenesis of SLE.

    SAGE has been used to analyze the expression profiles of T h-1 and T h-2 cells, and newly identified numerous genes for which expression isselective in either population.

    Contributes to understanding of the molecular basis of T h1/T h2dominated diseases and diagnosis of these diseases.

  • 8/13/2019 31931-31941

    61/64

    3. YEAST TRANSCRIPTOME

    Yeast is widely used to clarify the biochemical physiologic parameters underlying eukaryotic cellular functions.

    Yeast chosen as a model organism to evaluate the powerof SAGE technology.

    Most extensive SAGE profile was made for yeast.

    Analysis of yeast transcriptome affords a unique view ofthe RNA components defining cellular life.

  • 8/13/2019 31931-31941

    62/64

    4.ANALYSIS OF TISSUETRANSCRIPTOMES

    Used to analyze the transcriptomes of renal, cervicaltissues etc.

    Establishing a baseline of gene expression in normal tissueis key for identifying changes in cancer.

    Specific gene expression profiles were obtained, andknown markers (e.g., uromodulinin the thick ascendinglimb of Henle's loop and aquaporin-2 inthe collecting duct)were found.

  • 8/13/2019 31931-31941

    63/64

    REFERENCESMaillard, Jean-Charles, et al., Efficiency and limits of the Serial Analysis ofGene Expression., Veterinary Immunol. and Immunopathol. 2005., 108:59-69.Man, M.Z. et al., POWER-SAGE: comparing statistical tests for SAGEexperiments., Bioinformatics 2000., 16: 953-959.Polyak, K. and Riggins, G.J., Gene discovery using the serial analysis of geneexpression technique: Implications for cancer research., J. of Clin. Oncol.2001., 19(11):2948-2958.Tuteja and Tuteja., Serial Analysis of Gene Expression: Applications inHuman Studies., J. of Biomed. And Biotechnol. 2004., 2: 113-120.Tuteja and Tuteja., Serial analysis of gene expression: application in cancerresearch., Med. Sci. Monit. 2004., 10(6): 132-140.Velculescu, V.E. et al. Serial analysis of gene expression., Science 1995.,

    270:484-487.Wing, San Ming., Understanding SAGE data., Trends in Genetics 2006., 23:1-12.Yamamoto, M., et al., Use of serial analysis of gene expression (SAGE)technology., J. of Immunol. meth.2001., 250:45-66.

  • 8/13/2019 31931-31941

    64/64