Upload
hope-hines
View
216
Download
0
Embed Size (px)
Citation preview
2
Short History Of Compugen
• 1993: Founded• 1994: First Bioccelerator sold (Merck)• 1997: LEADS project initiated• 1998: Pfizer collaboration• 1999: USPTO agreement; LabOnWeb
launched• 2000: Launch of Z3; IPO• 2001: Gencarta and OligoLibraries
launched; Novartis collaboration
3
Unique R&D Team
• Substantial– 120 professionals – 32 PhD/MD, 37 M.Sc.
• Multidisciplinary– Algorithm development, Molecular biology,
Software engineering, Statistics, Physics, Chemistry
• Integrated– Synergy between disciplines and feedback
4
Gene analysis using mathematics
• Drug discovery and Bioinformatics• Principles of sequence alignment• The EST opportunity and the
Transcriptome• Applications (Gencarta and DNA chips)
7
Some definitions
• ‘Drug’ – protein, lipid, antibody, or small organic molecule which has proven effect and approved safety level.
• ‘Lead’ – A molecule in development which may one day become a drug
• ‘Target’ – A protein (in most cases) which activity a drug lead would affect, in order to create a desirable effect on the body.
• ‘Validated target’ – A target which has a proven, demonstrated effect on a disease or condition.
8
30,000 GENES?
• Fewer genes than initially thought?• Some complexity due to alternative
splicing• Gene prediction is problematic• Complex genes (interleaved, nested,...)
are especially difficult to identify• Both HGP and Celera tried to minimize
false positives• Conclusion: more genes may be found
Wright et al., Genome Biology 2001 2(7):
There are 65,000 – 75,000 genes
9
ONE GENE ONE PROTEIN???
Old Dogma
Gene
mRNA
Protein
Gene
mRNA
Protein
Current understanding
mRNAmRNA
Protein
Protein
Edited mRNA
Modified protein
Protein
10
Gene identification using sequence comparison
2311-2-2-5T 5
3012-1-1-4G 4
01-1-100-3T 3
-3-2-1001-2A 2
-3-2-101-1-1C 1
-6-5-4-3-2-100
G
6
T
5
C
4
G
3
C
2
A
10
11
Similar sequences, common ancestor...
... common ancestor, similar function
Understand genes = know your targets
13
Proteins ‘see’ deeper
Unrelated DNA sequences?
Highly related proteins!
TTACTCCGTCATGATGGGGUG
CTGATAAGGAAAGAAGGCTAT
LeuLeuArgHisAspGlyVal
LeuIleArgLysGluGlyTyr
14
How to align proteins?
MARQGEFPSILKM-RHGEFP-LLKWC
‘Good’
‘Bad’
A good algorithm, vs. 2001 databases, requires super-computers
15
Another direction: find genes by sequence
ACGATCGAGCATGCATCATCAGCATCTAGCGATCAGCAGGCATCGAGCAGCTAGCATGCATG
TGCTAGCACGTACGTAGTAGTCGTAGATCGCTAGTCGTCCGTAGCTCGTCGATCGTACGTCAC
- Gene regions have different nucleotide composition than non-coding regions.- Intron and exons are distinct in sequences- Splice junctions are clearly detectable
16
Genomic DNA
One step ahead: the story of the ESTs
mRNA
cDNA
exon 1 exon 2 exon 3
EST
EST
cDNA clone
Public domain ESTs (Expressed Sequence Tags): > 5,000,000
Craig Venter
17
The ESTs: Rough Diamonds?
• Short, inaccurate, badly annotated• Abundant with repeats, alternative
splicing• Too many…• The shredder effect
18
Input: GenBank- a pool of ESTs and mRNAs
Process 1-clustering
Process 2- Assembly
Output: The transcriptome
USING ESTS TO GET THE TRANSCRIPTOME
Cluster 1 Cluster 2 Cluster 3 Cluster 4
19
The Transcriptome - Definition
“The mRNA collection content, present at any given moment in a cell or a tissue,
and its behavior over time and cell states”
20
Introducing the Transcriptome
• The Genome: – Index to the range of possible proteins– Useful as map and for inter-organisms analysis
• The Proteome:– Describes what actually happens in the cell– Complex tools, partial results
• The Transcriptome:– “Golden path”: Proteome information in DNA
technology.
21
Transcriptome applications
• Discovery of new proteins– Which are present in specific tissues– Which have specific cell locations– Which respond to specific cell states
• Discovery of new variants– Of important genes– Which work to increase/decrease the
activity of the ‘native’ protein.
22
Example: Alternative SplicingOne Gene - Multiple mRNAs
64 521
Various Mature mRNA Transcripts
63 521
643 521
643 521 Pre mRNA
AlternativeSplicing
3
4
(tissue A)
(tissue B)
(Other tissues)
24
Extreme example of alternative splicing
Mature PSA
PSA precursor
PSA RNA
Genomic
Modified mRNA
LM precursor
Mature LM protein
Stop codon
Stop codon
Signal peptide
Signal peptide
Alternative splicing
Though coded by the same gene, mature proteins PSA and LM have not one residue in common!
25
PSA genomic
exon1 exon 2 exon 3 exon 4exon1 exon 2exon 3 exon 4 exon 5exon 5
KLK-2 genomic
LM KLM*Stop codon
Is This The Only Example?
**
**
26
Validation: Northern Blot
• Like PSA, LM expression is restricted to prostate tissue• Multiple bands may reflect conserved regions or
alternative splicing
29
LEADS Antisense Prediction
• When analyzing EST data for Antisense:
– Use original EST orientation annotation
– Check splicing signals on both strands
– Examine library description for enzymes used
– Mark PolyA signals and PolyA tails (compare to genomic PolyA)
– Take into account NotI sites
32
Using Compugen’s Transcriptome Technology
• Large-scale collaborations: Pfizer, Novartis• Co-development of molecules: TNF,
Chemokine receptors, kinases, GPCRs• Academia research: UCSF, NYU, TAU.• Database products• DNA chip design • Mass-spec analysis• Gene Ontology
34
How many ‘genes’ are there really?
• Raw data: – 3,770,969 human sequences– 2,061,357 mouse sequences– 297,568 rat sequences
• Non-singleton ‘clusters’: 120,372 H, 63,043 M, 33,396 R
• % with splice variants: 26% (H), 32% (M), 23% (R)• Homology (to SwissProt+Trembl, InterPro, other GC proteins):
20% (H+M), 27% (R).• Total unique proteins: 236,797 (H), 106,119 (M), 32,352
(R)
35
The Novartis Agreement
• Signed August 2001• Novartis non-exclusively licensed the LEADS
platform and related software, and plans to use it for:– In-silico drug target identification and prioritization– Genome wide chip design
• Agreement was signed after a detailed pilot study run in November 2000– Discovered novel genes and splice variants using
Incyte and Celera data
• Genes were subsequently verified in Novartis laboratory.
36
GENCARTA
• Result of LEADS applied to:– Public genome information– Published mRNA– ESTs
• In-house designed interface, Oracle-based infrastructure.
• Installed: Kyowa-Hakko, Avalon Pharma, Weizmann Institute, YU
• Version 2.2 out in October 2001.
37
Let’s go for the real thing…
• Gencarta Demonstration• OligoLibrary Demonstration
38
Conclusion: Advantages of the Transcriptome
• Identify new drug targets• Understand splice variant behavior• Isolate “natural” drugs• Annotate Proteomics experiments• Design better DNA chips
Solve the real bottlenecks in drug discovery and developmentSolve the real bottlenecks in drug discovery and development