Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
研究領域『科学的発見・社会的課題解決に向けた各分野のビッグデータ利活用推進のための次世代アプリケーション技術の創出・高度化』
医薬品創薬から製造までのビッグデータからの知識創出基盤の確立医薬品創薬から製造までの過程には蓄積された膨大な測定データ等が存在します。これまで異分野として個別にとらえられていた創薬の現場と製造の現場における知見および各種データを共有する仕組みを構築するとともに、創薬・製造を俯瞰的に見た医薬品開発のシステム全体の効率化および最適化を目指した研究を進めます。具体的にはこれらのデータを活用することで、大量のタンパク質 対 化合物情報からの創薬指針の抽出、大規模仮想ライブラリ創出およびそこからの新薬ターゲット発見とその合成・製造法の獲得、製造プラントの安定運転・リスク事前管理・品質安定化のための知識抽出を達成し、医薬品創薬から製造の段階を通した知識創出基盤を確立することを目標とします。
研究代表者 東京大学・工学系研究科・教授 船津 公人
Problem of Big Data analysis in chemistry and systematic application of knowledge
derived from Big Data to real world
“Development of a knowledge-generating platform driven by big data in drug
discovery through production processes”
Project Leader Prof. Kimito Funatsu(The University of Tokyo)
While massive amounts of quantitative data have accumulated across
the pipeline of a drug candidate's initial discovery up through its
production process, knowledge of and data analysis for each of the
discovery and the production processes has remained isolated.
In this project, we aim to establish a platform which allows us to unify
relevant knowledge about the different processes and their associated
data, and to advance research into improved and optimized systems that
view pharmaceutical development from a comprehensive, correlated, and
high-level perspective.
Background
2
Time-consuming, huge amount of cost
Period: 15-20 years
Success probability: 1/50000
Key point: Screening of lead compounds
and the optimization
Basic
Research
Screening of lead
compounds
and the optimization
Non-clinical test Clinical test ApprovalCommercial
production
General flow of drug development
Pilot-scaleproduction
Determination of target proteins responsible for a disease
Screening of seedsClinical trials
anddrug application
Proteins in the body100,000+
Lead optimization
Drug Discovery: An Exemplar of Big Data
Pre-clinicaltrials
Chemical library(1000s of seed chemicals)
Binding data10,000,000 pairs
Patient data22,000 entries
Side effect data5 million entries
Bioactivity data1.2 billion entries
Gene expression data1 million entries
Chemical space≥ 1060 molecules
Human genome3 billion base pairs
Background
4
Time-consuming, huge amount of cost
Period: 15-20 years
Success probability: 1/50000
Low quantity and quality of real compound library and virtual library
Low number of structures
Synthetic routes are unknown in virtual library.
Key point: Screening of lead compounds
and the optimization
Basic
Research
Screening of lead
compounds
and the optimization
Non-clinical test Clinical test ApprovalCommercial
production
General flow of drug development
No activity data for new target protein
Impossible to construct activity prediction model
Difficult to search lead compound
Pilot-scaleproduction
5
Strict check of product quality
Irregular products are scraped as waste → Serious damage
Using temperature, pressure, near infrared spectra (NIR)(X), quality
of product(y) can be predicted and monitored on-line by Soft sensor
(statistical model:y=f(X)).
But, application of soft sensor to real plants has not been realized on-
line yet because of low predictive accuracy and complex maintenance
of soft sensor.
Efficient and stable production is required, keeping high quality of chemicals
Background
Basic
Research
Screening of lead
compounds
and the optimization
Non-clinical test Clinical test ApprovalCommercial
production
General flow of drug development
Pilot-scaleproduction
In operating chemical plants, operators have to monitor
operating condition of the plants and control process variables.
But, all of them are not easy to measure online.
Process variables need to be measured online.
temperature, pressure, concentration of products, etc.
Soft sensor
concentration, ...temperature, pressure, ...
input output
measure online
X: temperature, pressure, ... y: concentration, ...
Database
BIG DATA
technical difficulties large measurement delays
estimate online
Model : y=f(X)
Soft sensor
Easy to measure Difficult to measure
Soft Sensor
Soft sensor model
time / min
Soft sensor model calculates
values of ○ with T1, T2, and P.
Temperature
1
Pressure
Temperature
2
Concentration
#Observed value. #Reduction of cost for chemical analyses. #Reported with time delay.
Objective
8
Big data
①Complicated interaction data between many proteins and many drug candidates
with other biological information.
② Large virtual library containing chemical structures to be drug candidates.
③ Plant operating data and product quality data in pharmaceutical and chemical
processes.
By utilizing the above “Big Data”, the following subjects should be realized.
Ⅰ. Construction of mathematical model derived from many proteins vs. compounds
together with other biological information, and extraction of guide for drug discovery.
Ⅱ. Automated generation of large virtual library(several billion chemical structures),
discovery of new drug, and acquisition of the synthetic routes from the library.
Ⅲ. Knowledge extraction for process monitoring and controlling contributing to stable
operation and stabilizing product quality. Development of automated construction of
soft sensor model and the model maintenance system for process monitoring.
Basic research
Screening of lead compounds
and the optimization
Non-clinical test
Clinical test
Approval
Commercial production
Flow of development
of Pharmaceuticals
Integration of chemical and Biological information
and fast processing
Analysis and prediction on interaction between chemical and biological
information
Extraction of patterns for directions in lead molecule
development based upon big data on chemical-bio
interaction化学構造・物性
結合タンパク質
遺伝子発現
パスウェイ変動
細胞活性
臨床情報
Input Output
Measured on-line
Database
Predicted on-line
Temperature, Pressure
Input Variables(X)
Concentration, Density
Output Variables(y)
Predictive ModelSoft Sensor
y = f(X)Output Variables
Easy to measure Difficult to measure
Input Variables
Process monitoringQuality control
Overview of this project
Chemical structures
Proteins
Genes
Pathways
Cellar activities
Clinical information
9
Okuno-G
Taiji-G & Hori-G
Funatsu-G
Pilot-Scale production
Prof. Gisbert Schneider
Framework of research project
Development of operational database of chemical plants for soft sensor Development of soft sensor construction system for plant monitoring and soft sensor maintenance method Knowledge extraction for process and quality controlling from Big Data from chemical plants
● Expansion of Huge virtual library Retrieval of useful information Addition of Synthetic routes and physical properties Visualization of contents
Integration of chemical and biological information and canonicalization of data structure Development of mathematical model for interaction between ligand and proteinExtraction of direction in lead molecule development Sharing Synthetic
routes and physical
properties of structures
Sharing interaction
information between
target protein and
candidates
TAIJI group OKUNO group
FUNATSU group
Drug candidates
Restriction of
chemical processEvaluation of feasibility of
candidate reactions is needed.
Only 3 % of reactions developed
in lab. are implemented in real
chemical plants due to reactor
size, mixing rate, dynamics,
controllability, byproducts, cost
….. 10
Rapid Evaluation of Feasibility of Synthetic Routes
HORI group
Framework of research project
Development of operational database of chemical plants for soft sensor Development of soft sensor construction system for plant monitoring and soft sensor maintenance method Knowledge extraction for process and quality controlling from Big Data from chemical plants
● Expansion of Huge virtual library Retrieval of useful information Addition of Synthetic routes and physical properties Visualization of contents
Integration of chemical and biological information and canonicalization of data structure Development of mathematical model for interaction between ligand and proteinExtraction of direction in lead molecule development Sharing Synthetic
routes and physical
properties of structures
Sharing interaction
information between
target protein and
candidates
TAIJI group OKUNO group
FUNATSU group
Drug candidates
Restriction of
chemical processEvaluation of feasibility of
candidate reactions is needed.
Only 3 % of reactions developed
in lab. are implemented in real
chemical plants due to reactor
size, mixing rate, dynamics,
controllability, byproducts, cost
….. 11
Rapid Evaluation of Feasibility of Synthetic Routes
HORI group
This research will lead to a new, collective platform
for systematic and efficient development of
pharmaceuticals from discovery through production