34
Suppor&ng Scien&fic Sensemaking Anita de Waard VP Research Data Collabora&ons, Elsevier [email protected] Visit Microso* Research, January 23, 2013

Scientific Sensemaking

Embed Size (px)

DESCRIPTION

Talk at Microsoft Research, Bellevue, WA, January 24th 2013; overview of past 5 years of my research.

Citation preview

Page 1: Scientific Sensemaking

Suppor&ng  Scien&fic  Sensemaking  

Anita  de  Waard  VP  Research  Data  Collabora&ons,  Elsevier  

[email protected]  

 

Visit  Microso*  Research,  January  23,  2013  

Page 2: Scientific Sensemaking

Outline    

•  A  model  of  scien&fic  sensemaking:    –  Stories,  that  persuade  with  data  –  Discourse  segments  and  verb  tense  

•  Towards  extrac&ng  claim-­‐evidence  networks:  –  Hedging  in  science  –  Crea&ng  claim-­‐evidence  networks  

•  Data:    – Why  life  is  so  complicated  –  Connec&ng  biological  experiments  into  collaboratories  

Page 3: Scientific Sensemaking

Story Grammar The Story of Goldilocks and the Three Bears

Setting Time Once upon a time

Character a little girl named Goldilocks

Location She went for a walk in the forest. Pretty soon, she came upon a house.

Theme Goal She knocked and, when no one answered,

Attempt she walked right in.

Episode Name At the table in the kitchen, there were three bowls of porridge.

Subgoal Goldilocks was hungry.

Attempt She tasted the porridge from the first bowl.

Outcome This porridge is too hot! she exclaimed.

Attempt So, she tasted the porridge from the second bowl.

Outcome This porridge is too cold, she said

Attempt So, she tasted the last bowl of porridge.

Outcome Ahhh, this porridge is just right, she said happily and

Outcome she ate it all up.

Paper Grammar

The AXH Domain of Ataxin-1 Mediates Neurodegeneration through Its Interaction with Gfi-1/Senseless Proteins

Background The mechanisms mediating SCA1 pathogenesis are still not fully understood, but some general principles have emerged.

Objects of study

the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract,

Experimental setup

studied and compared in vivo effects and interactions to those of the human protein

Research���goal

Gain insight into how Atx-1's function contributes to SCA1 pathogenesis. How these interactions might contribute to the disease process and how they might cause toxicity in only a subset of neurons in SCA1 is not fully understood.

Hypothesis Atx-1 may play a role in the regulation of gene expression

Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When Overexpressed in Files

Subgoal test the function of the AXH domain

Method overexpressed dAtx-1 in flies using the GAL4/UAS system (Brand and Perrimon, 1993) and compared its effects to those of hAtx-1.

Results Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, which drives expression in the differentiated R1-R6 photoreceptor cells (Mollereau et al., 2000 and O'Tousa et al., 1985), results in neurodegeneration in the eye, as does overexpression of hAtx-1[82Q]. Although at 2 days after eclosion, overexpression of either Atx-1 does not show obvious morphological changes in the photoreceptor cells

Data (data not shown),

Results both genotypes show many large holes and loss of cell integrity at 28 days

Data (Figures 1B-1D).

Results Overexpression of dAtx-1 using the GMR-GAL4 driver also induces eye abnormalities. The external structures of the eyes that overexpress dAtx-1 show disorganized ommatidia and loss of interommatidial bristles

Data (Figure 1F),

A  paper  is  a  story…  

Page 4: Scientific Sensemaking

Aristotle   Quin-lian   Scien-fic  Paper  

prooimion   Introduc&on/  exordium  

The  introduc&on  of  a  speech,  where  one  announces  the  subject  and  purpose  of  the  discourse,  and  where  one  usually  employs  the  persuasive  appeal  to  ethos  in  order  to  establish  credibility  with  the  audience.    

Introduc&on:  posi&oning  

prothesis  Statement  of  

Facts/narra<o  

The  speaker  here  provides  a  narra&ve  account  of  what  has  happened  and  generally  explains  the  nature  of  the  case.    

Introduc&on:  research  ques&on  

    Summary/  propos<<o  

The  proposi&o  provides  a  brief  summary  of  what  one  is  about  to  speak  on,  or  concisely  puts  forth  the  charges  or  accusa&on.     Summary  of  contents  

pis&s   Proof/  confirma<o  

The  main  body  of  the  speech  where  one  offers  logical  arguments  as  proof.  The  appeal  to  logos  is  emphasized  here.   Results  

    Refuta&on/  refuta<o  

As  the  name  connotes,  this  sec&on  of  a  speech  was  devoted  to  answering  the  counterarguments  of  one's  opponent.   Related  Work  

epilogos   perora<o    Following  the  refuta&o  and  concluding  the  classical  ora&on,  the  perora&o  conven&onally  employed  appeals  through  pathos,  and  oUen  included  a  summing  up.  

Discussion:  summary,  implica&ons.  

…that  persuades…  

Goal  of  the  paper  is  to  be  published;  it  uses  author/journal  as  a  host  Format  has  co-­‐evolved:  predator-­‐prey  rela&onship  with  reviewers  

Page 5: Scientific Sensemaking

5  

...  with  data.  

Page 6: Scientific Sensemaking

In  defense  of  the  clause    as  the  unit  of  thought:  

1.  Importantly,  our  results  so  far  indicate  that  the  expression  of  miR-­‐372&3  did  not  reduce  the  ac&vity  of  RASV12,  as  these  cells  were  s&ll  growing  faster  than  normal  cells  and  were  tumorigenic,  for  which  RAS  ac&vity  is  indispensable  (Hahn  et  al,  1999  and  Kolfschoten  et  al,  2005).    

2.  To  shed  more  light  on  this  aspect,  we  examined  the  effect  of  miR-­‐372&3  expression  on  p53  ac&va&on  in  response  to  oncogenic  s&mula&on.    

3.  We  used  for  this  experiment  BJ/ET  cells  containing  p14ARFkd  because,  following  RASV12  treatment,  in  those  cells  p53  is  s&ll  ac&vated  but  more  clearly  stabilized  than  in  parental  BJ/ET  cells    (Voorhoeve  and  Agami,  2003),  resul&ng  in  a  sensi&zed  system  for  slight  altera&ons  in  p53  in  response  to  RASV12.    

4.  Figure  4A  shows  that  following  RASV12  s&mula&on,  p53  was  stabilized  and  ac&vated,  and  its  target  gene,  p21cip1,  was  induced  in  all  cases,  indica&ng  an  intact  p53  pathway  in  these  cells.      

•  More  than  one  ‘thought  unit’  per  sentence.  •  Verb  tense  changes  within  sentence  (several  &mes).  •  Airibu&on,  ac&ons/states,  and  preposi&ons  all  contained  within  a  sentence.    

Page 7: Scientific Sensemaking

1.  Importantly,  our  results  so  far  indicate  that  the  expression  of  miR-­‐372&3  did  not  reduce  the  ac&vity  of  RASV12,  as  these  cells  were  s&ll  growing  faster  than  normal  cells  and  were  tumorigenic,  for  which  RAS  ac&vity  is  indispensable  (Hahn  et  al,  1999  and  Kolfschoten  et  al,  2005).    

2.  To  shed  more  light  on  this  aspect,  we  examined  the  effect  of  miR-­‐372&3  expression  on  p53  ac&va&on  in  response  to  oncogenic  s&mula&on.    

3.  We  used  for  this  experiment  BJ/ET  cells  containing  p14ARFkd  because,  following  RASV12  treatment,  in  those  cells  p53  is  s&ll  ac&vated  but  more  clearly  stabilized  than  in  parental  BJ/ET  cells    (Voorhoeve  and  Agami,  2003),  resul&ng  in  a  sensi&zed  system  for  slight  altera&ons  in  p53  in  response  to  RASV12.    

4.  Figure  4A  shows  that  following  RASV12  s&mula&on,  p53  was  stabilized  and  ac&vated,  and  its  target  gene,  p21cip1,  was  induced  in  all  cases,  indica&ng  an  intact  p53  pathway  in  these  cells.      

Head:  premise,  mo&va&on,  airibu&on  (matrix  clause)  

Middle:  main  biological  statement  

End:  interpreta&on,  elabora&on,  airibu&on  (reference)  

In  defense  of  the  clause    as  the  unit  of  thought:  

Page 8: Scientific Sensemaking

1.  Importantly,  our  results  so  far  indicate  that  the  expression  of  miR-­‐372&3  did  not  reduce  the  ac&vity  of  RASV12,  as  these  cells  were  s&ll  growing  faster  than  normal  cells  and  were  tumorigenic,  for  which  RAS  ac&vity  is  indispensable  (Hahn  et  al,  1999  and  Kolfschoten  et  al,  2005).    

2.  To  shed  more  light  on  this  aspect,  we  examined  the  effect  of  miR-­‐372&3  expression  on  p53  ac&va&on  in  response  to  oncogenic  s&mula&on.    

3.  We  used  for  this  experiment  BJ/ET  cells  containing  p14ARFkd  because,  following  RASV12  treatment,  in  those  cells  p53  is  s&ll  ac&vated  but  more  clearly  stabilized  than  in  parental  BJ/ET  cells    (Voorhoeve  and  Agami,  2003),  resul&ng  in  a  sensi&zed  system  for  slight  altera&ons  in  p53  in  response  to  RASV12.    

4.  Figure  4A  shows  that  following  RASV12  s&mula&on,  p53  was  stabilized  and  ac&vated,  and  its  target  gene,  p21cip1,  was  induced  in  all  cases,  indica&ng  an  intact  p53  pathway  in  these  cells.      

Regulatory  clause  

Fact   Goal   Method   Result   Implica&on  

In  defense  of  the  clause    as  the  unit  of  thought:  

Page 9: Scientific Sensemaking

Both seminomas and the EC component of nonseminomas share features with ES cells. To exclude that the detection of miR-371-3 merely reflects its expression pattern in ES cells, we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004). In many of the m i R - 3 7 1 - 3 e x p r e s s i n g s e m i n o m a s a n d nonseminomas, miR-302a-d was undetectable (Figs S7 and S8), suggesting that miR-371-3 expression is a selective event during tumorigenesis.

Both seminomas and the EC component of nonseminomas share features with ES cells. To exclude that the detection of miR-371-3 merely reflects its expression pattern in ES cells, we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004). In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8), suggesting that miR-371-3 expression is a selective event during tumorigenesis.

Fact  

Hypothesis  

Method  

Result  

Implica&on  

Goal  

Reg-­‐Implica&on  

Conceptual  knowledge  

Experimental  Evidence  

Clause,  realm  and  tense:  

Page 10: Scientific Sensemaking

Concepts,  models,  ‘facts’:  Present  tense  

Experiment:  Past  tense  

Transi&ons:  present  tense  

(1) Both seminomas and the EC component of nonseminomas share features with ES cells.

(2) b. the detection of miR-371-3 merely reflects its expression pattern in ES cells,

Fact   Problem  

(3) c. miR-371-3 expression is a selective event during tumorigenesis.

Implica&on  

(2) a. To exclude that

Goal  

(3) b. suggesting that

Regulatory-­‐Implica&on  

(2) c. we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).

(3) a. In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),

Method   Result  

Clause,  realm  and  tense:  

Page 11: Scientific Sensemaking

Facts  in  the  eternal  present  

Endogenous  small  RNAs  (miRNAs)  regulate  gene  expression  by  mechanisms  conserved  across  metazoans.  

Events  in  the  simple  past  

Vehicle-­‐treated  animals  spent  equivalent  &me  inves&ga&ng  a  juvenile  in  the  first  and  second  sessions  in  experiments  conducted  in  the  NAC  and  the  striatum:    T1  values  were  122  ±  6  s  and  114  ±  5  s.  

Events  with  embedded  facts  

We  also  generated  BJ/ET  cells  expressing  the  RASV12-­‐ERTAM  chimera  gene,  which  is  only  ac&ve  when  tamoxifen  is  added  (De  Vita  et  al,  2005).  

A>ribu-on  in  the  present  perfect  

miRNAs  have  emerged  as  important  regulators  of  development  and  control  processes  such  as  cell  fate  determina&on  and  cell  death  (Abrahante  et  al.,  2003,  Brennecke  et  al.,  2003,  Chang  et  al.,  2004,  Chen  et  al.,  2004,  Johnston  and  Hobert,  2003,  Lee  et  al.,  1993]  

Implica-ons  are  hedged,  and  in  the  present  tense  

These  results  indicate  that  although  miR-­‐372&3  confer  complete  protec&on  to  oncogene-­‐induced  senescence  in  a  manner  similar  to  p53  inac&va&on,  the  cellular  response  to  DNA  damage  remains  intact  

Tense  use  in  science  and  mythology:  I  sing  of  golden-­‐throned  Hera  whom  Rhea  bare.  Queen  of  the  immortals  is  she,  surpassing  all  in  beauty:  she  is  the  sister  and  the  wife  of  loud-­‐thundering  Zeus,  -­‐-­‐the  glorious  one  whom  all  the  blessed  throughout  high  Olympus  reverence  and  honor.  

Now  the  wooers  turned  to  the  dance  and  to  gladsome  song,  and  made  them  merry,  and  waited  &ll  evening  should  come;  and  as  they  made  merry  dark  evening  came  upon  them.  

And  she  took  her  mighty  spear,  &pped  with  sharp  bronze,  heavy  and  huge  and  strong,  wherewith  she  vanquishes  the  ranks  of  men-­‐of  warriors,  with  whom  she  is  wroth,  she,  the  daughter  of  the  mighty  sire.  

In  this  book  I  have  had  old  stories  wriien  down,  as  I  have  heard  them  told  by  intelligent  people,  concerning  chiefs  who  have  held  dominion  in  the  northern  countries,  and  who  spoke  the  Danish  tongue;  and  also  concerning  some  of  their  family  branches,  according  to  what  has  been  told  me.  

Now  it  is  said  that  ever  since  then  whenever  the  camel  sees  a  place  where  ashes  have  been  scaiered,  he  wants  to  get  revenge  with  his  enemy  the  rat  and  stomps  and  rolls  in  the  ashes  hoping  to  get  the  rat  

Page 12: Scientific Sensemaking

From  fic&on  to  fact:  Hedging  

•  Voorhoeve  et  al.,  2006:  “These  miRNAs  neutralize  p53-­‐  mediated  CDK  inhibi&on,  possibly  through  direct  inhibi&on  of  the  expression  of  the  tumor  suppressor  LATS2.”  

•  Kloosterman  and  Plasterk,  2006:  “In  a  gene&c  screen,  miR-­‐372  and  miR-­‐373  were  found  to  allow  prolifera&on  of  primary  human  cells  that  express  oncogenic  RAS  and  ac&ve  p53,  possibly  by  inhibi&ng  the  tumor  suppressor  LATS2  (Voorhoeve  et  al.,  2006).”  

•  Yabuta  et  al.,  2007:    “[On  the  other  hand,]  two  miRNAs,  miRNA-­‐372  and-­‐373,  func&on  as  poten-al  novel  oncogenes  in  tes&cular  germ  cell  tumors  by  inhibi&on  of  LATS2  expression,  which  suggests  that  Lats2  is  an  important  tumor  suppressor  (Voorhoeve  et  al.,  2006).”    

•  Okada  et  al.,  2011:  “Two  oncogenic  miRNAs,  miR-­‐372  and  miR-­‐373,  directly  inhibit  the  expression  of  Lats2,  thereby  allowing  tumorigenic  growth  in  the  presence  of  p53  (Voorhoeve  et  al.,  2006).”  

“[Y]ou  can  transform  ..  fic&on  into  fact  just  by  adding  or  subtrac&ng  references”,  Bruno  Latour  [1]

Page 13: Scientific Sensemaking

Hedging  in  science:  •  Why  do  authors  hedge?  

–  Make  a  claim  ‘pending  […]  acceptance  in  the  community’  [2]  –  ‘Create  A  Research  Space’  –  hedging  allows  authors  to  insert  themselves  into  

the  discourse  in  a  community  [3]  –  ‘the  strongest  claim  a  careful  researcher  can  make’  [4]  

•  Hedging  cues,  specula&ve  language,  modality/nega&on:  –  Light  et  al  [5]:  finding  specula&ve  language  –  Wilbur  et  al  [6]:  focus,  polarity,  certainty,  evidence,  and  direc&onality  –  Thompson  et  al  [7]:  level  of  specula&on,  type/source  of  the  evidence  and  

level  of  certainty      

•  Sen&ment  detec&on  (e.g.  Kim  and  Hovy  [8]  a.m.o.):    –  Holder  of  the  opinion,  strength,  polarity  as  ‘mathema&cal  func&on’  ac&ng  on  

main  proposi&onal  content    –  Wide  applica&ons  in  product  reviews;  but  not  (yet)  in  science!  

Page 14: Scientific Sensemaking

A  model  for  epistemic  evalua&ons:  

For  a  Proposi&on  P,  an  epistemically  marked  clause  E  is  an  evalua&on  of  P,    where    EV,  B,  S(P),  with:  

–  V  =  Value:  3  =  Assumed  true,  2  =  Probable,  1  =  Possible,  0  =  Unknown,    (-­‐  1=  possibly  untrue,  -­‐  2  =  probably  untrue,  -­‐3  =  assumed  untrue)  

–  B  =  Basis:  Reasoning  Data    

–  S  =  Source:  A  =  speaker  is  author  A,  explicit  IA  =  speaker  author,  A,  implicit  N  =  other  author  N,  explicit  NN  =  other  author  NN,  implicit     Model  suggested  by  Eduard  Hovy,    

Informa<on  Sciences  Ins<tute  University  South  Califormia  

Page 15: Scientific Sensemaking

Repor&ng  verbs  vs.  epistemic  value:  Value  =  0  (unknown)  

establish,  (remain  to  be)  elucidated,    be  (clear/useful),  (remain  to  be)  examined/determined,  describe,  make  difficult  to  infer,  report  

Value  =  1  (hypothe&cal)  

be  important,  consider,  expect,  hypothesize  (5x),  give  insight,  raise  possibility  that,  suspect,  think  

Value  =  2  (probable)  

appear,  believe,  implicate  (2x),  imply,  indicate  (12x),  play  a  role,  represent,  suggest  (18x),  validate  (2x),    

Value  =  3  (presumed  true)  

be  able/apparent/important  /posi&ve/visible,  compare  (2x),  confirm  (2x),  define,    demonstrate  (15x),  detect  (5x),  discover,  display  (3x),  eliminate,  find  (3x),  iden&fy  (4x),  know,  need,  note  (2x),  observe  (2x),  obtain  (success/results-­‐  3x),  prove  to  be,  refer,  report(2x),    reveal  (3x),  see(2x),  show(24x),    study,  view  

Page 16: Scientific Sensemaking

Most  prevalent  clause  type:    “These  results  suggest  that...”  

Adverb/Connec&ve   thus,  therefore,  together,  recently,  in  summary    

Determiner/Pronoun     it,  this,  these,  we/our  

Adjec&ve   previous,  future,  beYer  

Noun  phrase   data,  report,  study,  result(s);  method  or  reference  

Modal   form  of    ‘to  be’,  may,  remain  

Adjec&ve   o*en,  recently,  generally  

Verb   show,  obtain,  consider,  view,  reveal,  suggest,  hypothesize,  indicate,  believe  

Preposi&on     that,  to  

Page 17: Scientific Sensemaking

Ontology  for  Reasoning,  Certainty  and  Airibu&on  [11]    vocab.deri.ie/orca    

Page 18: Scientific Sensemaking

Adding  metadiscourse  to  triples:  Biological  statement  with  BEL/  epistemic  markup  

BEL  representa-on:   Epistemic  evalua-on  

These  miRNAs  neutralize  p53-­‐mediated  CDK  inhibi<on,  possibly  through  direct  inhibi<on  of  the  expression  of  the  tumor-­‐suppressor  LATS2.    

r(MIR:miR-­‐372)  -­‐|(tscript(p(HUGO:Trp53))  -­‐|  kin(p(PFH:”CDK    Family”)))  Increased  abundance  of  miR-­‐372  decreases  abundance  of  LATS2  r(MIR:miR-­‐372)  -­‐|  r(HUGO:LATS2)  

Value  =  Possible  Source  =  Unknown  Basis  =  Unknown    

Biological  statement  with  Medscan/epistemic  markup  

MedScan  Analysis:   Epistemic  evalua-on  

Furthermore,  we  present  evidence  that  the  secre<on  of  nesfaTn-­‐1  into  the  culture  media  was  drama&cally  increased  during  the  differen&a&on  of  3T3-­‐L1  preadipocytes  into  adipocytes  (P  <  0.001)  and  aUer  treatments  with  TNF-­‐alpha,  IL-­‐6,  insulin,  and  dexamethasone  (P  <  0.01).  

IL-­‐6  è  NUCB2  (nesfa<n-­‐1)  Rela&on:  MolTransport  Effect:  Posi&ve  CellType:  Adipocytes  Cell  Line:  3T3-­‐L1    

Value  =  Probable  Source  =  Author  Basis  =  Data      

Page 19: Scientific Sensemaking

Claim-­‐Evidence  example:  Data2Seman<cs  Goal:  improve  speed  of  integra&on  of  research  >  prac&ce    

A. Philips’ Electronic Patient Records B.  Elsevier-­‐published    Clinical  Guideline  

C. Elsevier (or other publisher’s) Research Report or Data

Step 1: Patient data + diagnosis link to Guideline recommendation

Step 2: Guideline recommendation links to evidence in report or data

Page 20: Scientific Sensemaking

Claim-­‐Evidence  Chains  in    Drug-­‐drug  interac&ons  

20

Step  1:  Manually  iden&fy  DDIs  and  drug  names  in  wide  collec&on  of  content  sources   Step  2:  Develop  a  model  of  Drug-­‐Drug  

Interac&on  and  define  candidates  

Step  3:  Automate  this  process  and  store  as  Linked  Data  

Page 21: Scientific Sensemaking

Claimed  Knowledge  Updates  Defini&on:    1)  A  CKU  expresses  a  proposi&on  about  biological  en&&es    2)  A  CKU  is  a  new  proposi&on  3)  The  authors  present  the  CKU  as  factual:  =>  Strength  =  Certainty  4)  A  CKU  is  derived  from  experimental  work  described  in  the  ar&cle:  =>  Basis  =  Data  5)  The  ownership  is  aiributed    

to  the  author(s)  of  the  ar&cle.    ⇒  Source  =  Author,  Explicit  

Sandor/de  Waard,  [13]  

Page 22: Scientific Sensemaking

A  corpus  for  cita&on  analysis:    

Work  done  with  Lucy  Vanderwende  

Type   Voorhoeve  text   CiTng  text  

Method   We  subsequently  created  a  human  miRNA  expression  library  (miR-­‐Lib)  by  cloning  almost  all  annotated  human  miRNAs  into  our  vector  (Rfam  release  6)  (Figure  S3)    

Voorhoeve  et  al.  (116)  employed  a  novel  strategy  by  combining  an  miRNA  vector  library  and  corresponding  bar  code  array  Using  a  novel  retroviral  miRNA  expression  library,    

Agami  and  co-­‐workers  performed  a  cell-­‐based  screen  

Result   we  iden&fied  miR-­‐372  and  miR-­‐373,  each  permi|ng  prolifera&on  and  tumorigenesis  of  primary  human  cells  that  harbor  both  oncogenic  RAS  and  ac&ve  wild  -­‐  type  p53.    

miR-­‐372  and  miR-­‐373  were  consequently  found  to  permit  prolifera&on  and  tumorigenesis  of  these  primary  cells  carrying  both  oncogenic  RAS  and  wild-­‐type  p53,    

Voorhoeve  et  al.  (2006)  iden&fied  miR-­‐372  and  miR-­‐373    miR-­‐372  has  been  recently  described  as  poten&al  oncogene  

that  collaborate  with  oncogenic  RAS  in  cellular  transforma&on  

Interpreta<on   These  miRNAs  neutralize  p53-­‐  mediated  CDK  inhibi&on,  possibly  through  direct  inhibi&on  of  the  expression  of  the  tumor  suppressor  LATS2  .    

probably  through  direct  inhibi&on  of  the  expression  of  the  tumor-­‐suppressor  LATS2  and  subsequent  neutraliza&on  of  the  p53  pathway.    

Compromised  Lats2  func&onality  might  reduce  the  selec&ve  pressure  for  p53  inac&va&on  during  tumor  progression.    

 

Page 23: Scientific Sensemaking

Data  sharing  in  biology  

hip://en.wikipedia.org/wiki/File:Duck_of_Vaucanson.jpg  

•  Interspecies  variability  >  A  specimen  is  not  a  species!  •  Gene  expression  variability  >    Knowing  genes  is  not    

knowing  how  they  are  expressed!  •  Microbiome  >    An  animal  is  an  ecosystem!  •  Systems  biology  >  Whole  is  more  than  the  sum  of  its  parts!  •  Models  vs.  experiment  >  Are  we  talking  about  the  same  

things?  In  a  way  we  can  all  use?    •  Dynamics  >  Life  is  not  in  equilibrium!           =>  Life  is  complicated!  

Reduc&onism  doesn’t  work  for  living  systems.  

Page 24: Scientific Sensemaking

Sta&s&cs  to  the  rescue!    With  enough  observa&ons,  trends  and  anomalies  can  be  detected:  •   “Here  we  present  resources  from  a  popula&on  of  242  

healthy  adults  sampled  at  15  or  18  body  sites  up  to  three  &mes,  which  have  generated  5,177  microbial  taxonomic  profiles  from  16S  ribosomal  RNA  genes  and  over  3.5  terabases  of  metagenomic  sequence  so  far.”    

The  Human  Microbiome  Project  Consor&um,  Structure,  func&on  and  diversity  of  the  healthy  human  microbiome,  Nature  486,  207–214  (14  June  2012)  doi:10.1038/nature11234  

•  “The  large  sample  size  —  4,298  North  Americans  of  European  descent  and  2,217  African  Americans  —  has  enabled  the  researchers  to  mine  down  into  the  human  genome.”    

Nidhi  Subbaraman,  Nature  News,  28  November  2012,  High-­‐resolu&on  sequencing  study  emphasizes  importance  of  rare  variants  in  disease.  

 

Page 25: Scientific Sensemaking

•  Collect:  store  data  at  the  level  of  the  experiment:  – Accessible  through  a  single  interface  – Add  enough  metadata  to  know  what  was  done/seen  

•  Connect:  allow  analyses  over:    –  Similar  experiment  types    –  Experiments  done  with/on  similar  biological  ‘things’    (species,  strains,  systems,  cells  etc.)  

–  In  a  way  that  can  be  used  by  modelers!    •  Keep:  

–  Long-­‐term  preserva&on  of  data  and  soUware      –  Fulfill  Data  Management  Plan  requirements  – Allow  ‘gated’  access  when  and  to  whom  researcher  wants  

 

Enable  ‘incidental  collaboratories’:  

Page 26: Scientific Sensemaking

Let’s  look  at  a  typical  lab:  

•  How  to  get  the  right    an&body  IDs    

•  And  messy  bits      •  From  the  lab  notebook    •  Into  the  PI’s  command    center?  

Page 27: Scientific Sensemaking

Objec&ons  and  rebuials  re.  data  sharing  Objec-on:   Rebu>al:  

“But  our  lab  notebooks  are  all  on  paper”  

Develop  smart  phone/tablet  apps  for  data  input  

“I  need  to  see  a  direct  benefit  from  something  I  spend  my  &me  on”    

Develop  ‘data  manipula-on  dashboard’  for  PI  to  allow  beier  access  to  full  experimental  output  for  his/her  lab  

“I  want  things  to  be  peer  reviewed  before  I  expose  them”    

Allow  reviewers  access  to  experimental  database  before  publica&on  (of  data  or  paper)  

“I  don’t  really  trust  anyone  else’s  data  –  well,  except  for  the  guys  I  went  to  Grad  School  with…”    

Add  a  social  networking  component  to  this  data  repository  so  you  know  who  (to  the  individual)  created  that  data  point.    

“I  am  afraid  other  people  might  scoop  my  discoveries”  

=>  Reward  system  moves  from  a  compe--on  to  a  ‘shared  mission’  

Page 28: Scientific Sensemaking

Problem:  biological  research  is  quite  insular    •  Biology  is  small:  size  10^-­‐5  –  10^2  m,  scien&st  can  work  alone  (‘King’  and  ‘subjects’).    

•  Biology  is  messy:  it  doesn’t  happen  behind  a  terminal.    

•  Biology  is  compe&&ve:  many    people  with  similar  skill  sets,    vying  for  the  same  grants      

•  In  summary:  the  structure  of  biological  research  does  not  inherently  promote  collabora&on  (vs.,  for  instance,  big  physics  or  astronomy).  

Prepare  

Observe  

Analyze  

Ponder  

Communicate  

Page 29: Scientific Sensemaking

So  we  can  do  joint  experiments:  

Prepare  

Analyze   Communicate  

Prepare  

Analyze   Communicate  

Observa&ons  

Observa&ons  

Observa&ons  

Across  labs,  experiments:  track  reagents  and  how  they  are  used  

Page 30: Scientific Sensemaking

So  we  can  do  joint  experiments:  

Prepare  

Analyze   Communicate  

Prepare  

Analyze   Communicate  

Observa&ons  

Observa&ons  

Observa&ons  

Compare  outcome  of  interac&ons  with  these  en&&es  

Page 31: Scientific Sensemaking

So  we  can  do  joint  experiments:  

Prepare  

Analyze   Communicate  

Prepare  

Analyze   Communicate  

Observa&ons  

Observa&ons  

Observa&ons  

Build  a  ‘virtual  reagent  spectrogram’  by  comparing    how  different  en&&es    interacted  in  different  experiments  

Page 32: Scientific Sensemaking

Elsevier  Research  Data  Services:  

1.  Help  increase  the  amount  of  data  shared  from  the  lab,  enabling  incidental  collaboratories  

2.  Help  increase  the  value  of  the  data  shared  by  increasing  annota&on,  normaliza&on,  provenance  enabling  enhanced  interoperability  

3.  Help  measure  and  deliver  credit  for  shared  data,  the  researchers,  the  ins&tute,  and  the  funding  body,  enabling  more  sustainable  pla�orms  

Page 33: Scientific Sensemaking

Summary  –    Possible  Collabora&ons?    

•  A  model  of  scien&fic  sensemaking:    –  Stories,  that  persuade  with  data  –  Discourse  segments  and  verb  tense  

•  Towards  claim-­‐evidence  networks:  –  Hedging  in  science  –  Crea&ng  claim-­‐evidence  networks  

•  Data:    – Why  life  is  so  complicated  –  Connec&ng  experiments  into  collaboratories  

Thesis:  joint    research?    

Labs:  research  collabora&ons?  

RDS:  joint  development?  

Page 34: Scientific Sensemaking

References:  [1]  J  Am  Med  Inform  Assoc.  2010  September;  17(5):  514–518  hip://dx.doi.org/10.1136/jamia.2010.003947    [2]  Quanzhi  Li,  Yi-­‐Fang  Brook  Wu  (2006):  Iden&fying  important  concepts  from  medical  documents,  Journal  of  Biomedical  Informa&cs  39  (2006)  668–679  [3]  Useful  list  of  resources  in  bioinforma&cs  hip://www.bioinforma&cs.ca/  [4]  Biological  Expression  Language  –  hip://www.openbel.org    [5]  Latour,  B.  and  Woolgar,  S.,  Laboratory  Life:  the  Social  Construc&on  of  Scien&fic  Facts,  1979,  Sage  Publica&ons  [6]  Light  M,  Qiu  XY,  Srinivasan  P.  (2004).  The  language  of  bioscience:  facts,  specula&ons,  and  statements  in  between.  BioLINK  2004:  Linking  Biological  Literature,  Ontologies  and  Databases  2004:17-­‐24.  [7]  Wilbur  WJ,  Rzhetsky  A,  Shatkay  H  (2006).  New  direc&ons  in  biomedical  text  annota&ons:  defini&ons,  guidelines  and  corpus  construc&on.  BMC  Bioinforma&cs  2006,  7:356.  [8]  Thompson  P.,  Venturi  G.,  McNaught  J,  Montemagni  S,  Ananiadou  S.  (2008).  Categorising  modality  in  biomedical  texts.  Proc.  LREC  2008  Wkshp  Building  and  Evalua&ng  Resources  for  Biomedical  Text  Mining  2008.  

[9]  Kim,  S-­‐M.  Hovy,  E.H.  (2004).  Determining  the  Sen&ment  of  Opinions.  Proceedings  of  the  COLING  conference,  Geneva,  2004.    [10]  de  Waard,  A.  and  Schneider,  J.  (2012)  Formalising  Uncertainty:  An  Ontology  of  Reasoning,  Certainty  and  Airibu&on  (ORCA),  Seman&c  Technologies  Applied  to  Biomedical  Informa&cs  and  Individualized  Medicine  workshop  at  ISWC  2012  (submiYed)  [11]  Data2Seman&cs  project:  hip://www.data2seman&cs.org/    [12]  Boyce  R,  Collins  C,  Horn  J,  Kalet  I.  (2009)    Compu&ng  with  evidence  Part  I:  A  drug-­‐mechanism  evidence  taxonomy  oriented  toward  confidence  assignment.  J  Biomed  Inform.  2009  Dec;42(6):979-­‐89.  Epub  2009  May  10,  see  also  hip://dbmi-­‐icode-­‐01.dbmi.pii.edu/dikb-­‐evidence/front-­‐page.html    [13]  Sándor,  Àgnes  and  de  Waard,  Anita,  (2012).  Iden&fying  Claimed  Knowledge  Updates  in  Biomedical  Research  Ar&cles,  Workshop  on  Detec&ng  Structure  in  Scholarly  Discourse,  ACL  2012.