11
Ulcerous colitis, irritable bowel syndrome and microbiota Can machine learning help forecasting out of poop? Bojic Svetlana, R user, MD University of Belgrade

10% Human and Machine Learning

  • Upload
    startit

  • View
    353

  • Download
    3

Embed Size (px)

Citation preview

Page 1: 10% Human and Machine Learning

Ulcerous colitis, irritable bowel syndrome and

microbiota

Can machine learning help forecasting out of poop?

Bojic Svetlana, R user, MDUniversity of Belgrade

Page 2: 10% Human and Machine Learning

Human microbiota

1 human cellfor every 10 microbial cells

• Are we only humans (Homo sapiens)?

1 human geneFor every 100-1000 microbial genes

……if there was democracy in our bodies…

Page 3: 10% Human and Machine Learning

Gut microbiota-”forgotten organ”• Synthesize and excrete vitamins, transform non-

digestible carbohydrates, produce SCFA that feed colonocytes

• Prevent colonization by pathogens-”defending the territory”

• May antagonize other bacteria• Educate the immune system

Page 4: 10% Human and Machine Learning

How can we tell who is in there?• Each bacteria needs to produce

proteins- hence has ribosomes (and ribosomal DNA encoded in its genome)

• While large parts of 16S ribosomal DNA are “conserved”, V1 and V6 are highly specific for particular bacterial strain

• V1 and V6 regions (24-72bp) of all known intestinal bacteria were selected as basis for the probe design

• 3699 unique probes were printed on the microarray slide• We have culture independent tool to assess and quantify the

presence of ~1000 bacterial strains simultaneously

Page 5: 10% Human and Machine Learning

HITChip based experimentm icrobia l

com m unity D N A

R N AN ucle ic acids

extraction & labelling

Data analys is- P rofiling

- Identification- Q uantifica tion

• ….You end up with data frame, the intensity of the signal from the given probe per sample

• One could combine those with phylogenetic map, to get the abundancy of particular bacteria on genus or phylum level

Page 6: 10% Human and Machine Learning

Our problem• Ulcerative colitis (UC)

is a chronic, or long lasting disease that causes inflammation- irritation or swelling- and ulcers on the inner lining of the large intestine

• Irritable bowel syndrome (IBS) is a group of symptoms – including pain or discomfort in your abdomen and changes in your bowel movement patterns- that occur together.

• It’s a functional gastrointestinal disorder.

Endoscopies of the large intestine are the most accurate methods for diagnosing ulcerative colitis

We need less invasive alternative!

Page 7: 10% Human and Machine Learning

calprotectin • One large (n=2499) meta analysis gave pooled estimate of sensitivity and

specificity for calprotectin (0.88, 0.73) for assassment of endoscopically defined disease activity in UC. (Mosli et al, 2015)

• marker of neutrophilic intestinal inflammation

• On the other hand, Clostridium sphenoides and Hemophilus strongly correlate with calprotectin levels!

Kolho et. al. 2015

Page 8: 10% Human and Machine Learning

Our data : 150 patients X 3699 features

• Preliminary constrained RDA analysis on phylum level confirmed that there is significant effect of health status on microbial composition (p<0.01)

• The health status alone could explain as much as 10.2% of variability, even when the influence of ProjectID and gender was partialed out.

Page 9: 10% Human and Machine Learning

Supervised machine learning procedures

• The most successful model – from elastic net family (sensitivity=0.5, specificity=0.98 for UC class) utilized only 89 unique probes- ..and all made biological sense!

* Most common error was mistaking diarrhea predominant IBS subtype for UC and vice versa

• Why we love R : package ‘caret’ has a number of models implemented, we had preference for feature-selection algorithms

Page 10: 10% Human and Machine Learning

Yet, situation complicates further…

..which means that one out of 10 labels in our training set could be wrong!

Page 11: 10% Human and Machine Learning

And that calls for semi-supervised methods…

• These are the methods that make use of both labeled and unlabeled data to train the model

• We are currently experimenting with upclass package, and are in the correspondence with MDs

• …+ implementing the learning algorithms of our own