Upload
startit
View
353
Download
3
Embed Size (px)
Citation preview
Ulcerous colitis, irritable bowel syndrome and
microbiota
Can machine learning help forecasting out of poop?
Bojic Svetlana, R user, MDUniversity of Belgrade
Human microbiota
1 human cellfor every 10 microbial cells
• Are we only humans (Homo sapiens)?
1 human geneFor every 100-1000 microbial genes
……if there was democracy in our bodies…
Gut microbiota-”forgotten organ”• Synthesize and excrete vitamins, transform non-
digestible carbohydrates, produce SCFA that feed colonocytes
• Prevent colonization by pathogens-”defending the territory”
• May antagonize other bacteria• Educate the immune system
How can we tell who is in there?• Each bacteria needs to produce
proteins- hence has ribosomes (and ribosomal DNA encoded in its genome)
• While large parts of 16S ribosomal DNA are “conserved”, V1 and V6 are highly specific for particular bacterial strain
• V1 and V6 regions (24-72bp) of all known intestinal bacteria were selected as basis for the probe design
• 3699 unique probes were printed on the microarray slide• We have culture independent tool to assess and quantify the
presence of ~1000 bacterial strains simultaneously
HITChip based experimentm icrobia l
com m unity D N A
R N AN ucle ic acids
extraction & labelling
Data analys is- P rofiling
- Identification- Q uantifica tion
• ….You end up with data frame, the intensity of the signal from the given probe per sample
• One could combine those with phylogenetic map, to get the abundancy of particular bacteria on genus or phylum level
Our problem• Ulcerative colitis (UC)
is a chronic, or long lasting disease that causes inflammation- irritation or swelling- and ulcers on the inner lining of the large intestine
• Irritable bowel syndrome (IBS) is a group of symptoms – including pain or discomfort in your abdomen and changes in your bowel movement patterns- that occur together.
• It’s a functional gastrointestinal disorder.
Endoscopies of the large intestine are the most accurate methods for diagnosing ulcerative colitis
We need less invasive alternative!
calprotectin • One large (n=2499) meta analysis gave pooled estimate of sensitivity and
specificity for calprotectin (0.88, 0.73) for assassment of endoscopically defined disease activity in UC. (Mosli et al, 2015)
• marker of neutrophilic intestinal inflammation
• On the other hand, Clostridium sphenoides and Hemophilus strongly correlate with calprotectin levels!
Kolho et. al. 2015
Our data : 150 patients X 3699 features
• Preliminary constrained RDA analysis on phylum level confirmed that there is significant effect of health status on microbial composition (p<0.01)
• The health status alone could explain as much as 10.2% of variability, even when the influence of ProjectID and gender was partialed out.
Supervised machine learning procedures
• The most successful model – from elastic net family (sensitivity=0.5, specificity=0.98 for UC class) utilized only 89 unique probes- ..and all made biological sense!
* Most common error was mistaking diarrhea predominant IBS subtype for UC and vice versa
• Why we love R : package ‘caret’ has a number of models implemented, we had preference for feature-selection algorithms
Yet, situation complicates further…
..which means that one out of 10 labels in our training set could be wrong!
And that calls for semi-supervised methods…
• These are the methods that make use of both labeled and unlabeled data to train the model
• We are currently experimenting with upclass package, and are in the correspondence with MDs
• …+ implementing the learning algorithms of our own