24
Probabilistic Probabilistic Programming Programming A Brief introduction to Probabilistic Programming and Python EuroSciPy - University of Cambridge August 2015 [email protected] All opinions my own

Probabilistic Programming in Python

Embed Size (px)

Citation preview

Page 1: Probabilistic Programming in Python

ProbabilisticProbabilisticProgrammingProgramming

A Brief introduction to Probabilistic Programming and Python

EuroSciPy - University of Cambridge August 2015

[email protected]

All opinions my own

Page 2: Probabilistic Programming in Python

Who am I?Who am I?

I work as a Data Scientist for a large Telecommunications Company

Masters in MathematicsInterned at AmazonWas a consultant for a whileOccasional contributor to Pandas and other projectsCo-organizer of the Data Science Meetup in LuxembourgMember of Royal Statistical Society and NumFOCUS@springcoil

Page 3: Probabilistic Programming in Python

What is Probabilistic ProgrammingWhat is Probabilistic Programming

Basically using random variables instead of variablesAllows you to create a generative story rather than a black boxA different tool to Machine LearningA different paradigm to frequentist statisticsForces you to be explicit about your 'subjective' assumptions

Page 6: Probabilistic Programming in Python

Bayesian StatisticsBayesian Statistics

I studied Mathematics, and encountered in textbooks BayesiansThis is a hard area to do by pen and paper, and most integrals can't besolved in exact formThankfully there was an invention of Monte Carlo SimulationsThese simulations are used to approximate your likelihood function

Page 7: Probabilistic Programming in Python
Page 8: Probabilistic Programming in Python

Some terminologySome terminology

Page 10: Probabilistic Programming in Python

How do you pick your prior?How do you pick your prior?

This is a bit of an artYou generally base the prior on experience As you add more data this matters less and less

Page 11: Probabilistic Programming in Python
Page 12: Probabilistic Programming in Python

Huh but isn't ProbabilisticHuh but isn't ProbabilisticProgramming just Stan and BUGS?Programming just Stan and BUGS?

Page 13: Probabilistic Programming in Python

No in Python you have PyMC3No in Python you have PyMC3

A complete rewrite of PyMC2 now in 'Beta' statusBased upon Theano Computational techniques for handling gradientsAutomatic Differentiation and GPU speedupTheano - is also used in deep learning!Currently there is a project to port ' ' from I gave a thorough tutorial on this - Key authors: John Salvatier, Thomas Wiecki, Chris Fonnesbeck

BMH PyMC2 to PyMC3my github

Page 14: Probabilistic Programming in Python

Case study: Rugby AnalyticsCase study: Rugby Analytics

I wanted to do a model of the Six Nations last year.

I wanted to build an understandable model to predict the winner

Key Info: Inferring the 'strength' of each team.

We only have scoring data, which is noisy hence Bayesian Stats

Page 15: Probabilistic Programming in Python

What did I do?What did I do?

1. I picked Gamma as a prior for all teams

2. I used a Hierarchical Model because I wanted home advantage to bestronger for stronger teams based

3. From this I was able to create a novel model based only on historicalresults and scoring intensity

4. I simulated the likelihood function using MCMC

Page 16: Probabilistic Programming in Python
Page 17: Probabilistic Programming in Python
Page 18: Probabilistic Programming in Python
Page 19: Probabilistic Programming in Python

Run the modelRun the model

Page 20: Probabilistic Programming in Python
Page 21: Probabilistic Programming in Python

What actually happenedWhat actually happenedThe model incorrectly predicted that England would come out on top.Ireland actually won by points difference of 6 points. It really came down to the wire!"Prediction is difficult especially about the future"One of the problems is what we call 'over-shrinkage' and you candelve into the results to see what the errors are, my model was withinthe errors. Hat tip: Thanks to Abraham Flaxman and the PyMC3 on helping meport this from PyMC2 to PyMC3

Page 22: Probabilistic Programming in Python

Lessons learnedLessons learned

I can build an explainable model using PyMC2 and PyMC3

Generative stories help you build up interest with your colleagues

Communication is the 'last mile' problem of Data Science

PyMC3 is cool please use it and please contribute

Page 24: Probabilistic Programming in Python