Probabilistic Programming in Python

Preview:

Citation preview

ProbabilisticProbabilisticProgrammingProgramming

A Brief introduction to Probabilistic Programming and Python

EuroSciPy - University of Cambridge August 2015

peadarcoyle@googlemail.com

All opinions my own

Who am I?Who am I?

I work as a Data Scientist for a large Telecommunications Company

Masters in MathematicsInterned at AmazonWas a consultant for a whileOccasional contributor to Pandas and other projectsCo-organizer of the Data Science Meetup in LuxembourgMember of Royal Statistical Society and NumFOCUS@springcoil

What is Probabilistic ProgrammingWhat is Probabilistic Programming

Basically using random variables instead of variablesAllows you to create a generative story rather than a black boxA different tool to Machine LearningA different paradigm to frequentist statisticsForces you to be explicit about your 'subjective' assumptions

Bayesian StatisticsBayesian Statistics

I studied Mathematics, and encountered in textbooks BayesiansThis is a hard area to do by pen and paper, and most integrals can't besolved in exact formThankfully there was an invention of Monte Carlo SimulationsThese simulations are used to approximate your likelihood function

Some terminologySome terminology

How do you pick your prior?How do you pick your prior?

This is a bit of an artYou generally base the prior on experience As you add more data this matters less and less

Huh but isn't ProbabilisticHuh but isn't ProbabilisticProgramming just Stan and BUGS?Programming just Stan and BUGS?

No in Python you have PyMC3No in Python you have PyMC3

A complete rewrite of PyMC2 now in 'Beta' statusBased upon Theano Computational techniques for handling gradientsAutomatic Differentiation and GPU speedupTheano - is also used in deep learning!Currently there is a project to port ' ' from I gave a thorough tutorial on this - Key authors: John Salvatier, Thomas Wiecki, Chris Fonnesbeck

BMH PyMC2 to PyMC3my github

Case study: Rugby AnalyticsCase study: Rugby Analytics

I wanted to do a model of the Six Nations last year.

I wanted to build an understandable model to predict the winner

Key Info: Inferring the 'strength' of each team.

We only have scoring data, which is noisy hence Bayesian Stats

What did I do?What did I do?

1. I picked Gamma as a prior for all teams

2. I used a Hierarchical Model because I wanted home advantage to bestronger for stronger teams based

3. From this I was able to create a novel model based only on historicalresults and scoring intensity

4. I simulated the likelihood function using MCMC

Run the modelRun the model

What actually happenedWhat actually happenedThe model incorrectly predicted that England would come out on top.Ireland actually won by points difference of 6 points. It really came down to the wire!"Prediction is difficult especially about the future"One of the problems is what we call 'over-shrinkage' and you candelve into the results to see what the errors are, my model was withinthe errors. Hat tip: Thanks to Abraham Flaxman and the PyMC3 on helping meport this from PyMC2 to PyMC3

Lessons learnedLessons learned

I can build an explainable model using PyMC2 and PyMC3

Generative stories help you build up interest with your colleagues

Communication is the 'last mile' problem of Data Science

PyMC3 is cool please use it and please contribute

Wanna learn more?Wanna learn more?

BMHBMH

Jake VanDerPlas

PyMC3PyMC3

peadarcoyle@googlemail.compeadarcoyle@googlemail.com

Recommended