34
1 PyMC PyMC による確率的プログラミングと による確率的プログラミングと MCMC MCMC Theano Theano 2014/7/12 BUGS,stan勉強会 #3 @xiangze750

PyMC mcmc

  • Upload
    xiangze

  • View
    2.973

  • Download
    2

Embed Size (px)

DESCRIPTION

Introduction of pyMC (Japanese)

Citation preview

Page 1: PyMC mcmc

1

PyMCPyMCによる確率的プログラミングとによる確率的プログラミングとMCMCMCMCととTheanoTheano

2014/7/12 BUGS,stan勉強会 #3@xiangze750

Page 2: PyMC mcmc

2

Agenda

● Pythonでのベイズモデリング

● PyMCの使い方

● “Probabilistic Programming and Bayesian Methods for Hackers”

● 参照すべきPyMCブログ “While My MCMC Gently Samples “

● Theano, GPUとの連携

● Appendix: Theano, HMC

Page 3: PyMC mcmc

3

Pythonでのベイズモデリング

● Pystan

● PyMC

Page 4: PyMC mcmc

4

PyMCの利点

● Installが簡単

● pythonでモデリング、実行、可視化ができる。

● c++での高速化 (Theano)

– HMC,NUTS

– GPUの使用?

Page 5: PyMC mcmc

5

Install

● #PyMC 2.3

pip install pymc

● #PyMC 3(開発中)

pip install git+https://github.com/pymc-devs/pymc● うまくいかない場合

git clone https://github.com/pymc-devs/pymc

cd pymc

python setup.py install

Page 6: PyMC mcmc

6

Documents

● User's guide

– http://pymc-devs.github.io/pymc/● Tutorial

– https://github.com/fonnesbeck/pymc_tutorial● Probabilistic Programming and Bayesian Methods for

Hackers– http://nbviewer.ipython.org/github/CamDavidsonPilon/Probabilistic-

Programming-and-Bayesian-Methods-for-Hackers/blob/master/Prologue/Prologue.ipynb

Page 7: PyMC mcmc

7

PyMCの使い方

● 基本的文法

● モデル構築

– 確率変数(分布)

– 決定的変数 @pm.deterministic

– 観測変数

● Sampling

● Traceplot, histogram

Page 8: PyMC mcmc

8

Probabilistic Programmingand Bayesian Methods for Hackers

● 1. Introduction

● 2. MorePyMC

● 3. MCMC

● 4. The Greatest Theorem Never Told (The Law of Large Numbers)

● 5. Loss Functions

● 6. Priorities

● 7. Bayesian Machine Learning

http://nbviewer.ipython.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/tree/master/

Ipython notebook、pymcを使用したベイズの教科書

Page 9: PyMC mcmc

9

PyMCの使い方: 変数

#事前分布

lambda_1 = pm.Exponential("lambda_1", 1)

lambda_2 = pm.Exponential("lambda_2", 1)

tau = pm.DiscreteUniform("tau", lower=0, upper=10)

http://nbviewer.ipython.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter2_MorePyMC/MorePyMC.ipynb

Stanの parameter blockに対応

● 確率変数

Page 10: PyMC mcmc

10

PyMCの使い方: 変数

#@pm.stochasticを用いて独自分布を定義することも可能

@pm.stochastic(dtype=int)

def switchpoint(value=1900, t_l=1851, t_h=1962):

if value > t_h or value < t_l:

return -np.inf # Invalid values

else:

return -np.log(t_h - t_l + 1)# Uniform log-likelihood

● 確率変数

http://pymc-devs.github.io/pymc/modelbuilding.html#the-stochastic-class

Page 11: PyMC mcmc

11

PyMCの使い方: 変数

● 確率分布の一覧http://pymc-devs.github.io/pymc/distributions.html#chap-distributions

Page 12: PyMC mcmc

12

PyMCの使い方: 変数

#関数定義の前に@pm.deterministicを付ける

n_data_points = 5 # in CH1 we had ~70 data points

@pm.deterministic

def lambda_(tau=tau, lambda_1=lambda_1, lambda_2=lambda_2):

out = np.zeros(n_data_points)

out[:tau] = lambda_1 # lambda before tau is lambda1

out[tau:] = lambda_2 # lambda after tau is lambda2

return out

#lambdaの値をtauで切り替える。手続き的記述

http://nbviewer.ipython.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter2_MorePyMC/MorePyMC.ipynb

Stanのtransformed parameter blockに対応

● 決定的変数

Page 13: PyMC mcmc

13

PyMCの使い方: 変数

#observed=True

data = np.array([10, 25, 15, 20, 35])

obs = pm.Poisson("obs", lambda_, value=data, observed=True)

http://nbviewer.ipython.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter2_MorePyMC/MorePyMC.ipynb

Stanのdata blockに対応

● 観測変数

Page 14: PyMC mcmc

14

PyMCの使い方: モデル構築

#定義した変数のリストを渡す

model = pm.Model([obs, lambda_, lambda_1, lambda_2, tau])

http://nbviewer.ipython.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter2_MorePyMC/MorePyMC.ipynb

Stanのmodel blockに対応

● モデル

Page 15: PyMC mcmc

15

PyMCの使い方: sampling

#MCMCのための初期値推定

model = pm.Model( [p, assignment, taus, centers ] ) 

map_ = pm.MAP( model ) 

map_.fit() #stores the fitted variables'values in foo.value 

#MCMC

mcmc = pm.MCMC( model ) 

mcmc.sample( 100000, 50000 )

http://nbviewer.ipython.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter3_MCMC/IntroMCMC.ipynb

Page 16: PyMC mcmc

16

PyMCの使い方: histogram, random

samples = [lambda_1.random() for i in range(20000)]

plt.hist(samples, bins=70, normed=True, histtype="stepfilled")

http://nbviewer.ipython.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter2_MorePyMC/MorePyMC.ipynb

Page 17: PyMC mcmc

17

PyMCの使い方: traceplotwith pm.Model() as model:

x = pm.Normal('x', mu=0., sd=1)

y = pm.Normal('y', mu=pm.exp(x), sd=2., shape=(ndims, 1)) # here,

shape is telling us it's a vector rather than a scalar.

z = pm.Normal('z', mu=x + y, sd=.75, observed=zdata) # shape is

inferred from zdata

with model:

  start = pm.find_MAP()

step = pm.NUTS()

trace = pm.sample(3000, step, start)

pm.traceplot(trace)

http://nbviewer.ipython.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter2_MorePyMC/MorePyMC.ipynb

Page 19: PyMC mcmc

19

その他の例

● Bayesian A/B testing● Cheating among students● Challenger Space Shuttle Disaster

http://nbviewer.ipython.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter2_MorePyMC/MorePyMC.ipynb

Page 20: PyMC mcmc

20

注目すべきPyMCブログ

While My MCMC Gently Samples http://twiecki.github.io/

Page 21: PyMC mcmc

21

注目すべきpymcブログ

Repos https://github.com/twiecki● mpi4py_map

– provides a simple map() interface to mpi4py that allows easy parallelization of function evaluations over sequential input.

● CythonGSL– Cython interface for the GNU Scientific Library

(GSL).

Page 22: PyMC mcmc

22

一般化線形モデル(glm)

● code(抜粋)

http://twiecki.github.io/blog/2013/08/27/bayesian-glms-2/

with pm.Model() as model_robust: family = pm.glm.families.T() pm.glm.glm('y ~ x', data, family=family) trace_robust = pm.sample(2000, pm.NUTS(), progressbar=False)

plt.figure(figsize=(5, 5))plt.plot(x_out, y_out, 'x')pm.glm.plot_posterior_predictive(trace_robust, label='posterior predictive regression lines')plt.plot(x, true_regression_line, label='true regression line', lw=3., c='y')plt.legend();

Page 23: PyMC mcmc

23

例: 階層的線形モデル

家の中のラドン濃度

● 85 countries

● 2 to 116 measurements

http://twiecki.github.io/blog/2014/03/17/bayesian-glms-3/

Page 24: PyMC mcmc

24

例: 階層的線形モデル

● Pooling of measurements

– 各地点で共通のパラメータθ

http://twiecki.github.io/blog/2014/03/17/bayesian-glms-3/

Page 25: PyMC mcmc

25

例: 階層的線形モデル

● Unpooled measurements

– パラメータθが地点c毎に異なる。

http://twiecki.github.io/blog/2014/03/17/bayesian-glms-3/

Page 26: PyMC mcmc

26

例: 階層的線形モデル

● Partial pooling: Hierarchical Regression

– ハイパーパラメータμ, σ

http://twiecki.github.io/blog/2014/03/17/bayesian-glms-3/

Page 27: PyMC mcmc

27

例: 階層的線形モデル code

● code(抜粋)

http://twiecki.github.io/blog/2014/03/17/bayesian-glms-3/

with pm.Model() as hierarchical_model: # ハイパーパラメータ(平均と分散) mu_a = pm.Normal('mu_alpha', mu=0., sd=100**2) sigma_a = pm.Uniform('sigma_alpha', lower=0, upper=100) mu_b = pm.Normal('mu_beta', mu=0., sd=100**2) sigma_b = pm.Uniform('sigma_beta', lower=0, upper=100)

#箇所ごとの傾きと切片, 正規分布 a = pm.Normal('alpha', mu=mu_a, sd=sigma_a, shape=n_counties) b = pm.Normal('beta', mu=mu_b, sd=sigma_b, shape=n_counties) # Model error eps = pm.Uniform('eps', lower=0, upper=100)

radon_est = a[county_idx] + b[county_idx] * data.floor.values #尤度 radon_like = pm.Normal('radon_like', mu=radon_est, sd=eps, observed=data.log_radon)

Page 28: PyMC mcmc

28

例: 階層的線形モデル code

● code(抜粋)

http://twiecki.github.io/blog/2014/03/17/bayesian-glms-3/

#modelの実行with hierarchical_model: start = pm.find_MAP() step = pm.NUTS(scaling=start) hierarchical_trace = pm.sample(2000, step, start=start, progressbar=False)

Page 29: PyMC mcmc

29

例: 階層的線形モデル

● CASSでは一カ所でしか測定していない

http://twiecki.github.io/blog/2014/03/17/bayesian-glms-3/

Page 30: PyMC mcmc

30

例: 階層的線形モデル

● Root Mean Square Deviation

individual/non-hierarchical model 0.13

hierarchical model 0.08

Page 31: PyMC mcmc

31

Theano, GPUとの連携

PyMC3 https://github.com/pymc-devs/pymc

_人人人人人人人人人人_

>  開発中!!  <

 ̄Y^Y^Y^Y^Y^Y^Y^Y^Y ̄

Page 32: PyMC mcmc

32

Appendix: Theanoについて

● PythonでDeep learning関係のアルゴリズム実装が出来るライブラリ(http://deeplearning.net/software/theano/)

– (Stacked) Auto Denoising Encoder, RBMなどの実装公開

– 定義した式を記号微分で変形させ、コンパイルする形式

– GPU(Nvidia CUDA)で計算を並列化可能

– 内部でgcc, nvcc(CUDAのコンパイラ)を呼んでいる。

– HMCのサンプル実装(http://deeplearning.net/tutorial/hmc.html)

Page 33: PyMC mcmc

33

Appendix: HMC (Hamilton Monte-Carlo)

● Hamilton (Hyblid) Monte-Carlo

– “運動量”とハミルトニアンHを定義して分布関数が小さい部分での移動幅を大きくし、効率的にサンプリング。

– 積分が1遷移あたりの計算量が通常のMCMCより大きくなる。

http://xiangze.hatenablog.com/entry/2014/06/21/234930