23
Fast evaluation of the inverse Poisson CDF Mike Giles University of Oxford Mathematical Institute Oxford e-Research Centre GPU Technology Conference 2014 March 26, 2014 Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 1 / 23

Fast Evaluation of the Inverse Poisson Cumulative ...on-demand.gputechconf.com/gtc/2014/presentations/S4173-fast...with probability e −λ λ n n! Hence, the ... Computational cost

  • Upload
    doandan

  • View
    225

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Fast Evaluation of the Inverse Poisson Cumulative ...on-demand.gputechconf.com/gtc/2014/presentations/S4173-fast...with probability e −λ λ n n! Hence, the ... Computational cost

Fast evaluation of the inverse Poisson CDF

Mike Giles

University of Oxford

Mathematical Institute

Oxford e-Research Centre

GPU Technology Conference 2014

March 26, 2014

Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 1 / 23

Page 2: Fast Evaluation of the Inverse Poisson Cumulative ...on-demand.gputechconf.com/gtc/2014/presentations/S4173-fast...with probability e −λ λ n n! Hence, the ... Computational cost

Outline

problem specification

incomplete Gamma function

CPUs versus GPUs

inverse approximation based on Temme expansion

Temme asymptotic evaluation

complete algorithm

results

Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 2 / 23

Page 3: Fast Evaluation of the Inverse Poisson Cumulative ...on-demand.gputechconf.com/gtc/2014/presentations/S4173-fast...with probability e −λ λ n n! Hence, the ... Computational cost

Poisson CDF and inverse

A discrete Poisson random variable N with rate λ takes integer value n

with probability

e−λλn

n!

Hence, the cumulative distribution function is

C (n) ≡ P(N ≤ n) = e−λ

n∑

m=0

λm

m!.

To generate N, can take a uniform (0, 1) random variable U and then

compute N = C−1

(U), where N is the smallest integer such that

U ≤ C (N)

Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 3 / 23

Page 4: Fast Evaluation of the Inverse Poisson Cumulative ...on-demand.gputechconf.com/gtc/2014/presentations/S4173-fast...with probability e −λ λ n n! Hence, the ... Computational cost

Poisson CDF and inverse

Illustration of the inversion process

4 6 8 10 12 14 16 180

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

u

Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 4 / 23

Page 5: Fast Evaluation of the Inverse Poisson Cumulative ...on-demand.gputechconf.com/gtc/2014/presentations/S4173-fast...with probability e −λ λ n n! Hence, the ... Computational cost

Poisson CDF and inverse

When λ is fixed and not too large (λ<104 ?) can pre-compute C (n)and perform a table lookup.

When λ is variable but small (λ<10 ?) can use bottom-up/top-downsummation.

When λ is variable and large, then rejection methods can be used togenerate Poisson r.v.’s, but the inverse CDF is sometimes helpful:

stratified sampling

Latin hypercube

QMC

This is the problem I am concerned with — approximating C−1

(u) ata cost similar to the inverse Normal CDF, or inverse error function.

Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 5 / 23

Page 6: Fast Evaluation of the Inverse Poisson Cumulative ...on-demand.gputechconf.com/gtc/2014/presentations/S4173-fast...with probability e −λ λ n n! Hence, the ... Computational cost

Poisson CDF and inverse

Illustration of the inversion process through rounding down of some

Q(u) ≡ C−1(u) to give C−1

(u)

4 6 8 10 12 14 16 180

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

u

Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 6 / 23

Page 7: Fast Evaluation of the Inverse Poisson Cumulative ...on-demand.gputechconf.com/gtc/2014/presentations/S4173-fast...with probability e −λ λ n n! Hence, the ... Computational cost

Poisson CDF and inverse

Errors in approximating Q(u) can only lead to errors in rounding downif near an integer

4 6 8 10 12 14 16 180

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

u

Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 7 / 23

Page 8: Fast Evaluation of the Inverse Poisson Cumulative ...on-demand.gputechconf.com/gtc/2014/presentations/S4173-fast...with probability e −λ λ n n! Hence, the ... Computational cost

Incomplete Gamma function

If X is a positive random variable with CDF

C (x) ≡ P(X < x) =1

Γ(x)

∫∞

λ

e−t tx−1 dt.

then integration by parts gives

P(⌊X ⌋ ≤ n) =1

n!

∫∞

λ

e−t tn dt = e−λ

n∑

m=0

λm

m!

=⇒ C−1

(u) = ⌊C−1(u)⌋

We will approximate Q(u) ≡ C−1(u) so that |Q̃(u)−Q(u)| < δ ≪ 1

This will round down correctly except when Q(u) is within δ of an integer– then we need to check some C (m)

Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 8 / 23

Page 9: Fast Evaluation of the Inverse Poisson Cumulative ...on-demand.gputechconf.com/gtc/2014/presentations/S4173-fast...with probability e −λ λ n n! Hence, the ... Computational cost

CPUs and GPUs

On a CPU, if the costs of Q̃(u) and C (m) are CQ and CC , the averagecost is approximately

CQ + 2 δ CC .

However, on a GPU with a warp length of 32, the CC penalty is incurredif any thread in the warp needs it, so the average cost is

CQ +(1− (1−2 δ)32

)CC ≈ CQ + 64 δ CC if δ ≪ 1.

This pushes us to more accurate approximations for GPUs.

Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 9 / 23

Page 10: Fast Evaluation of the Inverse Poisson Cumulative ...on-demand.gputechconf.com/gtc/2014/presentations/S4173-fast...with probability e −λ λ n n! Hence, the ... Computational cost

Temme expansion

Temme (1979) derived a uniformly convergent asymptotic expansionfor C (x) of the form

C (x) = Φ(λ

12 f (r)

)+ λ−

12 φ

12 f (r)

) ∞∑

n=0

λ−n an(r)

where r = x/λ and

f (r) ≡√

2 (1− r + r log r),

with the sign of the square root matching the sign of r−1.

Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 10 / 23

Page 11: Fast Evaluation of the Inverse Poisson Cumulative ...on-demand.gputechconf.com/gtc/2014/presentations/S4173-fast...with probability e −λ λ n n! Hence, the ... Computational cost

Temme expansion

Based on this, can prove that the quantile function is

Q(u) ≈ λ r + c0(r)

wherer = f −1(w/

√λ), w = Φ−1(u)

and

c0(r) =log

(f (r)

√r/(r−1)

)

log r

Both f −1(s) and c0(r) can be approximated very accurately (over acentral range) by polynomials, and an additional ad hoc correction gives

Q̃T (u) = λ r + p2(r) + p3(r)/λ

Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 11 / 23

Page 12: Fast Evaluation of the Inverse Poisson Cumulative ...on-demand.gputechconf.com/gtc/2014/presentations/S4173-fast...with probability e −λ λ n n! Hence, the ... Computational cost

Temme approximation

The function f (r)

0 0.5 1 1.5 2 2.5 3 3.5 4−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

r

f(r)

Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 12 / 23

Page 13: Fast Evaluation of the Inverse Poisson Cumulative ...on-demand.gputechconf.com/gtc/2014/presentations/S4173-fast...with probability e −λ λ n n! Hence, the ... Computational cost

Temme approximation

Errors in f −1(s) and c0(r) approximations:

−1 0 1 2 3−8

−6

−4

−2

0

2

4

6

8

10x 10

−10

s

erro

r f−1 approximation

0 1 2 3 4−5

−4

−3

−2

−1

0

1

2

3

4

5x 10

−6

r

erro

r

c0 approximation

Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 13 / 23

Page 14: Fast Evaluation of the Inverse Poisson Cumulative ...on-demand.gputechconf.com/gtc/2014/presentations/S4173-fast...with probability e −λ λ n n! Hence, the ... Computational cost

Temme approximation

Maximum error in Q̃T approximation:

101

102

103

104

105

0

0.2

0.4

0.6

0.8

1

1.2x 10

−5

x

max

imum

err

or

Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 14 / 23

Page 15: Fast Evaluation of the Inverse Poisson Cumulative ...on-demand.gputechconf.com/gtc/2014/presentations/S4173-fast...with probability e −λ λ n n! Hence, the ... Computational cost

C (m) evaluation

In double precision, when Q̃(u) is too close to an integer m+1,we need to evaluate C (m) to choose between m and m+1.

When 12λ≤m≤2λ, this can be done very accurately using another

approximation due to Temme (1987).

Outside this range, a modified version of bottom-up / top-downsummation can be used, because successive terms decrease by factor2 or more.

In single precision this “correction” procedure does not improve theaccuracy.

Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 15 / 23

Page 16: Fast Evaluation of the Inverse Poisson Cumulative ...on-demand.gputechconf.com/gtc/2014/presentations/S4173-fast...with probability e −λ λ n n! Hence, the ... Computational cost

Temme approximation

Maximum relative error in Temme approximation for C (m)

101

102

103

10−14

10−13

10−12

λ

max

imum

rel

ativ

e er

ror

1

Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 16 / 23

Page 17: Fast Evaluation of the Inverse Poisson Cumulative ...on-demand.gputechconf.com/gtc/2014/presentations/S4173-fast...with probability e −λ λ n n! Hence, the ... Computational cost

The GPU algorithm (single precision)

given inputs: λ, u

if λ > 4w := Φ−1(u)s := w/

√λ

if smin<s<smax main branchr := p1(s)x := λ r + p2(r) + p3(r)/λ

elser := f −1(w/

√λ) Newton iteration

x := λ r + c0(r)x := x − 0.0218/(x+0.065λ)

end

n := ⌊x⌋

Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 17 / 23

Page 18: Fast Evaluation of the Inverse Poisson Cumulative ...on-demand.gputechconf.com/gtc/2014/presentations/S4173-fast...with probability e −λ λ n n! Hence, the ... Computational cost

The GPU algorithm (single precision)

if x > 10return n

endend

use bottom-up summation to determine n

if u>0.5 and not accurate enoughuse top-down summation to determine n

end

Top-down summation finds smallest n such that

1−u ≥ e−λ

∞∑

m=n+1

λm

m!

Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 18 / 23

Page 19: Fast Evaluation of the Inverse Poisson Cumulative ...on-demand.gputechconf.com/gtc/2014/presentations/S4173-fast...with probability e −λ λ n n! Hence, the ... Computational cost

The GPU algorithm (double precision)given inputs: λ, u

if λ > 4w := Φ−1(u)s := w/

√λ

if smin<s<smax

r := p1(s)x := λ r + p2(r) + p3(r)/λδ := 2×10−5

elser := f −1(w/

√λ)

x := λ r + c0(r)x := x − 0.0218/(x+0.065λ)δ := 0.01/λ

end

n := ⌊x+δ⌋

Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 19 / 23

Page 20: Fast Evaluation of the Inverse Poisson Cumulative ...on-demand.gputechconf.com/gtc/2014/presentations/S4173-fast...with probability e −λ λ n n! Hence, the ... Computational cost

The GPU algorithm (double precision)

if x > 10if x−n > δ

return n

else if C (n) < u “correction” testreturn n

elsereturn n−1

endend

end

use bottom-up summation to determine n

if u>0.5 and not accurate enoughuse top-down summation to determine n

end

Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 20 / 23

Page 21: Fast Evaluation of the Inverse Poisson Cumulative ...on-demand.gputechconf.com/gtc/2014/presentations/S4173-fast...with probability e −λ λ n n! Hence, the ... Computational cost

Accuracy

100

102

104

106

10−15

10−10

10−5

λ

L1error

poissinvfpoissinv

L1 errors of poissinvf and poissinv functions written in CUDA.

(It measures the fraction of the (0, 1) interval for which the error is ±1.)

Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 21 / 23

Page 22: Fast Evaluation of the Inverse Poisson Cumulative ...on-demand.gputechconf.com/gtc/2014/presentations/S4173-fast...with probability e −λ λ n n! Hence, the ... Computational cost

Performance

Samples/sec for poissinvf and poissinv using CUDA 5.0

GTX670 K20λ poissinvf poissinv poissinvf poissinv

2 1.25e10 1.03e09 1.94e10 5.29e098 5.66e09 3.70e08 8.77e09 2.07e0932 8.07e09 6.98e08 1.25e10 4.20e09128 8.38e09 6.98e08 1.25e10 4.20e09mixed 4.91e09 3.00e08 6.83e09 1.64e09

normcdfinvf 1.96e10 2.70e10normcdfinv 9.61e08 7.15e09

Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 22 / 23

Page 23: Fast Evaluation of the Inverse Poisson Cumulative ...on-demand.gputechconf.com/gtc/2014/presentations/S4173-fast...with probability e −λ λ n n! Hence, the ... Computational cost

Conclusions

By approximating the inverse incomplete Gamma function, havedeveloped an approach for inverting the Poisson CDF for λ>4

Computational cost is roughly cost of inverse Normal CDF functionplus three polynomials of degree 8–12

Report and open source CUDA implementation available now:http://people.maths.ox.ac.uk/gilesm/poissinv

Report also describes a second approximation which is faster forCPUs, but has more branching so is worse for GPUs

Mike Giles (Oxford) Poisson inverse CDF March 26, 2014 23 / 23