33
This article was downloaded by: [University of Chicago Library] On: 13 November 2014, At: 23:14 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Communications in Statistics - Simulation and Computation Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/lssp20 Bayesian Dynamic Dirichlet Models Cibele Queiroz Da-Silva a & Guilherme Souza Rodrigues b a Departamento de Estatística, Universidade de Brasília, Brasília, DF, Brazil b Helio dos Santos Migon, Departamento de Estatística, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, Brazil Accepted author version posted online: 19 Mar 2014.Published online: 10 Sep 2014. To cite this article: Cibele Queiroz Da-Silva & Guilherme Souza Rodrigues (2015) Bayesian Dynamic Dirichlet Models, Communications in Statistics - Simulation and Computation, 44:3, 787-818, DOI: 10.1080/03610918.2013.795592 To link to this article: http://dx.doi.org/10.1080/03610918.2013.795592 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms- and-conditions

Bayesian Dynamic Dirichlet Models

Embed Size (px)

Citation preview

Page 1: Bayesian Dynamic Dirichlet Models

This article was downloaded by: [University of Chicago Library]On: 13 November 2014, At: 23:14Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Communications in Statistics - Simulationand ComputationPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/lssp20

Bayesian Dynamic Dirichlet ModelsCibele Queiroz Da-Silvaa & Guilherme Souza Rodriguesb

a Departamento de Estatística, Universidade de Brasília, Brasília, DF,Brazilb Helio dos Santos Migon, Departamento de Estatística, UniversidadeFederal do Rio de Janeiro, Rio de Janeiro, RJ, BrazilAccepted author version posted online: 19 Mar 2014.Publishedonline: 10 Sep 2014.

To cite this article: Cibele Queiroz Da-Silva & Guilherme Souza Rodrigues (2015) Bayesian DynamicDirichlet Models, Communications in Statistics - Simulation and Computation, 44:3, 787-818, DOI:10.1080/03610918.2013.795592

To link to this article: http://dx.doi.org/10.1080/03610918.2013.795592

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Page 2: Bayesian Dynamic Dirichlet Models

Communications in Statistics—Simulation and Computation R©, 44: 787–818, 2015Copyright © Taylor & Francis Group, LLCISSN: 0361-0918 print / 1532-4141 onlineDOI: 10.1080/03610918.2013.795592

Bayesian Dynamic Dirichlet Models

CIBELE QUEIROZ DA-SILVA1 AND GUILHERME SOUZARODRIGUES2

1Departamento de Estatıstica, Universidade de Brasılia, Brasılia, DF, Brazil2Helio dos Santos Migon, Departamento de Estatıstica, Universidade Federal doRio de Janeiro, Rio de Janeiro, RJ, Brazil

The main purpose of this article is the study of statistical models for compositional data,which are characterized by random vectors yt defined on the open standard (k − 1)-simplex. Each coordinate of yt represents the share, in percentage, of each one of thek categories that represent a given phenomenon. We propose a new dynamic model,the Dynamic Dirichlet Model (DDM), for describing time series of compositional data.DDM includes, as submodels, the Beta Dynamic model, static Dirichlet regression,and a competitor of the static Beta regression. We design both on-line and off-lineapproaches for the estimation of the parameters in the model. The on-line version isadequate for recursive estimation while the off-line one, which is based on stochasticsimulation via Markov Chain Monte Carlo (MCMC), can be used when there are somespecific unknown parameters in the model. We discuss the practical use of the proposedmodel in describing the past behavior of the series, as well as in the prediction process.We also discuss the application of DDM in a static context. This particular case isimportant because when the latent states in the model do not vary over time, DDM takesthe form of a Dirichlet regression model. Some DDM submodels, such as the dynamicBeta model and the static Beta regression, are also discussed.

Keywords Bayesian approach; Beta distribution; Compositional data; Dirichlet distri-bution; Dynamic models; Logistic-Normal distribution; Time series.

Mathematics Subject Classification 91B82; 62F15

1. Introduction

The class of Dynamic models (West and Harrison, 1997), which is included among thestate space models, has been a very fruitful research area over the years. The interest forsuch models is justified by their versatility and tractability in the description of time series.

The Dynamic Linear Models (DLMs) are appropriate for Gaussian time series. Themodel features include the following. (I) Flexibility in the model specification: the choiceof some appropriate design matrices allows for model specification in a multiplicity of

Received November 14, 2012; Accepted April 8, 2013Address correspondence to Dr. Cibele Queiroz da-Silva, Universidade de Brasilia, Departa-

mento de Estatistica, Campus Universitario Darcy Ribeiro, Brasilia, 70910-900, Brazil.; E-mail:[email protected]

Color versions of one or more of the figures in the article can be found online atwww.tandfonline.com/lssp.

787

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:14

13

Nov

embe

r 20

14

Page 3: Bayesian Dynamic Dirichlet Models

788 da-Silva and Rodrigues

ways and they also allow for the inclusion of covariates in the model. (II) Missing data: thestructure of state space models for analyzing time series is such that missing observationscan be easily accommodated. (III) Intervention: the time series may present some abruptshifts. When there are known historical events, it is possible to incorporate those shifts inthe model such that the estimated level will recognize the new regime, leading, in turn, tomore accurate forecasts.

In order to allow for non-normal problems, West et al., (1985) created the class ofDynamic Generalized Linear Models (DGLMs) so that the observations follow the expo-nential family of distributions. Other extensions and generalizations were also proposed byLindsey and Lambert (1995) and Godolphin and Triantafyllopoulos (2006). More recently,da-Silva et al., (2011) proposed a dynamic Beta model (DBM) for modeling and forecastingsingle time series of rates or proportions.

In this article, we extend the DBM proposed by da-Silva et al., (2011) to the multivariatecase. We propose a new dynamic model, the dynamic Dirichlet model (DDM), that is usefulfor analyzing time series of compositional data expressed in terms of proportions. The dataare characterized by random time-varying vectors yt , defined on the open standard (k− 1)-simplex. Each coordinate of yt represents the share, in percentage, of each one of the kcategories that describe a given phenomenon.

We would like to stress the fact that DDM is a model in which the observed time seriesfollows a Dirichlet distribution. This problem differs, substantially, from dynamic topicsdata (Billheimer et al., 1997; Blei and Lafferty, 2006; Pruteanu-Malinici, 2010; p. 560of West and Harrison, 1997). According to Pruteanu-Malinici (2010), in dynamic topicsdata, the topics are drawn from a multinomial distribution over a dictionary such that theallocation probabilities follow a Dirichlet distribution (the natural conjugate prior to themultinomial). Wang et al., (2008), Section 2, make that point very clear when describingtheir algorithm. In contrast, in the DDM setting, the Dirichlet distribution involved in theanalysis does not play the rule of a prior, in Multinomial-Dirichlet problems, in which thevector of parameters of the Multinomial distribution is to be estimated. We have, instead, anobserved compositional time series, expressed in terms of proportions that we model usingthe Dirichlet distribution, which, in our formulation, represents the observation equationof our proposed dynamic model.

In addition, in dynamic topics data, one is generally interested in clustering problems,such as obtaining clusters of paragraphs using what appears to be an accurate topic repre-sentation (Pruteanu-Malinici, 2010). However, among other features, with DDM, we areinterested in forecasting future events for the observed compositional time series, which ismodeled by the Dirichlet distribution.

DDM is not only useful in the study of compositional time series of proportions butalso in analyzing the static case of Dirichlet regression models. Other submodels includethe beta dynamic model (da-Silva et al., 2011) and a competitor model of the static betaregression (Ferrari and Cribari-Neto, 2004).

Other works in the literature dealing with compositional series of proportions includeQuintana and West (1988) and Grunwald et al., (1993). Quintana and West (1988) modeledthe observational distribution of compositional series of proportions using the multivariatelogistic-normal distribution, as defined in Aitchison and Shen (1980). In order to satisfy theassumptions of the Matrix Normal DLMs (see West and Harrison, 1997, Chapter 16), theauthors applied a logistic/log ratio transformation to map the original vector of proportions,p, into a vector of real-valued quantities. However, the resulting variable, say Y, poses asecond problem, a zero-sum restriction causing the singularity of the covariance matrixassociated with the latent states. Trying to solve these singularities, the authors applied

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:14

13

Nov

embe

r 20

14

Page 4: Bayesian Dynamic Dirichlet Models

Bayesian Dynamic Dirichlet Models 789

another transformation, now to Y, obtaining H, and then used a Matrix Normal DLM in theanalysis (see expressions (6) to (9) by Quintana and West, 1988). The major problem withthat approach is the impossibility of expressing the estimated series and predictions thatwere obtained with H, back to the scale of probabilities p. In DDM, no transformations areapplied to the data, providing easier interpretation of the results. In this sense, we believethat the composite model by Quintana and West (1988) is not comparable with ours. Wediscuss this issue a little further in Section 5.4.

Grunwald et al., (1993) dealt with the non-Gaussian nature of the problem by describinga power-steady like model (Smith, 1979). This formulation is useful because it providesa simple way for the updating procedure of the state-space approach, but also is toospecific, preventing the specification of more general model specifications. They alsoused a logistic/log ratio transformation in their approach, with the specification of someimportant effects on the latent process, such as growth and seasonality, being done viathe inclusion of a series of dummy variables. The coefficients associated with each ofthe dummy variables have to be estimated using an approximated maximum likelihoodprocedure based on Laplace approximations (Tierney and Kadane, 1986). This approachdemands expensive computation and is not so clear initially. In DDM, in contrast, thespecification of the mentioned effects as well as the inclusion of covariates in the modelare very straightforward since these are done using some design matrices that are naturallyincorporated in the model. DDM was formulated according to the class of DGLM, whichbestows all gains in flexibility mentioned in the beginning of this section.

We designed both on-line and off-line approaches for the estimation of the parametersin the model. The on-line version is adequate for recursive estimation while the off-lineone, which is based on stochastic simulation via MCMC, can be used when there are somespecific unknown parameters in the model.

We discuss the practical use of the proposed model in describing the past behavior ofthe series, as well as in the prediction process. We also discuss the application of DDM (seeCamargo et al., 2012 and Hijazi, 2009) in a static context, an important particular case inwhich DDM takes the form of a Dirichlet regression model. The state space nature of ourmodel, although described by partially specified distributions, allows us to take advantageof all the dynamic model features, like sequential monitoring, intervention, missing data,etc.

One of the applications in this work illustrates the utility of DDM with regard tothe statistical description of compositional data of mortality and morbidity in Brazil. Weanalyze a time series of the relative contribution of two major mortality categories to totalmortality: traffic-related accidents and aggressions and lesions intentionally auto-inflicted.We also explore a series of submodels that can be described from DDM, including the Betadynamic model.

The article is organized as follows. In Section 2, we introduce background informationabout DLMs. In Section 3, we describe DDM and the on-line approach in the estimationof the parameters in the model. In Section 4, we present the off-line inferential approach.In Section 5, we discuss a variety of examples, including important details of the uses ofDDM.

2. Dynamic Models

DLMs are parametric models for describing time series where the parameter variationand the available data are described probabilistically. They are characterized by a pair ofequations, named the observational equation, and the parameters evolution equation or the

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:14

13

Nov

embe

r 20

14

Page 5: Bayesian Dynamic Dirichlet Models

790 da-Silva and Rodrigues

system equation. The observational and system equations are, respectively, given by

yt = F′t θt + εt , εt ∼ N (0, Vt ) (1)

θt = Gt θt−1 + ωt, ωt ∼ N (0,Wt ), (2)

θ0 ∼ N (m0,C0), (3)

where yt is a time sequence of observations, conditionally independent given thesequence of parameters θt , m0, and C0 are prior moments for the initial state vector θ0;Ft is a vector of explanatory variables; θt is a k × 1 vector of parameters; Gt is a k × k

matrix describing the parameter evolution; and finally, Vt and Wt are the variances ofthe associated errors, respectively, with a unidimensional observation vector and with a k-dimensional vector of parameters. A DLM is characterized by a quadruple (Ft ,Gt ,Vt ,Wt ).

The DLMs satisfy two fundamental properties.

(A.1). (θt ) is a Markov chain.(A.2). Conditionally on (θt ), the yts are independent and depend on θt only.

The consequence of (A.1) and (A.2) is that we can write the joint distribution of the statesθt and observations yt as a product of conditional distributions: for t ≥ 1,

p(θ0, θ1, . . . , θT , y1, . . . , yT ) = p(θ0)T∏t=1

p(θt |θt−1)p(yt |θt ). (4)

This result is very important in the analysis of dynamic models in general and it will beextensively used in Section 4.

Let Dt = Dt−1 ∪ {yt } be the available information up to time t. In order to estimatethe state vector � = (θ1, . . . , θT )

′, it is necessary to compute the conditional densities

p(θs | Dt ).The inferential process of the dynamic models, in general, includes three different

problems: filtering, when s = t ; state prediction, when s > t ; and smoothing, when s < t .In the filtering problem, the data are supposed to arrive sequentially in time, making ittheoretically possible to compute p(θt | Dt ). In the smoothing problem, or retrospectiveanalysis, we are interested in estimating the state sequence at times 1, . . . , t , given thedata y1, . . . , yt ; that is, p(θt−h | Dt ), with 1 ≤ h ≤ t . The filtering distributions allow us toobtain on-line inferences, that is, up-to-date estimates of the current state of the system.

When DLM contains unknown parameters in its specification (e.g., variances and somekinds of hyperparameters), one has to resort not only to numerical techniques, but also tooff-line inferences, like the MCMC method. However, this increases computational effortfor obtaining the desirable on-line inferences, since each time a new observation arrives, atotally new MCMC run has to be performed.

Based on the generalized linear models of Nelder and Wedderburn (1972), West et al.,(1985) extended DLMs to allow for observations in the exponential family. In such a setting,the observation equation (1) is described as

P (yt |ηt ) = exp{φt [ytηt − b(ηt )] + c(yt , φt )}, (5)

where ηt and φt = V −1t are, respectively, the natural parameter and the precision parameter

of the distribution. The components b(ηt ) and c(yt , φt ) are known functions, and b(ηt ) issuch that μt = E(yt |ηt , Vt ) = db(ηt )

dηt= ηt and V(yt |ηt , Vt ) = Vt ηt . A suitable link function

g, applied to η, is related to the linear predictor F′t θt : λt = g(ηt ) = F′

t θt .

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:14

13

Nov

embe

r 20

14

Page 6: Bayesian Dynamic Dirichlet Models

Bayesian Dynamic Dirichlet Models 791

The system equation is the same as Eq. (2), except that the distribution associated withthe errors ωt is only specified by its first-order and second-order moments, that is, θt =Gt θt−1 + ωt ; ωt ∼ (0,Wt ), with initial prior information given by (θ0 | D0) ∼ (m0,C0),(θt−1 | Dt−1) ∼ (mt−1,Ct−1), (θt | Dt−1) ∼ (at ,Rt ), at = Gtmt−1, and Rt = GtCt−1G

′t +

Wt . The so-called conjugate-updating method (West, Harrison, and Migon, 1985) is usedfor the on-line inferential procedure enabling estimation of the state sequence θt , t =1, . . . , T .

3. DDM

In this section, we introduce a new model, DDM, which represents a generalization of DBM(da-Silva et al., 2011). Let yt , t = 1, 2, . . ., be a k-component time series of compositionaldata, and let us use LNk−1(ηt ,t ) to denote the Logistic-normal distribution (Aitchison,1982) parameterized by ηt and t . The dynamic Dirichlet is defined by the followingcomponents:

• Observation equation:

(yt |μt, φ) ∼ Dirichlet(μ1tφ, μ2tφ, . . . , μktφ), that is,

p(yt |μt, φ) = (φ)∏ki=1 (φμit )

k∏i=1

yitφμit−1, where

ykt = 1 −k−1∑i=1

yit ; 0 < yit , μit < 1 ∀i; φ > 0;

μkt = 1 −k−1∑i=1

μit ; and μt = (μ1t , . . . , μk−1,t ).

• Prior:

(μt |Dt−1) ∼ LNk−1(ηt ,t ).

• Link function: additive logit, given as

λt = F′t θt =

[log

(μ1t

μkt

), . . . , log

(μk−1,t

μkt

)]′.

• System equation:

θt = Gt θt−1 + ωt ; ωt ∼ (0,Wt ).

• Initial information:

(θ0|D0) ∼ (m0,C0).

In this work, the precision parameter φ is considered static, that is, with no dynamicbehavior. In this section, φ is considered known along with the covariance matrix Wt .Following the same principles as in da-Silva et al., (2011), most of the distributions areonly partially specified in terms of their moments.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:14

13

Nov

embe

r 20

14

Page 7: Bayesian Dynamic Dirichlet Models

792 da-Silva and Rodrigues

It is worth mentioning some important properties of the distributions in discussion.The Dirichlet distribution is conjugate to the Multinomial distribution and its support isdefined on the open standard (k − 1)-simplex, �k−1, where

�k−1 = {(y1, . . . , yk−1) ∈ IRk−1 |k−1∑i=1

yi ≤ 1 and yi > 0,∀i}.

The moments of the Dirichlet distribution are

E(yt |μt, φ) = μt and V (yt |μt, φ) = 1

1 + φ[diag(μt ) − μtμ

′t ]. (6)

Modeling dynamic composite data using the Dirichlet distribution, instead of anotherkind of distribution, can be justified by the simplicity of the relationships involved in theparameters in the model. Suppose thatμt is known. Then, the covariance matrixV (yt |μt, φ)is totally determined, except for the knowledge of the precision parameter φ, which can beeasily estimated via MCMC methods.

Besides the multivariate nature of DDM, an important difference between DDM andDBM is in the choice of the prior distribution used for the parameter vector μt . In DBM, aBeta prior distribution was used in the analysis while in this work, we use a Logistic-normaldistribution.

Since the Dirichlet distribution is a generalization of the Beta distribution, the readermight wonder why we have not followed a similar strategy as in DBM, which would implythe use of a Dirichlet prior for μt . The main reason for this is in the elicitation processof the prior distribution of μt : if we choose to work with a Dirichlet prior, the elicitationof its parameters is clumsy, since we have to solve an over-defined system of equations.As we shall see in step (2) of Section 3.1, the use of a Logistic-normal prior makes theelicitation step very straightforward. This finding is one of the major contributions of ourproposed method. Another important aspect to consider is the gain in flexibility providedby a Logistic-normal prior when compared to a Dirichlet prior.

We have chosen to work with a Logistic-normal prior, (μt |Dt−1) ∼ LNk−1(ηt ,t ),since its support is also defined on the simplex and because such a distribution is manageableenough to allow all the important steps needed in the inferential process of DDM. Thus,the density of (μt |Dt−1) is given by

p(μt |Dt−1) =(

1

)(k−1)/2

|t |−1/2

(1∏k

i=1 μit

)exp

[−1

2(zt − ηt )

′−1t (zt − ηt )

],

where

μt = (μ1t , . . . , μk−1,t ) ∈ IRk−1 |k−1∑i=1

μit < 1; μit > 0,∀i, and

zt = alz(μt ) =[

log

(μ1t

μkt

), . . . , log

(μk−1,t

μkt

)]= (z1t , . . . , zk−1,t )

′.

The alz transformation from �k−1 into IRk also represents the multivariate link functionused in our model. It is easy to show that the random vector zt follows a multivariate normaldistribution: zt ∼ Nk−1(ηt ,t ).

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:14

13

Nov

embe

r 20

14

Page 8: Bayesian Dynamic Dirichlet Models

Bayesian Dynamic Dirichlet Models 793

3.1. DDM When some Parameters are Known: The On-line DDM Procedure

The steps necessary for the on-line estimation of the parameters of DDM follow the samegeneral principles as in Section 3 of da-Silva et al., (2011). Steps (1) to (5) below summarizethe on-line DDM procedure.

1. Prior moments of θt and λtThe prior distributions of (θt |Dt−1, φ) and (λt |Dt−1, φ) are only partially specified,such that (θt |Dt−1, φ) ∼ (at ,Rt ) and (λt |Dt−1, φ) ∼ (ft ,Qt ), where

at = Gtmt−1, Rt = GtCt−1G′t + Wt , ft = F′

tat and Qt = F′tRtFt .

2. Elicitation of the prior distribution of μt .Our objective here is to describe functional forms for the parameters of (μt |Dt−1);that is, we need to find expressions for ηt and t based on the first-order andsecond-order moments of (λt |Dt−1).We note that according to the properties of the Logistic-normal distribution, it iseasy to establish that

ft = E(λt |Dt−1, φ) = E(alz(μt )|Dt−1, φ) = E(zt |Dt−1, φ) = ηt ,

Qt = V (λt |Dt−1, φ) = V (alz(μt )|Dt−1, φ) = V (zt |Dt−1, φ) = t.

The equalities above reveal that the prior moments of λt and alz(μt ) are exactlythe same. Therefore, the use of a Logistic-normal prior for μt implies that noadditional computations and approximations are needed for eliciting the parametersof the prior distribution of μt . In fact, ηt and t are, respectively, elicited asft and Qt . This result, initially very surprising by its extreme simplicity, makesthe elicitation step straightforward. Actually, these results represent an extremelyimportant methodological contribution, since they make viable the elicitation stepof DDM. The strength of such a result can be better appreciated when we compareit in terms of the computational demands needed in the elicitation of the prior of(μt |Dt−1) in the Beta dynamic model (see Section 3.1.2 of da-Silva et al., 2011).As we mentioned before, had we opted to work with a Dirichlet prior instead, wewould have faced an over-identified system of equations that proved to be impossibleto solve.

3. Updating step for μtThe posterior distribution of μt is obtained using Bayes’ Theorem:

p(μt |Dt, φ) ∝ p(yt |μt,Dt−1, φ)p(μt |Dt−1). (7)

Now, we need to obtain the posterior moments of (μt |Dt, φ) in order to calculatethe posterior moments of (λt |Dt, φ).Since the posterior density p(μt |Dt, φ) does not have a closed form, its first-orderand second-order moments, μt = E(μt | Dt, φ) and Vt = V (μt | Dt, φ), have tobe calculated numerically. Such calculation is performed in a two-step procedurethat is described below.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:14

13

Nov

embe

r 20

14

Page 9: Bayesian Dynamic Dirichlet Models

794 da-Silva and Rodrigues

Step 3.1 for calculating μt and Vt : The integration region is defined by the open(k − 1)-simplex, that is, the set

�k−1 = {(μ1, . . . , μk−1) ∈ IRk−1 |k−1∑i=1

μi < 1; μi > 0,∀i}.

It is useful to transform the integration region into a unitary hypercube (0, 1)k−1,which is described as

k−1 = {x = (x1, . . . , xk−1) ∈ IRk−1 | 0 < x1, . . . , xk−1 < 1}.

Such transformation, in which the absolute value of the Jacobian is given byJ =∏k−2

i=2 (1 − xi)k−1−i , is described as

xi =

⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩

k−1∑j=1

μj , for i = 1,

μi−1/

k−1∑j=i−1

μj , for 2 ≤ i ≤ k − 1.

The R library hyperdirichlet (Hankin, 2010) provides tools for performingnot only the referred transformation to the unitary hypercube, using the functionp to e, but also the calculation of the Jacobian, using the function Jacobian.Step 3.2 for calculating μt and Vt : Once the integration region is transformed, itis possible to use an adaptative multidimensional integration technique known ascubature, which is a generalization of the quadrature method.The R library cubature (Johnson and Narasimhan, 2009) enables the use of thefunction adaptIntegrate, which is very useful since it allows us to computeintegrals when the integrand is a vector.In order to estimate μt and Vt , it is necessary to calculate the following integrals:one for the normalization constant of the density given in expression (7), k − 1for calculating each of the coordinates of μt , and k(k − 1)/2 in order to calculatethe moments E(μiμj |Dt ), i, j = 1, . . . , k − 1. The function adaptIntegrateworks very well for such a task.

4. Updating for λt .Using the pair of values μt and Vt , we evaluate the first moments f∗

t and Q∗t of

the posterior distribution (λt |Dt, φ). This is accomplished by solving the nonlinearsystem of equations below. Such s system implicitly incorporates μt and Vt as weare about to show.

E(λt |Dt, φ) = E(alz(μt )|Dt, φ) = f∗t ,

V (λt |Dt, φ) = V (alz(μt )|Dt, φ) = Q∗t .

According to the link function we are working with,

μt = alz−1(λt ) = 11+∑k−1

i=1 eλit

(eλ1t , . . . , eλk−1,t )′, (8)

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:14

13

Nov

embe

r 20

14

Page 10: Bayesian Dynamic Dirichlet Models

Bayesian Dynamic Dirichlet Models 795

and it is possible to estimate f∗t and Q∗

t by taking a first-order multivariate Taylorexpansion of alz−1(λt ) around f∗

t . Therefore, we have

alz−1(λt ) ≈ alz−1(f∗t ) + [alz−1(f∗

t )](λt − f∗t ), (9)

where is the gradient; that is,

= [alz−1(f∗t )]= 1

(1 + β ′1k−1)2[diag(β) + diag(β1′

k−1β) − ββ ′]. (10)

The vectorβ = (β1, . . . , βk−1)′ = (ef∗1t , . . . , ef

∗k−1,t )′ is used to simplify the notation,

and the term 1k−1 represents a k − 1 column vector with all elements equal to 1.Thus, the following equations relate the posterior moments of μt and λt :

μt = E(μt |Dt ) = E(alz−1(λt )|Dt )

≈ E(alz−1(f∗t ) + (λt − f∗

t )|Dt )

= E(alz−1(f∗t )|Dt ) + E( (λt − f∗

t )|Dt )

= alz−1(f∗t ), (11)

Vt = V (μt |Dt ) = V (alz−1(λt )|Dt )

≈ V (alz−1(f∗t ) + (λt − f∗

t )|Dt )

= ′V (λt |Dt ) = ′Q∗

t . (12)

The Eqs. (11) and (12) are then solved as a function of f∗t and Q∗

t . This results inf∗t = alz(μt ) and Q∗

t = ( ′)−1Vt ( )−1. Therefore, the posterior distribution of λtis approximated by

(λt |Dt ) ∼ (alz(μt ), ( ′)−1Vt ( )−1).

5. Updating for θt .In order to update the moments of θt , we resort to the Bayesian linear estimationmethod. Using similar procedures as in da-Silva et al., (2011) (see Section 3.1.4(c)), we have that the joint posterior distribution of (λt , θt ) is only partially specified.Thus, the posterior moments of θt are given by

mt = E(θt |Dt ) = E[E(θt |λt ,Dt−1)|Dt ]

≈ E[E(θt |λt ,Dt−1)|Dt ]

= E[at + RtFtQ−1t (λt − ft )|Dt ]

= at + RtFtQ−1t (f∗

t − ft ),

Ct = V (θt |Dt ) = V [E(θt |λt ,Dt−1)|Dt ] + E[V (θt |λt ,Dt−1)|Dt ]

≈ V [at + RtFtQ−1t (λt − ft )|Dt ] + E[Rt − RtFtQ−1

t F′tRt |Dt ]

= V [RtFtQ−1t λt |Dt ] + Rt − RtFtQ−1

t F′tRt

= Rt − RtFtQ−1t [I − Q∗

t Q−1t ]F′

tRt .

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:14

13

Nov

embe

r 20

14

Page 11: Bayesian Dynamic Dirichlet Models

796 da-Silva and Rodrigues

The one-step-ahead forecasting distribution is given by

p(yt |Dt−1, φ) =∫�

p(yt |μt,Dt−1, φ)p(μt |Dt−1) dμt .

This density cannot be calculated analytically. However, steps 3.1 and 3.2 can be usedfor that purpose.

Concerning step (4), we only used first-order approximations for the moments of(λt |Dt ). In the multivariate context, the inclusion of second-order approximations wouldadd tremendous complications in the updating of λt . Our simulations indicate that ourmethods can satisfactorily recover the parameters and features used for simulating thedata (see Section 5.2). In the Beta dynamic model, da-Silva et al., (2011) (Section 4.2.4)evaluated the differences between the first-order and second-order approximations for themoments of (λt |Dt ). The authors concluded that the first-order approximations are preciseenough for a broad range of applications.

When W and φ are unknown, as it is in virtually all real applications, we can proceed asfollows. First, specify W using the so-called discount factors (see West and Harrison, 1997)and their variations. This technique elegantly deals with the problem by enforcing thatinformation decays at the same rate for each of the elements of the state vector. Accordingto these authors, “Block discounting is our recommended approach to structuring theevolution variance sequence in almost all applications. The approach is parsimonious,naturally interpretable, and robust.” Second, in order to estimate the precision parameterφ, we set φ as the value that maximizes the log of the likelihood function, L, expressedby

L = p(Y|φ) =T∏t=1

p(Yt |Dt−1, φ) =T∏t=1

∫p(Yt |θt , φ)p(θt |Dt−1)dθt .

This last step can be replaced by any other convenient criterion, such as the value thatminimizes the squared prediction error. Observe that, in either case, φ is then treated asknown in the filtering process.

4. DDM When All Parameters are Unknown: Off-line DDM Procedure

In this section, we deal with the off-line estimation of DDM, which allows us to es-timate the parameters via MCMC. We consider the case of Wt = W. For the MCMCformulation of DDM (see Section 3), we now consider Gaussian errors in the evolutionequation:

θt = Gt θt−1 + ωt ; ωt ∼ N (0,Wt ).

Thus, for t = 1, . . . , T , we have that (θt |θt−1,W) ∼ N (Gt θt−1,W). In addition, let � =(θ1, . . . , θT ) andψ = (W, φ, θ0) represent the parameters in the model, andp(�,ψ) denotethe prior distribution associated with (�,ψ). Thus,

p(�,ψ) ∼ [p(�|θ0,W) p(θ0)]p(W, φ), (13)

with θ0 ∼ N (m0,C0).

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:14

13

Nov

embe

r 20

14

Page 12: Bayesian Dynamic Dirichlet Models

Bayesian Dynamic Dirichlet Models 797

In order to present the off-line procedure in a broader perspective, we also considerthe possibility of the occurrence of missing data. Thus, the vector ψ is now describedas ψ = (θ0, φ,W,YAC ), where AC represents the set of indexes such that yt is a missingobservation.

Using properties (A.1) and (A.2) described in Section 2 and assuming that W, φ, andθ0 are independent a priori, the joint posterior distribution is given by

p(ψ,�|DT ) ∝ p(DT |�,ψ)p(�,ψ)

=∏t=A

p(Yt |θt , φ)∏t=AC

p(Yt |θt , φ)T∏t=1

p(θt |θt−1,W)p(W)p(φ)p(θ0).

(14)

The first term of the second line of expression (14) is the conditional likelihood whilethe second one represents the prior distribution of the missing observations.

The sampling process from distribution p(ψ,�|DT ) is performed with the help of theMetropolis-Hastings within Gibbs algorithm, which is summarized below:

1. Initialization: Define initial values ψ = ψ (0) and � = �(0).2. For i = 1, . . . , N :

(a) Sample Y(i)AC from p(YAC |� = �(i−1), φ = φ(i−1));

(b) Sample the latent states �(i) from p(�|Dt,YAC = Y(i)AC , θ0 = θ

(i−1)0 , φ =

φ(i−1),W = W(i−1));(c) Sample the covariance matrix W(i) from p(W|θ0 = θ

(i−1)0 ,� = �(i));

(d) Sample the initial state θ (i)0 from p(θ0|θ1 = θ

(i)1 ,W = W(i));

(e) Sample the precision parameter φ(i) from p(φ|Dt,YAC = Y(i)AC , θ0 = θ

(i)0 ,� =

�(i)).

In the next section, we describe in detail each of the items in the algorithm above.

4.1. Sampling the Missing Observations

When an observation yt is missing, it contributes with no additional information. This canbe expressed as p(θt |Dt ) = p(θt |Dt−1), that is, the filtered distribution in time t is simplythe prior at time t.

The MCMC procedure for dealing with missing observations consists of treating themas unknown parameters. According to the expressions (13) and (14), the joint prior p(�,ψ)is conveniently defined as

p(�,ψ) = p(YAC |�,φ)p(�|θ0,W)p(W)p(φ)p(θ0). (15)

The first term on the right-hand side of expression (15) is specified by the Dirichletdistribution (see the observation equation in Section 3). Thus, the priori distribution of YAC

is given by p(YAC |�,φ) =∏t=AC p(Yt |θt , φ), and according to expression (14), the fullconditional distribution of YAC is such that

p(YAC | . . .) =∏t=AC

p(Yt |θt , φ), (16)

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:14

13

Nov

embe

r 20

14

Page 13: Bayesian Dynamic Dirichlet Models

798 da-Silva and Rodrigues

where the notation “. . .,” represents all data available DT and all the unknown parametersin the model except for YAC . Expression (16) may look awkward at first, since it does notdepend on the observed data, but that is a consequence of property (A.2) in Section 2.

4.2. Sampling from the Latent States �

From Eq. (14), the full conditional distribution of � is given by

p(�| . . .) ∝T∏t=1

p(yt |θt , φ)p(θt |θt−1,W), (17)

which does not have a closed form. The problem of sampling from the posterior distributionof the latent states of non-Gaussian nonlinear dynamic models has received much attention;see, for example, Shephard and Pitt (1997), Gamerman (1998), Geweke and Tanizaki(2001), and Ravines et al. (2007). Ravines et al. developed a methodology called conjugateupdating backward sampling (CUBS) for dealing with the sampling procedure of the latentstates in the class of DGLMs.

The CUBS technique is very similar to the Forward Filtering Backward Sampling(FFBS) method developed by Fruhwirth-Schnatter (1994) and Carter and Kohn (1994).This method replaces the Kalman filter by the Conjugate Updating procedure (see Section3.1 for our version of the conjugate updating step). In this article, we made extensive useof CUBS, which is briefly described next.

With the help of the CUBS algorithm, it is possible to describe a proposal distributionfor� and to sample the whole set of parameters in�. Considering the sequential nature ofthe dynamic models, the mentioned proposal distribution is based on the decomposition ofthe full conditional posterior of �, given by

p(�|DT ,ψ) = p(θT |DT ,ψ)T−1∏t=1

p(θt |θt+1,Dt , ψ).

In the case of DDM, the distributions on the right-hand side of the expression above areunknown. However, the moments associated with them can be estimated using the filteringmethod described in Section 3 and the smoothing or retrospective analysis (see Section14.3.4 of West and Harrison, 1997). One can show that the first- and second-order momentsof (θt |θt+1,Dt , ψ), given by (ms

t ,Cst ), are described as

mst = mt + CtG′

t+1R−1t+1(θt+1 − at+1); (18)

Cst = Ct − CtG′

t+1R−1t+1Gt+1Ct ,

where mt and Ct are, respectively, the on-line mean and variance approximated by the filter-ing methodology. The CUBS algorithm provides tools for sequentially drawing candidate-values θ∗

t , for t = T , . . . , 1. The block �∗ is accepted with probability defined by theMetropolis-Hastings algorithm. The CUBS algorithm is summarized below.CUBS algorithm

1. Calculate the moments m(i) and C(i) ofp(θt |Dt,ψ(i−1)) using the filtering procedure

described in Section 3;2. Sample �∗ via Backward Sampling:

(a) Sample the candidate θ∗T from the distribution N (m(i)

T ,C(i)T ).

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:14

13

Nov

embe

r 20

14

Page 14: Bayesian Dynamic Dirichlet Models

Bayesian Dynamic Dirichlet Models 799

(b) Sample θ∗t , t = T − 1, . . . , 1 from p(θt |θ∗

t+1, ψ(i−1)).

3. Set�(i) = �∗ with probability πt and�(i) = �(i−1) with probability 1 −πt , whereπt = min{1, A} and A is the acceptance rate of the Metropolis-Hastings algorithm:

A = min

[1,ω(θ∗)

ω(θ )

], ω(θ∗) = p(θ∗)

q(θ∗),

4. where q(.) is a proposal distribution obtained through a combination of the previoussteps.

4.3. Sampling the Covariance Matrix W

In the beginning of Section 3, we described the system equation of DDM. In our inferentialdevelopments with respect to the covariance matrix of the evolution errors, W, we assumethat W = block-diag(W1, . . . ,Wh), that is, W is a block-diagonal matrix, with Wi being api×pi matrix and h representing the dimension of vector θt . We use� = W−1 to denote theblock-diagonal precision matrix whose elements are denoted by �i = W−1

i , i = 1, . . . , h.In this work, we adopted independent Wishart priors for each �i . Thus, �i ∼ Wishart(νi , Si). For clarity, in the system equation (see the beginning of Section 3), θt and Gt arewritten as

θt =

⎡⎢⎣θ1,t

...θh,t

⎤⎥⎦ and Gt =

⎡⎢⎢⎣

G1,t . . . 0

0. . . 0

0 . . . Gh,t

⎤⎥⎥⎦.

The full conditional posterior of �i is given by

p(�i | . . .) ∝T∏t=1

p(θt |θt−1,�)p(�) =T∏t=1

N (θt ; Gt θt−1,�−1)W(�i ; νi,Si)

∝ |�i |T/2+νi−(pi+1)/2 exp

{−tr

((1

2SSi· + Si

)�i

)}, (19)

where SSi· =∑Tt=1 SSii,t , with SSii,t = (θi,t − Gi,t θi,t−1)(θi,t − Gi,t θi,t−1)′. Therefore,

(�i | . . .) ∼ Wishart(T/2 + νi , Si + SSi·/2).

4.4. Sampling from the Initial State θ0

From Eq. (14), the full conditional distribution of θ0 is such that

p(θ0| . . .) ∝ p(θ1|θ0,W)p(θ0)

∝ exp{−(θ1 − θ0)′W−1(θ1 − θ0)/2}exp{−(θ0 − m0)′C−10 (θ0 − m0)/2}.

Thus, we can show that (θ0| . . .) ∼ N (μ,), where μ = (G′1W−1θ1 + C−1

0 m0) and−1 = G′

1W−1G1 + C−10 .

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:14

13

Nov

embe

r 20

14

Page 15: Bayesian Dynamic Dirichlet Models

800 da-Silva and Rodrigues

4.5. Sampling the Precision Parameter φ

The full conditional distribution of φ is given by

p(φ| . . .) ∝T∏t=1

p(yt |θt , φ)p(φ). (20)

We adopted a non-conjugate gamma(α, β) prior. Since the density (20) does not have aclosed form, we used the Metropolis-Hastings algorithm in order to sample from p(φ| . . .).The proposal distribution was chosen to be log-normal with mean equal to the current valueof φ.

4.6. MCMC for the Static Case of DDM

The static case of DDM requires a particular treatment for the MCMC approach, since thematrices Wt are null and the vectors θt , for t = 0, . . . , T , are deterministically related. Inorder to avoid confusion with conventional DDM, let us denote θ as a vector of dimension q.We have that θt = Gt θt−1 = GtGt−1 . . .G1θ , for t = 0, . . . , T , and θ0 = θ . Thus, for anyindex t, θt is determined by vector θ . Therefore, instead of having to estimate q × (T + 1)parameters associated with the latent states, in the static case, we only need to estimate qparameters. In this setting, it is not unusual that Gt = I, for t = 1, . . . , T . Thus, for a givenprior distribution p(θ ), the full conditional posterior of θ is given by

p(θ | . . .) ∝[

T∏t=1

p(yt |θ, φ)

]p(θ ). (21)

The first term on the right-hand side of expression (21) is the likelihood described by theDirichlet model. Note that (21) does not have a closed form and the Metropolis-Hastingsalgorithm can be used for drawing samples from p(θ | . . .). In the present case, insteadof using the CUBS algorithm (see Section 4.1), for describing the proposal distributionfor the latent states, we use a normal distribution centered at the current value of θ :q(θ ) ∼ N (θ (i−1), ).

The rejection probability of the Metropolis-Hastings algorithm is given by

π (θ (i−1), θ∗) = min{1, p(θ∗)/p(θ (i−1))}.The choice of the matrix is important for the efficiency of the algorithm. An

interesting way of specifying is through the use of the on-line estimation and theso-called discount factors (see West and Harrison, 1997). In such a case, it is neces-sary to use the filtering procedure only once and then calibrate the acceptance proba-bility in the range of 25% to 50%. The full conditional distribution of φ is given byexpression (20).

5. Applications

In univariate DLMs, each model component is described as a building block in the repre-sentation of yt , that is, yt = y1t + . . .+ yht , where y1t might represent a trend component,y2,t might represent a seasonal component, and so on. The components in the model arethen combined in order to represent the dynamic model for yt .

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:14

13

Nov

embe

r 20

14

Page 16: Bayesian Dynamic Dirichlet Models

Bayesian Dynamic Dirichlet Models 801

Matrix specificationTo represent each of those components in the DDM context, one needs to define the

design matrices F and G and the covariance matrix W in a convenient way. The chosenmatrices determine the prediction functionE(yt+k|Dt ). A full and detailed discussion aboutthose issues can be found in Chapters 7 to 9 of West and Harrison (1997). Note that althoughsuch methods refer to Gaussian dynamic models, the procedures and techniques thereinalso apply to DDM.

For illustration purposes, consider two distinct models: the first one defined by the setof matrices M1 = {F1,G1,W1}, representing only locally linear growth (in the λ scale,that is, in the scale of the linear predictor). The second model, related to purely seasonalevolution, being specified by the matricesM2 = {F2,G2,W2}. We can define a new modelby combining the information in matrices M1 and M2:

θt =(θ1t

θ2t

), F =

(F1t

F2t

), G =

(G1 00 G2

), and W =

(W1 00 W2

).

This new model describes a time series in which fλ(k) = E(λt+k|Dk) can be expressed asa sum of a two-degree polynomial and a periodic function. Of course, this procedure isalso valid for more than two blocks. In other words, one can easily accommodate distinctfeatures in these models by simply adding or removing terms of the latent vector θt , andmaking the respective modifications on the matrices F and G. We can also describe matrixW to allow each of the system parameters to evolve or not, depending on its nature.

In order to clarify some of the ideas and the specification of DDM that includesseasonality effects, consider the Fourier Representation Theorem (see p. 49 of Pole et al.,1994). This theorem allows us to express any cyclical function of period p, defined by a setof p effects ψ1, . . . , ψp, as a linear combination of sine and cosine terms.

Consider a time series of compositional data that includes (say) k = 3 categories.Suppose that the latent states incorporate the effects of level, growth, and seasonality withseasonal cycles of size p = 4 (quarters). For DDM, we can incorporate a seasonal effectfor each of the two first categories (say, A and B) in the model. The matrices F and G arethen described as

F =(

E2 E2 1 0 0 00 0 0 E2 E2 1

)and G =

(G∗ 00 G∗

), with G∗ =

(J2(1) 0

0 Geven),

where

J2(1) =(

1 10 1

), J2(1, ω) =

(cos(ω) sin(ω)−sin(ω) cos(ω)

), E2 =

(10

),

Geven =

⎛⎜⎜⎜⎜⎜⎝

J2(1, ω) 0 . . . 0 00 J2(1, 2ω) . . . 0 0...

......

...0 0 . . . J2(1, (p/2 − 1)ω) 00 0 . . . 0 −1

⎞⎟⎟⎟⎟⎟⎠ ,

withω = 2π/p. Therefore, for each of the categories being modeled, we include seasonalityparameters. The matrices F and G presented above imply that λt = (λ1t , λ2t ), with λ1t =θ1t + θ3t + θ5t and λ2t = θ6t + θ8t + θ10t . The parameters θ1t and θ6t represent, respectively,the level in categories A and B. θ3t and θ5t represent the harmonics related to category A.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:14

13

Nov

embe

r 20

14

Page 17: Bayesian Dynamic Dirichlet Models

802 da-Silva and Rodrigues

Similarly, θ8t and θ10t represent the harmonics related to category B. The extension to thecases where the seasonal cycle differs from p = 4 is straightforward and depends on thecorrect specification of F and G.

The examples shown in the following sections will help clarify further how the com-ponents in the model can be combined in practice.

5.1. DDM Sub-models

In order to show the flexibility of DDM, in this section, we deal with two sub-models:DBM and static Dirichlet regression. In the case of DBM, we used simulated data froma univariate time series of rates or proportions. The knowledge of the true values of theparameters that were used to generate the data helps us to evaluate the quality of theinferences. In the case of static Dirichlet regression, we illustrate the usefulness of DDMas an alternative to the classic Dirichlet regression model implemented in R by packageDirichletReg (see http://cran.r-project.org).

5.1.1 DBM as a submodel of DDM. In order to illustrate how DDM works in a univariatesetting, we built a scenario in which the simulated observations are affected by a locally lin-ear tendency and a cyclic behavior of size p = 4 (e.g., quarters). In addition, the simulateddata series is affected by a covariate, and we allowed the data to present missing observa-tions. For describing the cyclic behavior, we used just one harmonic. DDM is specified bythe following quantities:⎧⎪⎪⎨

⎪⎪⎩Ft =

⎛⎜⎜⎝

E2

E2

1xt

⎞⎟⎟⎠, G =

⎛⎝ J2(1) 0 0

0 Geven 00 0 1

⎞⎠, W =

(10−2 0

0 10−4I5×5

),

Geven =⎛⎝ 0 1 0

−1 0 00 0 −1

⎞⎠, φ = 300

⎫⎬⎭ .

We randomly removed 20% of the 50 observations of the generated time series in orderto show the inferential capabilities of the MCMC methods described in Section 3 regardingmissing values in series. The known effects are then compared to the estimated values.

We generated 300,000 MCMC samples of the unknown parameters from their fullconditional distributions. The submatrices in W were sampled separately in order to ensurethat the components in the model (polynomial, seasonal, and regressive) were treatedindependently from each other. Following the notation in Section 4, the parameters we usedfor the prior�i were (νi,Si) = (k/2, 10−3 I), where k is the number of lines (and columns)in �i . For the parameter φ, we used a gamma prior with parameters α = β = 0.001.

The convergence of the MCMC procedure, not shown here for brevity, wasverified by Gelman and Rubin (1992) and Heidelberger and Welch (1983) con-vergence diagnosis techniques available in the software CODA (http://www.mrc-bsu.cam.ac.uk/bugs/classic/coda04). The chain reached convergencearound the 150,000th iteration. Thus, we discarded these first observations as the burn-inperiod, and in order to reduce autocorrelation, we stored one observation for every 100 newiterations of the chain.

The four distinct graphs in Fig. 1 show different aspects of the elements involved inthe analyses. The first graph shows the time series yt ; note the missing values. The second

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:14

13

Nov

embe

r 20

14

Page 18: Bayesian Dynamic Dirichlet Models

Bayesian Dynamic Dirichlet Models 803

(a)

0 10 20 30 40 50

0.0

0.5

1.0

Tim

e se

ries

(b)

0 10 20 30 40 50

0.0

0.5

1.0

Mea

n

(c)

0 10 20 30 40 50

−1

13

Leve

l

(d)

0 10 20 30 40 50

−1

01

Sea

sona

lity

(e)

0 10 20 30 40 50

−1

01

Reg

ress

ion

Figure 1. (a) Time series yt ; (b) Meanμt ; (c) Level θ1t ; (d) Seasonality effects θ3t+θ5t ; (e) Regressioneffects θ6t . The red line gives the real values; the black line, the estimated values; and the shadedregions, the 95% credibility intervals.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:14

13

Nov

embe

r 20

14

Page 19: Bayesian Dynamic Dirichlet Models

804 da-Silva and Rodrigues

graph overlays the real means μt (red line), the respective estimated values (back line), andthe 95% credibility intervals. The last three graphs are related to, respectively, the effectsof level, seasonality, and regression, and overlay the real values (red line), the respectiveestimated values (black line), and the 95% credibility intervals.

The global information we can extract from Fig. 1 is that the inferential procedure wasvery successful at capturing the information about the different components of yt . The factthat there are missing observations caused no degeneracy in the quality of the estimatedvalues of the parameters in the model. With respect to the last panel of Fig. 1 (regression),one can say that θ6t < 0, for t = 1, . . . , T ; that is, for a given time t, an increase of m unitsin the value of Xt implies a reduction of magnitude mθ6t in the value of λt . Thus, fromthe link function, it implies a reduction in the mean μt . In summary, the parameter θ6t

determines the extent to which the variation in the covariate Xt has an impact on the meanμt . Finally, the posterior mode and median of φ were, respectively, 260.99 and 339.30(the true value is φ = 300), showing that DDM is good at recovering the true values in themodel.

5.1.2 Dirichlet regression model. In this section, we illustrate the usefulness of DDM forspecifying a regression model, as an alternative to the classic Dirichlet regression model im-plemented in R by package DirichletReg (see http://cran.r-project.org).Considering the DDM specified in Section 3, we can describe a Dirichlet Regression modelby setting Gt = I and W = 0. In such a case, the index t no longer stands for time but forindividual observations.

In order to examine the practical connection between DDM and the Dirichlet regressionmodel, we applied DDM to the so-calledArctic data (Arctic lake sediments at differentdepths). This data set was studied by Aitchison (2003), and can be found in the libraryDirichletReg.

In sedimentology, specimens of sediments are traditionally separated into three mutu-ally exclusive and exhaustive constituents: sand, silt, and clay, and the proportions of theseparts by weight are quoted as (sand, silt, clay) compositions. The Arctic data include a3-dimensional response variable, yt ; a quantitative description of the relative frequenciesof sand, silt, and clay; and water depth measured in meters.

To enforce comparability, we set categories silt and clay as free, and the percentagesassociated with them are modeled using a quadratic model. Therefore, the latent vectorθ has six coordinates and it is related to the linear predictor λt = (λsilt,t , λclay,t )′ by thefollowing equations:

λsilt,t = θ1 + deptht θ2 + depth2t θ3,

λclay,t = θ4 + deptht θ5 + depth2t θ6,

where deptht is the water depth for individual t. Note that the restrictions imposed by thephenomenon are respected in the estimation of μt : the mean value of a given categorycan increase if, and only if, the sum of the mean values of the other categories decreasesaccordingly. This property does not hold when one models each of the rates separately asin the case of a univariate analysis.

The structure described can naturally be expressed by the following components ofDDM: {

Ft =(

1 deptht depth2t 0 0 0

0 0 0 1 deptht depth2t

), G = I, W = 0

}.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:14

13

Nov

embe

r 20

14

Page 20: Bayesian Dynamic Dirichlet Models

Bayesian Dynamic Dirichlet Models 805

Table 1Real and estimated values for the Beta regression model using both the on-line and off-line

approaches.

θ1 θ2 θ3 θ4 θ5 θ6 φ

DDM: off-line estimate −1.688 0.092 −0.001 −4.067 0.151 −0.001 17.826DirichletReg estimate −1.747 0.095 −0.001 −4.156 0.155 −0.001 19.038DDM: on-line estimate −1.815 0.095 −0.001 −3.973 0.148 −0.001 50.000Std Error: DDM off-line 0.328 0.016 0.000 0.441 0.019 0.000 2.881Std Error: DirichletReg 0.308 0.015 0.000 0.428 0.019 0.000 3.021Std Error: DDM on-line 0.256 0.013 0.000 0.335 0.015 0.000 NA

In order to estimate the parameters in the model, we first used the on-line approachwith an arbitrary φ = 50, treated as a known quantity. We than fitted the model based on theMCMC procedure for the static case of DDM (see Section 4.6). In the latter case, a vaguegamma prior with hyperparameters α = β = 0.001 was used for the precision parameterφ. From the generated chain of size 20,000, we discarded the first 2,000 observations asburn-in and then saved one out of every ten observations to reduce autocorrelation. Inaddition, we fitted the classic regression model with the same structure for the covariates.

It is important to observe that the choice of φ in the on-line procedure defines thecovariance matrix of the proposal distribution for θ in the off-line estimation. Thus, if weset a value for φ that is too far away from the true one, it is possible that the acceptancerate would either be too low or too high and the simulated data be too correlated. Insteadof just fixing an arbitrary value, one can always explore the parameter space of φ to findthe point that minimizes, for instance, the sum of forecasting errors.

From Table 1, we observe that the classic Dirichlet regression model and the off-line DDM produced very similar estimates. Even though the point estimates obtainedwith the on-line DDM were reasonable, the uncertainty associated with parameter θ wasunderestimated. This is expected since we assumed the parameter φ as known (φ = 50),while the true value must probably be between 12 and 25. Unfortunately, the classicDirichlet model, available in the library DirichletReg, is only implemented with a log-link for φ. Again, to allow comparability, we used an approximation based on a first-orderTaylor expansion of the function log(φ), for calculating E(φ) and V (φ), as functions ofE(log(φ)) and V (log(φ)).

We would like to stress that we are not comparing the estimates of DDM with the truetheoretical values. Instead, the comparison is between the estimates obtained using twocompletely different methodologies.

Since the Beta regression model (Ferrari and Cribari-Neto, 2004) can be seen as aparticular case of the classic Dirichlet regression model, DDM can also be used in thecontext of a univariate model as an alternative to the Beta regression model. Routines forthe beta regression model are available in the R package betareg.

5.2. DDM: An Application Using Simulated Data

In this section, we fit simulated data from DDM considering k = 3 categories. This exerciseallows us not only to evaluate the overall quality of the fit based on the proposed MCMC

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:14

13

Nov

embe

r 20

14

Page 21: Bayesian Dynamic Dirichlet Models

806 da-Silva and Rodrigues

methods but also to illustrate certain practical aspects involved in the description of thedesign matrices in the model.

For the simulated data, we used a DDM that incorporates the following features: thefirst category is represented by a linear growth model, that is, a local level model, butit includes a time-varying slope in the dynamics for the level parameter. In addition, thespecification includes a seasonal effect with p = 4 periods. The second category is affectedby a linear growth model and two covariates. This model is represented by the followingmatrices:

{Ft =

(1 0 0 0 1 0 1 0 0

0 1 0 0 0 0 0 x1t x2t

), G =

⎛⎜⎝

Gtrend 0 0

0 Gseason 0

0 0 Greg

⎞⎟⎠,

and W =(

W1 0

0 0

)},

where

Gtrend =

⎛⎜⎜⎜⎜⎝

1 0 1 0

0 1 0 1

0 0 1 0

0 0 0 1

⎞⎟⎟⎟⎟⎠, Gseason =

⎛⎜⎝

0 1 0

−1 0 0

0 0 −1

⎞⎟⎠, Greg = I2, and

W1 = 10−4diag(10, 1, 10, 1).

Each of the components in the system equation (see Section 3) are then described as

θ1t = θ1,t−1 + θ3,t−1 + ω1t ,

θ2t = θ2,t−1 + θ4,t−1 + ω2t ,

θ3t = θ3,t−1 + ω3t ,

θ4t = θ4,t−1 + ω4t ,

θ5t = θ6,t−1,

θ6t = −θ5,t−1,

θ7t = −θ7,t−1,

θ8t = θ8,t−1,

θ9t = θ9,t−1.

Parameters θ1t and θ2t represent the mean level for categories 1 and 2, respectively.Parameters θ3t and θ4t are related to the time-varying slopes in the dynamics for θ1t and θ2t ,respectively. θ5t to θ7t represent joint seasonal effects in the linear predictor λ1t , related tothe first category. It is easy to verify that θ5t = θ5,t+4 and θ6t = θ6,t+4, ∀ t ∈ N. Parameterθ7t changes signal over time to ensure identifiability. The regression parameters θ8t andθ9t describe, respectively, the effect of variations in the covariates x1 and x2 in the linearpredictor λ2t . Depending on the scenario, the same covariates can be used to explain morethan one category.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:14

13

Nov

embe

r 20

14

Page 22: Bayesian Dynamic Dirichlet Models

Bayesian Dynamic Dirichlet Models 807

Note that the seasonality pattern θ5t to θ7t and the covariate effects θ8t and θ9t do notvary over time. This does not mean that the parameters θ5t to θ7t are static. This means thatthe seasonal effects are the same for each group of four consecutive observations. Needlessto say, we could have added dynamics to the parameters θ5t to θ9t , and we would then onlyhave to change the specification of matrix W. However, instead of having to estimate fiveparameters, it would be necessary to estimate 5 × T parameters. In practical applications,the analyst can decide the best option with the help of model comparison measures.

Considering the additive logit link, the link function is given by (see beginning ofSection 3):

λt =(λ1t

λ2t

)=(

θ1t + θ5t + θ7t

θ2t + x1t θ8t + x2t θ9t

)and μt =

(μ1t

μ2t

)= 1

1 + eλ1t + eλ3t

(eλ1t

eλ2t

).

(23)

In the multivariate context, one can describe the matrices F, G, and W in multipleways. As stated earlier, it is possible to use a different design matrix for each category ofDDM, and according to these specifications, one can define each one of the submodelsdiscussed in the previous section.

Considering the F, G, and W matrices of our example, we simulated time series of sizeT = 50 and then randomly removed 10% of the observations. Using the off-line approach,we generated 500,000 MCMC samples. The first 75,000 were discarded and a systematicsample was taken using gaps of size 100.

The filtering step (via CUBS) used for updating the proposal distribution is responsiblefor nearly all of the computing time. Thus, in order to speed-up the MCMC simulationprocess, we only used it once every 200 iterations. This strategy does not result in seriousloss of quality in the proposal distribution since the parameters of such a distribution evolvevery slowly over the chain iterations.

Figure 2 displays a set of plots that are useful for analyzing the goodness of fit. Figure 2(a) presents the temporal evolution of the first two categories of the simulated series yt .Figure 2(b) displays the true values of μ1t and μ2t (solid curves) that were described inexpression (23), and the 95% highest posterior density (HPD) confidence intervals forthose parameters. Figure 2(c) shows the true values of the level parameters θ1t and θ2t

(solid curves), and the corresponding 95% HPD confidence intervals. Figure 2(d) displaysthe seasonality effects θ5t + θ7t (solid curves) and the corresponding 95% HPD confidenceintervals. Figure 2(e) displays the regression effects θ8t and θ9t used in the simulation. Aswe can observe from Fig. 2(a) to (e), the proposed inferential procedure provided goodestimates for the parameters in the model.

The estimated posterior mode, median, and HPD confidence intervals for φ were,respectively, 673.1, 678.6, and (384.0; 1041.5): the true value is 500. The posterior modewas estimated by the maximization of the smoothed posterior empirical density of φ basedon the MCMC samples. We used R function density to that purpose.

In summary, in this work, we propose an extremely versatile model not only in theconceptual formulation but also in terms computational tractability and implementation,making it of practical interest.

In order to maintain the focus of the discussion, we opted for not providing majordetails about the MCMC convergence issues. However, to the best of our knowledge, allthe discussed examples provided no cause for concern in that respect. However, just toshow some graphs, we present Fig. 3. In Fig. 3 (a) to (c), we present the MCMC tracesrelated to the parameters μ1t at times 1, 25 and 50. Analogously, in Fig. 3 (d) to (f ), we

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:14

13

Nov

embe

r 20

14

Page 23: Bayesian Dynamic Dirichlet Models

808 da-Silva and Rodrigues

(a)

0 10 20 30 40 50

0.0

0.5

1.0

Tim

e se

ries

Category 1

Category 2

(b)

0 10 20 30 40 50

0.0

0.4

0.8

Mea

n

Category 1

Category 2

(c)

0 10 20 30 40 50

−2.

50.

02.

5

Leve

l

Category 1

Category 2

(d)

0 10 20 30 40 50

−0.

40.

00.

4

Sea

sona

lity

(e)

0 10 20 30 40 50

−0.

20.

00.

2

Reg

ress

ion Regressor 1

Regressor 2

Figure 2. (a) Time series of categories 1 and 2; (b) Means μ1t and μ2t ; (c) Levels θ1t and θ2t ; (d)Seasonality effects θ5t + θ7t ; (e) Regression effects θ8t and θ9t . The red line gives the real values; theblack line, estimated values; and the shaded regions, the 95% credibility intervals.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:14

13

Nov

embe

r 20

14

Page 24: Bayesian Dynamic Dirichlet Models

Bayesian Dynamic Dirichlet Models 809

)b()a(

0 2000 40000.45

00.

580

0 2000 4000

0.49

0.63

)d()c(

0 2000 40000.03

00.

100

0 2000 4000

0.16

0.26

)f()e(

0 2000 4000

0.18

0.30

0 2000 4000

0.34

0.46

(g)

0 2000 4000

050

015

00

Figure 3. (a) to (c) MCMC chains for μ1,1, μ1,25, and μ1,50, respectively; (d) to (f) μ2,1, μ2,25, andμ2,50, respectively; (g) MCMC chain for φ. The red line denotes the real values.

present the corresponding traces to μ2t . The traces for other times display similar behavior.In Fig. 3 (g), we present the MCMC trace for φ.

5.3. An Application to Mortality Data in Brazil

Violence in Brazil is one of the leading public health problems. According to Reichenheimet al. (2011), even though there are signs of a decline, homicides and traffic-related injuriesand deaths in Brazil account for almost two thirds of all deaths from external causes.

The authors point out the chosen model for the transport system, which is primarilycomposed of roads and private cars but with no adequate infrastructure and legislation, as animportant cause of traffic-related injuries. Further, according to the authors, such accidentshave a high personal and social cost.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:14

13

Nov

embe

r 20

14

Page 25: Bayesian Dynamic Dirichlet Models

810 da-Silva and Rodrigues

At the individual level, there is not only high mortality but also major physical andpsychological sequelae in injured survivors, such as spinal-cord injuries related to trafficaccidents, especially in young victims.

According to Nachif (2006), homicides reflect a high level of social tension, and wereresponsible for the increase in the contribution of violence to the overall mortality rate from2% in 1930, to 10.5% in 1980 and 15.3% in 1990. Reichenheim et al. (2011) present someanalyses that show that in Brazil, men are ten times more likely to die from homicide thanwomen.

The main factors implicated in the increase of homicides in Brazil include high con-sumption of alcohol and illicit drugs (the latter due to the intensification of the trade in illicitdrugs), firearms-related deaths due to organized crime, and domestic violence, in particular,violence against women. Reichenheim et al. (2011) conclude that the high homicide ratein Brazil has major emotional and social costs, often resulting in family breakdowns andother psychological consequences for the people involved in such tragedies.

Due to the high relevance of the violence issue in Brazil, in this section, wefit DDM to a data set related to this topic. The data can be found at DATASUShttp://www2.datasus.gov.br.

Chapter X-Ref of a document entitled CID-10 (International Statistical Classificationof Diseases and Health Related Problems) is organized in nine sections, and it is possibleto establish the following mortality categories:

A: Traffic-related accidents;B: Aggressions/lesions intentionally auto-inflicted (including homicides);C: Other external causes of mortality.

In terms of public policing, reducing A and B should be a priority for any government.The historical series available consists of the monthly counts in each category (A, B,

and C), from 1998 to 2010. The monthly counts were then transformed into a time seriesof quarterly compositional data.

We used a second-order polynomial trend effect three-category DDM described by thefollowing matrices, with φ and Wi , i = 1, 2, 3, 4 to be estimated:

{F =

(1 0 0 0

0 0 1 0

), G =

(J2(1) 0

0 J2(1)

), W = diag(W1, . . . ,W4),

J2(1) =(

1 1

0 1

), φ

}.

For t = 1, . . . , N , the evolution equation for the model is alternatively expressed as

θ1t = θ1,t−1 + θ2,t−1 + ω1t ,

θ2t = θ2,t−1 + ω2t ,

θ3t = θ3,t−1 + θ4,t−1 + ω3t ,

θ4t = θ4,t−1 + ω4t .

Thus, the growth parameters θ2t and θ4t follow a random walk, while the level parame-ters θ1t and θ3t describe a locally linear evolution. From the linear prediction λt = F ′

t θt , weobtain λt = (λ1t , λ2t )′ = (θ1t , θ3t )′, and from the inverse of the additive logit link function,

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:14

13

Nov

embe

r 20

14

Page 26: Bayesian Dynamic Dirichlet Models

Bayesian Dynamic Dirichlet Models 811

Mo

rta

lity

pro

po

rtio

ns

1998 2000 2002 2004 2006 2008 2010

0.0

0.2

0.4

0.6

0.8

Category A

Category B

Category C

Figure 4. Observed mortality proportions in each category: A, Traffic-related accidents; B, Ag-gressions/lesions intentionally auto-inflicted (including homicides); and C, Other external causesof mortality. The corresponding estimated proportions are described by the black line. The shadedregions represent the 95% credibility intervals.

we have

μ1t = eθ1t

1 + eθ1t + eθ3tand μ2t = eθ3t

1 + eθ1t + eθ3t.

Thus, the parameters θ1t and θ3t pertain to the definition of the mean of the observationequation.

For the mortality data, category C was modeled indirectly: pt (C) = 1−pt (A)−pt (B),where pt (A) and pt (B) are the proportions in categories A and B at time t. Note that froma practical point of view, category C is not as important as the others.

In this section, we used the MCMC-based approach, described in Section 4. Accord-ing to the notation of Section 4, we fixed (νi,Si) = (1/2, 10−3), for i = 1 and i = 3, and(νi,Si) = (1/2, 10−4), For i = 2 and 4, as prior values in order to describe the prior distri-bution of �i . The prior distribution of parameter φ was described as a gamma distributionwith parameters α = β = 0.001. The filtering process (see the end of Section 4.2, step (1)of the CUBS algorithm), necessary for drawing samples from the full posterior distributionof the latent states, was executed only once every 100 MCMC full cycles.

Figure 4 displays the observed mortality proportions in each category, along withthe estimated proportions (black line). The dashed line represents the 95% credibilityintervals. From Fig. 4, we observe that from 1998 to 2009, there was a noticeable declinein the mortality rate concerning traffic-related accidents (gray line). However, from 2009,this rate increased. The mortality rate from aggressions/lesions intentionally auto-inflicted(blue line) has remained largely stable over time.

With respect to the estimation of the precision parameter φ, the mode and medianwere, respectively, 2057.22 and 2043.87. The HPD confidence interval for φ was (1337.86,2819.19). These large values for the estimated φ are not surprising since the series yt doesnot present too much noise: note from Eq. (6) that the variance yt is inversely proportionalto the values of φ. From the practical point of view, the data correspond to a situation

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:14

13

Nov

embe

r 20

14

Page 27: Bayesian Dynamic Dirichlet Models

812 da-Silva and Rodrigues

Mort

alit

y p

roport

ions

2006 2007 2008 2009 2010 2011 2012 2013

0.0

0.2

0.4

0.6

0.8

Category A

Category B

Category C

Figure 5. Observed mortality proportions in each category: A, Traffic-related accidents; B, Ag-gressions/lesions intentionally auto-inflicted (including homicides); and C, Other external causes ofmortality. The eight-step-ahead forecasts for 2011 and 2012 are given by the black line while theshaded regions represent the 90% credibility intervals.

where the sampling error is very small. In other words, yt is a very precise measure of thenonobservable mean μt .

Figure 5 shows the observed proportions from 2006, along with the eight-step-aheadpredictions (black line) and the 90% credibility intervals (shaded regions) for the quartersin years 2011 and 2012. We observe that the predicted proportions of mortality for allaccident categories seem to be stable. DDM described here can be useful for monitoringimportant issues of public health and to establish very well-designed governmental actionsaiming to provide better security and quality of life.

5.4. Comparison of DDM with Other Models

5.4.1 Quintana and West (1988) model. As mentioned in Section 1, Quintana and West(1988) modeled the observational distribution of compositional series of proportions us-ing the multivariate logistic-normal distribution. The authors applied a logistic/log ratiotransformation (slr) to map the original vector of proportions into a vector of real-valuedquantities: ytj = log(ptj /pt ) = logptj − log pt , j = 1, . . . , q. The slr inverse transfor-mation (slr−1) is given by pti exp(yit )/

∑q

j=1 exp(ytj ), where pt is the geometric mean ofptj .

The transformation to the ytj scale was performed in order to satisfy the assumptions ofthe Matrix Normal DLMs (see chapter 6, West and Harrison, 1997). However, the resultingvariable ytj poses a second problem: a zero-sum restriction causing the singularity of thecovariance matrix associated with the latent states. The authors solved the singularities withthe following model (see notation details in Quintana and West, 1988):

yt′K = xt

′�t + (Ket )′, Ket ∼ N (0, νt�), (22)

�t = Gt�t−1 + FtK, FtK ∼ N (0,Wt ,�), (23)

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:14

13

Nov

embe

r 20

14

Page 28: Bayesian Dynamic Dirichlet Models

Bayesian Dynamic Dirichlet Models 813

�t−1 ∼ N (Mt−1K,Ct−1, �), � ∼ W−1(K′St−1K, dt−1), (24)

where � = �tK and � = K ′K , K = I − 11′. However, matrix K is not invertible, andthen it is not possible to express the estimated series and predictions, that were obtainedwith the series ht = yt

′K , back to the scale of the probabilities. Therefore, in this sense,we believe that the composite model by Quintana and West (1988) is not comparable withDDM.

5.4.2 Grunwald, Raftery and Guttorp (1993) model. We now compare DDM with themodel in Grunwald et al. (1993). We analyze the World Motor Vehicle data used by theseauthors. We tried to fit a DDM model that incorporates the same effects of the trend modelfitted by them (see Table 2 of Grunwald et al., 1993). Using the on-line approach, we fitteda simple constant trend DDM defined by the following matrices:{

F =(

1 0 0 0

0 0 1 0

), G =

(J2(1) 0

0 J2(1)

), Wt = diag(W1t , 0,W3t , 0),

J2(1) =(

1 1

0 1

), φ

}.

The form of the variance (Wt ) of the system error is such that the slope parameters arestatic. This was imposed to imply a fairly similar structure between the two models.

As we are adopting the on-line estimation technique, we use the block-discount pro-cedure (West and Harrison, 1997) to specify the matrices Wt . This is done by setting

W1t = 1 − δ

δC[1,1],t−1 and W3t = 1 − δ

δC[3,3],t−1,

for t = 1, . . . , T . That is, the uncertainty about each level parameter due to the time passagefrom t − 1 to t is amplified by a factor of 1/δ.

In order to estimate the precision parameter φ, we set φ as the value that maximizesthe log of the likelihood function L expressed by

L = p(Y|φ) =T∏t=1

p(Yt |Dt−1, φ) =T∏t=1

∫p(Yt |θt , φ)p(θt |Dt−1)dθt .

The expression above is computed in the filtering process. It was found that log(L) takesits highest value when δ = 0.7 and φ = 540.

***We observed that the filtered estimates of the states obtained with each model werevery close. However, with DDM, it is possible to assess the uncertainty of the estimatessince we can provide the corresponding credibility intervals.

With respect to the predicted series obtained for each model, the estimates were againvery close. However, with DDM, we are able to provide credibility intervals for representingthe uncertainty, while with the Grunwald et al., (1993) model, it is only possible to presentthe estimated prediction more or less one standard deviation, that is, p ± s.d.(p), withs.d.(p) calculated using Tierney and Kadane (1986) approximations (see p. 112 in line 2 ofGrunwald et al., 1993).

Regarding the incorporation of covariates x in the Grunwald et al., (1993) model,according to their Section 2.4, it is done through the following steps:

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:14

13

Nov

embe

r 20

14

Page 29: Bayesian Dynamic Dirichlet Models

814 da-Silva and Rodrigues

[a] θ = slr−1(slr(θ )), with slr being the symmetric log-ratio transformation;[b] Using their new model requirements, described in their expression (2.19),

θ∗t = slr−1(slr(θt ) + Bxt+1),

where B is a matrix of regression parameters with B ∈ {z ∈ Bd+1 : z′u = 0}. The resulting

θ∗t values have to be incorporated into expression (25) obtained with the help of the Tierney

and Kadane (1986) approximation:

�(γ, τt , B) =T∑t=2

log(P (Zt−1|Dt−1, τt , γ, B)) (25)

(see p. 111 of Grunwald et al., (1993)), with

P (Zt−1|Dt−1, τt , γ, B) =∫P (Zt−1|Dt−1, θt , τt , γ, B)P (θt |Dt−1, τtγ, B)dθt .

We make two remarks on this procedure. (1) The method of Grunwald, Raftery, andGuttorp (1993) for incorporation of covariates in the model only includes the possibilitydescribed in [a], with B being a static parameter. (2) The amount of extra calculation neededin order to correctly describe and calculate expression (25) is really appreciable. Besides,the likelihood-based estimation of so many parameters is not an easy or trivial task.

In our model formulation, we show very clear ways of incorporating both effects andcovariates in the model, with their coefficients being either static or dynamic. We believethat this flexibility and well-defined ways of describing a whole family of submodels andeffects are the major advantages of DDM when compared to the Grunwald et al., (1993)model.

6. Discussion

We developed Bayesian DDM for modeling and forecasting multivariate time series ofcompositional data, which are characterized by random vectors yt defined on the openstandard (k− 1)-simplex. Each coordinate of yt represents the share, in percentage, of eachone of the k categories that describe a given phenomenon.

DDM is a very flexible model, representing an extension of DBM (da-Silva et al,2011). DDM has the following submodels: (1) the static Bayesian Dirichlet regressionmodel, which is useful for analyzing random samples of compositional data; (2) DBM thatdeals with univariate time series of rates or proportions; and (3) the static Beta regressionmodel as an alternative to the one developed by Ferrari and Cribari-Neto (2004), whichis useful for modeling random samples of rates or proportions. In this work, we appliedDDM to a time series of real compositional data. In addition, we analyzed simulated datafor most of the mentioned sub-models.

Another formulation for dealing with time series of compositional data was proposedby Grunwald et al., (1993). In order to take advantage of some properties of the exponentialfamily of distributions, the authors developed a model where the data are expressed interms of the so called symmetric log-ratio transformation, but unlike the Quintana andWest (1988) model, it is possible to transform the predicted series back to the probabilityscale.

For dealing with the non-Gaussian nature of the problem, the power-steady model(Smith, 1979) was used Grunwald et al., (1993) in order to describe the system equation.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:14

13

Nov

embe

r 20

14

Page 30: Bayesian Dynamic Dirichlet Models

Bayesian Dynamic Dirichlet Models 815

However, that approach has the disadvantage of making the model too specific. Besides, inGrunwald et al., (1993), the specification of some important effects on the latent process,like growth and seasonality, is done via the inclusion of a series of dummy variables. Theproposed procedure for that is far too clumsy and it does not allow too much freedom andflexibility in model specification as opposed to what we are able to do, via the matrices Fand G, when using DDM. The development of a series of important computations neededin the estimation of the uncertainty of the predicted values was not shown by the authors.In Grunwald et al., (1993), the authors do not present any methodology for computingthe uncertainty of their filtered estimates for predicting future composite data. All of theseaspects were developed in DDM.

We designed both on-line and off-line approaches for the estimation of the parametersin the model.

As illustrated throughout this article, there are substantial differences between the on-line and off-line estimation techniques. The first one is especially recommended when themain purpose of the model is to obtain on-line inferences, that is, up-to-date estimates ofthe current state of the system. Besides, based on the filtering distributions, it is possible toobtain the predictive densitiesp(yt+k|Dt−1), for k = 0, 1, . . .. Considering the setting of thepresented applications, all computation regarding prediction, filtering, and smoothing areperformed in a matter of seconds. In addition, when a new observation arrives, updating theparameters of the distribution of (θt |Dt ) is immediate. In practice, the on-line method offersa simple and computationally cheap mechanism, which preserves, however, all flexibilityprovided by DDM.

The off-line approach, in contrast, is considerably less efficient from the computationalpoint of view. Each time a new observation arrives, a totally new MCMC run has to beperformed. This prevents us from obtaining the desirable on-line inferences. We need tobear in mind that the on-line filtering process is repeated thousands of times during theMCMC procedure. This is necessary because the output of the filtering process is used inthe Metropolis-Hastings inside CUBS, to build up the proposal distribution for the vector�. In addition, the on-line technique is also useful for defining the initial state of the chain.

The CUBS algorithm is not the only possible way to construct the proposal distribution.As mentioned before, there are many other competitors in the statistical literature, and webelieve some of them may outperform CUBS. Regardless of the chosen algorithm, it isknown that MCMC converges to the target distribution and that all results of DDM willalso be valid. Moreover, DDM may be estimated using the particle filter, in place of theon-line and off-line methods. All these possibilities can be explored in future research byinterested researches.

If the on-line approach is so much faster, why would one use the off-line one? First, ifwe use discount factors to specify W and the ad hoc estimation for φ, as illustrated in thelast few paragraphs of Section 3, the uncertainty about these quantities is disregarded, asthey are then treated as known, although they are not. A simple analogy for this is the caseof the interval estimation of the population mean μ from a sample drawn from a univariateGaussian distribution. The on-line procedure would be equivalent to a plug-in estimate inwhich the population variance σ would be replaced by the sample variance: σ = s = σ . Inthis case, instead of using the student’s t distribution quantile, one would use the standardGaussian one. Of course, if the sample size is big, the estimate is sufficiently precise, andcan be verified by the convergence of student’s t to the standard Gaussian distribution.

In the simulation method, in contrast, all unknown quantities receive adequate treat-ment, and we obtain, as soon as the convergence has been properly reached, a (noninde-pendent) sample of the posterior distribution of (�,ψ) with ψ = (W, φ, θ0). Under the

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:14

13

Nov

embe

r 20

14

Page 31: Bayesian Dynamic Dirichlet Models

816 da-Silva and Rodrigues

methodological perspective, this is a good advantage of this approach. Another benefit isthat while in the on-line version, we are only able to estimate the first two moments of theparameters in the model, with the off-line procedure, we can estimate the distribution ofeach parameter, which allows us to compute HPD intervals and study the shape of eachdistribution.

With regard to the computing time, the slowest applications took a little bit more thana day to be concluded, mainly due to the low acceptance rate of �, sometimes below 1%.The computing time is deeply reduced when W is a null matrix. In this situation, the systemdoes not evolve and we only need to execute the filtering process once. In contrast, thecomputational time grows exponentially as a function of the number of categories, beingprohibitive for problems involving more than seven or eight categories.

In summary, both on-line and off-line methods have their own advantages and disad-vantages. Based on the discussion above, it is up to the modeler to decide which is the bestone for each application.

As for future developments, we envision making the DDM even more flexible byallowing dynamics to the precision parameter φ. In addition, we need to improve thequality of our computational tools. Our codes were all written using the R package. Thisbrings the advantage of making the codes more useful to the scientific community ingeneral, but the computing time to run the models can become too long. We are working onsome strategies to boost all our codes, like developing routines that are able to establish aninterface with the C language, thereby reducing the computing time. We are also workingon other strategies for describing proposal distributions that may be faster than the CUBSalgorithm, as proposed by Gamerman (1998).

The data and codes are available upon request.

Acknowledgments

The authors would like to thank the anonymous referees for their careful reading of themanuscript and for their constructive suggestions that considerably improved the article.

Funding

The authors also thank the Office to Improve University Research (CAPES-Brazil), forfinancial support via Project PROCAD-NF 2008. Cibele Q. da-Silva received support fromCNPq-Brazil, and BPPesq. Helio S. Migon received support from the National ResearchCouncil (CNPq-Brazil, BPPesq) and the Rio de Janeiro State Research Foundation (Faperj)together with CAPES via the Pronex project.

References

Aitchison, J. (1982). The statistical analysis of compositional data. Journal of the Royal StatisticalSociety: Series B 44:139–177.

Aitchison, J. (2003). The Statistical Analysis of Compositional Data. 2nd ed. London: Chapman andHall.

Aitchison, J., Shen, S. M. (1980). Logistic-normal distributions: Some properties and uses. Biometrika67:261–272.

Billheimer, D., Cardoso, T., Freeman, E., Guttorp, P., Ko, H., Silkey, M. (1997). Natural variabilityof Benthic species composition in the Delaware Bay. Journal of Environmental and EcologicalStatistics 4:95–115.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:14

13

Nov

embe

r 20

14

Page 32: Bayesian Dynamic Dirichlet Models

Bayesian Dynamic Dirichlet Models 817

Blei, D. M., Lafferty, J. D. (2006). Dynamic topic models. In: Proceedings of the 23rd InternationalConference on Machine Learning. Pittsburgh, PA, 113–120.

Camargo, A. P., Stern, J. M., Lauretto, M. S. (2012). Estimation and model selection in Dirichletregression. AIP Conf. Proc. 1443:206.

Carter, C. K., Kohn, R. (1994). On Gibbs sampling for state-space models. Biometrika 81:541–553.Cribari-Neto, F., Zeileis, A. (2010). Beta regression in R. Journal of Statistical Software 34:1–24.da-Silva, C. Q., Migon, H. S., Correia, L. T. (2011). Dynamic Bayesian beta models. Computational

Statistics and Data Analysis 55:2074–2089.Ferrari, S. L. P., Cribari-Neto, F. (2004). Beta regression for modeling rates and proportions. Journal

of Applied Statistics 31:799–815.Fruhwirth-Schnatter, S. (1994). Data augmentation and dynamic linear models. Journal of Time

Series Analysis 15:183–202.Gamerman, D. (1998). Markov chain Monte Carlo for dynamic generalized linear models. Biometrika

85:215–227.Gelman, A., Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences (with

discussion). Statistical Science 7:457–511.Geweke, J., Tanizaki, H. (2001). Bayesian estimation of state space models using Metropolis-Hastings

algorithm within Gibbs sampling. Computation Statistics and Data Analysis 37:151–170.Godolphin, E. J., Triantafyllopoulos, K. (2006). Decomposition of time series models in state-space

form. Computational Statistics and Data Analysis 50:2232–2246.Grunwald, G. K., Raftery, A. E., Guttorp, P. (1993). Time series of continuous proportions. Journal

of the Royal Statistical Society. Series B 55:103–116.Hankin, R. K. S. (2010). A generalization of the Dirichlet distribution. Journal of Statistical Software

33:1–18.Heidelberger, P., Welch, P. (1983). Simulation run length control in the presence of initial transient.

Operations Research 31:1109–1144.Hijazi, R. H. (2009). Modeling compositional data using Dirichlet regression models. Journal of

Applied Probability & Statistics 4:77–91.Johnson, S. G., Narasimhan, B. (2009). Cubature: Adaptive multivariate integration over hypercubes.

R package version 1.0.Lange, K. (1999). Numerical Analysis for Statisticians. New York: Springer.Lindsey, J. K., Lambert, P. (1995). Dynamic generalized linear models and repeated measurements.

Journal of Statistical Planning and Inference 47:129–139.Nachif, M. C. A. (2006). Homicide as a public health problem in the city of Campo Grande, Mato

Grosso do Sul, Brazil. Psicologia & Sociedade 18:99–104.Nelder, J. A., Wedderburn, R. W. M. (1972). Generalized linear models. Journal of the Royal Statistical

Society. Series A 135:370–384.Pole, A., West, M., Harrison, J. (1994). Applied Bayesian Forecasting and Time Series Analysis. Boca

Raton:Chapman and Hall/CRC.Pruteanu-Malinici, L. R., Paisley, J., Wang, E., Carin, L. (2010). Hierarchical Bayesian modeling of

topics in time-stamped documents. IEEE Transactions Pattern Analysis and Machine Intelli-gence 32:996–1011.

Quintana, J. M., West, M. (1988). Time series analysis of compositional data. In: Bernardo, J. M.,DeGroot, M. H., Lindley, D. V., Smith, A. F. M., eds. Bayesian Statistics 3. pp. 747–756, NewYork: Oxford University Press.

Ravines, R., Migon, H., Schmidt, A. (2007). An efficient sampling scheme for dynamic generalizedmodels. Technical Report 201/2007, Departamento de Metodos Estatısticos - IM- UFRJ.

Reichenheim, M. E., Souza, E. R., Moraes, C. L., Jorge, M. H. P. M., Silva, C. M. F. P., Minayo, M.C. S. (2011). Violence and injuries in Brazil: the effect, progress made, and challenges ahead.The Lancet 377:1962–1975.

Shephard, N., Pitt, M. (1997). Likelihood analysis of non-Gaussian measurement time series.Biometrika 84:653–667.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:14

13

Nov

embe

r 20

14

Page 33: Bayesian Dynamic Dirichlet Models

818 da-Silva and Rodrigues

Smith, J. Q. (1979). A Generalization of the Bayesian steady forecasting model. Journal of the RoyalStatistical Society. Series B 41:375–387.

Tierney, L., Kadane, J. B. (1986). Accurate approximations for the posterior moments and marginaldensities. Journal of the American Statistical Association 81:82–86.

Wang, C., Blei, D. M., Heckerman, D. (2008). Continuous Time Dynamic Topic Models. In Proceed-ings of UAI. 579–586.

West, M., Harrison, P. J. (1997). Bayesian Forecasting and Dynamic Models. 2nd ed. New York:Springer.

West, M., Harrison, P. J., Migon, H. S. (1985). Dynamic generalized linear models and Bayesianforecasting. Journal of the American Statistical Association 80:73–97.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:14

13

Nov

embe

r 20

14