hansen & seo

8/7/2019 hansen & seo

http://slidepdf.com/reader/full/hansen-seo 1/26

Journal of Econometrics 110 (2002) 293–318

www.elsevier.com/locate/econbase

Testing for two-regime threshold cointegration invector error-correction models

Bruce E. Hansena ; ∗, Byeongseon Seob

a Department of Economics, University of Wisconsin, Madison, WI 53706, USAbDepartment of Economics, Soongsil University, Seoul 156-743, South Korea

Abstract

This paper examines a two-regime vector error-correction model with a single cointegrating

vector and a threshold eect in the error-correction term. We propose a relatively simple algo-

rithm to obtain maximum likelihood estimation of the complete threshold cointegration model for

the bivariate case. We propose a SupLM test for the presence of a threshold. We derive the null

asymptotic distribution, show how to simulate asymptotic critical values, and present a bootstrap

approximation. We investigate the performance of the test using Monte Carlo simulation, andÿnd that the test works quite well. Applying our methods to the term structure model of interest

rates, we ÿnd strong evidence for a threshold eect.

c 2002 Published by Elsevier Science B.V.

JEL classiÿcation: C32

Keywords: Term structure; Bootstrap; Identiÿcation; Non-linear; Non-stationary

1. Introduction

Threshold cointegration was introduced by Balke and Fomby (1997) as a feasible

means to combine non-linearity and cointegration. In particular, the model allows for

non-linear adjustment to long-run equilibrium. The model has generated signiÿcant

applied interest, including the following applications: Balke and Wohar (1998), Baum

et al. (2001), Baum and Karasulu (1998), Enders and Falk (1998), Lo and Zivot

(2001), Martens et al. (1998), Michael et al. (1997), O’Connell (1998), O’Connell

and Wei (1997), Obstfeld and Taylor (1997), and Taylor (2001). Lo and Zivot (2001)

provide an extensive review of this growing literature.

∗ Corresponding author. Tel.: +1-608-263-2989; fax: +1-608-262-2033.

E-mail address: [email protected] (B.E. Hansen).

0304-4076/02/$ - see front matter c 2002 Published by Elsevier Science B.V.

PII: S 0 3 0 4 - 4 0 7 6 ( 0 2 ) 0 0 0 9 7 - 0



294 B.E. Hansen, B. Seo/ Journal of Econometrics 110 (2002) 293 – 318

One of the most important statistical issues for this class of models is testing for the

presence of a threshold eect (the null of linearity). Balke and Fomby (1997) proposed

using the application of the univariate tests of Hansen (1996) and Tsay (1989) to the

error-correction term (the cointegrating residual). This is known to be valid when thecointegrating vector is known, but Balke–Fomby did not provide a theory for the case

of estimated cointegrating vector. Lo and Zivot (2001) extended the Balke–Fomby

approach to a multivariate threshold cointegration model with a known cointegrating

vector, using the tests of Tsay (1998) and multivariate extensions of Hansen (1996).

In this paper, we extend this literature by examining the case of unknown cointegrat-

ing vector. As in Balke–Fomby, our model is a vector error-correction model (VECM)

with one cointegrating vector and a threshold eect based on the error-correction term.

However, unlike Balke–Fomby who focus on univariate estimation and testing meth-

ods, our estimates and tests are for the complete multivariate threshold model. The fact

that we use the error-correction term as the threshold variable is not essential to our

analysis, and the methods we discuss here could easily be adapted to incorporate other

models where the threshold variable is a stationary transformation of the predetermined

variables.

This paper makes two contributions. First, we propose a method to implement max-

imum likelihood estimation (MLE) of the threshold model. This algorithm involves

a joint grid search over the threshold and the cointegrating vector. The algorithm is

simple to implement in the bivariate case, but would be dicult to implement in higher

dimensional cases. Furthermore, at this point we do not provide a proof of consistency,

nor a distribution theory for the MLE.Second, we develop a test for the presence of a threshold eect. Under the null hy-

pothesis, there is no threshold, so the model reduces to a conventional linear VECM.

Thus estimation under the null hypothesis is particularly easy, reducing to conven-

tional reduced rank regression. This suggests that a test can be based on the Lagrange

multiplier (LM) principle, which only requires estimation under the null. Since the

threshold parameter is not identiÿed under the null hypothesis, we base inference on a

SupLM test. (See Davies (1987), Andrews (1993), and Andrews and Ploberger (1994)

for motivation and justiÿcation for this testing strategy.) Our test takes a similar al-

gebraic form to those derived by Seo (1998) for structural change in error-correction

models.We derive the asymptotic null distribution of the Sup-LM test, and ÿnd that it is

identical to the form found in Hansen (1996) for threshold tests applied to stationary

data. In general, the asymptotic distribution depends on the covariance structure of the

data, precluding tabulation. We suggest using either the ÿxed regressor bootstrap of

Hansen (1996, 2000b), or alternatively a parametric residual bootstrap algorithm, to

approximate the sampling distribution.

Section 2 introduces the threshold models and derives the Gaussian quasi-MLE for

the models. Section 3 presents our LM test for threshold cointegration, its asymp-

totic distribution, and two methods to calculate p-values. Section 4 presents simu-

lation evidence concerning the size and power of the tests. Section 5 presents anapplication to the term structure of interest rates. Proofs of the asymptotic distri-

bution theory are presented in the appendix. Gauss programs which compute the



B.E. Hansen, B. Seo/ Journal of Econometrics 110 (2002) 293 – 318 295

estimates and test, and replicate the empirical work reported in this paper, are available

at www.ssc.wisc.edu/∼bhansen.

2. Estimation

2.1. Linear cointegration

Let xt be a p-dimensional I (1) time series which is cointegrated with one p × 1

cointegrating vector ÿ. Let wt (ÿ) = ÿxt denote the I (0) error-correction term. A linear

VECM of order l + 1 can be compactly written as

xt = AX t −1(ÿ) + ut ; (1)

where

X t −1(ÿ) =

1

wt −1(ÿ)

xt −1

xt −2

...

xt −

l

:

The regressor X t −1(ÿ) is k × 1 and A is k × p where k = pl + 2. The error ut is

assumed to be a vector martingale dierence sequence (MDS) with ÿnite covariance

matrix = E(ut u

t ).

The notation wt −1(ÿ) and X t −1(ÿ) indicates that the variables are evaluated at generic

values of ÿ. When evaluated at the true value of the cointegrating vector, we will denote

these variables as wt −1 and X t −1, respectively.

We need to impose some normalization on ÿ to achieve identiÿcation. Since there

is just one cointegrating vector, a convenient choice is to set one element of ÿ equal

to unity, which has no cost when the system is bi-variate (p = 2) and for p ¿ 2 only

imposes the restriction that the corresponding element of xt enters the cointegratingrelationship.

The parameters (ÿ;A;) are estimated by maximum likelihood under the assumption

that the errors ut are iid Gaussian (using the above normalization on ÿ). Let these

estimates be denoted (ÿ; A; ). Let u t = xt − A

X t −1(ÿ) be the residual vectors.

2.2. Threshold cointegration

As an extension of model (1), a two-regime threshold cointegration model takes the

form

xt =

A

1X t −1(ÿ) + ut if wt −1(ÿ)6 ;

A

2X t −1(ÿ) + ut if wt −1(ÿ) ¿ ;

http://www.ssc.wisc.edu/~bhansen








where is the threshold parameter. This may alternatively be written as

xt = A

1X t −1(ÿ)d1t (ÿ; ) + A

2X t −1(ÿ)d2t (ÿ; ) + ut ; (2)

where

d1t (ÿ; ) = 1(wt −1(ÿ)6 );

d2t (ÿ; ) = 1(wt −1(ÿ) ¿ )

and 1(·) denotes the indicator function.

Threshold model (2) has two regimes, deÿned by the value of the error-correction

term. The coecient matrices A1 and A2 govern the dynamics in these regimes. Model

(2) allows all coecients (except the cointegrating vector ÿ) to switch between these

two regimes. In many cases, it may make sense to impose greater parsimony on themodel, by only allowing some coecients to switch between regimes. This is a special

case of (2) where constraints are placed on (A1; A2). For example, a model of particular

interest only lets the coecients on the constant and the error correction wt −1 to switch,

constraining the coecients on the lagged xt −j to be constant across regimes.

The threshold eect only has content if 0 ¡ P(wt −16 ) ¡ 1, otherwise the model

simpliÿes to linear cointegration. We impose this constraint by assuming that

06P(wt −16 )6 1 − 0; (3)

where 0 ¿ 0 is a trimming parameter. For the empirical application, we set 0 = 0:05.We propose estimation of model (2) by maximum likelihood, under the assumption

that the errors ut are iid Gaussian. The Gaussian likelihood is

Ln(A1; A2; ; ÿ ; ) = −n

2log|| − 1

2

nt =1

ut (A1; A2; ÿ ; )−1ut (A1; A2; ÿ ; );

where

ut (A1; A2; ÿ ; ) = xt

−A

1X t −1(ÿ)d1t (ÿ; )

−A

2X t −1(ÿ)d2t (ÿ; ):

The MLE (A1; A2; ; ÿ; ) are the values which maximize Ln(A1; A2; ; ÿ ; ).

It is computationally convenient to ÿrst concentrate out (A1; A2; ). That is, hold

(ÿ; ) ÿxed and compute the constrained MLE for (A1; A2; ). This is just OLS regres-

sion, speciÿcally, 1

A1(ÿ; ) =

n

t =1

X t −1(ÿ)X t −1(ÿ)d1t (ÿ; )

−1 n

t =1

X t −1(ÿ)xt d1t (ÿ; )

; (4)

1 These formulas are for unconstrained model (2). If a constrained threshold model is used, then the

appropriate constrained OLS estimates should be used.




A2(ÿ; ) =

n

t =1

X t −1(ÿ)X t −1(ÿ)d2t (ÿ; )

−1 n

t =1

X t −1(ÿ)xt d2t (ÿ; )

; (5)

u t (ÿ; ) = ut (A1(ÿ; ); A2(ÿ; ); ÿ ; )

and

(ÿ; ) =1

n

nt =1

u t (ÿ; )u t (ÿ; ): (6)

It may be helpful to note that (4) and (5) are the OLS regressions of xt on X t −1(ÿ)

for the subsamples for which wt −1(ÿ)6 and wt −1(ÿ) ¿ , respectively.

This yields the concentrated likelihood function

Ln(ÿ; ) =Ln(A1(ÿ; ); A2(ÿ; ); (ÿ; ); ÿ ; )

=−n

2log|(ÿ; )| − np

2: (7)

The MLE (ÿ; ) are thus found as the minimizers of log|(ÿ; )| subject to the nor-

malization imposed on ÿ as discussed in the previous section and the constraint

06 n−1

n

t =1

1(xt ÿ6 )6 1 − 0

(which imposes (3)). The MLE for A1 and A2 are A1 = A1(ÿ; ) and A2 = A2(ÿ; ).

This criterion function (7) is not smooth, so conventional gradient hill-climbing

algorithms are not suitable for its maximization. In the leading case p = 2, we suggest

using a grid search over the two-dimensional space (ÿ; ). In higher dimensional cases,

grid search becomes less attractive, and alternative search methods (such as a genetic

algorithm, see Dorsey and Mayer, 1995) might be more appropriate. Note that in the

event that ÿ is known a priori, this grid search is greatly simpliÿed.

To execute a grid search, one needs to pick a region over which to search. We sug-

gest calibrating this region based on the consistent estimate˜

ÿ obtained from the linearmodel (the MLE 2 discussed in Section 2.1). Set wt −1 =wt −1(ÿ), let [L; U] denote the

empirical support of wt −1, and construct an evenly spaced grid on [L; U]. Let [ÿL; ÿU]

denote a (large) conÿdence interval for ÿ constructed from the linear estimate ÿ (based,

for example, on the asymptotic normal approximation) and construct an evenly spaced

grid on [ÿL; ÿU]. The grid search over (ÿ; ) then examines all pairs (; ÿ) on the grids

on [L; U] and [ÿL; ÿU], conditional on 06 n−1n

t =1 1(xt ÿ6 )6 1− 0 (the latter

to impose constraint (3)).

In Figs. 1 and 2, we illustrate the non-dierentiability of the criterion function for an

empirical example from Section 5. (The application is to the 12- and 120-month T-Bill

2 Any consistent estimator could in principle be used here. In all our simulations and applications, we use

the Johansen MLE.




Fig. 1. Concentrated negative log likelihood.

Fig. 2. Concentrated negative log likelihood.




rates, setting l = 1.) Fig. 1 plots criterion (7) as a function of with ÿ concentrated

out, and Fig. 2 plots the criterion as a function of ÿ with concentrated out.

In summary, our algorithm for the p = 2 case is

1. Form a grid on [L; U] and [ÿL; ÿU] based on the linear estimate ÿ as described

above.

2. For each value of (ÿ; ) on this grid, calculate A1(ÿ; ), A2(ÿ; ), and (ÿ; ) as

deÿned in (4), (5), and (6), respectively.

3. Find (ÿ; ) as the values of (ÿ; ) on this grid which yields the lowest value of

log|(ÿ; )|.4. Set = (ÿ; ), A1 = A1(ÿ; ), A2 = A2(ÿ; ), and u t = u t (ÿ; ).

It is useful to note that in step 3, there is no guarantee that the minimizers ( ÿ; )

will be unique, as the log-likelihood function is not concave.We have described an algorithm to implement the MLE, but it should be emphasized

that this is not a theory of inference. We have not provided a proof of consistency

of the estimator, nor a distribution theory. In linear models, ÿ converges to ÿ at rate

n, and in stationary models, converges to at rate n. It therefore seems reasonable

to guess that in the threshold cointegration model, (ÿ; ) will converge to (ÿ; ) at

rate n. In this case, the slope estimates A1 and A2 should have conventional normal

asymptotic distributions as if ÿ and were known. Hence conventional standard errors

can be reported for these parameter estimates.

3. Testing for a threshold

3.1. Test statistics

Let H0 denote the class of linear VECM models (1) and H1 denote the class of

two-regime threshold models (2). These models are nested, and the restriction H0 are

the models in H1 which satisfy A1 = A2.

We want to test H0 (linear cointegration) versus H1 (threshold cointegration). Wefocus on formal model-based statistical tests, as these allow for direct model compar-

isons, and yield the greatest power for discrimination between models. Alternatively,

one might consider the non-parametric non-linearity tests of Tsay (1989) and Tsay

(1998) for the univariate and multivariate cases, respectively. As shown in the simula-

tion studies by Balke and Fomby (1997) and Lo and Zivot (2001), these non-parametric

tests generally have lower power than comparable model-based tests.

In this paper we consider LM statistics. We do this for two reasons. First, the LM

statistic is computationally quick, enabling feasible implementation of the bootstrap.

Second, a likelihood ratio or Wald-type test would require a distribution theory for

the parameter estimates for the unrestricted model, which we do not yet have. Weconjecture, but have no proof, that these tests are asymptotically equivalent to the LM

test. We now derive the LM test statistic.




Assume for the moment that (ÿ; ) are known and ÿxed. The model under H0 is

xt = AX t −1(ÿ) + ut (8)

and H1 is

xt = A

1X t −1(ÿ)d1t (ÿ; ) + A

2X t −1(ÿ)d2t (ÿ; ) + ut : (9)

Given (ÿ; ), the models are linear so the MLE is least squares. As (8) is nested in (9)

and the models are linear, an LM-like statistic which is robust to heteroskedasticity

can be calculated from a linear regression on model (9). Speciÿcally, let X 1(ÿ; )

and X 2(ÿ; ) be the matrices of the stacked rows X t −1(ÿ)d1t (ÿ; ) and X t −1(ÿ)d2t (ÿ; ),

respectively, let 1(ÿ; ) and 2(ÿ; ) be the matrices of the stacked rows u t ⊗X t −1(ÿ)d1t

(ÿ; ) and u t

⊗X t −1(ÿ)d2t (ÿ; ), respectively, with u t the residual vector from the linear

model as deÿned in Section 2.1, and deÿne the outer product matrices

M 1(ÿ; ) = I p ⊗ X 1(ÿ; )X 1(ÿ; );

M 2(ÿ; ) = I p ⊗ X 2(ÿ; )X 2(ÿ; )

and

1(ÿ; ) = 1(ÿ; )1(ÿ; );

2(ÿ; ) = 2(ÿ; )2(ÿ; ):

Then we can deÿne V 1(ÿ; ) and V 2(ÿ; ), the Eicker–White covariance matrix estima-

tors for vec A1(ÿ; ) and vec A2(ÿ; ), as

V 1(ÿ; ) = M 1(ÿ; )−11(ÿ; )M 1(ÿ; )−1; (10)

V 2(ÿ; ) = M 2(ÿ; )−12(ÿ; )M 2(ÿ; )−1 (11)

yielding the standard expression for the heteroskedasticity-robust LM-like statistic

LM(ÿ; ) = vec(ˆ

A1(ÿ; ) −ˆ

A2(ÿ; ))

(ˆ

V 1(ÿ; ) +ˆ

V 2(ÿ; ))

−1

× vec(A1(ÿ; ) − A2(ÿ; )): (12)

If ÿ and were known, (12) would be the test statistic. When they are unknown,

the LM statistic is (12) evaluated at point estimates obtained under H0. The null

estimate of ÿ is ÿ (Section 2.1), but there is no estimate of under H0, so there is

no conventionally deÿned LM statistic. Arguing from the union–intersection principle,

Davies (1987) proposed the statistic

SupLM = sup

L66U

LM(ÿ; ): (13)

For this test, the search region [L; U] is set so that L is the 0 percentile of wt −1,

and U is the (1−0) percentile. This imposes constraint (3). For testing, the parameter




0 should not be too close to zero, as Andrews (1993) shows that doing so reduces

power. Andrews (1993) argues that setting 0 between 0.05 and 0.15 are typically

good choices.

Further justiÿcation for statistic (13) is given in Andrews (1993) and Andrewsand Ploberger (1994). Andrews and Ploberger (1994) argue that better power may

be achieved by using exponentially weighted averages of LM(ÿ; ), rather than the

supremum. There is an inherent arbitrariness in this choice of statistic, however, due

to the choice of weighting function, so our analysis will remain conÿned to (13).

As the function LM(ÿ; ) is non-dierentiable in , to implement the maximization

deÿned in (13) it is necessary to perform a grid evaluation over [L; U].

In the event that the true cointegrating vector ÿ0 is known a priori, then the test

takes form (13), except that ÿ is ÿxed at the known value ÿ0. We denote this test

statistic as

SupLM0 = supL66U

LM(ÿ0; ): (14)

It is important to know that the values of which maximize the expressions in (13)

and (14) will be dierent from the MLE presented in Section 2. This is true for

two separate reasons. First, (13) and (14) are LM tests, and are based on parameter

estimates obtained under the null rather than the alternative. Second, these LM statistics

are computed with heteroskedasticity-consistent covariance matrix estimates, and in this

case even the maximizers of SupWald statistics are dierent from the MLE (the latter

equal only when homoskedastic covariance matrix estimates are used). This dierence

is generic in threshold testing and estimation for regression models, and not special tothreshold cointegration.

3.2. Asymptotic distribution

First consider the case that the true cointegrating vector ÿ0 is known. The regres-

sors are stationary, and the testing problem is a multivariate generalization of Hansen

(1996). It follows that the asymptotic distribution of the tests will take the form given

in that paper. We require the following standard weak dependence conditions.

Assumption. {ÿ

xt ; xt } is L4r -bounded; strictly stationary and absolutely regular; withmixing coecients Ám = O(m−A) where A ¿ =(− 1) and r ¿ ¿ 1. Furthermore; the

error ut is an MDS; and the error-correction ÿxt has a bounded density function.

Under null hypothesis (8), these conditions are known to hold when ut is iid with

a bounded density and L4r -bounded. Under alternative hypothesis (9) these conditions

are known to hold under further restrictions on the parameters.

Let F (·) denote the marginal distribution of wt −1, let “⇒” denote weak convergence

with respect to the uniform metric on [0; 1 − 0]. Deÿne t −1 = F (wt −1) and

M (r ) = I p ⊗ E(X t −

1X

t −11(t −

16

r ));and

(r ) = E[1(t −16 r )(ut u

t ⊗ X t −1X t −1)]:




Theorem 1. Under H0;

SupLM0

⇒T = sup

06r 61−0

T (r );

where

T (r ) = S ∗(r )∗(r )−1S ∗(r );

∗(r ) = (r ) − M (r )M (1)−1(r ) − (r )M (1)−1M (r )

+ M (r )M (1)−1(1)M (1)−1M (r );

and

S ∗(r ) = S (r ) − M (r )M (1)−1S (1);

where S (r ) is a mean-zero matrix Gaussian process with covariance kernel E(S (r 1)

S (r 2)) = (r 1 ∧ r 2).

The asymptotic distribution in Theorem 1 is the same as that presented in Hansen

(1996). In general, the asymptotic distribution does not simplify further. However, we

discuss one special simpliÿcation at the end of this subsection.

Now we consider the case of estimated ÿ. Since n(ÿ − ÿ0) = Op(1), it is sucientto examine the behavior of LM(ÿ; ) in an n−1 neighborhood of ÿ0.

Theorem 2. Under H0; LM(; ) = LM(ÿ0 + =n; ) has the same asymptotic ÿnite

dimensional distributions ( ÿdi ’s) as LM(ÿ0; ).

If, in addition, we could show that the process LM (; ) is tight on compact sets, it

would follow that SupLM and SupLM0 have the same asymptotic distribution, namely

T . This would imply that the use of the estimate ÿ, rather than the true value ÿ0, does

not alter the asymptotic null distribution of the LM test. Unfortunately, we have been

unable to establish a proof of this proposition. The diculty is two-fold. The processLM(; ) is discontinuous in ÿ (due to the indicator functions) and is a function of the

non-stationary variable xt −1. There is only a small literature on empirical process results

for time-series processes, and virtually none for non-stationary data. Furthermore, the

non-stationary variable xt −1 appears in the indicator function, so Taylor series methods

cannot be used to simplify the problem.

It is our view that despite the lack of a complete proof, the ÿdi result of Theorem

2 is sucient to justify using the asymptotic distribution T for the statistic SupLM.

Theorem 1 gives an expression for the asymptotic distribution T . It has the expression

as the supremum of a stochastic process T (r ), the latter sometimes called a “chi-square

process” since for each r the marginal distribution of T (r ) is chi-square. As T isthe supremum of this stochastic process, its distribution is determined by the joint

distribution of this chi-square process, and hence depends on the unknown functions




M (r ) and (r ). As these functionals may take a broad range of shapes, critical values

for T cannot in general be tabulated.

In one special case, we can achieve an important simpliÿcation. Take model (2)

under E(ut ut |Ft −1) = with no intercept and no lags of xt , so that the only re-gressor is the error-correction term wt −1. Then since M (r ) is scalar and monotonically

increasing, there exists a function (s) such that M ( (s)) = sM (1). We can without

loss of generality normalize M (1) = 1 and = I . Then S ( (s)) = W (s) is a standard

Brownian motion, S ( (s)) = W (s) − sW (1) is a Brownian bridge, and

T = sup06r 61−0

T (r )

= sup

06 (s)61−

0

T ( (s))

= sups16s6s2

(W (s) − sW (1))2

s(1 − s);

where s1 = −1(0) and s2 = −1(1 − 0). This is the distribution given in An-

drews (1993) for tests for structural change of unknown timing, and is a function of

only

s0 =s2(1 − s1)

s1(1 − s2)

:

3.3. Asymptotic p-values: the ÿxed regressor bootstrap

With the exception discussed at the end of Section 3.2, the asymptotic distribution

in Theorems 1 and 2 appears to depend upon the moment functionals M (r ) and (r ),

so tabulated critical values are unavailable. We discuss in this section how the ÿxed

regressor bootstrap of Hansen (1996, 2000b) can be used to calculate asymptotic critical

values and p-values, and hence achieve ÿrst-order asymptotically correct inference.

We interpret Theorem 2 to imply that the ÿrst-step estimation of the cointegratingvector ÿ does not aect the asymptotic distribution of the SupLM test. We therefore

do not need to take the estimation of ÿ into account in conducting inference on the

threshold. However, since Theorem 2 is not a complete proof of the asymptotic dis-

tribution of SupLM when ÿ is estimated, we should emphasize that this is partially a

conjecture.

We now describe the ÿxed regressor bootstrap. Let wt −1 = wt −1(ÿ), X t −1 = X t −1(ÿ),

and let u t be the residuals from the reduced rank regression as described in

Section 2. For the remainder of our discussion, u t , wt −1, X t −1 and ÿ are held ÿxed at

their sample values.

Let ebt be iid N(0; 1) and set ybt = u t ebt . Regress ybt on X t −1 yielding residuals u bt .Regress ybt on X t −1d1t (ÿ; ) and X t −1d2t (ÿ; ), yielding estimates A1()b and A2()b,

and residuals u bt (). Deÿne V 1()b and V 2()b as in (10) and (11) setting ÿ = ÿ and




replacing u t with u bt in the deÿnition of 1(ÿ; ) and 2(ÿ; ). Then set

SupLM∗ = supL66U

vec(A1()b

−A2()b)(V 1()b + V 2()b)−1

× vec(A1()b − A2()b):

The analysis in Hansen (1996) shows that under local alternatives to H0, SupLM∗ ⇒p

T , so the distribution of SupLM∗ yields a valid ÿrst-order approximation to the asymp-

totic null distribution of SupLM. The symbol “⇒p” denotes weak convergence in prob-

ability as deÿned in Gine and Zinn (1990).

The distribution SupLM∗ is unknown, but can be calculated using simulation

methods. The description given above shows how to create one draw from the dis-

tribution. With independent draws of the errors ebt , a new draw can be made. If this isrepeated a large number of times (e.g. 1000), a p-value can be calculated by counting

the percentage of simulated SupLM∗ which exceed the actual SupLM.

The label “ÿxed regressor bootstrap” is intended to convey the feature that the re-

gressors X t −1d1t (ÿ; ) and X t −1d2t (ÿ; ) are held ÿxed at their sample values. As such,

this is not really a bootstrap technique, and is not expected to provide a better ap-

proximation to the ÿnite sample distribution than conventional asymptotic approxima-

tions. The advantage of the method is that it allows for heteroskedasticity of unknown

form, while conventional model-based bootstrap methods eectively impose indepen-

dence on the errors ut and therefore do not achieve correct ÿrst-order asymptotic infer-

ence. It allows for general heteroskedasticity in much the same way as White’s (1980)heteroskedasticity-consistent standard errors.

3.4. Residual bootstrap

The ÿxed regressor bootstrap of the previous section has much of the computational

burden of a bootstrap, but only approximates the asymptotic distribution. While we have

no formal theory, it stands to reason that a bootstrap method might achieve better ÿnite

sample performance than asymptotic methods. This conjecture is not obvious, as the

asymptotic distribution of Section 3.2 is non-pivotal, and it is known that the bootstrapin general does not achieve an asymptotic reÿnement (an improved rate of convergence

relative to asymptotic inference) when asymptotic distributions are non-pivotal.

One cost of using the bootstrap is the need to be fully parametric concerning

the data-generating mechanism. In particular, it is dicult to incorporate conditional

heteroskedasticity, and in its presence a conventional bootstrap (using iid innovations)

will fail to achieve the ÿrst-order asymptotic distribution (unlike the ÿxed regressors

bootstrap, which does).

The parametric residual bootstrap method requires a complete speciÿcation of the

model under the null. This is Eq. (1) plus auxiliary assumptions on the errors ut

and the initial conditions. In our applications, we assume ut is iid from an unknowndistribution G, and the initial conditions are ÿxed (other choices are possible). The

bootstrap calculates the sampling distribution of the test SupLM using this model and




the parameter estimates obtained under the null. The latter are ÿ, A, and the empirical

distribution of the bi-variate residuals u t .

The bootstrap distribution may be calculated by simulation. Given the ÿxed initial

conditions, random draws are made from the residual vectors u t , and then the vectorseries xbt are created by recursion given model (1). The statistic SupLM∗ is calcu-

lated on each simulated sample and stored. The bootstrap p-value is the percentage of

simulated statistics which exceed the actual statistic.

4. Simulation evidence

4.1. Threshold test

Monte Carlo experiments are performed to ÿnd out the small sample performance

of the test. The experiments are based on a bivariate error-correction model with two

lags. Letting xt = (x1t x2t ), the single-regime model H0 is

xt =

1

2

+

1

2

(x1t −1 − ÿx2t −1) + xt −1 +

u1t

u2t

: (15)

The two-regime model H1 is the generalization of (15) as in (2), allowing all coe-

cients to dier depending if x1t −1 − x2t −16 or x1t −1 − x2t −1 ¿ .

Our tests are based on model (15), allowing all coecients to switch betweenregimes under the alternative. The tests are calculated setting 0 = 0:10, using 50

gridpoints on [L; U] for calculation of (13), and using 200 bootstrap replications for

each replication. Our results are calculated from 1000 simulation replications.

We ÿx 1 = 2 = 0, ÿ = 1, and 1 = −1. We vary 2 among (0;−0:5; 0:5), and

among

0 =

0 0

0 0

; 1 =

−0:2 0

−0:1 −0:2

; 2 =

−0:2 −0:1

−0:1 −0:2

;

and consider two sample sizes, n = 100 and 250. We generated the errors u1t and u2t

under homoskedastic and heteroskedastic speciÿcations. For a homoskedastic error, we

generate u1t and u2t as independent N(0; 1) variates. For a heteroskedastic error, we

generate u1t and u2t as independent GARCH(1; 1) processes, with u1t ∼ N(0; 21t ) and

21t = 1 + 0:2u2

1t −1 + 21t −1, and similarly u2t .

We ÿrst explored the size of the SupLM and SupLM0 statistics under the null

hypothesis H0 of a single regime. This involved generating data from linear model

(15). For each simulated sample, the statistics and p-values were calculated using both

the ÿxed-regressor bootstrap and the residual bootstrap. In Table 1, we report rejection

frequencies from nominal 5% and 10% tests for the SupLM statistic (ÿ unknown). The

results for the SupLM0 statistic (ÿ known) were very similar and so are omitted.For the ÿrst ÿve parameterizations, we generate u1t and u2t as independent N(0; 1)

variates, and vary the parameters 2 and . The rejection frequencies of the tests




Table 1

Size of SupLM tests

Parameters Homoskedastic errors Heteroskedastic

2 0 −0:5 0:5 0 0 0 0 0

0 0 0 1 2 0 0 0

0 0 0 0 0 0.25 0.50 0.75

5% nominal size, n = 100

Fixed-regressor bootstrap 0.083 0.072 0.108 0.071 0.074 0.075 0.080 0.085

Residual bootstrap 0.058 0.052 0.084 0.049 0.048 0.054 0.065 0.065










are reported in the ÿrst ÿve columns of Table 1. (These are the percentage of thesimulated p-values which are smaller than the nominal size.) The rejection frequencies

are similar across the various parameterizations. Using the ÿxed-regressor bootstrap,

the test somewhat over-rejects, with the rejection rate at the nominal 5% level and

n = 100 ranging from 0.071 to 0.108. If the residual bootstrap is used, the test has

much better size, with rejection rates ranging from 0.048 to 0.084. If the sample size

is increased to n = 250, then the size improves considerably, with the 5% rejection

rates for the ÿxed-regressor bootstrap ranging from 0.067 to 0.080 and those for the

residual bootstrap ranging from 0.052 to 0.058.

For the remaining three parameterizations, we generate u1t and u2t as independent

GARCH(1; 1) processes. The other parameters are set as in the ÿrst column of Table1, and the results for the SupLM tests are reported in the ÿnal three columns of Table

1. The rejection rates do not appear to be greatly aected by the heteroskedasticity.

The rejection rates for both SupLM tests increase modestly, and the best results are

obtained again by the residual bootstrap. (This might appear surprising, as the residual

bootstrap does not replicate the GARCH dependence structure, but the LM statistics are

constructed with heteroskedasticity-robust covariance matrices, so are ÿrst-order robust

to GARCH.)

We next explored the power of the tests against the two-regime alternative H1. To

keep the calculations manageable, we generate the data from the simple process

xt =

−1

0

(x1t −1 − x2t −1) +

0

(x1t −1 − x2t −1)1(x1t −1 − x2t −16 ) + ut ;




Table 2

Power of SupLM and SupLM0 tests, 5% nominal size against two-regime alternative

0.5 0.25

0:2 0:4 0:6 0:8 0:2 0:4 0:6 0:8

n = 100

SupLM, ÿxed-regressor bootstrap 0.107 0.207 0.395 0.624 0.119 0.231 0.364 0.468

SupLM; residual bootstrap 0.081 0.175 0.324 0.587 0.091 0.183 0.322 0.425

SupLM0, ÿxed-regressor bootstrap 0.104 0.212 0.411 0.713 0.117 0.224 0.392 0.517

SupLM0, residual bootstrap 0.087 0.166 0.357 0.647 0.088 0.174 0.332 0.453

n = 250

SupLM; ÿxed-regressor bootstrap 0.156 0.450 0.878 0.997 0.187 0.527 0.844 0.933

SupLM; residual bootstrap 0.117 0.399 0.845 0.995 0.152 0.481 0.816 0.914

SupLM0, ÿxed-regressor bootstrap 0.154 0.460 0.896 0.999 0.184 0.546 0.856 0.953

SupLM0, residual bootstrap 0.121 0.403 0.852 0.998 0.144 0.492 0.823 0.929

with ut iid N(0; I 2). Setting =0, the null hypothesis holds and this process corresponds

to the data generated from the ÿrst column of Table 1. For = 0, the alternative of a

two-regime model holds. The threshold parameter is set so that P(wt −16 ) equals

either 0.5 or 0.25. While the data are generated from the ÿrst-order VAR, we used the

same tests as described above, which are based on an estimated second-order VAR.

Table 2 reports the rejection frequency of the SupLM and SupLM0 tests at the 5%

size for several values of . As expected, the power increases in the threshold eect

and sample size n. The ÿxed regressor bootstrap has a higher rejection rate than

the parametric bootstrap, but this is a likely artifact of the size distortions shown in

Table 1. The SupLM0 test (known ÿ) has slightly higher power than the SupLM test

(unknown ÿ) but the dierence is surprisingly small. At least in these settings, there

is little power loss due to estimation of ÿ.

4.2. Parameter estimates

We next explore the ÿnite sample distributions of the estimators of the cointegratingvector ÿ and the threshold parameter . The simulation is based on the following

process:

xt =

−1

0

(x1t −1 − ÿ0x2t −1) +

−2

0

1(x1t −1 − ÿ0x2t −16 0)

+

0:5

0

(x1t −1 − ÿ0x2t −1)1(x1t −1 − ÿ0x2t −1 ¿ 0) + ut

with ut ∼ iid N(0; I 2). We set the cointegrating coecient ÿ0 at 1 and the thresholdcoecient 0 at 0. This model has threshold eects in both the intercept and in the

error correction. We consider two sample sizes, n = 100 and 250. We varied some of




Table 3

Distribution of estimators

Mean RMSE MAE Percentiles (%)

5 25 50 75 95

n = 100

ÿ − ÿ0 − 0.0002 0.0729 0.0154 − 0.0382 − 0.0104 − 0.0004 0.0102 0.0322

ÿ0 − ÿ0 − 0.0000 0.0524 0.0100 − 0.0266 − 0.0056 0.0001 0.0056 0.0234

ÿ − ÿ0 0.0000 0.0982 0.0234 − 0.0493 − 0.0196 − 0.0004 0.0187 0.0475

− 0 − 0.0621 0.9778 0.1460 − 0.3462 − 0.0940 − 0.0145 0.0462 0.2207

0 − 0 − 0.0918 0.9967 0.1221 − 0.3351 − 0.0773 − 0.0320 − 0.0050 0.0983

n = 250

ÿ − ÿ0 0.0000 0.0130 0.0048 − 0.0107 − 0.0035 − 0.0000 0.0035 0.0107

ÿ0 − ÿ0 − 0.0001 0.0091 0.0029 − 0.0080 − 0.0017 − 0.0000 0.0017 0.0064ÿ − ÿ0 0.0003 0.0236 0.0088 − 0.0187 − 0.0060 − 0.0001 0.0066 0.0194

− 0 − 0.0051 0.1419 0.0441 − 0.1109 − 0.0323 − 0.0020 0.0238 0.0919

0 − 0 − 0.0150 0.0815 0.0259 − 0.0752 − 0.0272 − 0.0113 − 0.0014 0.0376

the coecients, but omit the results since the essential features were unchanged. While

the data are generated from a VAR(1), our estimates are based on a VAR(2).

We consider three estimators of ÿ, and two of . The pair (ÿ; ) are the unrestricted

estimators of (ÿ; ), using the algorithm 3 of Section 2. ÿ0 is the restricted estimator

obtained when the true value 0 is known. 0 is the restricted estimator of when thetrue value ÿ0 is known. We should expect ÿ0 and 0 to be more accurate than ÿ and

, respectively, and this comparison allows us to assess the cost due to estimating the

threshold and cointegrating vector, respectively. We also consider the Johansen MLE,

ÿ, which would be ecient if there were no threshold eect.

In Table 3 we report the mean, root mean squared error (RMSE), mean absolute

error (MAE), and selected percentiles of each estimator in 1000 simulation replications.

The results contain no surprises. The three estimators of ÿ all have approximate

symmetric, unbiased, distributions. The restricted estimator ÿ0 (which exploits knowl-

edge about ) is the most accurate, followed by the unrestricted estimatorˆ

ÿ. Bothare more accurate than the Johansen ÿ. It may be interesting to note that the linear

Johansen estimator ÿ (which is much easier to compute) does reasonably well, even

in the presence of the threshold eect, although there is a substantial eciency loss

for n = 250.

Both estimators of have asymmetric distributions. For n = 100, the distributions

are similar, both are meaningfully biased and the estimators are quite inaccurate. For

n = 250, the performance of both estimators is much improved, and the restricted esti-

mator 0 has considerably less dispersion but slightly higher bias than the unrestricted

estimator .

3 The grid sizes for and ÿ are 500 and 100, respectively.




5. Term structure

Let r t be the interest rate on a one-period bond, and Rt be the interest rate on a

multi-period bond. As ÿrst suggested by Campbell and Shiller (1987), the theory of the term structure of interest rates suggests that r t and Rt should be cointegrated with

a unit cointegrating vector. This has led to a large empirical literature estimating linear

cointegrating VAR models such asRt

r t

= + wt −1 +

Rt −1

r t −1

+ ut (16)

with wt −1 = Rt −1 − ÿr t −1. Setting ÿ = 1, the error-correction term is the interest rate

spread.

Linearity, however, is not implied by the theory of the term structure. In this sec-tion, we explore the possibility that a threshold cointegration model provides a better

empirical description.

To address this question, we estimate and test models of threshold cointegration using

the monthly interest rate series of McCulloch and Kwon (1993). Following Campbell

(1995), we use the period 1952–1991. The interest rates are estimated from the prices

of U.S. Treasury securities, and correspond to zero-coupon bonds. We use a selection

of bonds rates with maturities ranging from 1 to 120 months. To select the VAR lag

length, we found that both the AIC and BIC, applied either to the linear VECM or the

threshold VECM, consistently picked l = 1 across speciÿcations. We report our results

for both l = 1 and 2 for robustness. We considered both ÿxing the cointegrating vectorÿ = 1 and letting ÿ be estimated.

First, we tested for the presence of (bivariate) cointegration, using the ADF test ap-

plied to the error-correction term (this is the Engle–Granger test when the cointegrating

vector is estimated). For all bivariate pairs and lag lengths considered, the tests 4 eas-

ily rejected the null hypothesis of no cointegration, indicating the presence of bivariate

cointegration between each pair.

To assess the evidence for threshold cointegration, we applied several sets of tests.

For the complete bivariate speciÿcation, we use the SupLM test (estimated ÿ) and the

SupLM0 test (ÿ =1) with 300 gridpoints, and the p-values calculated by the parametric

bootstrap. For comparison, we also applied the univariate Hansen (1996) threshold

autoregressive test to the error-correction term as in Balke and Fomby (1997). All

p-values were computed with 5000 simulation replications. The results are presented

in Table 4.

The multivariate tests point to the presence of threshold cointegration in some of the

bivariate relationships. In six of the nine models, the SupLM0 statistic is signiÿcant at

the 10% level when l =1 and ÿ is ÿxed at unity. If we set l = 2, the evidence appears

to strengthen, with seven of the nine signiÿcant at the 5% level. If instead of ÿxing ÿ

we estimate it freely, the evidence for threshold cointegration is diminished, with only

four of nine signiÿcant at the 5% level (in either lag speciÿcation).

4 Not reported here to conserve space.




Table 4

Treasury Bond rates: tests for threshold cointegration (p-values)

Short rate Long rate Bivariate Univariate

ÿ = 1 ÿ estimated ÿ = 1 ÿ estimated

l = 1 l = 2 l = 1 l = 2 l = 1 l = 2 l = 1 l = 2

1-month 2-month 0.083 0.003 0.014 0.007 0.453 0.002 0.370 0.188

1-month 3-month 0.030 0.009 0.117 0.188 0.283 0.085 0.245 0.044

1-month 6-month 0.085 0.029 0.634 0.288 0.017 0.040 0.122 0.133

3-month 6-month 0.036 0.021 0.038 0.031 0.658 0.311 0.322 0.133

3-month 12-month 0.047 0.032 0.161 0.198 0.121 0.091 0.122 0.083

3-month 120-month 0.193 0.102 0.095 0.146 0.227 0.485 0.171 0.357

12-month 24-month 0.267 0.516 0.245 0.623 0.489 0.583 0.314 0.618

12-month 120-month 0.018 0.022 0.023 0.016 0.109 0.119 0.251 0.22824-month 120-month 0.173 0.005 0.139 0.008 0.024 0.011 0.278 0.051

The Balke–Fomby univariate tests are somewhat more ambiguous. The threshold

eect is signiÿcant at the 10% level for two of the nine models when l = 1 and ÿ is

ÿxed at unity, and for ÿve when l = 2. When ÿ is estimated rather than ÿxed, then

none of the models are signiÿcant for l = 1, and only three for l = 2. The univariate

speciÿcation is quite restrictive, and this undoubtedly reduces the power of the test insome settings.

Next, we report the parameter estimates for one of the relatively successful models,

the bivariate relationship between the 120-month (10-year) and 12-month (one-year)

bond rates (normalized to be percentages). The parameter estimates were calculated

by minimization of (7) over a 300 × 300 grid on the parameters (; ÿ). The estimated

cointegrating relationship is wt = Rt − 0:984r t , quite close to a unit coecient. The

results we report are for the case of estimated cointegrating vector, but the results are

very similar if the unit coecient is imposed.

The estimated threshold is =−

0:63. Thus the ÿrst regime occurs when Rt 6 0:984r t

−0:63, i.e. when the 10-year rate is more than 0.6 percentage points below the short rate.

This is relatively unusual, with only 8% of the observations in this regime, and we

label this as the “extreme” regime. The second regime (with 92% of the observations)

is when Rt ¿ 0:984r t − 0:63, which we label as the “typical” regime.

The estimated threshold VAR is given below

Rt =

0:54 + 0:34wt −1 + 0:35Rt −1 − 0:17r t −1 + u1t ; wt −16− 0:63;

(0:17) (0:18) (0:26) (0:12)

0:01 − 0:02wt −1 − 0:08Rt −1 + 0:09r t −1 + u1t ; wt −1 ¿ − 0:63;

(0:02) (0:03) (0:06) (0:05)




Fig. 3. Interest rate response to error correction.

r t =

1:45 + 1:41wt −1 + 0:92Rt −1 − 0:04r t −1 + u2t ; wt −16− 0:63;

(0:35) (0:34) (0:62) (0:26)

−0:04 + 0:04wt −1 − 0:07Rt −1 + 0:23r t −1 + u2t ; wt −1 ¿ − 0:63:

(0:04) (0:04) (0:13) (0:13)

Eicker–White standard errors are in parentheses. However, as we have no formal

distribution theory for the parameter estimates and standard errors, these should beinterpreted somewhat cautiously.

In the typical regime, Rt and r t have minimal error-correction eects and minimal

dynamics. They are close to white noise, indicating that in this regime, Rt and r t are

close to driftless random walks.

Error-correction appears to occur only in the unusual regime (when Rt is much

below r t ). There is a strong error-correction eect in the short-rate equation. In the

long-rate equation, the point estimate for the error-correction term is moderately large,

and on the borderline of statistical signiÿcance. The remaining dynamic coecients are

imprecisely estimated due to the small sample in this regime.

In Fig. 3 we plot the error-correction eect—the estimated regression functions of Rt and r t as a function of wt −1, holding the other variables constant. In the ÿgure,

you can see the at near-zero error-correction eect on the right size of the threshold,




and on the left of the threshold, the sharp positive relationships, especially for the

short-rate equation.

One ÿnding of great interest is that the estimated error-correction eects are posi-

tive. 5 As articulated by Campbell and Shiller (1991) and Campbell (1995), the re-gression lines in Fig. 3 should be positive—equivalently, the coecients on wt −1 in

the threshold VECM should be positive. This is because a large positive spread Rt − r t

means that the long bond is earning a higher interest rate, so long bonds must be

expected to depreciate in value. This implies that the long interest rate is expected to

rise. (The short rate is also expected to rise as Rt is a smoothed forecast of future

short rates.)

Using linear correlation methods, Campbell and Shiller (1991) and Campbell (1995)

found considerable evidence contradicting this prediction of the term structure theory.

They found that the changes in the short rate are positively correlated with the spread,

but changes in the long rate are negatively correlated with the spread, especially at

longer horizons. These authors viewed this ÿnding as a puzzle.

In contrast, our results are roughly consistent with this term structure prediction. In

all nine estimated 6 bi-variate relationships, the four error-correction coecients (for

the long and short rate in the two regimes) are either positive or insigniÿcantly dierent

from zero if negative. As expected, the short-rate coecients are typically positive (in

six of the nine models the coecients are positive in both regimes), and the long-rate

coecients are much smaller in magnitude and often negative in sign. There appears

to be no puzzle.

6. Conclusion

We have presented a quasi-MLE algorithm for constructing estimates of a two-regime

threshold cointegration model and a SupLM statistic for the null hypothesis of no

threshold. We derived an asymptotic null distribution for this statistic. We developed

methods to calculate the asymptotic distribution by simulation, and how to calculate a

bootstrap approximation. These methods may ÿnd constructive use in applications.

Still, there are many unanswered questions for future research:

• A test for the null of no cointegration in the context of the threshold cointegration

model. 7 This testing problem is quite complicated, as the null hypothesis implies that

the threshold variable (the cointegrating error) is non-stationary, rendering current

distribution theory inapplicable.

5 In the typical regime, the long rate has a negative point estimate (−0:02), but it is statistically insignif-

icant and numerically very close to zero.6 ÿ ÿxed at unity, l = 1.7 Pippenger and Goering (2000) present simulation evidence that linear cointegration tests can have low

power to detect threshold cointegration.




• A distribution theory for the parameter estimates for the threshold cointegration

model. As shown in Chan (1993) and Hansen (2000a), threshold estimates have

non-standard distributions, and working out such distribution theory is challenging.

• Allowing for VECMs with multiple cointegrating vectors.• Developing estimation and testing methods which impose restrictions on the inter-

cepts to exclude the possibility of a time trend. This would improve the ÿt of the

model to the data, and improve estimation eciency. However, the constraint is quite

complicated and not immediately apparent how to impose. Careful handling of the

intercept and trends is likely to be a fruitful area of research.

• Extending the theory to allow a fully rigorous treatment of estimated cointegrating

vectors.

• How to extend the analysis to allow for three regimes. To assess the statistical

relevance of such models, we would need a test of the null of a two-regime model

against the alternative of a three-regime model.

• An extension to the Balke–Fomby three-regime symmetric threshold model. While

our methods should directly apply if the threshold variable is deÿned as the absolute

value of the error-correction term, a realistic treatment will require restrictions on

the intercepts.

Acknowledgements

We thank two referees, Mehmet Caner, Robert Rossana, Mark Watson, and Ken West

for useful comments and suggestions. Hansen thanks the National Science Foundation,

and Seo thanks the Korea Research Foundation, for ÿnancial support.

Appendix

Proof of Theorem 1. An alternative algebraic representation of the pointwise LM statis-

tic is

LM(ÿ; ) = N ∗

n (ÿ; )

∗

n (ÿ; )−1

N ∗

n (ÿ; ); (17)where

∗

n (ÿ; ) = n(ÿ; ) − M n(ÿ; )M n(ÿ)−1n(ÿ; ) − n(ÿ; )M n(ÿ)−1M n(ÿ; )

+ M n(ÿ; )M n(ÿ)−1n(ÿ)M n(ÿ)−1M n(ÿ; );

N ∗n (ÿ; ) = N n(ÿ; ) − M n(ÿ; )M n(ÿ)−1N n(ÿ);

M n(ÿ; ) = I p ⊗ 1

n

n

t =1

d1t (ÿ; )X t −1(ÿ)X t −1(ÿ);

M n(ÿ) = I p ⊗ 1

n

nt =1

X t −1(ÿ)X t −1(ÿ);




n(ÿ; ) =1

n

nt =1

d1t (ÿ; )(u t u

t ⊗ X t −1(ÿ)X t −1(ÿ));

n(ÿ) =1

n

nt =1

(u t u

t ⊗ X t −1(ÿ)X t −1(ÿ));

N n(ÿ; ) =1√

n

nt =1

d1t (ÿ; )(xt ⊗ X t −1(ÿ));

N n(ÿ) =1√

n

nt =1

(xt ⊗ X t −1(ÿ)):

Then observe that

SupLM0 = supL66U

LM(ÿ0; ) = sup06r 61−0

LM(ÿ0; F −1(r ));

where r = F (). Since LM(ÿ0; ) is a function of only through the indicator function

1(wt −16 ) = 1(t −16 r )

(as t −1 = F (wt −1)) it follows that

SupLM0 = sup06r 61−0

LM0(ÿ0; r );

where LM0(ÿ0; r ) = LM(ÿ0; F −1(r )) is deÿned as in (17) except that all instances of

1(wt −16 ) are replaced by 1(t −16 r ).

Under H0 and making these changes, this simpliÿes to

LM0(r ) = S ∗n (r )∗

n (r )−1S ∗n (r );

where

∗

n (r ) = n(r ) − M n(r )M −1n n(r ) − n(r )M −1

n M n(r ) + M n(r )M −1n nM −1

n M n(r );

S ∗n (r ) = S n(r )

−M n(r )M −1

n S n;

M n(r ) = I p ⊗ 1

n

nt =1

1(t −16 r )X t −1X t −1;

M n = I p ⊗ 1

n

nt =1

X t −1X t −1;

n(r ) =1

n

n

t =1

1(t −16 r )(u t u

t ⊗ X t −1X t −1);

n =1

n

nt =1

(u t u

t ⊗ X t −1X t −1);




S n(r ) =1√

n

nt =1

1(t −16 r )(ut ⊗ X t −1);

S n =1√

n

nt =1

(ut ⊗ X t −1):

The stated result then follows from the joint convergence

M n(r )⇒M (r );

n(r )⇒(r );

S n(r )⇒ S (r );

which follows from Theorem 3 of Hansen (1996), which holds under our stated as-sumptions.

Proof of Theorem 2. First; let wt −1()= wt −1(ÿ0+=n)=wt −1+n−1xt −1; and X t −1()=

X t −1(ÿ0 + =n). Hence

xt = AX t −1 + ut

= AX t −1() − (n−1xt −1) + ut ;

where is the second row of A (the coecient vector on wt −1). Hence

LM(; ) = N ∗n (ÿ0 + =n; )∗n (ÿ0 + =n; )−1N ∗n (ÿ0 + =n; )

and

N ∗n (ÿ0 + =n; ) = S ∗n (; ) − C ∗n (; );

where

S ∗n (; ) = S n(; ) − M n(ÿ0 + =n; )M n(ÿ0 + =n)−1S n();

S n(; ) =1√

n

n

t =1

1(wt −1()6 )(ut ⊗ X t −1());

S n() =1√

n

nt =1

(ut ⊗ X t −1())

and

C ∗n (; ) = C n(; ) − M n(ÿ0 + =n; )M n(ÿ0 + =n)−1C n();

C n(; ) =1

n3=2

n

t =1

1(wt −1()6 )(xt −1 ⊗ X t −1());

C n() =1

n3=2

nt =1

(xt −1 ⊗ X t −1()):




To complete the proof, we need to show that |n(ÿ0 + =n; ) − n(ÿ0; )| = op(1),

|M n(ÿ0 + =n; )−M n(ÿ0; )|= op(1), |S n(ÿ0 + =n; )−S n(ÿ0; )|= op(1), and C ∗n (; ) =

op(1). First, observe that since

|X t −1

−X t −1()

|=

|n−1xt −1

|= Op(n−1=2), it is fairly

straightforward to see that we can replace the X t −1() by X t −1 with only op(1) er-ror in the above expressions, and we make this substitution for the remainder of the

proof.

Let E Q be the event {n−1=2 supt 6n |xt −1|¿ Q}. For any ¿ 0, there is some

Q ¡∞ such that P(E Q)6 . The remainder of the analysis conditions on the set

{n−1=2 supt 6n|xt −1|¿ Q}.

We next show that |M n(ÿ0 + =n; ) − M n(ÿ0; )| = op(1). Indeed, on the set E Q

|M n(ÿ0 + =n; ) − M n(ÿ0; )|2

=

1

n

nt =1

(d1t (ÿ0 + =n; ) − d1t (ÿ0; ))X t −1X t −1

2

6

1

n

nt =1

|X t −1|4

1

n

nt =1

|1(wt −1 + n−1xt −16 ) − 1(wt −16 )|

61

n

n

t =1 |

X t −1

|41

n

n

t =1

1(wt −1

−n−1=2Q6 6wt −1 + n−1=2Q)

6 op(1):

The proof that |n(ÿ0 + =n; ) − n(ÿ0; )| = op(1) follows similarly.

Next, since ut is an MDS,

E(S n(ÿ0 + =n; ) − S n(ÿ0; ))2

=E 1

√n

nt =1

(1(wt −1()6

) − 1(wt −16

))X t −1u

t

2

=E|(1(wt −1()6 ) − 1(wt −16 ))X t −1ut |2

6E|1(wt −1 − n−1=2Q6 6wt −1 + n−1=2Q)X t −1ut |2 +

=o(1) +

and can be made arbitrarily small.

Finally, using similar analysis, C ∗n (; )=C ∗n (0; )+op(1). Since xt is I (1), n−1=2x[nr ] ⇒B(r ), a vector Brownian motion. We can appeal to Theorem 3 of Caner and Hansen

(2001) as our assumptions imply theirs (absolute regularity is stronger than strong




mixing). Hence

C ∗n (0; ) =1

n3=2

n

t =1

1(wt −16 )X t −1xt −1

⇒ E(1(wt −16 )X t −1)

1

0

B

= M ()e1

1

0

B;

where e1 is a p-dimensional vector with the ÿrst element 1 and the remainder 0.

Similarly,

C ∗

n (0) ⇒ Me1 1

0B

and hence

C ∗n (; ) = C ∗n (0; ) + op(1)

= C n(; ) − M n(ÿ0 + =n; )M n(ÿ0 + =n)−1C n()

⇒M ()e1

1

0

B − M ()M −1Me1

1

0

B = 0:

This completes the proof.

References

Andrews, D.W.K., 1993. Tests for parameter instability and structural change with unknown change point.

Econometrica 61, 821–856.

Andrews, D.W.K., Ploberger, W., 1994. Optimal tests when a nuisance parameter is present only under the

alternative. Econometrica 62, 1383–1414.

Balke, N.S., Fomby, T.B., 1997. Threshold cointegration. International Economic Review 38, 627–645.

Balke, N.S., Wohar, M.E., 1998. Nonlinear dynamics and covered interest rate parity. Empirical Economics

23, 535–559.

Baum, C.F., Karasulu, M., 1998. Modelling federal reserve discount policy. Computational Economics 11,53–70.

Baum, C.F., Barkoulas, J.T., Caglayan, M., 2001. Nonlinear adjustment to purchasing power parity in the

post-Bretton Woods era. Journal of International Money and Finance 20, 379–399.

Campbell, J.Y., 1995. Some lessons from the yield curve. Journal of Economic Perspectives 9, 129–152.

Campbell, J.Y., Shiller, R.J., 1987. Cointegration and tests of present value models. Journal of Political

Economy 95, 1062–1088.

Campbell, J.Y., Shiller, R.J., 1991. Yield spreads and interest rate movements: a bird’s eye view Review of

Economic Studies 58, 495–514.

Caner, M., Hansen, B.E., 2001. Threshold autoregression with a unit root. Econometrica, 69, 1555–1596.

Chan, K.S., 1993. Consistency and limiting distribution of the least squares estimator of a threshold

autoregressive model. The Annals of Statistics 21, 520–533.

Davies, R.B., 1987. Hypothesis testing when a nuisance parameter is present only under the alternative.Biometrika 74, 33–43.

Dorsey, R.E., Mayer, W.J., 1995. Genetic algorithms for estimation problems with multiple optima, no

dierentiability, and other irregular features. Journal of Business and Economic Statistics 13, 53–66.




Enders, W., Falk, B., 1998. Threshold-autoregressive, median-unbiased, and cointegration tests of purchasing

power parity. International Journal of Forecasting 14, 171–186.

Gine, E., Zinn, J., 1990. Bootstrapping general empirical measures. The Annals of Probability 18, 851–869.

Hansen, B.E., 1996. Inference when a nuisance parameter is not identiÿed under the null hypothesis.Econometrica 64, 413–430.

Hansen, B.E., 2000a. Sample splitting and threshold estimation. Econometrica 68, 575–603.

Hansen, B.E., 2000b. Testing for structural change in conditional models. Journal of Econometrics 97,

93–115.

Lo, M., Zivot, E., 2001. Threshold cointegration and nonlinear adjustment to the law of one price.

Macroeconomic Dynamics 5, 533–576.

Martens, M., Kofman, P., Vorst, T.C.F., 1998. A threshold error-correction model for intraday futures and

index returns. Journal of Applied Econometrics 13, 245–263.

McCulloch, J.H., Kwon, H.-C., 1993. US term structure data, 1947–1991. Ohio State University Working

Paper No. 93-6.

Michael, M., Nobay, R., Peel, D.A., 1997. Transactions costs and nonlinear adjustment in real exchange

rates: an empirical investigation. Journal of Political Economy 105, 862–879.Obstfeld, M., Taylor, A.M., 1997. Nonlinear aspects of goods market arbitrage and adjustment: Heckscher’s

commodity points revisited. Journal of the Japanese and International Economies 11, 441–479.

O’Connell, P.G.J., 1998. Market frictions and real exchange rates. Journal of International Money and Finance

17, 71–95.

O’Connell, P.G.J., Wei, S.-J., 1997. The bigger they are, the harder they fall: how price dierences across

U.S. cities are arbitraged. NBER Worker Paper No. 6089.

Pippenger, M.K., Goering, G.E., 2000. Additional results on the power of unit root and cointegration tests

under threshold processes. Applied Economics Letters 7, 641–644.

Seo, B., 1998. Tests for structural change in cointegrated systems. Econometric Theory 14, 221–258.

Taylor, A.M., 2001. Potential pitfalls for the purchasing-power-parity puzzle? Sampling and speciÿcation

biases in mean-reversion tests of the law of one price. Econometrica 69, 473–498.

Tsay, R.S., 1989. Testing and modeling threshold autoregressive processes. Journal of the American Statistical

Association 84, 231–240.

Tsay, R.S., 1998. Testing and modeling multivariate threshold models. Journal of the American Statistical

Association 93, 1188–1998.

White, H., 1980. A heteroskedasticity-consistent covariance matrix estimator and a direct test for

heteroskedasticity. Econometrica 48, 817–838.

Documents

hansen & seo