Upload
ketan-changotra
View
217
Download
0
Embed Size (px)
Citation preview
8/7/2019 hansen & seo
http://slidepdf.com/reader/full/hansen-seo 1/26
Journal of Econometrics 110 (2002) 293–318
www.elsevier.com/locate/econbase
Testing for two-regime threshold cointegration invector error-correction models
Bruce E. Hansena ; ∗, Byeongseon Seob
a Department of Economics, University of Wisconsin, Madison, WI 53706, USAbDepartment of Economics, Soongsil University, Seoul 156-743, South Korea
Abstract
This paper examines a two-regime vector error-correction model with a single cointegrating
vector and a threshold eect in the error-correction term. We propose a relatively simple algo-
rithm to obtain maximum likelihood estimation of the complete threshold cointegration model for
the bivariate case. We propose a SupLM test for the presence of a threshold. We derive the null
asymptotic distribution, show how to simulate asymptotic critical values, and present a bootstrap
approximation. We investigate the performance of the test using Monte Carlo simulation, andÿnd that the test works quite well. Applying our methods to the term structure model of interest
rates, we ÿnd strong evidence for a threshold eect.
c 2002 Published by Elsevier Science B.V.
JEL classiÿcation: C32
Keywords: Term structure; Bootstrap; Identiÿcation; Non-linear; Non-stationary
1. Introduction
Threshold cointegration was introduced by Balke and Fomby (1997) as a feasible
means to combine non-linearity and cointegration. In particular, the model allows for
non-linear adjustment to long-run equilibrium. The model has generated signiÿcant
applied interest, including the following applications: Balke and Wohar (1998), Baum
et al. (2001), Baum and Karasulu (1998), Enders and Falk (1998), Lo and Zivot
(2001), Martens et al. (1998), Michael et al. (1997), O’Connell (1998), O’Connell
and Wei (1997), Obstfeld and Taylor (1997), and Taylor (2001). Lo and Zivot (2001)
provide an extensive review of this growing literature.
∗ Corresponding author. Tel.: +1-608-263-2989; fax: +1-608-262-2033.
E-mail address: [email protected] (B.E. Hansen).
0304-4076/02/$ - see front matter c 2002 Published by Elsevier Science B.V.
PII: S 0 3 0 4 - 4 0 7 6 ( 0 2 ) 0 0 0 9 7 - 0
8/7/2019 hansen & seo
http://slidepdf.com/reader/full/hansen-seo 2/26
294 B.E. Hansen, B. Seo/ Journal of Econometrics 110 (2002) 293 – 318
One of the most important statistical issues for this class of models is testing for the
presence of a threshold eect (the null of linearity). Balke and Fomby (1997) proposed
using the application of the univariate tests of Hansen (1996) and Tsay (1989) to the
error-correction term (the cointegrating residual). This is known to be valid when thecointegrating vector is known, but Balke–Fomby did not provide a theory for the case
of estimated cointegrating vector. Lo and Zivot (2001) extended the Balke–Fomby
approach to a multivariate threshold cointegration model with a known cointegrating
vector, using the tests of Tsay (1998) and multivariate extensions of Hansen (1996).
In this paper, we extend this literature by examining the case of unknown cointegrat-
ing vector. As in Balke–Fomby, our model is a vector error-correction model (VECM)
with one cointegrating vector and a threshold eect based on the error-correction term.
However, unlike Balke–Fomby who focus on univariate estimation and testing meth-
ods, our estimates and tests are for the complete multivariate threshold model. The fact
that we use the error-correction term as the threshold variable is not essential to our
analysis, and the methods we discuss here could easily be adapted to incorporate other
models where the threshold variable is a stationary transformation of the predetermined
variables.
This paper makes two contributions. First, we propose a method to implement max-
imum likelihood estimation (MLE) of the threshold model. This algorithm involves
a joint grid search over the threshold and the cointegrating vector. The algorithm is
simple to implement in the bivariate case, but would be dicult to implement in higher
dimensional cases. Furthermore, at this point we do not provide a proof of consistency,
nor a distribution theory for the MLE.Second, we develop a test for the presence of a threshold eect. Under the null hy-
pothesis, there is no threshold, so the model reduces to a conventional linear VECM.
Thus estimation under the null hypothesis is particularly easy, reducing to conven-
tional reduced rank regression. This suggests that a test can be based on the Lagrange
multiplier (LM) principle, which only requires estimation under the null. Since the
threshold parameter is not identiÿed under the null hypothesis, we base inference on a
SupLM test. (See Davies (1987), Andrews (1993), and Andrews and Ploberger (1994)
for motivation and justiÿcation for this testing strategy.) Our test takes a similar al-
gebraic form to those derived by Seo (1998) for structural change in error-correction
models.We derive the asymptotic null distribution of the Sup-LM test, and ÿnd that it is
identical to the form found in Hansen (1996) for threshold tests applied to stationary
data. In general, the asymptotic distribution depends on the covariance structure of the
data, precluding tabulation. We suggest using either the ÿxed regressor bootstrap of
Hansen (1996, 2000b), or alternatively a parametric residual bootstrap algorithm, to
approximate the sampling distribution.
Section 2 introduces the threshold models and derives the Gaussian quasi-MLE for
the models. Section 3 presents our LM test for threshold cointegration, its asymp-
totic distribution, and two methods to calculate p-values. Section 4 presents simu-
lation evidence concerning the size and power of the tests. Section 5 presents anapplication to the term structure of interest rates. Proofs of the asymptotic distri-
bution theory are presented in the appendix. Gauss programs which compute the
8/7/2019 hansen & seo
http://slidepdf.com/reader/full/hansen-seo 3/26
B.E. Hansen, B. Seo/ Journal of Econometrics 110 (2002) 293 – 318 295
estimates and test, and replicate the empirical work reported in this paper, are available
at www.ssc.wisc.edu/∼bhansen.
2. Estimation
2.1. Linear cointegration
Let xt be a p-dimensional I (1) time series which is cointegrated with one p × 1
cointegrating vector ÿ. Let wt (ÿ) = ÿxt denote the I (0) error-correction term. A linear
VECM of order l + 1 can be compactly written as
xt = AX t −1(ÿ) + ut ; (1)
where
X t −1(ÿ) =
1
wt −1(ÿ)
xt −1
xt −2
...
xt −
l
:
The regressor X t −1(ÿ) is k × 1 and A is k × p where k = pl + 2. The error ut is
assumed to be a vector martingale dierence sequence (MDS) with ÿnite covariance
matrix = E(ut u
t ).
The notation wt −1(ÿ) and X t −1(ÿ) indicates that the variables are evaluated at generic
values of ÿ. When evaluated at the true value of the cointegrating vector, we will denote
these variables as wt −1 and X t −1, respectively.
We need to impose some normalization on ÿ to achieve identiÿcation. Since there
is just one cointegrating vector, a convenient choice is to set one element of ÿ equal
to unity, which has no cost when the system is bi-variate (p = 2) and for p ¿ 2 only
imposes the restriction that the corresponding element of xt enters the cointegratingrelationship.
The parameters (ÿ;A;) are estimated by maximum likelihood under the assumption
that the errors ut are iid Gaussian (using the above normalization on ÿ). Let these
estimates be denoted (ÿ; A; ). Let u t = xt − A
X t −1(ÿ) be the residual vectors.
2.2. Threshold cointegration
As an extension of model (1), a two-regime threshold cointegration model takes the
form
xt =
A
1X t −1(ÿ) + ut if wt −1(ÿ)6 ;
A
2X t −1(ÿ) + ut if wt −1(ÿ) ¿ ;
8/7/2019 hansen & seo
http://slidepdf.com/reader/full/hansen-seo 4/26
296 B.E. Hansen, B. Seo/ Journal of Econometrics 110 (2002) 293 – 318
where is the threshold parameter. This may alternatively be written as
xt = A
1X t −1(ÿ)d1t (ÿ; ) + A
2X t −1(ÿ)d2t (ÿ; ) + ut ; (2)
where
d1t (ÿ; ) = 1(wt −1(ÿ)6 );
d2t (ÿ; ) = 1(wt −1(ÿ) ¿ )
and 1(·) denotes the indicator function.
Threshold model (2) has two regimes, deÿned by the value of the error-correction
term. The coecient matrices A1 and A2 govern the dynamics in these regimes. Model
(2) allows all coecients (except the cointegrating vector ÿ) to switch between these
two regimes. In many cases, it may make sense to impose greater parsimony on themodel, by only allowing some coecients to switch between regimes. This is a special
case of (2) where constraints are placed on (A1; A2). For example, a model of particular
interest only lets the coecients on the constant and the error correction wt −1 to switch,
constraining the coecients on the lagged xt −j to be constant across regimes.
The threshold eect only has content if 0 ¡ P(wt −16 ) ¡ 1, otherwise the model
simpliÿes to linear cointegration. We impose this constraint by assuming that
06P(wt −16 )6 1 − 0; (3)
where 0 ¿ 0 is a trimming parameter. For the empirical application, we set 0 = 0:05.We propose estimation of model (2) by maximum likelihood, under the assumption
that the errors ut are iid Gaussian. The Gaussian likelihood is
Ln(A1; A2; ; ÿ ; ) = −n
2log|| − 1
2
nt =1
ut (A1; A2; ÿ ; )−1ut (A1; A2; ÿ ; );
where
ut (A1; A2; ÿ ; ) = xt
−A
1X t −1(ÿ)d1t (ÿ; )
−A
2X t −1(ÿ)d2t (ÿ; ):
The MLE (A1; A2; ; ÿ; ) are the values which maximize Ln(A1; A2; ; ÿ ; ).
It is computationally convenient to ÿrst concentrate out (A1; A2; ). That is, hold
(ÿ; ) ÿxed and compute the constrained MLE for (A1; A2; ). This is just OLS regres-
sion, speciÿcally, 1
A1(ÿ; ) =
n
t =1
X t −1(ÿ)X t −1(ÿ)d1t (ÿ; )
−1 n
t =1
X t −1(ÿ)xt d1t (ÿ; )
; (4)
1 These formulas are for unconstrained model (2). If a constrained threshold model is used, then the
appropriate constrained OLS estimates should be used.
8/7/2019 hansen & seo
http://slidepdf.com/reader/full/hansen-seo 5/26
B.E. Hansen, B. Seo/ Journal of Econometrics 110 (2002) 293 – 318 297
A2(ÿ; ) =
n
t =1
X t −1(ÿ)X t −1(ÿ)d2t (ÿ; )
−1 n
t =1
X t −1(ÿ)xt d2t (ÿ; )
; (5)
u t (ÿ; ) = ut (A1(ÿ; ); A2(ÿ; ); ÿ ; )
and
(ÿ; ) =1
n
nt =1
u t (ÿ; )u t (ÿ; ): (6)
It may be helpful to note that (4) and (5) are the OLS regressions of xt on X t −1(ÿ)
for the subsamples for which wt −1(ÿ)6 and wt −1(ÿ) ¿ , respectively.
This yields the concentrated likelihood function
Ln(ÿ; ) =Ln(A1(ÿ; ); A2(ÿ; ); (ÿ; ); ÿ ; )
=−n
2log|(ÿ; )| − np
2: (7)
The MLE (ÿ; ) are thus found as the minimizers of log|(ÿ; )| subject to the nor-
malization imposed on ÿ as discussed in the previous section and the constraint
06 n−1
n
t =1
1(xt ÿ6 )6 1 − 0
(which imposes (3)). The MLE for A1 and A2 are A1 = A1(ÿ; ) and A2 = A2(ÿ; ).
This criterion function (7) is not smooth, so conventional gradient hill-climbing
algorithms are not suitable for its maximization. In the leading case p = 2, we suggest
using a grid search over the two-dimensional space (ÿ; ). In higher dimensional cases,
grid search becomes less attractive, and alternative search methods (such as a genetic
algorithm, see Dorsey and Mayer, 1995) might be more appropriate. Note that in the
event that ÿ is known a priori, this grid search is greatly simpliÿed.
To execute a grid search, one needs to pick a region over which to search. We sug-
gest calibrating this region based on the consistent estimate˜
ÿ obtained from the linearmodel (the MLE 2 discussed in Section 2.1). Set wt −1 =wt −1(ÿ), let [L; U] denote the
empirical support of wt −1, and construct an evenly spaced grid on [L; U]. Let [ÿL; ÿU]
denote a (large) conÿdence interval for ÿ constructed from the linear estimate ÿ (based,
for example, on the asymptotic normal approximation) and construct an evenly spaced
grid on [ÿL; ÿU]. The grid search over (ÿ; ) then examines all pairs (; ÿ) on the grids
on [L; U] and [ÿL; ÿU], conditional on 06 n−1n
t =1 1(xt ÿ6 )6 1− 0 (the latter
to impose constraint (3)).
In Figs. 1 and 2, we illustrate the non-dierentiability of the criterion function for an
empirical example from Section 5. (The application is to the 12- and 120-month T-Bill
2 Any consistent estimator could in principle be used here. In all our simulations and applications, we use
the Johansen MLE.
8/7/2019 hansen & seo
http://slidepdf.com/reader/full/hansen-seo 6/26
298 B.E. Hansen, B. Seo/ Journal of Econometrics 110 (2002) 293 – 318
Fig. 1. Concentrated negative log likelihood.
Fig. 2. Concentrated negative log likelihood.
8/7/2019 hansen & seo
http://slidepdf.com/reader/full/hansen-seo 7/26
B.E. Hansen, B. Seo/ Journal of Econometrics 110 (2002) 293 – 318 299
rates, setting l = 1.) Fig. 1 plots criterion (7) as a function of with ÿ concentrated
out, and Fig. 2 plots the criterion as a function of ÿ with concentrated out.
In summary, our algorithm for the p = 2 case is
1. Form a grid on [L; U] and [ÿL; ÿU] based on the linear estimate ÿ as described
above.
2. For each value of (ÿ; ) on this grid, calculate A1(ÿ; ), A2(ÿ; ), and (ÿ; ) as
deÿned in (4), (5), and (6), respectively.
3. Find (ÿ; ) as the values of (ÿ; ) on this grid which yields the lowest value of
log|(ÿ; )|.4. Set = (ÿ; ), A1 = A1(ÿ; ), A2 = A2(ÿ; ), and u t = u t (ÿ; ).
It is useful to note that in step 3, there is no guarantee that the minimizers ( ÿ; )
will be unique, as the log-likelihood function is not concave.We have described an algorithm to implement the MLE, but it should be emphasized
that this is not a theory of inference. We have not provided a proof of consistency
of the estimator, nor a distribution theory. In linear models, ÿ converges to ÿ at rate
n, and in stationary models, converges to at rate n. It therefore seems reasonable
to guess that in the threshold cointegration model, (ÿ; ) will converge to (ÿ; ) at
rate n. In this case, the slope estimates A1 and A2 should have conventional normal
asymptotic distributions as if ÿ and were known. Hence conventional standard errors
can be reported for these parameter estimates.
3. Testing for a threshold
3.1. Test statistics
Let H0 denote the class of linear VECM models (1) and H1 denote the class of
two-regime threshold models (2). These models are nested, and the restriction H0 are
the models in H1 which satisfy A1 = A2.
We want to test H0 (linear cointegration) versus H1 (threshold cointegration). Wefocus on formal model-based statistical tests, as these allow for direct model compar-
isons, and yield the greatest power for discrimination between models. Alternatively,
one might consider the non-parametric non-linearity tests of Tsay (1989) and Tsay
(1998) for the univariate and multivariate cases, respectively. As shown in the simula-
tion studies by Balke and Fomby (1997) and Lo and Zivot (2001), these non-parametric
tests generally have lower power than comparable model-based tests.
In this paper we consider LM statistics. We do this for two reasons. First, the LM
statistic is computationally quick, enabling feasible implementation of the bootstrap.
Second, a likelihood ratio or Wald-type test would require a distribution theory for
the parameter estimates for the unrestricted model, which we do not yet have. Weconjecture, but have no proof, that these tests are asymptotically equivalent to the LM
test. We now derive the LM test statistic.
8/7/2019 hansen & seo
http://slidepdf.com/reader/full/hansen-seo 8/26
300 B.E. Hansen, B. Seo/ Journal of Econometrics 110 (2002) 293 – 318
Assume for the moment that (ÿ; ) are known and ÿxed. The model under H0 is
xt = AX t −1(ÿ) + ut (8)
and H1 is
xt = A
1X t −1(ÿ)d1t (ÿ; ) + A
2X t −1(ÿ)d2t (ÿ; ) + ut : (9)
Given (ÿ; ), the models are linear so the MLE is least squares. As (8) is nested in (9)
and the models are linear, an LM-like statistic which is robust to heteroskedasticity
can be calculated from a linear regression on model (9). Speciÿcally, let X 1(ÿ; )
and X 2(ÿ; ) be the matrices of the stacked rows X t −1(ÿ)d1t (ÿ; ) and X t −1(ÿ)d2t (ÿ; ),
respectively, let 1(ÿ; ) and 2(ÿ; ) be the matrices of the stacked rows u t ⊗X t −1(ÿ)d1t
(ÿ; ) and u t
⊗X t −1(ÿ)d2t (ÿ; ), respectively, with u t the residual vector from the linear
model as deÿned in Section 2.1, and deÿne the outer product matrices
M 1(ÿ; ) = I p ⊗ X 1(ÿ; )X 1(ÿ; );
M 2(ÿ; ) = I p ⊗ X 2(ÿ; )X 2(ÿ; )
and
1(ÿ; ) = 1(ÿ; )1(ÿ; );
2(ÿ; ) = 2(ÿ; )2(ÿ; ):
Then we can deÿne V 1(ÿ; ) and V 2(ÿ; ), the Eicker–White covariance matrix estima-
tors for vec A1(ÿ; ) and vec A2(ÿ; ), as
V 1(ÿ; ) = M 1(ÿ; )−11(ÿ; )M 1(ÿ; )−1; (10)
V 2(ÿ; ) = M 2(ÿ; )−12(ÿ; )M 2(ÿ; )−1 (11)
yielding the standard expression for the heteroskedasticity-robust LM-like statistic
LM(ÿ; ) = vec(ˆ
A1(ÿ; ) −ˆ
A2(ÿ; ))
(ˆ
V 1(ÿ; ) +ˆ
V 2(ÿ; ))
−1
× vec(A1(ÿ; ) − A2(ÿ; )): (12)
If ÿ and were known, (12) would be the test statistic. When they are unknown,
the LM statistic is (12) evaluated at point estimates obtained under H0. The null
estimate of ÿ is ÿ (Section 2.1), but there is no estimate of under H0, so there is
no conventionally deÿned LM statistic. Arguing from the union–intersection principle,
Davies (1987) proposed the statistic
SupLM = sup
L66U
LM(ÿ; ): (13)
For this test, the search region [L; U] is set so that L is the 0 percentile of wt −1,
and U is the (1−0) percentile. This imposes constraint (3). For testing, the parameter
8/7/2019 hansen & seo
http://slidepdf.com/reader/full/hansen-seo 9/26
B.E. Hansen, B. Seo/ Journal of Econometrics 110 (2002) 293 – 318 301
0 should not be too close to zero, as Andrews (1993) shows that doing so reduces
power. Andrews (1993) argues that setting 0 between 0.05 and 0.15 are typically
good choices.
Further justiÿcation for statistic (13) is given in Andrews (1993) and Andrewsand Ploberger (1994). Andrews and Ploberger (1994) argue that better power may
be achieved by using exponentially weighted averages of LM(ÿ; ), rather than the
supremum. There is an inherent arbitrariness in this choice of statistic, however, due
to the choice of weighting function, so our analysis will remain conÿned to (13).
As the function LM(ÿ; ) is non-dierentiable in , to implement the maximization
deÿned in (13) it is necessary to perform a grid evaluation over [L; U].
In the event that the true cointegrating vector ÿ0 is known a priori, then the test
takes form (13), except that ÿ is ÿxed at the known value ÿ0. We denote this test
statistic as
SupLM0 = supL66U
LM(ÿ0; ): (14)
It is important to know that the values of which maximize the expressions in (13)
and (14) will be dierent from the MLE presented in Section 2. This is true for
two separate reasons. First, (13) and (14) are LM tests, and are based on parameter
estimates obtained under the null rather than the alternative. Second, these LM statistics
are computed with heteroskedasticity-consistent covariance matrix estimates, and in this
case even the maximizers of SupWald statistics are dierent from the MLE (the latter
equal only when homoskedastic covariance matrix estimates are used). This dierence
is generic in threshold testing and estimation for regression models, and not special tothreshold cointegration.
3.2. Asymptotic distribution
First consider the case that the true cointegrating vector ÿ0 is known. The regres-
sors are stationary, and the testing problem is a multivariate generalization of Hansen
(1996). It follows that the asymptotic distribution of the tests will take the form given
in that paper. We require the following standard weak dependence conditions.
Assumption. {ÿ
xt ; xt } is L4r -bounded; strictly stationary and absolutely regular; withmixing coecients Ám = O(m−A) where A ¿ =(− 1) and r ¿ ¿ 1. Furthermore; the
error ut is an MDS; and the error-correction ÿxt has a bounded density function.
Under null hypothesis (8), these conditions are known to hold when ut is iid with
a bounded density and L4r -bounded. Under alternative hypothesis (9) these conditions
are known to hold under further restrictions on the parameters.
Let F (·) denote the marginal distribution of wt −1, let “⇒” denote weak convergence
with respect to the uniform metric on [0; 1 − 0]. Deÿne t −1 = F (wt −1) and
M (r ) = I p ⊗ E(X t −
1X
t −11(t −
16
r ));and
(r ) = E[1(t −16 r )(ut u
t ⊗ X t −1X t −1)]:
8/7/2019 hansen & seo
http://slidepdf.com/reader/full/hansen-seo 10/26
302 B.E. Hansen, B. Seo/ Journal of Econometrics 110 (2002) 293 – 318
Theorem 1. Under H0;
SupLM0
⇒T = sup
06r 61−0
T (r );
where
T (r ) = S ∗(r )∗(r )−1S ∗(r );
∗(r ) = (r ) − M (r )M (1)−1(r ) − (r )M (1)−1M (r )
+ M (r )M (1)−1(1)M (1)−1M (r );
and
S ∗(r ) = S (r ) − M (r )M (1)−1S (1);
where S (r ) is a mean-zero matrix Gaussian process with covariance kernel E(S (r 1)
S (r 2)) = (r 1 ∧ r 2).
The asymptotic distribution in Theorem 1 is the same as that presented in Hansen
(1996). In general, the asymptotic distribution does not simplify further. However, we
discuss one special simpliÿcation at the end of this subsection.
Now we consider the case of estimated ÿ. Since n(ÿ − ÿ0) = Op(1), it is sucientto examine the behavior of LM(ÿ; ) in an n−1 neighborhood of ÿ0.
Theorem 2. Under H0; LM(; ) = LM(ÿ0 + =n; ) has the same asymptotic ÿnite
dimensional distributions ( ÿdi ’s) as LM(ÿ0; ).
If, in addition, we could show that the process LM (; ) is tight on compact sets, it
would follow that SupLM and SupLM0 have the same asymptotic distribution, namely
T . This would imply that the use of the estimate ÿ, rather than the true value ÿ0, does
not alter the asymptotic null distribution of the LM test. Unfortunately, we have been
unable to establish a proof of this proposition. The diculty is two-fold. The processLM(; ) is discontinuous in ÿ (due to the indicator functions) and is a function of the
non-stationary variable xt −1. There is only a small literature on empirical process results
for time-series processes, and virtually none for non-stationary data. Furthermore, the
non-stationary variable xt −1 appears in the indicator function, so Taylor series methods
cannot be used to simplify the problem.
It is our view that despite the lack of a complete proof, the ÿdi result of Theorem
2 is sucient to justify using the asymptotic distribution T for the statistic SupLM.
Theorem 1 gives an expression for the asymptotic distribution T . It has the expression
as the supremum of a stochastic process T (r ), the latter sometimes called a “chi-square
process” since for each r the marginal distribution of T (r ) is chi-square. As T isthe supremum of this stochastic process, its distribution is determined by the joint
distribution of this chi-square process, and hence depends on the unknown functions
8/7/2019 hansen & seo
http://slidepdf.com/reader/full/hansen-seo 11/26
B.E. Hansen, B. Seo/ Journal of Econometrics 110 (2002) 293 – 318 303
M (r ) and (r ). As these functionals may take a broad range of shapes, critical values
for T cannot in general be tabulated.
In one special case, we can achieve an important simpliÿcation. Take model (2)
under E(ut ut |Ft −1) = with no intercept and no lags of xt , so that the only re-gressor is the error-correction term wt −1. Then since M (r ) is scalar and monotonically
increasing, there exists a function (s) such that M ( (s)) = sM (1). We can without
loss of generality normalize M (1) = 1 and = I . Then S ( (s)) = W (s) is a standard
Brownian motion, S ( (s)) = W (s) − sW (1) is a Brownian bridge, and
T = sup06r 61−0
T (r )
= sup
06 (s)61−
0
T ( (s))
= sups16s6s2
(W (s) − sW (1))2
s(1 − s);
where s1 = −1(0) and s2 = −1(1 − 0). This is the distribution given in An-
drews (1993) for tests for structural change of unknown timing, and is a function of
only
s0 =s2(1 − s1)
s1(1 − s2)
:
3.3. Asymptotic p-values: the ÿxed regressor bootstrap
With the exception discussed at the end of Section 3.2, the asymptotic distribution
in Theorems 1 and 2 appears to depend upon the moment functionals M (r ) and (r ),
so tabulated critical values are unavailable. We discuss in this section how the ÿxed
regressor bootstrap of Hansen (1996, 2000b) can be used to calculate asymptotic critical
values and p-values, and hence achieve ÿrst-order asymptotically correct inference.
We interpret Theorem 2 to imply that the ÿrst-step estimation of the cointegratingvector ÿ does not aect the asymptotic distribution of the SupLM test. We therefore
do not need to take the estimation of ÿ into account in conducting inference on the
threshold. However, since Theorem 2 is not a complete proof of the asymptotic dis-
tribution of SupLM when ÿ is estimated, we should emphasize that this is partially a
conjecture.
We now describe the ÿxed regressor bootstrap. Let wt −1 = wt −1(ÿ), X t −1 = X t −1(ÿ),
and let u t be the residuals from the reduced rank regression as described in
Section 2. For the remainder of our discussion, u t , wt −1, X t −1 and ÿ are held ÿxed at
their sample values.
Let ebt be iid N(0; 1) and set ybt = u t ebt . Regress ybt on X t −1 yielding residuals u bt .Regress ybt on X t −1d1t (ÿ; ) and X t −1d2t (ÿ; ), yielding estimates A1()b and A2()b,
and residuals u bt (). Deÿne V 1()b and V 2()b as in (10) and (11) setting ÿ = ÿ and
8/7/2019 hansen & seo
http://slidepdf.com/reader/full/hansen-seo 12/26
304 B.E. Hansen, B. Seo/ Journal of Econometrics 110 (2002) 293 – 318
replacing u t with u bt in the deÿnition of 1(ÿ; ) and 2(ÿ; ). Then set
SupLM∗ = supL66U
vec(A1()b
−A2()b)(V 1()b + V 2()b)−1
× vec(A1()b − A2()b):
The analysis in Hansen (1996) shows that under local alternatives to H0, SupLM∗ ⇒p
T , so the distribution of SupLM∗ yields a valid ÿrst-order approximation to the asymp-
totic null distribution of SupLM. The symbol “⇒p” denotes weak convergence in prob-
ability as deÿned in Gine and Zinn (1990).
The distribution SupLM∗ is unknown, but can be calculated using simulation
methods. The description given above shows how to create one draw from the dis-
tribution. With independent draws of the errors ebt , a new draw can be made. If this isrepeated a large number of times (e.g. 1000), a p-value can be calculated by counting
the percentage of simulated SupLM∗ which exceed the actual SupLM.
The label “ÿxed regressor bootstrap” is intended to convey the feature that the re-
gressors X t −1d1t (ÿ; ) and X t −1d2t (ÿ; ) are held ÿxed at their sample values. As such,
this is not really a bootstrap technique, and is not expected to provide a better ap-
proximation to the ÿnite sample distribution than conventional asymptotic approxima-
tions. The advantage of the method is that it allows for heteroskedasticity of unknown
form, while conventional model-based bootstrap methods eectively impose indepen-
dence on the errors ut and therefore do not achieve correct ÿrst-order asymptotic infer-
ence. It allows for general heteroskedasticity in much the same way as White’s (1980)heteroskedasticity-consistent standard errors.
3.4. Residual bootstrap
The ÿxed regressor bootstrap of the previous section has much of the computational
burden of a bootstrap, but only approximates the asymptotic distribution. While we have
no formal theory, it stands to reason that a bootstrap method might achieve better ÿnite
sample performance than asymptotic methods. This conjecture is not obvious, as the
asymptotic distribution of Section 3.2 is non-pivotal, and it is known that the bootstrapin general does not achieve an asymptotic reÿnement (an improved rate of convergence
relative to asymptotic inference) when asymptotic distributions are non-pivotal.
One cost of using the bootstrap is the need to be fully parametric concerning
the data-generating mechanism. In particular, it is dicult to incorporate conditional
heteroskedasticity, and in its presence a conventional bootstrap (using iid innovations)
will fail to achieve the ÿrst-order asymptotic distribution (unlike the ÿxed regressors
bootstrap, which does).
The parametric residual bootstrap method requires a complete speciÿcation of the
model under the null. This is Eq. (1) plus auxiliary assumptions on the errors ut
and the initial conditions. In our applications, we assume ut is iid from an unknowndistribution G, and the initial conditions are ÿxed (other choices are possible). The
bootstrap calculates the sampling distribution of the test SupLM using this model and
8/7/2019 hansen & seo
http://slidepdf.com/reader/full/hansen-seo 13/26
B.E. Hansen, B. Seo/ Journal of Econometrics 110 (2002) 293 – 318 305
the parameter estimates obtained under the null. The latter are ÿ, A, and the empirical
distribution of the bi-variate residuals u t .
The bootstrap distribution may be calculated by simulation. Given the ÿxed initial
conditions, random draws are made from the residual vectors u t , and then the vectorseries xbt are created by recursion given model (1). The statistic SupLM∗ is calcu-
lated on each simulated sample and stored. The bootstrap p-value is the percentage of
simulated statistics which exceed the actual statistic.
4. Simulation evidence
4.1. Threshold test
Monte Carlo experiments are performed to ÿnd out the small sample performance
of the test. The experiments are based on a bivariate error-correction model with two
lags. Letting xt = (x1t x2t ), the single-regime model H0 is
xt =
1
2
+
1
2
(x1t −1 − ÿx2t −1) + xt −1 +
u1t
u2t
: (15)
The two-regime model H1 is the generalization of (15) as in (2), allowing all coe-
cients to dier depending if x1t −1 − x2t −16 or x1t −1 − x2t −1 ¿ .
Our tests are based on model (15), allowing all coecients to switch betweenregimes under the alternative. The tests are calculated setting 0 = 0:10, using 50
gridpoints on [L; U] for calculation of (13), and using 200 bootstrap replications for
each replication. Our results are calculated from 1000 simulation replications.
We ÿx 1 = 2 = 0, ÿ = 1, and 1 = −1. We vary 2 among (0;−0:5; 0:5), and
among
0 =
0 0
0 0
; 1 =
−0:2 0
−0:1 −0:2
; 2 =
−0:2 −0:1
−0:1 −0:2
;
and consider two sample sizes, n = 100 and 250. We generated the errors u1t and u2t
under homoskedastic and heteroskedastic speciÿcations. For a homoskedastic error, we
generate u1t and u2t as independent N(0; 1) variates. For a heteroskedastic error, we
generate u1t and u2t as independent GARCH(1; 1) processes, with u1t ∼ N(0; 21t ) and
21t = 1 + 0:2u2
1t −1 + 21t −1, and similarly u2t .
We ÿrst explored the size of the SupLM and SupLM0 statistics under the null
hypothesis H0 of a single regime. This involved generating data from linear model
(15). For each simulated sample, the statistics and p-values were calculated using both
the ÿxed-regressor bootstrap and the residual bootstrap. In Table 1, we report rejection
frequencies from nominal 5% and 10% tests for the SupLM statistic (ÿ unknown). The
results for the SupLM0 statistic (ÿ known) were very similar and so are omitted.For the ÿrst ÿve parameterizations, we generate u1t and u2t as independent N(0; 1)
variates, and vary the parameters 2 and . The rejection frequencies of the tests
8/7/2019 hansen & seo
http://slidepdf.com/reader/full/hansen-seo 14/26
306 B.E. Hansen, B. Seo/ Journal of Econometrics 110 (2002) 293 – 318
Table 1
Size of SupLM tests
Parameters Homoskedastic errors Heteroskedastic
2 0 −0:5 0:5 0 0 0 0 0
0 0 0 1 2 0 0 0
0 0 0 0 0 0.25 0.50 0.75
5% nominal size, n = 100
Fixed-regressor bootstrap 0.083 0.072 0.108 0.071 0.074 0.075 0.080 0.085
Residual bootstrap 0.058 0.052 0.084 0.049 0.048 0.054 0.065 0.065
5% nominal size, n = 250
Fixed-regressor bootstrap 0.075 0.080 0.070 0.067 0.071 0.064 0.067 0.076
Residual bootstrap 0.052 0.058 0.055 0.052 0.053 0.053 0.051 0.059
10% nominal size, n = 100
Fixed-regressor bootstrap 0.156 0.133 0.186 0.134 0.144 0.162 0.146 0.156
Residual bootstrap 0.125 0.111 0.139 0.100 0.111 0.125 0.117 0.126
10% nominal size, n = 250
Fixed-regressor bootstrap 0.138 0.135 0.147 0.122 0.127 0.117 0.133 0.134
Residual bootstrap 0.106 0.113 0.109 0.093 0.095 0.089 0.099 0.103
are reported in the ÿrst ÿve columns of Table 1. (These are the percentage of thesimulated p-values which are smaller than the nominal size.) The rejection frequencies
are similar across the various parameterizations. Using the ÿxed-regressor bootstrap,
the test somewhat over-rejects, with the rejection rate at the nominal 5% level and
n = 100 ranging from 0.071 to 0.108. If the residual bootstrap is used, the test has
much better size, with rejection rates ranging from 0.048 to 0.084. If the sample size
is increased to n = 250, then the size improves considerably, with the 5% rejection
rates for the ÿxed-regressor bootstrap ranging from 0.067 to 0.080 and those for the
residual bootstrap ranging from 0.052 to 0.058.
For the remaining three parameterizations, we generate u1t and u2t as independent
GARCH(1; 1) processes. The other parameters are set as in the ÿrst column of Table1, and the results for the SupLM tests are reported in the ÿnal three columns of Table
1. The rejection rates do not appear to be greatly aected by the heteroskedasticity.
The rejection rates for both SupLM tests increase modestly, and the best results are
obtained again by the residual bootstrap. (This might appear surprising, as the residual
bootstrap does not replicate the GARCH dependence structure, but the LM statistics are
constructed with heteroskedasticity-robust covariance matrices, so are ÿrst-order robust
to GARCH.)
We next explored the power of the tests against the two-regime alternative H1. To
keep the calculations manageable, we generate the data from the simple process
xt =
−1
0
(x1t −1 − x2t −1) +
0
(x1t −1 − x2t −1)1(x1t −1 − x2t −16 ) + ut ;
8/7/2019 hansen & seo
http://slidepdf.com/reader/full/hansen-seo 15/26
B.E. Hansen, B. Seo/ Journal of Econometrics 110 (2002) 293 – 318 307
Table 2
Power of SupLM and SupLM0 tests, 5% nominal size against two-regime alternative
0.5 0.25
0:2 0:4 0:6 0:8 0:2 0:4 0:6 0:8
n = 100
SupLM, ÿxed-regressor bootstrap 0.107 0.207 0.395 0.624 0.119 0.231 0.364 0.468
SupLM; residual bootstrap 0.081 0.175 0.324 0.587 0.091 0.183 0.322 0.425
SupLM0, ÿxed-regressor bootstrap 0.104 0.212 0.411 0.713 0.117 0.224 0.392 0.517
SupLM0, residual bootstrap 0.087 0.166 0.357 0.647 0.088 0.174 0.332 0.453
n = 250
SupLM; ÿxed-regressor bootstrap 0.156 0.450 0.878 0.997 0.187 0.527 0.844 0.933
SupLM; residual bootstrap 0.117 0.399 0.845 0.995 0.152 0.481 0.816 0.914
SupLM0, ÿxed-regressor bootstrap 0.154 0.460 0.896 0.999 0.184 0.546 0.856 0.953
SupLM0, residual bootstrap 0.121 0.403 0.852 0.998 0.144 0.492 0.823 0.929
with ut iid N(0; I 2). Setting =0, the null hypothesis holds and this process corresponds
to the data generated from the ÿrst column of Table 1. For = 0, the alternative of a
two-regime model holds. The threshold parameter is set so that P(wt −16 ) equals
either 0.5 or 0.25. While the data are generated from the ÿrst-order VAR, we used the
same tests as described above, which are based on an estimated second-order VAR.
Table 2 reports the rejection frequency of the SupLM and SupLM0 tests at the 5%
size for several values of . As expected, the power increases in the threshold eect
and sample size n. The ÿxed regressor bootstrap has a higher rejection rate than
the parametric bootstrap, but this is a likely artifact of the size distortions shown in
Table 1. The SupLM0 test (known ÿ) has slightly higher power than the SupLM test
(unknown ÿ) but the dierence is surprisingly small. At least in these settings, there
is little power loss due to estimation of ÿ.
4.2. Parameter estimates
We next explore the ÿnite sample distributions of the estimators of the cointegratingvector ÿ and the threshold parameter . The simulation is based on the following
process:
xt =
−1
0
(x1t −1 − ÿ0x2t −1) +
−2
0
1(x1t −1 − ÿ0x2t −16 0)
+
0:5
0
(x1t −1 − ÿ0x2t −1)1(x1t −1 − ÿ0x2t −1 ¿ 0) + ut
with ut ∼ iid N(0; I 2). We set the cointegrating coecient ÿ0 at 1 and the thresholdcoecient 0 at 0. This model has threshold eects in both the intercept and in the
error correction. We consider two sample sizes, n = 100 and 250. We varied some of
8/7/2019 hansen & seo
http://slidepdf.com/reader/full/hansen-seo 16/26
308 B.E. Hansen, B. Seo/ Journal of Econometrics 110 (2002) 293 – 318
Table 3
Distribution of estimators
Mean RMSE MAE Percentiles (%)
5 25 50 75 95
n = 100
ÿ − ÿ0 − 0.0002 0.0729 0.0154 − 0.0382 − 0.0104 − 0.0004 0.0102 0.0322
ÿ0 − ÿ0 − 0.0000 0.0524 0.0100 − 0.0266 − 0.0056 0.0001 0.0056 0.0234
ÿ − ÿ0 0.0000 0.0982 0.0234 − 0.0493 − 0.0196 − 0.0004 0.0187 0.0475
− 0 − 0.0621 0.9778 0.1460 − 0.3462 − 0.0940 − 0.0145 0.0462 0.2207
0 − 0 − 0.0918 0.9967 0.1221 − 0.3351 − 0.0773 − 0.0320 − 0.0050 0.0983
n = 250
ÿ − ÿ0 0.0000 0.0130 0.0048 − 0.0107 − 0.0035 − 0.0000 0.0035 0.0107
ÿ0 − ÿ0 − 0.0001 0.0091 0.0029 − 0.0080 − 0.0017 − 0.0000 0.0017 0.0064ÿ − ÿ0 0.0003 0.0236 0.0088 − 0.0187 − 0.0060 − 0.0001 0.0066 0.0194
− 0 − 0.0051 0.1419 0.0441 − 0.1109 − 0.0323 − 0.0020 0.0238 0.0919
0 − 0 − 0.0150 0.0815 0.0259 − 0.0752 − 0.0272 − 0.0113 − 0.0014 0.0376
the coecients, but omit the results since the essential features were unchanged. While
the data are generated from a VAR(1), our estimates are based on a VAR(2).
We consider three estimators of ÿ, and two of . The pair (ÿ; ) are the unrestricted
estimators of (ÿ; ), using the algorithm 3 of Section 2. ÿ0 is the restricted estimator
obtained when the true value 0 is known. 0 is the restricted estimator of when thetrue value ÿ0 is known. We should expect ÿ0 and 0 to be more accurate than ÿ and
, respectively, and this comparison allows us to assess the cost due to estimating the
threshold and cointegrating vector, respectively. We also consider the Johansen MLE,
ÿ, which would be ecient if there were no threshold eect.
In Table 3 we report the mean, root mean squared error (RMSE), mean absolute
error (MAE), and selected percentiles of each estimator in 1000 simulation replications.
The results contain no surprises. The three estimators of ÿ all have approximate
symmetric, unbiased, distributions. The restricted estimator ÿ0 (which exploits knowl-
edge about ) is the most accurate, followed by the unrestricted estimatorˆ
ÿ. Bothare more accurate than the Johansen ÿ. It may be interesting to note that the linear
Johansen estimator ÿ (which is much easier to compute) does reasonably well, even
in the presence of the threshold eect, although there is a substantial eciency loss
for n = 250.
Both estimators of have asymmetric distributions. For n = 100, the distributions
are similar, both are meaningfully biased and the estimators are quite inaccurate. For
n = 250, the performance of both estimators is much improved, and the restricted esti-
mator 0 has considerably less dispersion but slightly higher bias than the unrestricted
estimator .
3 The grid sizes for and ÿ are 500 and 100, respectively.
8/7/2019 hansen & seo
http://slidepdf.com/reader/full/hansen-seo 17/26
B.E. Hansen, B. Seo/ Journal of Econometrics 110 (2002) 293 – 318 309
5. Term structure
Let r t be the interest rate on a one-period bond, and Rt be the interest rate on a
multi-period bond. As ÿrst suggested by Campbell and Shiller (1987), the theory of the term structure of interest rates suggests that r t and Rt should be cointegrated with
a unit cointegrating vector. This has led to a large empirical literature estimating linear
cointegrating VAR models such asRt
r t
= + wt −1 +
Rt −1
r t −1
+ ut (16)
with wt −1 = Rt −1 − ÿr t −1. Setting ÿ = 1, the error-correction term is the interest rate
spread.
Linearity, however, is not implied by the theory of the term structure. In this sec-tion, we explore the possibility that a threshold cointegration model provides a better
empirical description.
To address this question, we estimate and test models of threshold cointegration using
the monthly interest rate series of McCulloch and Kwon (1993). Following Campbell
(1995), we use the period 1952–1991. The interest rates are estimated from the prices
of U.S. Treasury securities, and correspond to zero-coupon bonds. We use a selection
of bonds rates with maturities ranging from 1 to 120 months. To select the VAR lag
length, we found that both the AIC and BIC, applied either to the linear VECM or the
threshold VECM, consistently picked l = 1 across speciÿcations. We report our results
for both l = 1 and 2 for robustness. We considered both ÿxing the cointegrating vectorÿ = 1 and letting ÿ be estimated.
First, we tested for the presence of (bivariate) cointegration, using the ADF test ap-
plied to the error-correction term (this is the Engle–Granger test when the cointegrating
vector is estimated). For all bivariate pairs and lag lengths considered, the tests 4 eas-
ily rejected the null hypothesis of no cointegration, indicating the presence of bivariate
cointegration between each pair.
To assess the evidence for threshold cointegration, we applied several sets of tests.
For the complete bivariate speciÿcation, we use the SupLM test (estimated ÿ) and the
SupLM0 test (ÿ =1) with 300 gridpoints, and the p-values calculated by the parametric
bootstrap. For comparison, we also applied the univariate Hansen (1996) threshold
autoregressive test to the error-correction term as in Balke and Fomby (1997). All
p-values were computed with 5000 simulation replications. The results are presented
in Table 4.
The multivariate tests point to the presence of threshold cointegration in some of the
bivariate relationships. In six of the nine models, the SupLM0 statistic is signiÿcant at
the 10% level when l =1 and ÿ is ÿxed at unity. If we set l = 2, the evidence appears
to strengthen, with seven of the nine signiÿcant at the 5% level. If instead of ÿxing ÿ
we estimate it freely, the evidence for threshold cointegration is diminished, with only
four of nine signiÿcant at the 5% level (in either lag speciÿcation).
4 Not reported here to conserve space.
8/7/2019 hansen & seo
http://slidepdf.com/reader/full/hansen-seo 18/26
310 B.E. Hansen, B. Seo/ Journal of Econometrics 110 (2002) 293 – 318
Table 4
Treasury Bond rates: tests for threshold cointegration (p-values)
Short rate Long rate Bivariate Univariate
ÿ = 1 ÿ estimated ÿ = 1 ÿ estimated
l = 1 l = 2 l = 1 l = 2 l = 1 l = 2 l = 1 l = 2
1-month 2-month 0.083 0.003 0.014 0.007 0.453 0.002 0.370 0.188
1-month 3-month 0.030 0.009 0.117 0.188 0.283 0.085 0.245 0.044
1-month 6-month 0.085 0.029 0.634 0.288 0.017 0.040 0.122 0.133
3-month 6-month 0.036 0.021 0.038 0.031 0.658 0.311 0.322 0.133
3-month 12-month 0.047 0.032 0.161 0.198 0.121 0.091 0.122 0.083
3-month 120-month 0.193 0.102 0.095 0.146 0.227 0.485 0.171 0.357
12-month 24-month 0.267 0.516 0.245 0.623 0.489 0.583 0.314 0.618
12-month 120-month 0.018 0.022 0.023 0.016 0.109 0.119 0.251 0.22824-month 120-month 0.173 0.005 0.139 0.008 0.024 0.011 0.278 0.051
The Balke–Fomby univariate tests are somewhat more ambiguous. The threshold
eect is signiÿcant at the 10% level for two of the nine models when l = 1 and ÿ is
ÿxed at unity, and for ÿve when l = 2. When ÿ is estimated rather than ÿxed, then
none of the models are signiÿcant for l = 1, and only three for l = 2. The univariate
speciÿcation is quite restrictive, and this undoubtedly reduces the power of the test insome settings.
Next, we report the parameter estimates for one of the relatively successful models,
the bivariate relationship between the 120-month (10-year) and 12-month (one-year)
bond rates (normalized to be percentages). The parameter estimates were calculated
by minimization of (7) over a 300 × 300 grid on the parameters (; ÿ). The estimated
cointegrating relationship is wt = Rt − 0:984r t , quite close to a unit coecient. The
results we report are for the case of estimated cointegrating vector, but the results are
very similar if the unit coecient is imposed.
The estimated threshold is =−
0:63. Thus the ÿrst regime occurs when Rt 6 0:984r t
−0:63, i.e. when the 10-year rate is more than 0.6 percentage points below the short rate.
This is relatively unusual, with only 8% of the observations in this regime, and we
label this as the “extreme” regime. The second regime (with 92% of the observations)
is when Rt ¿ 0:984r t − 0:63, which we label as the “typical” regime.
The estimated threshold VAR is given below
Rt =
0:54 + 0:34wt −1 + 0:35Rt −1 − 0:17r t −1 + u1t ; wt −16− 0:63;
(0:17) (0:18) (0:26) (0:12)
0:01 − 0:02wt −1 − 0:08Rt −1 + 0:09r t −1 + u1t ; wt −1 ¿ − 0:63;
(0:02) (0:03) (0:06) (0:05)
8/7/2019 hansen & seo
http://slidepdf.com/reader/full/hansen-seo 19/26
B.E. Hansen, B. Seo/ Journal of Econometrics 110 (2002) 293 – 318 311
Fig. 3. Interest rate response to error correction.
r t =
1:45 + 1:41wt −1 + 0:92Rt −1 − 0:04r t −1 + u2t ; wt −16− 0:63;
(0:35) (0:34) (0:62) (0:26)
−0:04 + 0:04wt −1 − 0:07Rt −1 + 0:23r t −1 + u2t ; wt −1 ¿ − 0:63:
(0:04) (0:04) (0:13) (0:13)
Eicker–White standard errors are in parentheses. However, as we have no formal
distribution theory for the parameter estimates and standard errors, these should beinterpreted somewhat cautiously.
In the typical regime, Rt and r t have minimal error-correction eects and minimal
dynamics. They are close to white noise, indicating that in this regime, Rt and r t are
close to driftless random walks.
Error-correction appears to occur only in the unusual regime (when Rt is much
below r t ). There is a strong error-correction eect in the short-rate equation. In the
long-rate equation, the point estimate for the error-correction term is moderately large,
and on the borderline of statistical signiÿcance. The remaining dynamic coecients are
imprecisely estimated due to the small sample in this regime.
In Fig. 3 we plot the error-correction eect—the estimated regression functions of Rt and r t as a function of wt −1, holding the other variables constant. In the ÿgure,
you can see the at near-zero error-correction eect on the right size of the threshold,
8/7/2019 hansen & seo
http://slidepdf.com/reader/full/hansen-seo 20/26
312 B.E. Hansen, B. Seo/ Journal of Econometrics 110 (2002) 293 – 318
and on the left of the threshold, the sharp positive relationships, especially for the
short-rate equation.
One ÿnding of great interest is that the estimated error-correction eects are posi-
tive. 5 As articulated by Campbell and Shiller (1991) and Campbell (1995), the re-gression lines in Fig. 3 should be positive—equivalently, the coecients on wt −1 in
the threshold VECM should be positive. This is because a large positive spread Rt − r t
means that the long bond is earning a higher interest rate, so long bonds must be
expected to depreciate in value. This implies that the long interest rate is expected to
rise. (The short rate is also expected to rise as Rt is a smoothed forecast of future
short rates.)
Using linear correlation methods, Campbell and Shiller (1991) and Campbell (1995)
found considerable evidence contradicting this prediction of the term structure theory.
They found that the changes in the short rate are positively correlated with the spread,
but changes in the long rate are negatively correlated with the spread, especially at
longer horizons. These authors viewed this ÿnding as a puzzle.
In contrast, our results are roughly consistent with this term structure prediction. In
all nine estimated 6 bi-variate relationships, the four error-correction coecients (for
the long and short rate in the two regimes) are either positive or insigniÿcantly dierent
from zero if negative. As expected, the short-rate coecients are typically positive (in
six of the nine models the coecients are positive in both regimes), and the long-rate
coecients are much smaller in magnitude and often negative in sign. There appears
to be no puzzle.
6. Conclusion
We have presented a quasi-MLE algorithm for constructing estimates of a two-regime
threshold cointegration model and a SupLM statistic for the null hypothesis of no
threshold. We derived an asymptotic null distribution for this statistic. We developed
methods to calculate the asymptotic distribution by simulation, and how to calculate a
bootstrap approximation. These methods may ÿnd constructive use in applications.
Still, there are many unanswered questions for future research:
• A test for the null of no cointegration in the context of the threshold cointegration
model. 7 This testing problem is quite complicated, as the null hypothesis implies that
the threshold variable (the cointegrating error) is non-stationary, rendering current
distribution theory inapplicable.
5 In the typical regime, the long rate has a negative point estimate (−0:02), but it is statistically insignif-
icant and numerically very close to zero.6 ÿ ÿxed at unity, l = 1.7 Pippenger and Goering (2000) present simulation evidence that linear cointegration tests can have low
power to detect threshold cointegration.
8/7/2019 hansen & seo
http://slidepdf.com/reader/full/hansen-seo 21/26
B.E. Hansen, B. Seo/ Journal of Econometrics 110 (2002) 293 – 318 313
• A distribution theory for the parameter estimates for the threshold cointegration
model. As shown in Chan (1993) and Hansen (2000a), threshold estimates have
non-standard distributions, and working out such distribution theory is challenging.
• Allowing for VECMs with multiple cointegrating vectors.• Developing estimation and testing methods which impose restrictions on the inter-
cepts to exclude the possibility of a time trend. This would improve the ÿt of the
model to the data, and improve estimation eciency. However, the constraint is quite
complicated and not immediately apparent how to impose. Careful handling of the
intercept and trends is likely to be a fruitful area of research.
• Extending the theory to allow a fully rigorous treatment of estimated cointegrating
vectors.
• How to extend the analysis to allow for three regimes. To assess the statistical
relevance of such models, we would need a test of the null of a two-regime model
against the alternative of a three-regime model.
• An extension to the Balke–Fomby three-regime symmetric threshold model. While
our methods should directly apply if the threshold variable is deÿned as the absolute
value of the error-correction term, a realistic treatment will require restrictions on
the intercepts.
Acknowledgements
We thank two referees, Mehmet Caner, Robert Rossana, Mark Watson, and Ken West
for useful comments and suggestions. Hansen thanks the National Science Foundation,
and Seo thanks the Korea Research Foundation, for ÿnancial support.
Appendix
Proof of Theorem 1. An alternative algebraic representation of the pointwise LM statis-
tic is
LM(ÿ; ) = N ∗
n (ÿ; )
∗
n (ÿ; )−1
N ∗
n (ÿ; ); (17)where
∗
n (ÿ; ) = n(ÿ; ) − M n(ÿ; )M n(ÿ)−1n(ÿ; ) − n(ÿ; )M n(ÿ)−1M n(ÿ; )
+ M n(ÿ; )M n(ÿ)−1n(ÿ)M n(ÿ)−1M n(ÿ; );
N ∗n (ÿ; ) = N n(ÿ; ) − M n(ÿ; )M n(ÿ)−1N n(ÿ);
M n(ÿ; ) = I p ⊗ 1
n
n
t =1
d1t (ÿ; )X t −1(ÿ)X t −1(ÿ);
M n(ÿ) = I p ⊗ 1
n
nt =1
X t −1(ÿ)X t −1(ÿ);
8/7/2019 hansen & seo
http://slidepdf.com/reader/full/hansen-seo 22/26
314 B.E. Hansen, B. Seo/ Journal of Econometrics 110 (2002) 293 – 318
n(ÿ; ) =1
n
nt =1
d1t (ÿ; )(u t u
t ⊗ X t −1(ÿ)X t −1(ÿ));
n(ÿ) =1
n
nt =1
(u t u
t ⊗ X t −1(ÿ)X t −1(ÿ));
N n(ÿ; ) =1√
n
nt =1
d1t (ÿ; )(xt ⊗ X t −1(ÿ));
N n(ÿ) =1√
n
nt =1
(xt ⊗ X t −1(ÿ)):
Then observe that
SupLM0 = supL66U
LM(ÿ0; ) = sup06r 61−0
LM(ÿ0; F −1(r ));
where r = F (). Since LM(ÿ0; ) is a function of only through the indicator function
1(wt −16 ) = 1(t −16 r )
(as t −1 = F (wt −1)) it follows that
SupLM0 = sup06r 61−0
LM0(ÿ0; r );
where LM0(ÿ0; r ) = LM(ÿ0; F −1(r )) is deÿned as in (17) except that all instances of
1(wt −16 ) are replaced by 1(t −16 r ).
Under H0 and making these changes, this simpliÿes to
LM0(r ) = S ∗n (r )∗
n (r )−1S ∗n (r );
where
∗
n (r ) = n(r ) − M n(r )M −1n n(r ) − n(r )M −1
n M n(r ) + M n(r )M −1n nM −1
n M n(r );
S ∗n (r ) = S n(r )
−M n(r )M −1
n S n;
M n(r ) = I p ⊗ 1
n
nt =1
1(t −16 r )X t −1X t −1;
M n = I p ⊗ 1
n
nt =1
X t −1X t −1;
n(r ) =1
n
n
t =1
1(t −16 r )(u t u
t ⊗ X t −1X t −1);
n =1
n
nt =1
(u t u
t ⊗ X t −1X t −1);
8/7/2019 hansen & seo
http://slidepdf.com/reader/full/hansen-seo 23/26
B.E. Hansen, B. Seo/ Journal of Econometrics 110 (2002) 293 – 318 315
S n(r ) =1√
n
nt =1
1(t −16 r )(ut ⊗ X t −1);
S n =1√
n
nt =1
(ut ⊗ X t −1):
The stated result then follows from the joint convergence
M n(r )⇒M (r );
n(r )⇒(r );
S n(r )⇒ S (r );
which follows from Theorem 3 of Hansen (1996), which holds under our stated as-sumptions.
Proof of Theorem 2. First; let wt −1()= wt −1(ÿ0+=n)=wt −1+n−1xt −1; and X t −1()=
X t −1(ÿ0 + =n). Hence
xt = AX t −1 + ut
= AX t −1() − (n−1xt −1) + ut ;
where is the second row of A (the coecient vector on wt −1). Hence
LM(; ) = N ∗n (ÿ0 + =n; )∗n (ÿ0 + =n; )−1N ∗n (ÿ0 + =n; )
and
N ∗n (ÿ0 + =n; ) = S ∗n (; ) − C ∗n (; );
where
S ∗n (; ) = S n(; ) − M n(ÿ0 + =n; )M n(ÿ0 + =n)−1S n();
S n(; ) =1√
n
n
t =1
1(wt −1()6 )(ut ⊗ X t −1());
S n() =1√
n
nt =1
(ut ⊗ X t −1())
and
C ∗n (; ) = C n(; ) − M n(ÿ0 + =n; )M n(ÿ0 + =n)−1C n();
C n(; ) =1
n3=2
n
t =1
1(wt −1()6 )(xt −1 ⊗ X t −1());
C n() =1
n3=2
nt =1
(xt −1 ⊗ X t −1()):
8/7/2019 hansen & seo
http://slidepdf.com/reader/full/hansen-seo 24/26
316 B.E. Hansen, B. Seo/ Journal of Econometrics 110 (2002) 293 – 318
To complete the proof, we need to show that |n(ÿ0 + =n; ) − n(ÿ0; )| = op(1),
|M n(ÿ0 + =n; )−M n(ÿ0; )|= op(1), |S n(ÿ0 + =n; )−S n(ÿ0; )|= op(1), and C ∗n (; ) =
op(1). First, observe that since
|X t −1
−X t −1()
|=
|n−1xt −1
|= Op(n−1=2), it is fairly
straightforward to see that we can replace the X t −1() by X t −1 with only op(1) er-ror in the above expressions, and we make this substitution for the remainder of the
proof.
Let E Q be the event {n−1=2 supt 6n |xt −1|¿ Q}. For any ¿ 0, there is some
Q ¡∞ such that P(E Q)6 . The remainder of the analysis conditions on the set
{n−1=2 supt 6n|xt −1|¿ Q}.
We next show that |M n(ÿ0 + =n; ) − M n(ÿ0; )| = op(1). Indeed, on the set E Q
|M n(ÿ0 + =n; ) − M n(ÿ0; )|2
=
1
n
nt =1
(d1t (ÿ0 + =n; ) − d1t (ÿ0; ))X t −1X t −1
2
6
1
n
nt =1
|X t −1|4
1
n
nt =1
|1(wt −1 + n−1xt −16 ) − 1(wt −16 )|
61
n
n
t =1 |
X t −1
|41
n
n
t =1
1(wt −1
−n−1=2Q6 6wt −1 + n−1=2Q)
6 op(1):
The proof that |n(ÿ0 + =n; ) − n(ÿ0; )| = op(1) follows similarly.
Next, since ut is an MDS,
E(S n(ÿ0 + =n; ) − S n(ÿ0; ))2
=E 1
√n
nt =1
(1(wt −1()6
) − 1(wt −16
))X t −1u
t
2
=E|(1(wt −1()6 ) − 1(wt −16 ))X t −1ut |2
6E|1(wt −1 − n−1=2Q6 6wt −1 + n−1=2Q)X t −1ut |2 +
=o(1) +
and can be made arbitrarily small.
Finally, using similar analysis, C ∗n (; )=C ∗n (0; )+op(1). Since xt is I (1), n−1=2x[nr ] ⇒B(r ), a vector Brownian motion. We can appeal to Theorem 3 of Caner and Hansen
(2001) as our assumptions imply theirs (absolute regularity is stronger than strong
8/7/2019 hansen & seo
http://slidepdf.com/reader/full/hansen-seo 25/26
B.E. Hansen, B. Seo/ Journal of Econometrics 110 (2002) 293 – 318 317
mixing). Hence
C ∗n (0; ) =1
n3=2
n
t =1
1(wt −16 )X t −1xt −1
⇒ E(1(wt −16 )X t −1)
1
0
B
= M ()e1
1
0
B;
where e1 is a p-dimensional vector with the ÿrst element 1 and the remainder 0.
Similarly,
C ∗
n (0) ⇒ Me1 1
0B
and hence
C ∗n (; ) = C ∗n (0; ) + op(1)
= C n(; ) − M n(ÿ0 + =n; )M n(ÿ0 + =n)−1C n()
⇒M ()e1
1
0
B − M ()M −1Me1
1
0
B = 0:
This completes the proof.
References
Andrews, D.W.K., 1993. Tests for parameter instability and structural change with unknown change point.
Econometrica 61, 821–856.
Andrews, D.W.K., Ploberger, W., 1994. Optimal tests when a nuisance parameter is present only under the
alternative. Econometrica 62, 1383–1414.
Balke, N.S., Fomby, T.B., 1997. Threshold cointegration. International Economic Review 38, 627–645.
Balke, N.S., Wohar, M.E., 1998. Nonlinear dynamics and covered interest rate parity. Empirical Economics
23, 535–559.
Baum, C.F., Karasulu, M., 1998. Modelling federal reserve discount policy. Computational Economics 11,53–70.
Baum, C.F., Barkoulas, J.T., Caglayan, M., 2001. Nonlinear adjustment to purchasing power parity in the
post-Bretton Woods era. Journal of International Money and Finance 20, 379–399.
Campbell, J.Y., 1995. Some lessons from the yield curve. Journal of Economic Perspectives 9, 129–152.
Campbell, J.Y., Shiller, R.J., 1987. Cointegration and tests of present value models. Journal of Political
Economy 95, 1062–1088.
Campbell, J.Y., Shiller, R.J., 1991. Yield spreads and interest rate movements: a bird’s eye view Review of
Economic Studies 58, 495–514.
Caner, M., Hansen, B.E., 2001. Threshold autoregression with a unit root. Econometrica, 69, 1555–1596.
Chan, K.S., 1993. Consistency and limiting distribution of the least squares estimator of a threshold
autoregressive model. The Annals of Statistics 21, 520–533.
Davies, R.B., 1987. Hypothesis testing when a nuisance parameter is present only under the alternative.Biometrika 74, 33–43.
Dorsey, R.E., Mayer, W.J., 1995. Genetic algorithms for estimation problems with multiple optima, no
dierentiability, and other irregular features. Journal of Business and Economic Statistics 13, 53–66.
8/7/2019 hansen & seo
http://slidepdf.com/reader/full/hansen-seo 26/26
318 B.E. Hansen, B. Seo/ Journal of Econometrics 110 (2002) 293 – 318
Enders, W., Falk, B., 1998. Threshold-autoregressive, median-unbiased, and cointegration tests of purchasing
power parity. International Journal of Forecasting 14, 171–186.
Gine, E., Zinn, J., 1990. Bootstrapping general empirical measures. The Annals of Probability 18, 851–869.
Hansen, B.E., 1996. Inference when a nuisance parameter is not identiÿed under the null hypothesis.Econometrica 64, 413–430.
Hansen, B.E., 2000a. Sample splitting and threshold estimation. Econometrica 68, 575–603.
Hansen, B.E., 2000b. Testing for structural change in conditional models. Journal of Econometrics 97,
93–115.
Lo, M., Zivot, E., 2001. Threshold cointegration and nonlinear adjustment to the law of one price.
Macroeconomic Dynamics 5, 533–576.
Martens, M., Kofman, P., Vorst, T.C.F., 1998. A threshold error-correction model for intraday futures and
index returns. Journal of Applied Econometrics 13, 245–263.
McCulloch, J.H., Kwon, H.-C., 1993. US term structure data, 1947–1991. Ohio State University Working
Paper No. 93-6.
Michael, M., Nobay, R., Peel, D.A., 1997. Transactions costs and nonlinear adjustment in real exchange
rates: an empirical investigation. Journal of Political Economy 105, 862–879.Obstfeld, M., Taylor, A.M., 1997. Nonlinear aspects of goods market arbitrage and adjustment: Heckscher’s
commodity points revisited. Journal of the Japanese and International Economies 11, 441–479.
O’Connell, P.G.J., 1998. Market frictions and real exchange rates. Journal of International Money and Finance
17, 71–95.
O’Connell, P.G.J., Wei, S.-J., 1997. The bigger they are, the harder they fall: how price dierences across
U.S. cities are arbitraged. NBER Worker Paper No. 6089.
Pippenger, M.K., Goering, G.E., 2000. Additional results on the power of unit root and cointegration tests
under threshold processes. Applied Economics Letters 7, 641–644.
Seo, B., 1998. Tests for structural change in cointegrated systems. Econometric Theory 14, 221–258.
Taylor, A.M., 2001. Potential pitfalls for the purchasing-power-parity puzzle? Sampling and speciÿcation
biases in mean-reversion tests of the law of one price. Econometrica 69, 473–498.
Tsay, R.S., 1989. Testing and modeling threshold autoregressive processes. Journal of the American Statistical
Association 84, 231–240.
Tsay, R.S., 1998. Testing and modeling multivariate threshold models. Journal of the American Statistical
Association 93, 1188–1998.
White, H., 1980. A heteroskedasticity-consistent covariance matrix estimator and a direct test for
heteroskedasticity. Econometrica 48, 817–838.