Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
A Spatial Dynamic Panel Data Model with Both Time and
Individual Fixed E¤ects
Lung-fei Lee�
Department of EconomicsThe Ohio State University
Jihai YuDepartment of EconomicsThe Ohio State University
July 18, 2007
Abstract
This paper establishes asymptotic properties of quasi-maximum likelihood estimators for spatial dy-
namic panel data with both time and individual �xed e¤ects when both the number of individuals n
and the number of time periods T can be large. Instead of using the direct approach where we estimate
both individual e¤ects and time e¤ects directly, we propose a data transformation approach to eliminate
the time e¤ects so that the bias of the order O(n�1) is avoided. When T is relatively larger than n,
the estimators arepnT consistent and asymptotically centered normal; when n is asymptotically pro-
portional to T , the estimators arepnT consistent and asymptotically normal, but the limit distribution
is not centered around 0; when T is relatively smaller than n, the estimators are consistent with rate
T and have a degenerate limit distribution. We also propose a bias correction for our estimators. We
show that when T grows faster than n1=3, the correction will asymptotically eliminate the bias and yield
a centered con�dence interval. This transformation approach has advantage over the direct approach
especially when n is relatively small as the direct approach has the bias of order O(n�1) remained.
JEL classi�cation: C13; C23
Keywords: Spatial autoregression, Dynamic panels, Fixed e¤ects, Time e¤ects, Quasi-maximum like-
lihood estimation, Bias correction
�We acknowledge �nancial support for the research from NSF under Grant No. SES-0519204.
1
1 Introduction
This paper investigates the properties of maximum likelihood (ML) estimator and quasi-maximum like-
lihood (QML) estimator for spatial dynamic panel data models with both individual e¤ects and time e¤ects
when both the number of individuals n and the number of time periods T can be large.
Recently, there is a growing literature on the estimation of dynamic panel data models when both n and
T are large (see Phillips and Moon (1999), Hahn and Kuersteiner (2002), Hahn and Newey (2004), etc).
For the panel data with spatial interaction, we have Baltagi, Song and Koh (2003), Kapoor, Kelejian and
Prucha (2004) and Yu, de Jong and Lee (2006), etc. Those papers focus on models with only individual
e¤ects. While in the panel data literature, we also have the two way error component regression model where
we have not only unobservable individual e¤ects but also unobserved time e¤ects (See Wallace and Hussain
(1969), Nerlove (1971) and Amemiya (1971), etc). Hence, it is natural to study the spatial dynamic panel
data models when both n and T are large with both individual e¤ects and time e¤ects.
In Yu, de Jong and Lee (2006), the consistency and asymptotic distribution of the QML estimator are
established when individual e¤ects are included. Also, a bias correction procedure for the estimator is
proposed. It is shown that as long as T grows faster than n1=3, the correction will asymptotically eliminate
the bias of the order O(T�1) and yield a centered con�dence interval. This approach can be directly
extended to the estimation of models with both individual and time �xed e¤ects, where �direct�means we
estimate also the �xed time e¤ects jointly with other parameters. When there are also time e¤ects included
in the model, we might have additional bias of the order O(n�1) in the estimation due to the presence
of time e¤ects if we estimate the time e¤ects directly (see Theorem 4.2). In this paper, we propose an
transformation procedure to eliminate the time e¤ects that can avoid the additional O(n�1) order bias with
the same asymptotic e¢ ciency as the direct QML estimates when n is not relatively smaller than T . Our
transformation procedure is particularly useful when n is relatively smaller than T . For the latter, the
estimates of the transformed approach has faster rates of convergence than that of the direct estimates. The
direct estimates have a degenerate limit distribution but the transformed estimates are properly centered
and are asymptotically normal.
This paper is organized as follows. In Section 2, the model is introduced and the data transformation
procedure is proposed. We then explain our method of estimation, which is a concentrated quasi-maximum
likelihood estimation. In Section 3, we establish the consistency and asymptotic distribution of the QML
estimator of the transformation approach. A bias correction procedure is proposed and the simulation result
is reported. Section 4 presents the asymptotic properties of the direct approach estimator and compares the
two approaches. Section 5 concludes the paper. Some useful lemmas and proofs are collected in Appendix.
2 The Model
2.1 Data Generating Process and Data Transformation
The model considered in this paper is
2
Ynt = �0WnYnt + 0Yn;t�1 + �0WnYn;t�1 +Xnt�0 + cn0 + �t0ln + Vnt; t = 1; 2; :::; T , (2.1)
where Ynt = (y1t; y2t; :::; ynt)0 and Vnt = (v1t; v2t; :::; vnt)0 are n � 1 column vectors and vit is i:i:d: across iand t with zero mean and variance �20. Also,Wn is an n�n spatial weights matrix which is nonstochastic andgenerates the spatial dependence between cross sectional units yit, Xnt is an n� kx matrix of nonstochasticregressors, cn0 is n� 1 column vector of individual �xed e¤ects, �t0 is a scalar of time e¤ect and ln is n� 1column vector of ones1 . Therefore, the total number of parameters in this model is equal to the sum of the
number of individuals n and the number of time periods T , plus the dimension of the common parameters
( ; �; �0; �; �2)0, which is kx + 4.
Wn is usually row normalized from a symmetric matrix such that its ith row is
[cn;i1; cn;i2; � � � ; cn;in]=Xn
j=1cn;ij , (2.2)
where cn;ij represents a function of the spatial distance of di¤erent units in some space. As a normalization,
cn;ii = 0. It is a common practice in empirical work that Wn is row normalized, which ensures that all the
weights are between 0 and 1 and weighting operations can be interpreted as an average of the neighboring
values. Also, a spatial weights matrix row normalized has the property Wnln = ln.
De�ne Sn(�) = In��Wn and Sn � Sn(�0) = In��0Wn. Then, presuming Sn is invertible and denoting
An = S�1n ( 0In + �0Wn), (2.1) can be rewritten as
Ynt = AnYn;t�1 + S�1n Xnt�0 + S
�1n cn0 + �t0S
�1n ln + S
�1n Vnt. (2.3)
Assuming that the in�nite sums are well-de�ned, by continuous substitution of (2.3),
Ynt =1Xh=0
AhnS�1n (cn0 +Xn;t�h�0 + �t0ln + Vn;t�h) = �n + Xnt�0 + �t0ln
1Xh=0
� 0 + �01� �0
�h+ Unt, (2.4)
where �n �1Ph=0
AhnS�1n cn0, Xnt �
1Ph=0
AhnS�1n Xn;t�h and Unt �
1Ph=0
AhnS�1n Vn;t�h.
One way to estimate (2.1) is to estimate all the parameters including the time e¤ects and individual
e¤ects in the model, which will yield a bias of the order O(max(n�1; T�1)) for the common parameters (see
Theorem 4.2). In this paper, we will introduce a data transformation approach (to eliminate the time e¤ects)
such that we can avoid the bias of the order O(n�1), where the estimator has the same asymptotic e¢ ciency
as the direct QML estimator when n is not relatively smaller than T . This transformation procedure is
particularly useful when n is relatively smaller than T where the estimates of the transformed approach has
faster rates of convergence than that of the direct estimates. Also, when n is relatively smaller than T , the
direct estimates have a degenerate limit distribution but the transformed estimates are properly centered
and are asymptotically normal.
Let Jn = In � 1n lnl
0n be the deviation from the group mean transformation. As In = Jn +
1n lnl
0n and
Wnln = ln, we have1Due to the presence of �xed individual and time e¤ects, the Xnt will not include time invariant or individual invariant
regressors.
3
JnWn = JnWn(Jn +1
nlnl
0n) = JnWnJn,
because JnWnln = Jnln = 0. Hence, for t = 1; 2; :::; T , we have
(JnYnt) = �0(JnWn)(JnYnt)+ 0(JnYn;t�1)+ �0(JnWn)(JnYn;t�1)+ (JnXnt)�0+(Jncn0)+ (JnVnt); (2.5)
which does not involve the time e¤ects and Jncn0 can be regarded as the transformed individual e¤ects.
Thus, we can estimate �0 and Jncn0 basing on the transformed equation (2.5) where relevant variables are
premultiplied by Jn.
A special feature of the transformed equation (2.5) is that the variance matrix of JnVnt is equal to �20Jnso that the elements of JnVnt are correlated. Also, Jn is singular with rank (n � 1) as Jn is an orthogonalprojector with trace (n� 1). Hence, there is a linear dependence among the elements of JnVnt. An e¤ectiveestimation method shall eliminate the linear dependence in sample observations. This can be done with
the eigenvalues and eigenvectors decomposition as in the theory of generalized inverses for the estimation of
linear regression models (see, e.g., Theil (1971), Ch. 6).
The eigenvalues of Jn are a single zero and (n � 1) ones. An eigenvector corresponding to the zeroeigenvalue is proportional to ln. Let (Fn;n�1, ln=
pn) be the orthonormal matrix of eigenvectors of Jn where
Fn;n�1 corresponds to the eigenvalues of ones and ln=pn corresponds to the eigenvalue zero. As is derived
in Appendix A.1, the transformation of JnYnt to Y �nt, where Y�nt = F
0n;n�1JnYnt is of dimension (n�1), gives
Y �nt = �0W�nY
�nt + 0Y
�n;t�1 + �0W
�nY
�n;t�1 +X
�nt�0 + c
�n0 + V
�nt, (2.6)
where W �n = F
0n;n�1WnFn;n�1, Y �nt = F
0n;n�1JnYnt = F
0n;n�1Ynt, X
�nt = F
0n;n�1JnXnt = F
0n;n�1Xnt, c
�n0 =
F 0n;n�1Jncn0 = F 0n;n�1cn0, V�nt = F 0n;n�1JnVnt = F 0n;n�1Vnt and V
�nt is an n � 1 dimensional disturbance
vector with zero mean and variance matrix �20In�1.
2.2 The Method of Maximum Likelihood Estimation
Suppose that Vnt is normally distributed N(0; �20In), the transformed V�nt will be N(0; �
20In�1) and (2.6)
can be estimated by the ML. The log likelihood function of (2.6) for Y �nt is
lnLn;T (�; c�n) = �
(n� 1)T2
ln 2� � (n� 1)T2
ln�2 + T ln jIn�1 � �W �n j �
1
2�2
TXt=1
V �0nt(�)V�nt(�), (2.7)
where V �nt(�) = (In�1 � �W �n)Y
�nt � Z�nt� � c�n, Znt = (Yn;t�1;WnYn;t�1; Xnt) and � = ( ; �; �0)0. In order
to use Equation (2.6) for e¤ective estimation, the determinant and inverse of (In�1 � �W �n) are needed. We
shall show that their computations are not more complicated than those for the original matrix In � �Wn.
As is derived in Appendix A.2, (In�1 � �W �n) = F
0n;n�1(In � �Wn)Fn;n�1 and we have
jIn�1 � �W �n j =
1
1� � jIn � �Wnj , (2.8)
4
(In�1 � �W �n)�1 = F 0n;n�1(In � �Wn)
�1Fn;n�1. (2.9)
Also, as we have
(In�1 � �W �n)Y
�nt � Z�nt� � c�n
= F 0n;n�1(In � �Wn)Fn;n�1 � F 0n;n�1Ynt � F 0n;n�1Znt� � F 0n;n�1cn
= F 0n;n�1(In � �Wn)(In �1
nlnl
0n)Ynt � F 0n;n�1Znt� � F 0n;n�1cn
= F 0n;n�1[(In � �Wn)Ynt � Znt� � cn],
because F 0n;n�1Wnln = F0n;n�1ln = 0, it follows that
V �0nt(�)V�nt(�)
= [(In�1 � �W �n)Y
�nt � Z�nt� � c�n]0[(In�1 � �W �
n)Y�nt � Z�nt� � c�n]
= [(In � �Wn)Ynt � Znt� � cn]0Fn;n�1F 0n;n�1[(In � �Wn)Ynt � Znt� � cn]= [(In � �Wn)Ynt � Znt� � cn]0Jn[(In � �Wn)Ynt � Znt� � cn],
by the property Fn;n�1F 0n;n�1 = Jn. Hence, the log likelihood function (2.7) for Y �nt can be expressed in
terms of Ynt as
lnLn;T (�; cn) = � (n� 1)T2
ln 2� � (n� 1)T2
ln�2 � T ln(1� �) + T ln jIn � �Wnj (2.10)
� 1
2�2
TXt=1
V 0nt(�)JnVnt(�),
where Vnt(�) = (In��Wn)Ynt�Znt��cn and Jn shall be read as the generalized inverse of ��20 V ar(JnVnt).
2.3 FOC and SOC of MLE
Therefore, we will �rst transform the data from Ynt to Y �nt = F0n;n�1Ynt, then we will maximize Equation
(2.7) by searching over the parameter space. This is equivalent to the estimation of the spatial dynamic
panel model with only individual e¤ects with (n� 1) cross-section units and T time periods. Alternatively,one may maximize Equation (2.10) instead. However, although the components of Vnt are i.i.d. in the model,
the elements of V �nt might not be independent in general, even though they are uncorrelated. The original
asymptotic analysis in Yu, de Jong and Lee (2006) does not directly carry over to the transformed model
with the disturbances V �nt.2 As Equation (2.7) is equivalent to Equation (2.10), it turns out that we can
analyze the asymptotic distribution of the estimator just from Equation (2.10).
2One could not treat the components of V �nt as if they were independent when the disturbances are not normally distributed.Furthermore, it is not clear whether W �
n and A�n = (In�1 � �0W �
n)�1( 0In�1 + �0W
�n) would be uniformly bounded in both
row and column sums even Wn and An are.
5
Using �rst order conditions, we concentrate out cn in (2.10) and get3
lnLn;T (�) = � (n� 1)T2
ln 2� � (n� 1)T2
ln�2 � T ln(1� �) + T ln jIn � �Wnj (2.11)
� 1
2�2
TXt=1
~V 0nt(�)Jn ~Vnt(�),
where ~Vnt(�) = (In��Wn) ~Ynt� ~Znt� and Jn ~Vnt(�) = Jn[(In��Wn) ~Ynt� ~Znt��~�tln] because Jnln = 0. Theconcentrated likelihood function is di¤erent from (4.2) of the direct approach4 in that we have an attachment
of T2 ln 2��2 � T ln(1 � �). For the concentrated likelihood function (2.11), the �rst order derivatives and
the second order derivatives are Equation (C.3) and (C.4) in Appendix C.2. Denote
Jn ~Vnt = Jn[Sn ~Ynt � ~Znt�0 � ~�t0ln].
Hence, for the �rst order derivative evaluated at �0, denote Gn =WnS�1n , we have
1p(n� 1)T
@ lnLn;T (�0)
@�=
0BBBBBB@1�20
1p(n�1)T
TPt=1
~Z 0ntJn~Vnt
1�20
1p(n�1)T
TPt=1(Gn ~Znt�0)
0Jn ~Vnt +1�20
1p(n�1)T
TPt=1( ~V 0ntG
0nJn
~Vnt � �20trJnGn)
12�40
1p(n�1)T
TPt=1( ~V 0ntJn
~Vnt � (n� 1)�20)
1CCCCCCA ,(2.12)
which is a linear and quadratic form of ~Vnt. For the information matrix, denote ��0;nT = �E�
1(n�1)T
@2 lnLn;T (�0)@�@�0
�and HnT =
1(n�1)T
TPt=1( ~Znt; Gn ~Znt�0)
0Jn( ~Znt; Gn ~Znt�0), we have
��0;nT =1
�20
EHnT 0
0 0
!+
0B@ 0 0 0
0 1n�1
�tr(G0nJnGn) + tr((JnGn)
2)�
1�20(n�1)
tr(JnGn)
0 1�20(n�1)
tr(JnGn)12�40
1CA
�
0B@ 0 � �1
�20(n�1)E(Gn �VnT )
0Jn �ZnT2
�20(n�1)E�(Gn �ZnT �0)
0JnGn �VnT�+ 1
(n�1)T tr(G0nJnGn) �
1�40(n�1)
E��Z 0nTJn
�VnT�0 1
�40(n�1)E�(Gn �ZnT �0)
0Jn �VnT�0+ 1
�20(n�1)Ttr(JnGn)
1T
1�40
1CA .3 QML Estimators
For our analysis of the asymptotic properties of estimator, we need the following assumptions:
3For notational purpose, we de�ne for any n � 1 vector at period t, �nt, we have ~�nt = �nt � ��nT and��n;t�1 =
�n;t�1 � ��nT;�1 for t = 1; 2; � � � ; T where ��nT = 1T
TPt=1
�nt and ��nT;�1 =1T
TPt=1
�n;t�1.
4The di¤erence term can be regarded as an adjustment by the transformation approach to eliminate the bias of the orderO(n�1) which will appear in the direct approach.
6
Assumption 1. Wn is a row normalized nonstochastic spatial weights matrix.
Assumption 2. The disturbances fvitg, i = 1; 2; :::; n and t = 1; 2; :::; T; are i:i:d across i and t with
zero mean, variance �20 and E jvitj4+�
<1 for some � > 0.
Assumption 3. Sn(�) is invertible for all � 2 �. Furthermore, � is compact and the true parameter �0is in the interior of �.
Assumption 4. The elements of Xnt are nonstochastic and bounded, uniformly in n and t, and
limT!11nT
PTt=1
~X 0ntJn ~Xnt exists and is nonsingular.
Assumption 5. The row and column sums of Wn and S�1n (�) are bounded uniformly5 in n, also
uniformly in � 2 � for S�1n (�).
Assumption 6. The row and column sums ofP1
h=1 abs(Ahn) are bounded uniformly in n, where
[abs(An)]ij = jAn;ij j.Assumption 7. n is a nondecreasing function of T .Assumption 1 is a standard normalization assumption in spatial econometrics. In many empirical ap-
plications, the rows of Wn sum to 1, which ensures that all the weights are between 0 and 1. Assumption
2 provides regularity assumptions for vit. Assumption 3 guarantees that Equation (2.3) is valid. When
exogenous variables Xnt are included in the model, it is convenient to assume that the exogenous regressors
are uniformly bounded as in Assumption 4. Assumption 5 is originated by Kelejian and Prucha (1998, 2001).
The uniform boundedness ofWn and S�1n (�) is a condition that limits the spatial correlation to a manageable
degree. Assumption 6 is the absolute summability condition and row/column sum boundedness condition,
which will play an important role to derive asymptotic properties of QML estimator. This assumption is
essential for the paper because it limits the dependence between time series and between cross sectional
units. In order to justify the absolute summability of An in Equation (2.4) and Assumption 6, a su¢ cient
condition is kAnk < 1 for any matrix norm (see Horn and Johnson (1985), Corollary 5.6.16) that satis�es
kAnk = kabs (An)k. When kAnk < 1,P1
h=0Ahn exists and can be de�ned as (In � An)�1. Assumption
7 allows two cases: (i) n ! 1 as T ! 1; (ii) n is �xed as T ! 1. Because (ii) is similar to a vectorautoregressive (VAR) model, our main interest is in (i). If Assumption 7 holds, then we say that n; T !1simultaneously.
3.1 Consistency
For the log likelihood function (2.11) divided by the e¤ective sample size (n�1)T , we have correspondingQn;T (�) = Emaxcn
1(n�1)T lnLn;T (�; cn). Hence,
Qn;T (�) =1
(n� 1)T E lnLn;T (�) (3.1)
= �12ln 2� � 1
2ln�2 � 1
n� 1 ln(1� �) +1
n� 1 ln jSn(�)j �1
2�21
(n� 1)T E
TXt=1
~V 0nt(�)Jn ~Vnt(�)
!.
To get the consistency proof, we need the following uniform convergence result.
5We say the row and column sums of a (sequence of n � n) matrix Pn are bounded uniformly in n ifsup1�i�n;n�1
Pnj=1 jpij;nj <1 and sup1�j�n;n�1
Pni=1 jpij;nj <1.
7
Claim 3.1 Let � be any compact parameter space. Then under Assumptions 1-7, 1(n�1)T lnLn;T (�) �
Qn;T (�)p! 0 uniformly in � 2 � and Qn;T (�) is uniformly equicontinuous for � 2 �.
Proof. See Appendix D.1.
For identi�cation, if the information matrix�E�
1(n�1)T
@2 lnLn;T (�0)@�@�0
�is nonsingular and�E
�1
(n�1)T@2 lnLn;T (�)
@�@�0
�has full rank for any � in some neighborhood N(�0) of �0, the parameters are locally identi�ed (see Rothen-
berg (1971)). For the information matrix, using Lemma B.2 in Yu, de Jong and Lee (2006), we have
��0;nT =1
�20
EHnT 0
0 0
!+
0B@ 0 0 0
0 1n�1
�tr(G0nJnGn) + tr((JnGn)
2)�
1�20(n�1)
tr(JnGn)
0 1�20(n�1)
tr(JnGn)12�40
1CA+O� 1T
�;
(3.2)
which is nonsingular if EHnT is nonsingular or 1n�1
htr(G0nJnGn) + tr((JnGn)
2)� 2tr2(JnGn)n�1
iis nonzero6 .
Also, its rank does not change in a small neighborhood of �0 (see Equation (C.8)).
When limT!1EHnT is nonsingular, we can get the global identi�cation of the parameters.
Theorem 3.2 Under Assumptions 1-7, if limT!1EHnT is nonsingular, �0 is globally identi�ed and �nTp!
�0.
Proof. See Appendix D.3.
When limT!1EHnT is singular, global identi�cation can still be obtained from the following theorem.
Denote �2n(�) =�20n�1 tr(S
0�1n S0n(�)JnSn(�)S
�1n ).
Theorem 3.3 Under Assumptions 1-7, �0 is globally identi�ed iflimn!1
�1n ln
���20S0�1n S�1n��� 1
n ln���2n(�)S0�1n (�)S�1n (�)
��� 6= 0 for � 6= �07 . And if this condition holds,
�nTp! �0.
Proof. See Appendix D.4.
3.2 Asymptotic Distribution
As Znt = (Yn;t�1;WnYn;t�1; Xnt) where Ynt is speci�ed in Equation (2.4), we can decompose Jn ~Znt such
that
Jn ~Znt = Jn ~Z(u)nt � (Jn �UnT;�1; JnWn
�UnT;�1; 0), (3.3)
where ~Z(u)nt = ((�Xn;t�1+Un;t�1); (Wn
�Xn;t�1+WnUn;t�1); ~Xnt) with
�Xn;t�1 = Xn;t�1� �XnT;�1. Hence, Jn ~Znt
has two components: one is Jn ~Z(u)nt , which is uncorrelated with Vnt; the other is�(Jn �UnT;�1; JnWn
�UnT;�1; 0),
6See Appendix D.2 for proof.
7When n is �nite, the condition is 1nln����20S0�1n S�1n
���� 1nln����2n(�)S0�1n (�)S�1n (�)
���+ � 1nln
�20(1��0)2
� 1nln
�2n(�)
(1��)2
�6= 0 for
� 6= �0. See Appendix D.4.
8
which is correlated with Vnt when t � T � 1. Therefore, from Equation (2.12), the score can be decomposed
into two parts such that
1p(n� 1)T
@ lnLn;T (�0)
@�=
1p(n� 1)T
@ lnL(u)n;T (�0)
@���nT , (3.4)
where
1p(n� 1)T
@ lnL(u)n;T (�0)
@�=
0BBBBBB@1�20
1p(n�1)T
TPt=1
~Z(u)0nt JnVnt
1�20
1p(n�1)T
TPt=1(Gn ~Z
(u)nt �0)
0JnVnt +1�20
1p(n�1)T
TPt=1(V 0ntG
0nJnVnt � �20trJnGn)
12�40
1p(n�1)T
TPt=1(V 0ntJnVnt � (n� 1)�20)
1CCCCCCA ,(3.5)
�nT =
rn� 1T
0BB@1�20
Tn�1 (Jn
�UnT;�1; JnWn�UnT;�1; 0)
0 �VnT1�20
Tn�1 (JnGn(
�UnT;�1; Wn�UnT;�1, 0)�0)0 �VnT + 1
�20
Tn�1
�V 0nTG0nJn �VnT
12�40
Tn�1
�V 0nTJn�VnT
1CCA . (3.6)
As is derived in Appendix C.3, the variance matrix of 1p(n�1)T
@ lnL(u)n;T (�0)
@� is equal to
E
1p
(n� 1)T@ lnL
(u)n;T (�0)
@�� 1p
(n� 1)T@ lnL
(u)n;T (�0)
@�0
!= ��0;nT +�0;n +O
�T�1
�, (3.7)
where ��0;nT is in (3.2) and �0;n =�4�3�40�40
0BBB@0 0 0
0 1n�1
nPi=1
[(JnGn)2]ii
12�20(n�1)
tr(JnGn)
0 12�20(n�1)
tr(JnGn)14�40
1CCCA is a symmetricmatrix with �4 being the fourth moment of vit. When Vnt are normally distributed, �0;n = 0 because
�4 � 3�40 = 0 for a normal distribution. Denote ��0 = limT!1��0;nT and �0 = limT!1�0;n, then,
limT!1
E
1p
(n� 1)T@ lnL
(u)n;T (�0)
@�� 1p
(n� 1)T@ lnL
(u)n;T (�0)
@�0
!= ��0 +�0 . (3.8)
The asymptotic distribution of 1p(n�1)T
@ lnL(u)n;T (�0)
@� can be derived from the central limit theorem for
martingale di¤erence arrays (Theorem B.3). For the term �nT , from Equation (B.3) in Lemma B.1 and
Equation (B.5) in Theorem B.2, �nT =q
n�1T a�0;n +O(
qn�1T 3 ) +Op(
1pT) where
a�0;n =
0BBBBBB@
1n�1 tr
��JnP1
h=0Ahn
�S�1n
�1
n�1 tr�Wn
�JnP1
h=0Ahn
�S�1n
�0kx�1
1n�1 0tr(Gn
�JnP1
h=0Ahn
�S�1n ) + 1
n�1�0tr(GnWn
�JnP1
h=0Ahn
�S�1n ) + 1
n�1 trJnGn12�20
1CCCCCCA (3.9)
9
is O(1). To get the asymptotic distribution of the score, we need the following additional assumption.
Assumption 8. limT!1EHnT is nonsingular or limn!11
n�1
htr(G0nJnGn) + tr((JnGn)
2)� 2tr2(JnGn)n�1
i6=
0.
Assumption 8 is a condition for the nonsingularity of the information matrix ��0 . When limT!1EHnT
is singular8 , as long as we have limn!11
n�1 trh(G0nJnGn) + tr((JnGn)
2)� 2tr2(JnGn)n�1
i6= 0, the information
matrix ��0 is still nonsingular (see Appendix D.2).
Claim 3.4 Under Assumptions 1-8,
1p(n� 1)T
@ lnLn;T (�0)
@�+�nT
d! N(0;��0 +�0). (3.10)
When fvitg, i = 1; 2; :::; n and t = 1; 2; :::; T; are normal, 1p(n�1)T
@ lnLn;T (�0)@� +�nT
d! N(0;��0).
Proof. See Appendix D.5.
Claim 3.5 Under Assumptions 1-8,
1
(n� 1)T@2 lnLn;T (�)
@�@�0� 1
(n� 1)T@2 lnLn;T (�0)
@�@�0= k� � �0k �Op(1), (3.11)
and1
(n� 1)T@2 lnLn;T (�0)
@�@�0� @
2Qn;T (�0)
@�@�0= Op
1p
(n� 1)T
!. (3.12)
Proof. See Appendix D.6.
Using Claim 3.4 and Claim 3.5, we have the following theorem for the distribution of �nT .
Theorem 3.6 Under Assumptions 1-8,
p(n� 1)T (�nT � �0) +
rn� 1T
b�0;nT +Op
max
rn� 1T 3
;
r1
T
!!d! N(0;��1�0 (��0 +�0)�
�1�0), (3.13)
where b�0;nT = ��1�0;nT
a�0;n is O(1).
When nT ! 0, p
nT (�nT � �0)d! N(0;��1�0 (��0 +�0)�
�1�0). (3.14)
When nT ! k <1, p
nT (�nT � �0) +pkb�0;nT
d! N(0;��1�0 (��0 +�0)��1�0). (3.15)
When nT !1,
T (�nT � �0) + b�0;nTp! 0. (3.16)
Proof. See Appendix D.7.8The limT!1 EHnT can be singular if, for example,�0 = 0.
10
From Equation (3:13), the QML estimator has the bias � 1T b�0;nT where
b�0;nT = ��1�0;nT
� a�0;n (3.17)
and the con�dence interval is not centered when nT ! k where 0 < k <1. Furthermore, when T is relatively
smaller than n, the presence of b�0;nT causes �nT to have a degenerated distribution. An analytical bias
reduction procedure is to correct the bias of the estimate. De�ne the bias corrected estimator as
�1
nT = �nT �BnTT, (3.18)
where, from Theorem 3.6,
BnT =h���1�;nT � a�;n
i����=�nT
. (3.19)
We will show that when n=T 3 ! 0, �1
nT ispnT consistent and asymptotically centered normal even when
n=T !1.To show our result for the bias corrected estimator, we need the following additional assumption.
Assumption 9.P1
h=0Ahn(�) and
P1h=1 hA
h�1n (�) are uniformly bounded in either row sum or column
sums, uniformly in a neighborhood of �0.
Assumption 9 can be veri�ed through the following lemma.
Lemma 3.7 If kAn(�0))k1 < 1 (resp: kAn(�0))k1 < 1), then the row sum (resp: column sum) ofP1
h=0Ahn(�)
andP1
h=1 hAh�1n (�) are bounded uniformly in n and in a neighborhood of �0.
Proof. This is Lemma 3.9 in Yu, de Jong and Lee (2006).
Our result for the bias corrected estimator is as follows.
Theorem 3.8 Under Assumptions 1-9, if nT 3 ! 0,
pnT (�
1
nT � �0)d! N(0;��1�0 +�
�1�0�0�
�1�0). (3.20)
Proof. See Appendix D.8.
3.3 Monte Carlo Results
We conduct a small Monte Carlo experiment to evaluate the performance of our ML estimator and
the bias corrected estimator for the transformation approach. We generate samples from Equation (2.1)
using �a0 = (0:2; 0:2; 1; 0:2; 1)0 and �b0 = (0:3; 0:3; 1; 0:3; 1)0 where �0 = ( 0; �0; �00; �0; �
20)0, and Xnt; cn0,
�T0 = (�1; �2; � � � ; �T ) and Vnt are generated from independent normal distributions9 and the spatial
weights matrix we use is a rook matrix. We use T = 10, 50, and n = 16, 49. For each set of generated
sample observations, we calculate the ML estimator �nT and evaluate the bias �nT � �0; we then constructthe bias corrected estimator �
1
nT and evaluate the bias �1
nT ��0. We do this for 1000 times to see if the bias is9We generated the spatial panel data with 20 + T periods and then take the last T periods as our sample. And the initial
value is generated as N(0; In) in the simulation. We have also generated the data with a much longer history 1000+ T and theresults are similar.
11
reduced on average by using the analytical bias correction procedure, i.e., to compare 11000
P1000i=1 (�nT � �0)i
with 11000
P1000i=1 (�
1
nT � �0)i. We also compare not only the empirical standard error of estimator but alsothe average of estimated standard error of the estimator10 . With two di¤erent values of �0 for each n and
T , �nite sample properties of both estimators are summarized in Table 1 and Table 3, where Table 1 is for
the magnitude of biases and Table 3 is for the standard errors of the estimator.
We can see that both estimators have some biases, but the bias corrected estimators reduce those biases.
This is consistent with our asymptotic analysis, because the bias corrected estimator will eliminate the bias
of order O(T�1). Also, the bias reduction is achieved while there is no signi�cant increase in the variance
of the estimator, as can be seen from Table 3. Also, the estimated variances and the empirical variances are
close, so the estimated variances constructed from the Hessian matrix may be reliable.
For di¤erent cases of n and T , we can see that for each given n, when T is larger, the biases of the two
sets of estimators will be smaller and the variance will be smaller; for each given T , when n is larger, the
biases of the two sets of estimators will not be smaller but the variance will be smaller. This is consistent
with our theoretical prediction, because the bias is of the order O(T�1) and the variance of the estimator is
of the order O((nT )�1).
4 An Alternative Approach: Direct Estimation
An alternative approach is to estimate both the individual e¤ects and the time e¤ects directly11 , which
will yields bias of the order O(max(n�1; T�1)).
4.1 Model, Likelihood Function, FOC and SOC
Denote �T = (�1; �2; � � � ; �T ) as the time e¤ects, the likelihood function of (2.1) is
lnLdn;T (�; cn;�T ) = �nT
2ln 2� � nT
2ln�2 + T ln jSn(�)j �
1
2�2
TXt=1
V 0nt(�; cn;�T )Vnt(�; cn;�T ), (4.1)
where Vnt(�; cn;�T ) = Sn(�)Ynt � Znt� � cn � �tln. We will use the concentrating approach to estimatethe common parameters. As is derived in Appendix E.1, the likelihood function with both cn and �Tconcentrated out is
lnLdn;T (�) = �nT
2ln 2� � nT
2ln�2 + T ln jSn(�)j �
1
2�2
TXt=1
~V 0nt(�)Jn ~Vnt(�). (4.2)
10The empirical variance is the diagonal elements of 11000
P1000i=1 [(�nT �mean(�nT ))(�nT �mean(�nT ))0]i and the estimated
variance is the diagonal elements of 11000
P1000i=1 (�
�1�nT ;nT
)i.11Remark: There is an underidenti�cation problem on cn and �T . Because components of cn and �T appear additively,
there is a normalization issue in terms of location. By adding a constant to all components of cn and subtracting the sameconstant from all components of �T , the di¤erent sets of �xed e¤ects can not be identi�ed. So a proper normalization is neededin order to identify �xed e¤ects. A convenient location normalization is to set the sum of the components of cn to be zero, i.e.,the elements of cn have a zero (empirical) mean.
12
For the concentrated likelihood function (4.2), the �rst order derivatives are in (E.3) and (E.4) in Ap-
pendix E.1. Hence, for the �rst order derivative evaluated at �0, we have
1pnT
@ lnLdn;T (�0)
@�=
0BBBBBB@1�20
1pnT
TPt=1
~Z 0ntJn ~Vnt
1�20
1pnT
TPt=1(Gn ~Znt�0)
0Jn ~Vnt +1�20
1pnT
TPt=1( ~V 0ntG
0nJn
~Vnt � �20trGn)
12�40
1pnT
TPt=1( ~V 0ntJn
~Vnt � n�20)
1CCCCCCA , (4.3)
which is a linear and quadratic form of ~Vnt. For the information matrix of (4.2), denote �d�0;nT = �E�
1nT
@2 lnLdn;T (�0)
@�@�0
�and Hd
nT =1nT
TPt=1( ~Znt; Gn ~Znt�0)
0Jn( ~Znt; Gn ~Znt�0), we have12
�d�0;nT =1
�20
EHd
nT 0
0 0
!+
0B@ 0 0 0
0 1n
�tr(G0nJnGn) + tr(G
2n)�
1�20ntr(JnGn)
0 1�20ntr(JnGn)
12�40
1CA
�
0B@ 0 � �1�20nE(Gn �VnT )
0Jn �ZnT2�20nE�(Gn �ZnT �0)
0JnGn �VnT�+ 1
nT tr(G0nJnGn) �
1�40nE��Z 0nTJn
�VnT�0 1
�40nE�(Gn �ZnT �0)
0Jn �VnT�0+ 1
�20nTtr(JnGn)
1T�40
1CA .Here, Hd
nT is the covariance matrix of regressors of the reduced form (C.1) after demeaning from both the
time dimension and the cross sectional dimension (see Appendix C.1 for the reduced form).
4.2 Asymptotic Properties
Similarly as the transformation approach, we can get the asymptotic properties of the QML estimator.
To do that, we need to strengthen Assumption 7 such that both n and T are large. This is so in order that
the time dummies can be consistently estimated. But, compared to the transformation approach, we don�t
need the row normalization of the spatial weights matrix.
Assumption 1�. Wn is a nonstochastic spatial weights matrix.
Assumption 7�. n is an increasing function of T .
Theorem 4.1 Under Assumptions 1�,2�6,7�, if limT!1EHdnT is nonsingular or
limn!1
�1
nln���20S0�1n S�1n
��� 1
nln���2n(�)S0�1n (�)S�1n (�)
��� 6= 0 for � 6= �0,�0 is globally identi�ed and �
d
nTp! �0.
Proof. See Appendix E.3.
12The HdnT di¤ers from HnT in the transformation approach only in the division by n in Hd
nT instead of (n� 1) in HnT .
13
From (4.3) and (3.3), the score can be decomposed into 3 parts such that
1pnT
@ lnLdn;T (�0)
@�=
1pnT
@ lnLd(u)n;T (�0)
@���1;nT ��2;nT , (4.4)
where
1pnT
@ lnLd(u)n;T (�0)
@�=
0BBBBBB@1�20
1pnT
TPt=1
~Z(u)0nt JnVnt
1�20
1pnT
TPt=1(Gn ~Z
(u)nt �0)
0JnVnt +1�20
1pnT
TPt=1(V 0ntG
0nJnVnt � �20trJnGn)
12�40
1pnT
TPt=1(V 0ntJnVnt � (n� 1)�20)
1CCCCCCA , (4.5)
�1;nT =
rn
T
0BB@1�20
Tn (Jn
�UnT;�1; JnWn�UnT;�1; 0)
0 �VnT1�20
Tn (JnGn(
�UnT;�1; Wn�UnT;�1, 0)�0)0 �VnT + 1
�20
Tn�V 0nTG
0nJn
�VnT12�40
Tn�V 0nTJn
�VnT
1CCA , (4.6)
�2;nT =
0BB@0q
Tn (trGn � trJnGn)q
Tn
12�20
1CCA =
rT
n
0B@ 01
1��012�20
1CA . (4.7)
The asymptotic distribution of 1pnT
@ lnLd(u)n;T (�0)
@� can be derived from the central limit theorem for mar-
tingale di¤erence arrays (Theorem B.3). For the term �1;nT , from Theorem B.2, �1;nT =p
nT a�0;n;1 +
O(p
nT 3 ) +Op(
1pT) where
a�0;n;1 =
0BBBBBB@
1n tr
��JnP1
h=0Ahn
�S�1n
�1n tr
�Wn
�JnP1
h=0Ahn
�S�1n
�0
1n 0tr(Gn
�JnP1
h=0Ahn
�S�1n ) + 1
n�0tr(GnWn
�JnP1
h=0Ahn
�S�1n ) + 1
n trJnGnn�1n
12�20
1CCCCCCA (4.8)
is O(1) and
a�0;2 = (0;1
1� �0;1
2�20)0. (4.9)
The distribution of the QML estimator is as follows.
Theorem 4.2 Under Assumption 1�,2-6,7�,8,
pnT (�
d
nT ��0)+rn
Tb�0;nT;1+
rT
nb�0;nT;2+Op
max
rn
T 3;
rT
n3;
r1
T
!!d! N(0;��1�0 (��0+�0)�
�1�0),
(4.10)
where b�0;nT;1 = (�d�0;nT
)�1a�0;n;1 and b�0;nT;2 = (�d�0;nT
)�1a�0;2 are O(1) and a�0;n;1, a�0;2 are de�ned in
(4.8) and (4.9).
14
When nT ! 0,
n(�d
nT � �0) + b�0;nT;2p! 0. (4.11)
When nT ! k <1,
pnT (�
d
nT � �0) +pkb�0;nT;1 +
pk�1b�0;nT;2
d! N(0;��1�0 (��0 +�0)��1�0). (4.12)
When nT !1,
T (�d
nT � �0) + b�0;nT;1p! 0. (4.13)
Proof. See Appendix E.4.
From Equation (4:10), the QML estimator has the bias � 1T b�0;nT;1 and �
1nb�0;nT;2 where
b�0;nT;1 = (�d�0;nT )�1 � a�0;n;1, (4.14)
b�0;nT;2 = (�d�0;nT )�1 � a�0;2, (4.15)
and the con�dence interval is not centered when nT ! k where 0 < k <1. Furthermore, when T and n do
not have the same order, the presence of b�0;nT;1 and b�0;nT;2 causes �d
nT to have a degenerated distribution.
An analytical bias reduction procedure is to correct the bias B1;nT = �b�0;nT;1 and B2;nT = �b�0;nT;2 byconstructing estimator B1;nT , B2;nT and de�ning the bias corrected estimator as
�d1
nT = �d
nT �B1;nTT
� B2;nTn
. (4.16)
From Equation 4.10, we can choose
B1;nT =��(�d�;nT )�1 � a�;n;1
����=�
dnT
, (4.17)
B2;nT =��(�d�;nT )�1 � a�;2
����=�
dnT
. (4.18)
We show that when n=T 3 ! 0 and T=n3 ! 0, �d1
nT ispnT consistent and asymptotically centered normal.
Our result for the bias corrected estimator is as follows.
Theorem 4.3 Under Assumption 1�,2-6,7�,8,9, if nT 3 ! 0 and T
n3 ! 0,pnT (�
d1
nT � �0)d! N(0;��1�0 +�
�1�0�0�
�1�0). (4.19)
Proof. See Appendix E.5.
By comparing Theorem 4.2, Theorem 4.3 with Theorem 3.6 and Theorem 3.8, we can see that both esti-
mators are consistent and have the same limiting distribution. For the transformed approach, the estimator
has the bias of the order O(T�1) and the bias correction requires n=T 3 ! 0. For the direct approach, the
estimator has the bias of the order O(max(n�1; T�1)) and the bias correction requires not only n=T 3 ! 0
but also T=n3 ! 0. Hence, the transformed approach has advantage over the direct approach especially
when n is relatively small. But the direct approach also has its own merit in that: we don�t need the weights
matrix to be row normalized. We conduct a small Monte Carlo to compare the two estimators where we use
the same simulated data. Table 2 is the counterpart of Table 1 and Table 4 is the counterpart of Table 3.
From the tables, we can see that for the biases of estimates, the transformation approach is smaller than the
direct approach when n is relatively small.
15
Table 1: Biases of Transformation Approach Estimators
Case Bias of �nT (1st line) and �1
nT (2nd line)
T n �0 � � � �2
(1) 10 49 �a0 -0.0627 -0.0045 -0.0101 -0.0013 -0.1159
-0.0038 -0.0023 -0.0026 0.0004 -0.0276
(2) 10 49 �b0 -0.0697 -0.0093 -0.0132 -0.0086 -0.1183
-0.0037 -0.0001 -0.0023 -0.0029 -0.0307
(3) 50 49 �a0 -0.0119 -0.0013 0.0001 0.0000 -0.0217
0.0000 -0.0013 0.0004 0.0001 -0.0022
(4) 50 49 �b0 -0.0131 -0.0017 -0.0001 -0.0006 -0.0218
-0.0000 -0.0013 0.0004 -0.0004 -0.0023
(5) 50 16 �a0 -0.0121 -0.0032 -0.0013 -0.0018 -0.0247
-0.0002 -0.0033 -0.0009 -0.0018 -0.0052
(6) 50 16 �b0 -0.0134 -0.0043 -0.0013 -0.0017 -0.0247
-0.0004 -0.0042 -0.0008 -0.0015 -0.0052
Note: �a0 = (0:2, 0:2, 1, 0:2, 1) and �b0 = (0:3, 0:3, 1, 0:3, 1).
Table 2: Biases of Direct Approach Estimators
Case Bias of �d
nT (1st line) and �d1
nT (2nd line)
T n �0 � � � �2
(1) 10 49 �a0 -0.0610 -0.0014 -0.0092 -0.0312 -0.1318
-0.0035 0.0026 -0.0022 -0.0009 -0.0298
(2) 10 49 �b0 -0.0669 -0.0014 -0.0117 -0.0391 -0.1327
-0.0032 0.0115 -0.0015 -0.0008 -0.0333
(3) 50 49 �a0 -0.0100 0.0051 0.0010 -0.0296 -0.0394
0.0002 0.0001 0.0007 -0.0019 -0.0031
(4) 50 49 �b0 -0.0099 0.0108 0.0016 -0.0297 -0.0379
0.0002 0.0014 0.0007 -0.0007 -0.0033
(5) 50 16 �a0 -0.0067 0.0115 -0.0030 -0.0973 -0.0842
0.0012 0.0013 0.0006 -0.0199 -0.0130
(6) 50 16 �b0 -0.0032 0.0272 -0.0008 -0.1038 -0.0785
0.0026 0.0082 0.0014 -0.0213 -0.0117
Note: �a0 = (0:2, 0:2, 1, 0:2, 1) and �b0 = (0:3, 0:3, 1, 0:3, 1).
16
Table 3: Standard Errors of Transformation Approach Estimators
Case Empirical S.E. of �nT (1st line) and �1
nT (2nd line)
T n �0 � � � �2
(1) 10 49 �a0 0.0351 0.0666 0.0482 0.0539 0.0610
0.0373 0.0707 0.0484 0.0540 0.0671
(2) 10 49 �b0 0.0349 0.0643 0.0484 0.0514 0.0610
0.0375 0.0711 0.0488 0.0523 0.0671
(3) 50 49 �a0 0.0149 0.0270 0.0203 0.0228 0.0297
0.0150 0.0273 0.0203 0.0228 0.0303
(4) 50 49 �b0 0.0147 0.0252 0.0203 0.0215 0.0298
0.0149 0.0256 0.0203 0.0215 0.0304
(5) 50 16 �a0 0.0248 0.0504 0.0372 0.0412 0.0519
0.0250 0.0510 0.0372 0.0412 0.0529
(6) 50 16 �b0 0.0239 0.0484 0.0373 0.0403 0.0518
0.0241 0.0492 0.0372 0.0403 0.0529
Case Estimated S.E. of �nT (1st line) and �1
nT (2nd line)
T n �0 � � � �2
(1) 10 49 �a0 0.0324 0.0621 0.0456 0.0504 0.0572
0.0340 0.0654 0.0478 0.0510 0.0629
(2) 10 49 �b0 0.0323 0.0602 0.0457 0.0487 0.0571
0.0339 0.0639 0.0479 0.0491 0.0628
(3) 50 49 �a0 0.0142 0.0273 0.0204 0.0224 0.0283
0.0143 0.0276 0.0206 0.0225 0.0288
(4) 50 49 �b0 0.0139 0.0258 0.0204 0.0214 0.0283
0.0141 0.0262 0.0206 0.0215 0.0289
(5) 50 16 �a0 0.0254 0.0508 0.0366 0.0422 0.0504
0.0256 0.0513 0.0369 0.0423 0.0514
(6) 50 16 �b0 0.0247 0.0491 0.0366 0.0412 0.0504
0.0249 0.0497 0.0369 0.0413 0.0514
17
Table 4: Standard Errors of Direct Approach Estimators
Case Empirical S.E. of �d
nT (1st line) and �d1
nT (2nd line)
T n �0 � � � �2
(1) 10 49 �a0 0.0350 0.0668 0.0482 0.0529 0.0600
0.0373 0.0712 0.0484 0.0539 0.0670
(2) 10 49 �b0 0.0347 0.0646 0.0484 0.0509 0.0601
0.0376 0.0801 0.0490 0.0533 0.0669
(3) 50 49 �a0 0.0149 0.0270 0.0203 0.0225 0.0292
0.0150 0.0273 0.0203 0.0228 0.0303
(4) 50 49 �b0 0.0147 0.0252 0.0203 0.0209 0.0293
0.0149 0.0258 0.0203 0.0212 0.0304
(5) 50 16 �a0 0.0249 0.0499 0.0375 0.0376 0.0489
0.0251 0.0508 0.0373 0.0394 0.0525
(6) 50 16 �b0 0.0238 0.0483 0.0375 0.0381 0.0492
0.0241 0.0504 0.0373 0.0399 0.0525
Case Estimated S.E. of �d
nT (1st line) and �d1
nT (2nd line)
T n �0 � � � �2
(1) 10 49 �a0 0.0321 0.0615 0.0452 0.0488 0.0556
0.0340 0.0653 0.0478 0.0491 0.0622
(2) 10 49 �b0 0.0320 0.0596 0.0453 0.0471 0.0558
0.0338 0.0637 0.0478 0.0469 0.0623
(3) 50 49 �a0 0.0141 0.0270 0.0202 0.0217 0.0275
0.0143 0.0276 0.0206 0.0217 0.0286
(4) 50 49 �b0 0.0138 0.0255 0.0203 0.0208 0.0277
0.0141 0.0260 0.0206 0.0206 0.0287
(5) 50 16 �a0 0.0246 0.0488 0.0354 0.0376 0.0458
0.0255 0.0507 0.0368 0.0375 0.0495
(6) 50 16 �b0 0.0239 0.0471 0.0355 0.0366 0.0463
0.0248 0.0489 0.0368 0.0360 0.0498
18
5 Conclusion
This paper establishes asymptotic properties of quasi-maximum likelihood estimator for spatial dynamic
panel data with both individual e¤ects and time e¤ects when both the number of individuals n and the
number of time periods T can be large. A possible approach is to estimate both e¤ects directly. But instead
of the direct estimation, we propose a data transformation approach so that the magnitude of the bias is
only of the order O(T�1) rather than O(max(T�1; n�1)) in the direct estimation approach. When n is
asymptotically proportional to T , the estimator ispnT consistent and asymptotically normal, but the limit
distribution is not centered around 0; when T is relatively larger than n, the estimator ispnT consistent and
asymptotically centered normal; when T is relatively smaller than n, the estimator is consistent with rate T
and has a degenerate limit distribution. We also propose a bias correction for our estimator. We show that
when T grows faster than n1=3, the correction will asymptotically eliminate the bias and yield a centered
con�dence interval. This transformation approach has advantage over the direct approach especially when
n is relatively small. When n is relatively small, the estimates of the transformed approach has faster rates
of convergence than that of the direct estimates. The direct estimates have a degenerate limit distribution
but the transformed estimates are properly centered and are asymptotically normal.
19
Appendices
A Some Notes
A.1 Data Transformation
Let Jn = In � 1n lnl
0n be the "deviation from the group mean" transformation. As In = Jn + 1
n lnl0n and
Wnln = ln, we have
JnWn = JnWn(Jn +1
nlnl
0n) = JnWnJn,
because JnWnln = Jnln = 0. Then, from (2.1), for t = 1; 2; :::; T , we have
(JnYnt) = �0(JnWn)(JnYnt)+ 0(JnYn;t�1)+�0(JnWn)(JnYn;t�1)+(JnXnt)�0+(Jncn0)+(JnVnt); (A.1)
which does not involve the time e¤ects and Jncn0 can be regarded as the individual e¤ects. A special feature
of the transformed equation (A.1) is that the variance matrix of JnVnt is equal to �20Jn, which is of the rank
n� 1.The variance of JnVnt is �20JnJ
0n = �
20Jn and the elements of JnVnt are correlated. Here, Jn is singular
because Jn has rank n�1 as Jn is an orthogonal projector with trace n�1. Hence, there is a linear dependenceamong the elements of JnVnt. An e¤ective estimation method shall eliminate the linear dependence in sample
observations. This can be done with the eigenvalues and eigenvectors decomposition as in the theory of
generalized inverse for the estimation of linear regression models (see, e.g., Theil (1971), Ch. 6).
The eigenvalues of Jn are a single zero and n�1 ones. An eigenvector corresponding to the zero eigenvalueis proportional to ln. Let (Fn;n�1, ln=
pn) be the orthonormal matrix of Jn where Fn;n�1 corresponds to
the eigenvalues of ones and ln=pn corresponds to the eigenvalue zero. Thus,
JnFn;n�1 = Fn;n�1; F 0n;n�1Fn;n�1 = In�1; Jnln = 0;
F 0n;n�1ln = 0; Fn;n�1F0n;n�1 +
1n lnl
0n = In; Fn;n�1F
0n;n�1 = Jn.
(A.2)
To eliminate the linear dependence in JnVnt, the transformation of JnYnt to Y �nt, where Y�nt = F
0n;n�1JnYnt
is a vector with dimension n� 1, gives
Y �nt = �0F0n;n�1(JnWn)(JnYnt) + 0F
0n;n�1(JnYn;t�1) + �0F
0n;n�1(JnWn)(JnYn;t�1)
+F 0n;n�1JnXnt�0 + F0n;n�1Jncn0 + F
0n;n�1JnVnt
= �0F0n;n�1JnWnJnYnt + 0Y
�n;t�1 + �0F
0n;n�1JnWnJnYn;t�1 +X
�nt�0 + c
�n0 + V
�nt, (A.3)
where Y �nt = F 0n;n�1JnYnt, X�nt = F 0n;n�1JnXnt, c
�n0 = F 0n;n�1Jncn0 and V
�nt = F 0n;n�1JnVnt. Observe
that JnWn = JnWn(Fn;n�1F0n;n�1 +
1n lnl
0n) = JnWnFn;n�1F
0n;n�1 because Jnln = 0. Denote W �
n =
F 0n;n�1JnWnFn;n�1 = F0n;n�1WnFn;n�1 (as F 0n;n�1ln = 0). One arrives at
20
Y �nt = �0W�nY
�nt + 0Y
�n;t�1 + �0W
�nY
�n;t�1 +X
�nt�0 + c
�n0 + V
�nt, (A.4)
where V �nt is a (n� 1) dimensional disturbance vector with zero mean and variance matrix �20In�1. Further-more, as l0nJnYnt = 0, l
0nJnWnYnt = 0, l0nJnXnt = 0, the singularity of JnVnt does not impose deterministic
constraints on the parameters (Theil (1971)). This is expected as (A.4) is derived by eliminating the time
e¤ects and there is no deterministic constraints imposed on the parameters in (2.1) to begin with.
Equation (A.4) shall provide the estimation of the structural parameters in the model. This equation is
in the format of a typical spatial autoregressive model in panel data, where the number of observations is
T (n� 1), reduced from the original sample observations by one for each period. Equation (A.4) is useful as
it motivates the derivation of the likelihood function for Y �nt in our approach.
A.2 Determinant and Inverse of In�1 � �W �n
We note that (In�1 � �W �n) = F
0n;n�1(In � �Wn)Fn;n�1. For the determinant, we have
[Fn;n�1; ln=pn]0(In � �Wn)[Fn;n�1; ln=
pn]
=
F 0n;n�1(In � �Wn)Fn;n�1 F 0n;n�1(In � �Wn)(ln=
pn)
(l0n=pn)(In � �Wn)Fn;n�1 (l0n=
pn)(In � �Wn)(ln=
pn)
!
=
F 0n;n�1(In � �Wn)Fn;n�1 0
��l0nWnFn;n�1=pn 1� �
!,
because F 0n;n�1Wnln = F0n;n�1ln = 0 and
1n l0nWnln = 1. Hence,
jIn�1 � �W �n j =
��F 0n;n�1(In � �Wn)Fn;n�1�� = 1
1� � jIn � �Wnj . (A.5)
Thus, the tractability in computing the determinant of In�1 � �W �n is exactly that of In � �Wn. When
Wn is constructed as a weights matrix row normalized from an original symmetric matrix, Ord (1975) has
suggested a computationally tractable method for the evaluation of jIn � �Wnj at various � for the MLmethod. Thus, this will also be useful for evaluating the determinant of (In�1 � �W �
n) even though the row
sums of the transformed spatial weights matrix W �n may not even be unity.
Furthermore, a spatial autoregressive model is an equilibrium model in the sense that the observed
outcomes are determined by the equation. That is, the matrix In�1 � �W �n shall be invertible. For the
transformed equation (2.6), In�1 � �W �n is invertible as long as the original matrices In � �Wn in (2.1) is
invertible. This is so as it shall be shown below.
The inverse of In�1 � �W �n is
(In�1 � �W �n)�1 = F 0n;n�1(In � �Wn)
�1Fn;n�1, (A.6)
21
because we have
(In�1 � �W �n) � F 0n;n�1(In � �Wn)
�1Fn;n�1
= F 0n;n�1(In � �Wn)Fn;n�1F0n;n�1(In � �Wn)
�1Fn;n�1
= F 0n;n�1(In � �Wn)Jn(In � �Wn)�1Fn;n�1
= F 0n;n�1Fn;n�1 � F 0n;n�1(In � �Wn)lnl
0n
n(In � �Wn)
�1Fn;n�1
= In�1,
as F 0n;n�1ln = 0 and F0n;n�1Wnln = 0. We have also
W �n(In�1 � �W �
n)�1 = F 0n;n�1WnFn;n�1F
0n;n�1(In � �Wn)
�1Fn;n�1
= F 0n;n�1WnJn(In � �Wn)�1Fn;n�1 = F
0n;n�1Wn(In � �Wn)
�1Fn;n�1,
and
(In�1 � �W �n)�1W �
n =W�n(In�1 � �W �
n)�1 = F 0n;n�1Wn(In � �Wn)
�1Fn;n�1.
A.3 About tr[(JnGn(�))m]
As is shown in (A.6) that S��1n (�) = (In � �W �n)�1 = F 0n;n�1S
�1n (�)Fn;n�1, it follows that G�n(�) =
W �nS
��1n (�) = F 0n;n�1WnFn;n�1 � F 0n;n�1S�1n (�)Fn;n�1 = F 0n;n�1WnJnS
�1n (�)Fn;n�1 = F 0n;n�1Gn(�)Fn;n�1
because F 0n;n�1WnJn = F0n;n�1Wn. Furthermore, as Gn(�)ln = 1
(1��) ln and F0n;n�1Gn(�)ln = 0, by induc-
tion, one has
G�mn (�) = F 0n;n�1Gmn (�)Fn;n�1; m = 0; 1; 2; � � � :
From this, tr(G�mn (�)) = tr(JnGmn (�)) because Fn;n�1F
0n;n�1 = Jn. Furthermore, because Gmn (�)ln =
1(1��)m ln,
tr(G�mn (�))� tr(Gmn (�)) = �1
ntr(Gmn (�)lnl
0n) = �
1
n(1� �m) tr(lnl0n) = �
1
(1� �m) ; m = 1; 2; � � � :
B Lemmas for Some Statistics in the Model
The following lemmas and theorems can be found in Yu, de Jong and Lee (2006).
Let Vnt = (v1t; v2t; � � � ; vnt)0 be n � 1 column vector. We assume that fvitg, i = 1; 2; � � � ; n and t =1; 2; � � � ; T; are i:i:d: across i and t with zero mean, variance �20 and E jvitj
4+�< 1 for some � > 0. Also,
let Dnt be n� 1 vector of uniformly bounded constants for all n and t. Denote
Unt =1Xh=1
PnhVn;t+1�h, (B.1)
where fPnhg1h=1 is a sequence of n� n nonstochastic square matrices.Assumption A1. The disturbances fvitg, i = 1; 2; :::; n and t = 1; 2; :::; T; are i:i:d across i and t with
zero mean, variance �20 and E jvitj4+�
<1 for some � > 0.
22
Assumption A2. The row and column sums ofP1
h=1 abs(Phn) are bounded uniformly in n.
Assumption A3. The elements of n� 1 vector Dnt are nonstochastic and bounded, uniformly in n andt.
Assumption A4. n is a nondecreasing function of T .
Lemma B.1 Under Assumptions A1 and A4, for an n� n nonstochastic matrix Bn, uniformly bounded inrow and column sums,
1
nT
TXt=1
V 0ntBnVnt � E(1
nT
TXt=1
V 0ntBnVnt) = Op
�1pnT
�, (B.2)
1
n�V 0nTBn �VnT � E(
1
n�V 0nTBn �VnT ) = Op
�1pnT 2
�, (B.3)
and1
nT
TXt=1
~V 0ntBn~Vnt � E(
1
nT
TXt=1
~V 0ntBn~Vnt) = Op
�1pnT
�; (B.4)
where E( 1nT
PTt=1 V
0ntBnVnt) =
1n�
20tr(Bn) = O(1) and E(
1n�V 0nTBn
�VnT ) =1nT �
20tr(Bn) = O
�T�1
�.
Theorem B.2 Under Assumptions A1, A2 and A4,rT
n
��U0nT;�1 �VnT � E
��U0nT;�1 �VnT
��= Op
�1pT
�, (B.5)
whereq
TnE
��U0nT;�1 �VnT
�=p
nT1n�
20tr (
P1h=1 Pnh) +O
�pnT 3
�when T !1.
For the theorem that follows, we will consider the following form:
QnT =TXt=1
�U0n;t�1Vnt +D0
ntVnt + V0ntBnVnt � �20trBn
�=
TXt=1
nXi=1
znt;i;
where Bn is a n � n nonstochastic symmetric matrix which is uniformly bounded in both row and columnsums, and znt;i = (ui;t�1 + dnti)vit + bn;ii(v2it � �20) + 2(
Pi�1j=1 bn;ijvjt)vit, where bn;ij is the (i; j) element of
Bn and dnti is the ith element of Dnt. Then, for the mean and variance of QnT , �QnT= 0 and
�2QnT= T�40tr
1Xh=1
P 0nhPnh
!+�20
TXt=1
D0ntDnt+T
��4 � 3�40
� nXi=1
b2n;ii + 2�40tr(B
2n)
!+2�3
TXt=1
nXi=1
dntibn;ii;
where �s = Evsit for s = 3; 4.
Theorem B.3 Under Assumptions A1, A2, A3, A4 and that row and column sums of Bn are bounded
uniformly in n, if the sequence 1nT �
2QnT
is bounded away from zero, then,
QnT�QnT
d! N(0; 1). (B.6)
23
Denote Znt = (Yn;t�1; WnYn;t�1; Xnt), we are going to provide some lemmas related to Jn ~Znt, Jn �ZnTand ~Vnt, �VnT of the model (2.1).
Lemma B.4 Under Assumptions 1-7, for an n�n nonstochastic matrix Bn, uniformly bounded in row andcolumn sums,
1
nT
TXt=1
(Jn ~Znt)0Bn(Jn ~Znt)� E
1
nT
TXt=1
(Jn ~Znt)0Bn(Jn ~Znt) = Op
�1pnT
�, (B.7)
where E 1nT
PTt=1(Jn
~Znt)0Bn(Jn ~Znt) is O(1);
1
nT
TXt=1
(Jn ~Znt)0Bn ~Vnt � E
1
nT
TXt=1
(Jn ~Znt)0Bn ~Vnt = Op
�1pnT
�, (B.8)
where E 1nT
PTt=1(Jn
~Znt)0Bn ~Vnt is O
�1T
�.
Lemma B.5 Under Assumptions 1-7, for an n�n nonstochastic matrix Bn, uniformly bounded in row andcolumn sums,
1
n(Jn �ZnT )
0Bn(Jn �ZnT )� E1
n(Jn �ZnT )
0Bn(Jn �ZnT ) = Op
�1pnT
�, (B.9)
where E 1n (Jn
�ZnT )0Bn(Jn �ZnT ) is O(1);
1
n(Jn �ZnT )
0Bn �VnT � E1
n(Jn �ZnT )
0Bn �VnT = Op
�1pnT
�, (B.10)
where E 1n (Jn
�ZnT )0Bn �VnT is O
�T�1
�.
Also, from Equation (3.3),
Jn ~Znt = Jn ~Z(u)nt � (Jn �UnT;�1; JnWn
�UnT;�1; 0),
where ~Z(u)nt = ((�Xn;t�1 + Un;t�1); (Wn
�Xn;t�1 +WnUn;t�1); ~Xnt) with
�Xn;t�1 = Xn;t�1 � �XnT;�1. Hence
JnZnt has two components: one is Jn ~Z(u)nt , which is uncorrelated with Vnt; the other is�(Jn �UnT;�1 JnWn
�UnT;�1 0),
but is correlated with Vnt when t � T � 1. Following is a lemma related to ~Z(u)nt and ~Znt.
Lemma B.6 Under Assumptions 1-7, for an n�n nonstochastic matrix Bn, uniformly bounded in row andcolumn sums,
E1
nT
TXt=1
~Z 0ntJnBnJn ~Znt � E1
nT
TXt=1
~Z(u)0nt JnBnJn ~Z
(u)nt = O
�1
T
�, (B.11)
where E 1nT
PTt=1
~Z(u)0nt JnBnJn ~Z
(u)nt is O(1).
24
C Concentrated QML of Transformation Approach
C.1 Reduced Form of (2.1)
From Equation (2.1), we have Ynt = S�1n (Znt�0 + cn0 + �tln + Vnt) and WnYnt = GnZnt�0 + Gncn0 +
�tGnln +GnVnt. Hence, using S�1n = In + �0Gn, we have Ynt = Znt�0 + �0GnZnt�0 + S�1n cn0 + �tS�1n ln +
S�1n Vnt. Using S�1n ln =1
1��0 ln and Jnln = 0, we have
~Ynt = ~Znt�0 + �0Gn ~Znt�0 +~�t
1� �0ln + S
�1n~Vnt,
and
Jn ~Ynt = Jn ~Znt�0 + �0JnGn ~Znt�0 + JnS�1n~Vnt. (C.1)
Similarly, as we have Wn~Ynt = Gn ~Znt�0 + ~�tGnln +Gn ~Vnt, we can get
JnWn~Ynt = JnGn ~Znt�0 + JnGn ~Vnt, (C.2)
because JnGnln = JnWnS�1n ln = JnWnJnS
�1n ln = 0.
C.2 FOC and SOC of MLE
For the concentrated likelihood function (2.11), the �rst order derivatives are
@ lnLn;T (�)
@�=
0B@@ lnLn;T (�)
@�@ lnLn;T (�)
@�@ lnLn;T (�)
@�2
1CA =
0BBBBBB@1�2
TPt=1(Jn ~Znt)
0 ~Vnt(�)
1�2
TPt=1
�(JnWn
~Ynt)0 ~Vnt(�)
�� TtrJnGn(�)
12�4
TPt=1( ~V 0nt(�)Jn ~Vnt(�)� (n� 1)�2)
1CCCCCCA ,
and the second order derivatives are
@2 lnLn;T (�)
@�@�0
= �
0BBBBBB@1�2
TPt=1
~Z 0ntJn ~Znt1�2
TPt=1
~Z 0ntJnWn~Ynt
1�4
TPt=1
~Z 0ntJn ~Vnt(�)
� 1�2
TPt=1
�(Wn
~Ynt)0JnWn
~Ynt
�+ Ttr((JnGn(�))
2) 1�4
TPt=1(Wn
~Ynt)0Jn ~Vnt(�)
� � � (n�1)T2�4 + 1
�6
TPt=1
~V 0nt(�)Jn ~Vnt(�)
1CCCCCCA .
Using trGn(�) � tr(JnGn(�)) = 11�� and tr(G
2n(�)) � tr((JnGn(�))2) = 1
(1��)2 (see Appendix A.3), we
can get
25
@ lnLn;T (�)
@�=
0B@@ lnLn;T (�)
@�@ lnLn;T (�)
@�@ lnLn;T (�)
@�2
1CA =
0BBBBBB@1�2
TPt=1(Jn ~Znt)
0 ~Vnt(�)
1�2
TPt=1
�(JnWn
~Ynt)0 ~Vnt(�)
�� TtrGn(�) + T
1��
12�4
TPt=1( ~V 0nt(�)Jn ~Vnt(�)� (n� 1)�2)
1CCCCCCA , (C.3)
and the second order derivatives are
@2 lnLn;T (�)
@�@�0(C.4)
= �
0BBBBBB@1�2
TPt=1
~Z 0ntJn ~Znt1�2
TPt=1
~Z 0ntJnWn~Ynt
1�4
TPt=1
~Z 0ntJn ~Vnt(�)
� 1�2
TPt=1
�(Wn
~Ynt)0JnWn
~Ynt
�+ Ttr(G2n(�))� T
(1��)21�4
TPt=1(Wn
~Ynt)0Jn ~Vnt(�)
� � � (n�1)T2�4 + 1
�6
TPt=1
~V 0nt(�)Jn ~Vnt(�)
1CCCCCCA .
C.3 The Variance of the Gradient
We are going to show that for 1p(n�1)T
@ lnL(u)n;T (�0)
@� , E( 1p(n�1)T
@ lnL(u)n;T (�0)
@� � 1p(n�1)T
@ lnL(u)n;T (�0)
@�0 ) =
��0;nT + �0;n + O�T�1
�where �0;n =
�4�3�40�40
0BBB@0 0 0
0 1n�1
nPi=1
[(JnGn)2]ii
12�20(n�1)
trJnGn
0 12�20(n�1)
trJnGn14�40
1CCCA. First,
we can write E( 1p(n�1)T
@ lnL(u)n;T (�0)
@� � 1p(n�1)T
@ lnL(u)n;T (�0)
@�0 ) =
E 1(n�1)T
0BBBBBBB@
1�40
�TPt=1
~Z(u)0nt JnVnt
��TPt=1
~Z(u)0nt JnVnt
�0� �
1�40
�TPt=1(Gn ~Z
(u)nt �0)
0JnVnt +TPt=1(V 0ntG
0nJnVnt � �20trJnGn)
��TPt=1
~Z(u)0nt JnVnt
�00 0
12�60
�TPt=1(V 0ntJnVnt � (n� 1)�20)
��TPt=1
~Z(u)0nt JnVnt
�00 0
1CCCCCCCA
+E 1(n�1)T
0BBBBB@0 0 0
0 1�40
�TPt=1(Gn ~Z
(u)nt �0)
0JnVnt +TPt=1(V 0ntG
0nJnVnt � �20trJnGn)
�2�
0 12�40
�TPt=1(Gn ~Z
(u)nt �0)
0JnVnt +TPt=1(V 0ntG
0nJnVnt � �20trJnGn)
��TPt=1(V 0ntJnVnt � (n� 1)�20)
�00
1CCCCCA
+E 1(n�1)T
0BBB@0 0 0
0 0 0
0 0 14�80
�TPt=1(V 0ntJnVnt � (n� 1)�20)
��TPt=1(V 0ntJnVnt � (n� 1)�20)
�01CCCA. As ~Z(u)nt is un-
26
correlated with Vnt, we have E( 1p(n�1)T
@ lnL(u)n;T (�0)
@� � 1p(n�1)T
@ lnL(u)n;T (�0)
@�0 )
=
0BBBBBBB@
1�20(n�1)T
ETPt=1
~Z(u)0nt Jn ~Z
(u)nt � �
1�20(n�1)T
ETPt=1(Gn ~Z
(u)nt �0)
0Jn ~Z(u)nt
0@ 1�20(n�1)T
ETPt=1(Gn ~Z
(u)nt �0)
0JnGn ~Z(u)nt �0
+ 1n�1
�tr(G0nJnGn) + tr((JnGn)
2)�
1A �
0 1�20(n�1)
tr(JnGn)12�40
1CCCCCCCA
+
0BBBB@0 � �
�3�40(n�1)T
nPi=1
(JnGn)iiE(TPt=1Jn ~Z
(u)nt )i
2�3�40(n�1)T
nPi=1
(JnGn)iiE(TPt=1JnGn ~Z
(u)nt �0)i +
�4�3�40�40(n�1)
nPi=1
(JnGn)2ii �
(1� 1n )
�32�60(n�1)T
l0nETPt=1Jn ~Z
(u)nt (1� 1
n )1
2�60(n�1)T�3l
0nE
TPt=1JnGn ~Z
(u)nt �0 +
�4�3�402�60(n�1)
trJnGn�4�3�404�80
1CCCCA.The �rst matrix is equal to ��0;nT + O
�T�1
�using Lemma B.6. The second matrix is equal to �0;n
using ETPt=1
~Z(u)nt = 0 and E
TPt=1Gn ~Z
(u)nt �0 = 0. Hence, E( 1p
(n�1)T@ lnL
(u)n;T (�0)
@� � 1p(n�1)T
@ lnL(u)n;T (�0)
@�0 ) =
��0;nT +�0;n+O�T�1
�. When Vnt are normally distributed, �0;n = 0 because �4� 3�40 = 0 for a normal
distribution.
C.4 About � 1(n�1)T
@2 lnLnT (�)@�@�0
Denote k� � �0k as the Euclidean norm of � � �0, and �1 as a neighborhood of �0, then, we have
1
(n� 1)T@2 lnLnT (�)
@�@�0� 1
(n� 1)T@2 lnLnT (�0)
@�@�0= k� � �0k �Op(1), (C.5)
1
(n� 1)T@2 lnLnT (�0)
@�@�0+��0;nT = Op
�1pnT
�, (C.6)
sup�2�
���� 1
(n� 1)T@2 lnLnT (�)
@�@�0� 1
(n� 1)T E@2 lnLnT (�)
@�@�0
����ij
= Op
�1pnT
�, (C.7)
and
sup�2�1
���� 1
(n� 1)T E@2 lnLnT (�)
@�@�0+��0;nT
����ij
= sup�2�1
k� � �0k �O(1) (C.8)
for all i; j = 1; 2; � � � ; kx + 4.These are Equations (C.7) to (C.10) in Yu, de Jong and Lee (2006).
D Proofs for Theorems
D.1 Proof of Claim 3.1
To prove 1(n�1)T lnLn;T (�)�Qn;T (�)
p! 0 uniformly in � in any compact parameter space �:
From ~Vnt(�) � Sn(�) ~Ynt � ~Znt� and ~Vnt = Sn ~Ynt � ~Znt�0, using Jnln = 0, we have
27
Jn ~Vnt(�) = Jn ~Vnt � (�� �0)JnWn~Ynt � Jn ~Znt(� � �0). (D.1)
Then,
~V 0nt(�)Jn ~Vnt(�) = ~V 0ntJn ~Vnt + (�� �0)2(Wn~Ynt)
0JnWn~Ynt + (� � �0)0 ~Z 0ntJn ~Znt(� � �0) (D.2)
+2(�� �0)(Wn~Ynt)
0Jn ~Znt(� � �0)� 2(�� �0)(Wn~Ynt)
0Jn ~Vnt � 2(� � �0)0 ~Z 0ntJn ~Vnt
where, using (C.2), we have
(Wn~Ynt)
0JnWn~Ynt = (Gn ~Znt�0)
0Jn(Gn ~Znt�0) + 2(Gn ~Znt�0)0JnGn ~Vnt + (Gn ~Vnt)
0JnGn ~Vnt.
Using Lemma B.1 and Lemma B.4,1
(n�1)TPT
t=1~V 0ntJn ~Vnt � E 1
(n�1)TPT
t=1~V 0ntJn ~Vnt
p! 0,1
(n�1)TPT
t=1(Wn~Ynt)
0JnWn~Ynt � E 1
(n�1)TPT
t=1(Wn~Ynt)
0JnWn~Ynt
p! 0,1
(n�1)TPT
t=1~Z 0ntJn ~Znt � E 1
(n�1)TPT
t=1~Z 0ntJn ~Znt
p! 0,1
(n�1)TPT
t=1(Wn~Ynt)
0Jn ~Znt � E 1(n�1)T
PTt=1(Wn
~Ynt)0Jn ~Znt
p! 0,1
(n�1)TPT
t=1(Wn~Ynt)
0Jn ~Vnt � E 1(n�1)T
PTt=1(Wn
~Ynt)0Jn ~Vnt
p! 0 and1
(n�1)TPT
t=1~Z 0ntJn ~Vnt � E 1
(n�1)TPT
t=1~Z 0ntJn ~Vnt
p! 0.
As � is compact so that �, � are bounded in �, we have
1
(n� 1)T
TXt=1
~V 0nt(�)Jn~Vnt(�)�
1
(n� 1)T ETXt=1
~V 0nt(�)Jn~Vnt(�)
p! 0 uniformly in � in �.
Also, from (3.1) and by using the fact that �2 is bounded away from zero in �,
1
(n� 1)T lnLn;T (�)�Qn;T (�) = �1
2�2
1
(n� 1)T
TXt=1
~V 0nt(�)Jn~Vnt(�)�
1
(n� 1)T ETXt=1
~V 0nt(�)Jn~Vnt(�)
!p! 0
uniformly in � in �.
To prove Qn;T (�) is uniformly equicontinuous in � in any compact parameter space �:
We have
QnT (�) = �12ln 2� � 1
2ln�2 � 1
n� 1 ln(1� �)
+1
n� 1 ln jSn(�)j �1
2�2(n� 1)T ETXt=1
~V 0nt(�)Jn ~Vnt(�):
As Jn ~Vnt(�) = JnhSn(�) ~Ynt � ~Znt�
iand ~Ynt = S�1n ~Znt�0 + S
�1n~Vnt +
~�t01��0 ln,
Jn ~Vnt(�) = Jn[Sn(�)S�1n~Znt�0 � ~Znt� + Sn(�)S
�1n~Vnt]
28
because Jnln = 0. Hence,
E1
(n� 1)T
TXt=1
~V 0nt(�)Jn ~Vnt(�) (D.3)
=1
(n� 1)T ETXt=1
(Sn(�)S�1n~Znt�0 � ~Znt�)
0Jn(Sn(�)S�1n~Znt�0 � ~Znt�)
+1
n� 1T � 1T
�20tr(S�10n S0n(�)JnSn(�)S
�1n )
+2
(n� 1)T ETXt=1
(Sn(�)S�1n~Znt�0 � ~Znt�)
0JnSn(�)S�1n~Vnt.
The third term 2(n�1)T E
TPt=1(Sn(�)S
�1n~Znt�0� ~Znt�)0JnSn(�)S�1n ~Vnt is O
�T�1
�according to expected values
in Lemma B.4 and the order O�T�1
�is uniformly in � in � because it is a polynomial function in � and � is a
bounded set. The �rst term is equal to (�0��00; ���0)EHnT (�0��00; ���0)0 using Sn(�)S�1n = In�(���0)Gn;
the second term is equal to T�1T �2n(�) where �
2n(�) =
�20n�1 tr(S
0�1n S0n(�)JnSn(�)S
�1n ), which are all polynomial
functions of �. To prove Qn;T (�) is uniformly equicontinuous in � in the compact parameter space �, the
followings are su¢ cient: (1) ln�2 is uniformly continuous; (2) 1n ln jSn(�)j is uniformly equicontinuous; (3)
(�0 � �00; �� �0)HnT (�0 � �00; �� �0)0 is uniformly equicontinuous; (4) �2n(�) is uniformly equicontinuous.
(1) is obvious because �2 is bounded away from zero in �. For (2), 1n ln jSn(�2)j �
1n ln jSn(�1)j =
1n tr
�WnS
�1n
�����(�2 � �1) where �� lies between �2 and �1. As S�1n (�) is uniformly bounded in row and
column sums, uniformly in � 2 �, 1n tr�WnS
�1n
�����is bounded, we have 1
n ln jS(�)j is uniformly equicontin-uous. For (3), because � and � are bounded and because EHnT is O(1) according to Lemma B.4, the result
follows. For (4), �2n(�2) � �2n(�1) =�20n�1 tr(S
0�1n S0n(�2)JnSn(�2)S
�1n ) � �20
n�1 tr(S0�1n S0n(�1)JnSn(�1)S
�1n ).
Using Sn(�)S�1n = In � (�� �0)Gn,
�2n(�2)� �2n(�1) = �20�(�2 � �1) (�2 + �1 � 2�0)
trG0nJnGnn
� (�2 � �1)tr (G0nJn + JnGn)
n
�:
As G0nJnGn and JnGn are uniformly bounded in row and column sums, �2n(�) is uniformly equicontinuous.
�
D.2 Proof of nonsingularity of the information matrix
We can prove the result by using an argument by contradiction. For ��0 � limT!1��0;nT , where ��0;nTis (3.2), we need to prove that ��0� = 0 implies � = 0 where � = (�
01; �2; �3)
0 and �2; �3 are scalars and �1is (kx + 2) � 1 vector. If this is true, then, columns of ��0 would be linear independent so that ��0 would
be nonsingular. Denote H� = plimT!11
(n�1)T
TPt=1
~Z 0ntJn ~Znt, H�� = plimT!11
(n�1)T
TPt=1
~Z 0ntJnGn ~Znt�0,
29
H�� = H0�� and H� = plimT!1
1(n�1)T
TPt=1(Gn ~Znt�0)
0JnGn ~Znt�0. Then
��0 =1
�20
0B@ H� H�� 0
H�� EH� + limn!1�20n�1
�tr(G0nJnGn) + tr((JnGn)
2)�
limn!11
n�1 tr(JnGn)
0 limn!11
n�1 tr(JnGn)12�20
1CA .Hence, ��0� = 0 implies
H� � �1 +H�� � �2 = 0,1�20H�� � �1 +
�1�20H� + limn!1
1n�1
�tr(G0nJnGn) + tr((JnGn)
2)��� �2 + limn!1
1�20(n�1)
tr(JnGn)� �3 = 0,
limn!11
(n�1) tr(JnGn)� �2 +12�20
� �3 = 0.
From the �rst equation, �1 = �(H�)�1H����2; from the third equation, �3 = �2 limn!1
�20n�1 tr(JnGn)��2.
By eliminating �1 and �3, the remaining equation becomes
��1
�20
�H� �H��H�1
� H��
��+ limn!1
1
n� 1
�tr(G0nJnGn) + tr((JnGn)
2)� 2 tr2(JnGn)
n� 1
��� �2 = 0.
Denote Cn = JnGn � tr(JnGn)n�1 Jn, then,
tr(G0nJnGn) + tr((JnGn)2)� 2tr
2(JnGn)
n� 1
=1
2tr(C 0n + Cn)(C
0n + Cn)
0,
which is nonnegative. Hence, if the limit of EHnT is nonsingular or the limit of 1n�1 (tr(G
0nJnGn) +
tr((JnGn)2) � 2tr2(JnGn)
n�1 ) is nonzero, we have �2 = 0 and hence � = 0. This proves the nonsingularity
of ��0 . �
D.3 Proof of Theorem 3.2
As ETPt=1
~V 0ntJn ~Vnt = (n� 1)(T � 1)�20 according to Lemma B.1, at �0, the expected log likelihood from
(3.1) implies
E lnLn;T (�0) = �(n� 1)T
2ln 2� � (n� 1)T
2ln�20 � T ln(1� �0) + T ln jSnj �
(n� 1)(T � 1)2
.
Denote �2n(�) =�20n�1 tr(S
�10n S0n(�)JnSn(�)S
�1n ). By using Sn(�)S�1n = In + (�0 � �)Gn for (D.3), it follows
that1
(n�1)T E lnLn;T (�)�1
(n�1)T E lnLn;T (�0)
30
= � 12 (ln�
2 � ln�20) + 1n�1 ln jSn(�)j �
1n�1 ln jSnj �
�12�2
1(n�1)T
TPt=1E ~V 0nt(�)Jn ~Vnt(�)� T�1
2T
�� 1n�1 (ln(1� �)� ln(1� �0))
= T1;n(�; �2)� 1
2�2T2;n;T (�; �) +O(T�1),
where
T1;n(�; �2) = �1
2(ln�2�ln�20)+
1
n� 1 ln jSn(�)j�1
n� 1 ln jSnj�1
2�2(�2n(�)��2)�
1
n� 1(ln(1��)�ln(1��0)),
and
T2;n;T (�; �) =1
(n� 1)T
TXt=1
En[ ~Znt(�0 � �) + (�0 � �)Gn ~Znt�0]0Jn[ ~Znt(�0 � �) + (�0 � �)Gn ~Znt�0]
o= ((�0 � �)0; (�0 � �)) � HnT � ((�0 � �)0; (�0 � �))0.
Consider the pure spatial process Ynt = �0WnYnt+�tln+Vnt for a period t. Similarly as Equation (2.10)
after data transformation, we can get the log likelihood function of this process as
lnLp;n(�; �2) = �n� 1
2ln 2� � n� 1
2ln�2 � ln(1� �) + ln jSn(�)j �
1
2�2V 0nt(�)JnV
0nt(�), (D.4)
where Vnt(�) = Sn(�)Ynt. Let Ep(�) be the expectation operator for Ynt based on this pure spatial autore-gressive process. It follows that
Ep(1
n� 1 lnLp;n(�; �2))� Ep(
1
n� 1 lnLp;n(�0; �20))
= �12(ln�2 � ln�20) +
1
n� 1 ln jSn(�)j �1
n� 1 ln jSn(�0)j
� 1
2�2(�2n(�)� �2)�
1
n� 1(ln(1� �)� ln(1� �0)),
which equals to T1;n(�; �2). By the information inequality, lnLp;n(�; �2) � lnLp;n(�0; �20) � 0. Thus,
T1;n(�; �2) � 0 for any (�; �2).
For T2;n;T (�; �), it is a quadratic function of � and �. Under the assumed condition that limT!1EHnT
is nonsingular, limT!1 T2;n;T (�; �) > 0 whenever (�; �) 6= (�0; �0). So, (�; �) is globally identi�ed. Given
�0, �20 is the unique maximizer of T1;n(�0; �2) for any given n. In the event that n ! 1, �20 is the unique
maximizer of limT!1 T1;n(�0; �2). Hence, (�; �; �2) is globally identi�ed.
Combined with uniform convergence and equicontinuity in Claim 3.1, the consistency follows. �
D.4 Proof of Theorem 3.3
From the Proof of Theorem 3.2 (see Appendix D.3),
1
(n� 1)T E lnLn;T (�)�1
(n� 1)T E lnLn;T (�0) = T1;n(�; �2)� 1
2�2T2;n;T (�; �) +O(T
�1).
31
When the limit of EHnT is singular, �0 and �0 cannot be identi�ed from T2;n;T (�; �). Global identi�cation
requires that the limit of T1;n(�; �2) is strictly less than zero. Thus, we require the global identi�cation
just from the pure spatial model with the likelihood function (D.4). Using the concentrating approach by
concentrating out �2 in (D.4), we have the concentrated likelihood function
lnLp;n(�) = �n� 12
(ln(2�) + 1)� n� 12
ln �2nt(�)� ln(1� �) + ln jSn(�)j , (D.5)
where �2nt(�) =1
n�1V0nt(�)JnVnt(�). Also, we have corresponding Qn(�) = max�2 E(lnLp;n(�; �
2)) =
�n�12 (ln(2�)+1)�n�1
2 ln�2n(�)�ln(1��)+ln jSn(�)j. Global identi�cation of �0 requires that limn!11n [Qn(�)�
Qnt(�0)] 6= 0 whenever � 6= �0, which is equivalent to limn!11n [ln jSn(�)j � ln jSnj �
n�12 (ln�2n(�) �
ln�2n(�0))� (ln(1� �)� ln(1� �0))] 6= 0 whenever � 6= �0. By rearranging the terms, Qn(�)�Qnt(�0) 6= 0whenever � 6= �0 is just equivalent to
1
nln���20S�10n S�1n
��� 1
nln���2n(�)S�1n (�)0S�1n (�)
��+ � 1nln
�20(1� �0)2
� 1
nln
�2n(�)
(1� �)2
�6= 0
for � 6= �0. When n!1, it becomes
limn!1
�1
nln���20S�10n S�1n
��� 1
nln���2n(�)S�1n (�)0S�1n (�)
��� 6= 0for � 6= �0. After �0 is identi�ed, �20 is then identi�ed. Also, given �0, �0 can be identi�ed from
limT!1 T2;n;T (�; �). Combined with uniform convergence and equicontinuity in Claim 3.1, the consistency
follows. �
D.5 Proof of Claim 3.4
As Znt = (Yn;t�1;WnYn;t�1; Xnt) where Ynt is speci�ed in Equation (2.4), we have two components of
Jn ~Znt such that
Jn ~Znt = Jn ~Z(u)nt � (Jn �UnT;�1; JnWn
�UnT;�1; 0), (D.6)
where ~Z(u)nt = ((�Xn;t�1+Un;t�1); (Wn
�Xn;t�1+WnUn;t�1); ~Xnt) with
�Xn;t�1 = Xn;t�1� �XnT;�1. Hence, Jn ~Znt
has two components: one is Jn ~Z(u)nt , which is uncorrelated with Vnt; the other is�(Jn �UnT;�1; JnWn
�UnT;�1; 0),
which is correlated with Vnt when t � T � 1. Then, the score can be decomposed into 2 parts such that
1p(n� 1)T
@ lnLn;T (�0)
@�=
1p(n� 1)T
@ lnL(u)n;T (�0)
@���nT , (D.7)
where 1p(n�1)T
@ lnL(u)n;T (�0)
@� is de�ned in (3.5) and �nT is de�ned in (3.6).
For the �rst part, it�s a linear and quadratic form of Vnt and the asymptotic distribution can be derived
from the central limit theorem for martingale di¤erence arrays (Theorem B.3). For the term �nT , from
Equation (B.3) in Lemma B.1 and Equation (B.5) in Theorem B.2, we have �nT =q
n�1T a�0;n+O(
qn�1T 3 )+
Op(1pT) where a�0;n is O(1) and speci�ed in Equation (3.9). Combined together, we get the result that
32
1p(n� 1)T
@ lnLn;T (�0)
@�+�nT
d! N(0;��0 +�0). � (D.8)
D.6 Proof of Claim 3.5
This is Equation (C.5) and (C.6).
D.7 Proof of Theorem 3.6
The Taylor expansion gives
p(n� 1)T (�nT � �0) =
�� 1
(n� 1)T@2 lnLn;T (��nT )
@�@�0
��11p
(n� 1)T@ lnLn;T (�0)
@�, (D.9)
where ��nT lies between �0 and �nT and 1p(n�1)T
@ lnLn;T (�0)@� = 1p
(n�1)T@ lnL
(u)n;T (�0)
@� ��nT .As
� 1
(n� 1)T@2 lnLn;T (��nT )
@�@�0=
�� 1
(n� 1)T@2 lnLn;T (��nT )
@�@�0��� 1
(n� 1)T@2 lnLn;T (�0)
@�@�0
��+
�� 1
(n� 1)T@2 lnLn;T (�0)
@�@�0� ��0;nT
�+��0;nT
where the �rst term is ��nT � �0 � Op(1) (Equation (C.5)) and the second term is Op
�1p
(n�1)T
�(Equa-
tion (C.6)), we have � 1(n�1)T
@2 lnLn;T (��nT )@�@�0 =
��nT � �0 � Op(1) + Op� 1p(n�1)T
�+ ��0;nT . Because (1) ��nT � �0 = op(1) as �nT is consistent and (2) ��0;nT is the nonsingular in the limit according to Assumption
8 and Appendix D.2, we have � 1(n�1)T
@2 lnLn;T (��nT )@�@�0 is invertible for large T and
�� 1(n�1)T
@2 lnLn;T (��nT )@�@�0
��1is Op(1).
Also, we have 1p(n�1)T
@ lnL(u)n;T (�0)
@�
d! N(0;��0 + �0) and �nT =q
n�1T a�0;n + O(
qn�1T 3 ) + Op(
1pT)
with a�0;n = O(1). Then, from (D.9),p(n� 1)T (�nT ��0) = Op(1) �
�Op(1) +O
�pnT
�+Op
�1pT
��, which
implies that
�nT � �0 = Op�max
�1pnT;1
T
��. (D.10)
Using the fact that13�� 1
(n� 1)T@2 lnLn;T (��nT )
@�@�0
��1= ��1�0;nT +Op
�max
�1pnT;1
T
��, (D.11)
given that ��0;nT is nonsingular and its inverse is of order O(1), we have
13For two matrices Ck and Dk which are nonsingular and Ck �Dk = Op(T��) for � > 0, we have C�1k �D�1k = C�1k (Dk �
Ck)D�1k = Op(T��) when C
�1k and D�1
k are of order O(1).
33
p(n� 1)T (�nT � �0) =
�� 1
(n� 1)T@2 lnLn;T (��nT )
@�@�0
��
1p(n� 1)T
@ lnL(u)n;T (�0)
@���nT
!
= ��1�0;nT �1p
(n� 1)T@ lnL
(u)n;T (�0)
@�+Op
�max
�1pnT;1
T
��� 1p
(n� 1)T@ lnL
(u)n;T (�0)
@�
���1�0;nT ��nT �Op�max
�1pnT;1
T
����nT ,
which implies that
p(n� 1)T (�nT � �0) + ��1�0;nT ��nT +Op
�max
�1pnT;1
T
����nT
= (��1�0;nT + op(1)) �1p
(n� 1)T@ lnL
(u)n;T (�0)
@�. (D.12)
As ��0 = limT!1��0;nT exists, then using Claim 3.4 and that �nT =q
n�1T a�0;n + O(
qn�1T 3 ) + Op(
1pT)
with a�0;n = O(1),
p(n� 1)T (�nT � �0) +
rn� 1T
��1�0;nTa�0;n +Op
max
rn
T 3;
r1
T
!!d! N(0;��1�0 (��0 +�0)�
�1�0). �
(D.13)
D.8 Proof for Theorem 3.8
Theorem 3.6 states thatp(n� 1)T (�nT ��0)+
qn�1T b�0;nT +Op
�max
�pnT 3 ;
1pT
��d! N(0;��1�0 (��0+
�0)��1�0). As the bias corrected estimator is
�1
nT = �nT +1
T
� 1
(n� 1)T E@2 lnLnT (�nT )
@�@�0
!�1� an(�nT ),
where an(�) = a�;n, we will havep(n� 1)T (�
1
nT � �0)d! N(0;��1�0 (��0 +�0)�
�1�0) if
rn� 1T
0@ � 1
(n� 1)T E@2 lnLnT (�nT )
@�@�0
!�1an(�nT )� ��1�0;nTan(�0)
1A p! 0, (D.14)
and nT 3 ! 0. Assuming that n
T 3 ! 0, we are going to prove Equation (D.14).
From Equation (D.11) and that �nT � �0 = Op�max
�1pnT; 1T
��from Equation (D.10), we have
� 1
(n� 1)T E@2 lnLnT (�nT )
@�@�0= ��1�0;nT +Op
�max
�1pnT;1
T
��.
34
Hence, rn� 1T
8<: � 1
(n� 1)T E@2 lnLnT (�nT )
@�@�0
!�1an(�nT )� ��1�0;nTan(�0)
9=;=
rn� 1T
����1�0;nT +Op
�max
�1pnT;1
T
���an(�nT )� ��1�0;nTan(�0)
�=
rn� 1T
n��1�0;nT
�an(�nT )� an(�0)
�o+
rn� 1T
an(�nT )�Op�max
�1pnT;1
T
��.
As �nT ��0 = Op�max
�1pnT; 1T
��and an(�0) is O(1), according to the Taylor expansion of an(�nT ) around
an(�0), to prove Equation (D.14) is reduced to prove that the sequence of@an(�)@�0 is bounded, uniformly in a
small neighborhood of �0 where, from (3.9),
a�;n =
0BBBBBBBB@
1n�1 tr
��JnP1
h=0Ahn(�)
�S�1n (�)
�1
n�1 tr�Wn
�P1h=0A
hn(�)
�S�1n (�)
�0
1n�1 tr(Gn(�)
�JnP1
h=0Ahn(�)
�S�1n (�))
+ 1n�1�tr(Gn(�)Wn
�JnP1
h=0Ahn(�)
�S�1n (�)) + 1
n�1 trJnGn(�)
!12�2
1CCCCCCCCA.
As An(�) = S�1n (�)( In+�Wn) and Gn(�) =WnS�1n (�), we have @An(�)
@ = S�1n (�), @An(�)@� = S�1n (�)Wn,
@An(�)@�i
= 0 for i = 1; 2; � � � ; kx and @An(�)@� = S�1n (�)WnS
�1n (�)( In + �Wn). Because14 for each element of
�, @Ahn(�)@�i
= hAh�1n (�)@An(�)@�i
for h � 1,P1
h=1@Ah
n(�)@�i
=P1
h=1 hAh�1n (�)@An(�)
@�i. As (1)
P1h=0A
hn(�) andP1
h=1 hAh�1n (�) are uniformly bounded in either row sum or column sum, uniformly in a neighborhood of
�0, (2) S�1n (�) is uniformly bounded in both row and column sums, also uniformly in � in a neighborhood
of �0 and (3) Wn is uniformly bounded in both row and column sums, we have the result that the sequence
of @an(�)@�0 will be uniformly bounded in a neighborhood of �0. As ��nT converges in probability to �0, we
conclude that elements of @an(��nT )
@�0 are Op(1). �
E Direct Approach
E.1 Concentrated Likelihood Function and its FOC and SOC
The likelihood function for all the parameters including both individual and time dummies of the model
is (4.1) and we will concentrate out �T and cn. The �rst order condition for �t is
@ lnLd(�; cn;�T )
@�t=1
�2l0nVnt(�; cn;�T ),
and it implies
14This can be proved by mathematical induction. See footnote 9 in Yu, de Jong and Lee (2006).
35
�t(�; cn) =1
nl0n(Sn(�)Ynt � Znt� � cn).
It follows that
Vnt(�; cn; �T (�; cn)) = Jn [Sn(�)Ynt � Znt� � cn] ,
where Jn = In � 1n lnl
0n, which is idempotent with rank n� 1 and Jnln = 0. Hence, the likelihood function
with �T being concentrated out is
lnLdn;T (�; cn) = �nT
2ln 2� � nT
2ln�2 + T ln jSn(�)j �
1
2�2
TXt=1
[Vnt(�; cn; �T (�; cn))]0[Vnt(�; cn; �T (�; cn))].
(E.1)
Compared to (2.10), (E.1) is the concentrated likelihood function for the true data generating process
where the time e¤ects are concentrated out; while (2.10) is the likelihood function for the transformed data.
For (E.1), the �rst order derivative of cn is@ lnLdnT (�;cn)
@cn= 1
�2
PTt=1 Vnt(�; cn; �T (�; cn)) and it implies
Jncn(�) = Jn�Sn(�) �YnT � �ZnT �
�where Znt = (Yn;t�1;WnYn;t�1; Xnt) and � = ( ; �; �
0)0. Therefore, the likelihood function with both cn and
�T concentrated out is
lnLdn;T (�) = �nT
2ln 2� � nT
2ln�2 + T ln jSn(�)j �
1
2�2
TXt=1
~V 0nt(�)Jn ~Vnt(�). (E.2)
Our approach here is the same as the "di¤erence in di¤erence" one. Denote ~Q = InT � IT 1n lnl
0n �
1T lT l
0T In + 1
nT J = (IT � 1T lT l
0T ) (In � 1
n lnl0n) where J is an nT � nT matrix of ones, and denote
Z = (Z 0n1; Z0n2; � � � ; Z 0nT )0, then,
PTt=1
~Z 0ntJn ~Znt = Z0 ~QZ. This is so because
PTt=1
~Z 0ntJn~Znt =
h(Jn ~Zn1)
0; (Jn ~Zn2)0; � � � ; (Jn ~ZnT )0
i�h(Jn ~Zn1)
0; (Jn ~Zn2)0; � � � ; (Jn ~ZnT )0
i0where
Jn ~Znt = (In � 1n lnl
0n)�
2666640BBBB@z1t
z2t...
znt
1CCCCA� 1T
0BBBB@z11 + z12 + z1T
z21 + z22 + z2T...
zn1 + zn2 + znT
1CCCCA377775
=
0BBBB@z1t
z2t...
znt
1CCCCA � 1n lnl
0n
0BBBB@z1t
z2t...
znt
1CCCCA � 1T
0BBBB@z11 + z12 + z1T
z21 + z22 + z2T...
zn1 + zn2 + znT
1CCCCA + 1nT
0BBBB@Pn
i=1 (zi1 + zi2 + ziT )Pni=1 (zi1 + zi2 + ziT )
...Pni=1 (zi1 + zi2 + ziT )
1CCCCA. Hence,
denote ~Q = InT � IT 1n lnl
0n � 1
T lT l0T In + 1
nT J where J is nT � nT matrix of ones, and denote Z =(Z 0n1; Z
0n2; � � � ; Z 0nT )0, we have
PTt=1
~Z 0ntJn ~Znt = Z0 ~QZ.
For the concentrated likelihood function (E.2), the �rst order derivatives are
36
1pnT
@ lnLdn;T (�)
@�=
1pnT
0BB@@ lnLdn;T (�)
@�@ lnLdn;T (�)
@�@ lnLdn;T (�)
@�2
1CCA =
0BBBBBB@1�2
1pnT
TPt=1(Jn ~Znt)
0 ~Vnt(�)
1�2
1pnT
TPt=1
�(JnWn
~Ynt)0 ~Vnt(�)� �2trGn(�)
�12�4
1pnT
TPt=1( ~V 0nt(�)Jn
~Vnt(�)� n�2)
1CCCCCCA , (E.3)
and the second order derivatives are
1
nT
@2 lnLdn;T (�)
@�@�0(E.4)
= � 1
nT
0BBBBBB@1�2
TPt=1
~Z 0ntJn~Znt
1�2
TPt=1
~Z 0ntJnWn~Ynt
1�4
TPt=1
~Z 0ntJn~Vnt(�)
� 1�2
TPt=1
�(Wn
~Ynt)0JnWn
~Ynt + tr(G2n(�))
�1�4
TPt=1(Wn
~Ynt)0Jn ~Vnt(�)
� � � nT2�4 +
1�6
TPt=1
~V 0nt(�)Jn ~Vnt(�)
1CCCCCCA .
Similarly to that of 1pnT
@ lnL(u)n;T (�0)
@� , the variance matrix of 1pnT
@ lnLd(u)n;T (�0)
@� of (4.5) is equal to
E
1pnT
@ lnLd(u)n;T (�0)
@�� 1pnT
@ lnLd(u)n;T (�0)
@�0
!= �d�0;nT +
d�0;n +O
�T�1
�, (E.5)
and d�0;n =�4�3�40�40
0BBB@0 0 0
0 1n
nPi=1
G2n;ii1
2�20ntrGn
0 12�20n
trGn14�40
1CCCA is a symmetric matrix with �4 being the fourth mo-
ment of vit, where Gn;ii is the (i; i) entry of Gn. When Vnt are normally distributed, �0;n = 0 because
�4 � 3�40 = 0 for a normal distribution. Denote ��0 = limT!1�d�0;nT
and �0 = limT!1d�0;n
(note that
limT!1�d�0;nT
= limT!1��0;nT and limT!1d�0;n
= limT!1�0;n), then,
limT!1
E
1pnT
@ lnLd(u)n;T (�0)
@�� 1pnT
@ lnLd(u)n;T (�0)
@�0
!= ��0 +�0 . (E.6)
Claim E.1 Under Assumption 1�,2�6,7�,8,
1pnT
@ lnLdn;T (�0)
@�+�1;nT +�2;nT
d! N(0;��0 +�0). (E.7)
When fvitg, i = 1; 2; :::; n and t = 1; 2; :::; T; are normal, 1pnT
@ lnLdn;T (�0)
@� +�1;nT +�2;nTd! N(0;��0).
Proof. See Appendix E.2.
Claim E.2 Under Assumption 1�,2�6,7�,8, 1nT
@2 lnLdn;T (�)
@�@�0 � 1nT
@2 lnLdn;T (�0)
@�@�0 = k� � �0k �Op(1).Proof. Its proof is similar to that of Equation (C.5).
37
Claim E.3 Under Assumption 1�,2�6,7�,8, 1nT
@2 lnLdn;T (�0)
@�@�0 � 1nT E
@2 lnLdn;T (�0)
@�@�0 = Op
�1pnT
�.
Proof. Its proof is similar to that of Equation (C.6).
Using Claim E.1, Claim E.2 and Claim E.3, we have Theorem 4.2.
E.2 Proof of Claim E.1
From (4.4)-(4.7), the score is decomposed into 3 parts such that
1pnT
@ lnLdn;T (�0)
@�=
1pnT
@ lnLd(u)n;T (�0)
@���1;nT ��2;nT . (E.8)
For the �rst part, it�s a linear and quadratic form of Vnt and the asymptotic distribution can be de-
rived from the central limit theorem for martingale di¤erence arrays (Theorem B.3). For the term �1;nT ,
from Equation (B.3) in Lemma B.1 and Equation (B.5) in Theorem B.2, we have �1;nT =q
n�1T a�0;n;1 +
O(q
n�1T 3 )+Op(
1pT) where a�0;n;1 is O(1) and is speci�ed in Equation (4.8). Also, �2;nT is speci�ed in (4.9).
Combining above together, we get the result. �
E.3 Proof of Theorem 4.1
For the log likelihood function (4.1) divided by the sample size nT , we have corresponding Qdn;T (�) =
Emaxcn;�T
1nT lnLn;T (�; cn;�T ). Hence,
Qdn;T (�) =1
nTE lnLdn;T (�) = �
1
2ln 2� � 1
2ln�2 +
1
nln jSn(�)j �
1
2�2E1
nT
TXt=1
~V 0nt(�) ~Vnt(�). (E.9)
Similarly as the proof of Claim 3.1 in Appendix D.1, we can show the uniform convergence of 1nT E lnL
dn;T (�)�
Qdn;T (�) and that Qdn;T (�) is uniformly equicontinuous in � in any compact parameter space �. When
limT!1HdnT is nonsingular, we can show that �
d
nT is globally identi�ed and consistent similarly as proof of
Theorem 3.2 in Appendix D.3. When limT!1HdnT is singular, as long as
limn!1
�1
nln���20S0�1n S�1n
��� 1
nln���2n(�)S0�1n (�)S�1n (�)
��� 6= 0 for � 6= �0,�0 is still globally identi�ed and �
d
nTp! �0. The proof can be done similarly as Appendix D.4.
E.4 Proof of Theorem 4.2
The Taylor expansion gives
pnT (�
d
nT � �0) = � 1
nT
@2 lnLdn;T (��nT )
@�@�0
!�11pnT
@ lnLdn;T (�0)
@�
where ��nT lies between �0 and �d
nT and1pnT
@ lnLdn;T (�0)
@� = 1pnT
@ lnLd(u)n;T (�0)
@� ��1;nT ��2;nT .
38
As
� 1
nT
@2 lnLdn;T (��nT )
@�@�0=
� 1
nT
@2 lnLdn;T (��nT )
@�@�0� � 1
nT
@2 lnLdn;T (�0)
@�@�0
!!
+
� 1
nT
@2 lnLdn;T (�0)
@�@�0� �d�0;nT
!+�d�0;nT
where the �rst term is ��nT � �0 � Op(1) (Claim E.2) and the second term is Op
�1pnT
�(Claim E.3),
we have � 1nT
@2 lnLdn;T (��nT )
@�@�0 = ��nT � �0 � Op(1) + Op � 1p
nT
�+ �d�0;nT . Because (1)
��nT � �0 = op(1)
as �d
nT is consistent and (2) �d�0;nT is the nonsingular in the limit according to Assumption 8, we have
� 1nT
@2 lnLdn;T (��nT )
@�@�0 is invertible for large n and T and�� 1nT
@2 lnLdn;T (��nT )
@�@�0
��1is Op(1).
Also, we have 1pnT
@ lnLd(u)n;T (�0)
@�
d! N(0;��0 +�0) and �dnT = �1;nT +�2;nT where �1;nT and �2;nT are
speci�ed in (4.6) and (4.7). Then,pnT (�
d
nT � �0) = Op(1) ��Op(1) +Op
�pnT
�+O
�qTn
��, which implies
that
�d
nT � �0 = Op�max
�1
n;1
T
��. (E.10)
Given that �d�0;nT is nonsingular and its inverse is O(1), we have � 1
nT
@2 lnLdn;T (��nT )
@�@�0
!�1= (�d�0;nT )
�1 +Op
�max
�1
n;1
T
��. (E.11)
Hence,
pnT (�
d
nT � �0) =
� 1
nT
@2 lnLdn;T (��nT )
@�@�0
!�
1pnT
@ lnLd(u)n;T (�0)
@���1;nT ��2;nT
!
= (�d�0;nT )�1 � 1p
nT
@ lnLd(u)n;T (�0)
@�+Op
�max
�1
n;1
T
��� 1pnT
@ lnLd(u)n;T (�0)
@�
�(�d�0;nT )�1 � (�1;nT +�2;nT )�Op
�max
�1
n;1
T
��� (�1;nT +�2;nT ),
which implies that
pnT (�
d
nT � �0) + (�d�0;nT )�1 � (�1;nT +�2;nT ) +Op
�max
�1
n;1
T
��(�1;nT +�2;nT )
= ((�d�0;nT )�1 + op(1)) �
1pnT
@ lnLd(u)n;T (�0)
@�. (E.12)
As ��0 = limT!1�d�0;nT
exists, then using Claim E.1 and that �1;nT =p
nT an;�0;1 + O(
pnT 3 ) + Op(
1pT),
we have
39
pnT (�
d
nT � �0) +rn
T(�d�0;nT )
�1an;�0;1 +
rT
n(�d�0;nT )
�1a�0;2 (E.13)
+Op
max
rn
T 3;
rT
n3;
r1
T
!!(E.14)
d! N(0;��1�0 (��0 +�0)��1�0). �
E.5 Proof for Theorem 4.3
Theorem 4.2 states thatpnT (�
d
nT � �0) +p
nT b�0;nT;1 +
qTn b�0;nT;2 + Op
�max
�pnT 3 ;q
Tn3 ;q
1T
��d!
N(0;��1�0 (��0 +�0)��1�0). As the bias corrected estimator is
�d1
nT = �d
nT +1
T(� 1
nTE@2 lnLdnT (�
d
nT )
@�@�0)�1a1;n(�
d
nT ) +1
n(� 1
nTE@2 lnLdnT (�
d
nT )
@�@�0)�1a2(�
d
nT )
where a1;n(�) = a�;n;1 and a2(�) = a�;2, we will havepnT (�
d1
nT � �0)d! N(0;��1�0 (��0 +�0)�
�1�0) if
rn
T
0@ � 1
nTE@2 lnLdnT (�
d
nT )
@�@�0
!�1a1;n(�
d
nT )� (�d�0;nT )�1a1;n(�0)
1A p! 0, (E.15)
rT
n
0@ � 1
nTE@2 lnLdnT (�
d
nT )
@�@�0
!�1a2(�
d
nT )� (�d�0;nT )�1a2(�0)
1A p! 0, (E.16)
when nT 3 ! 0, T
n3 ! 0. Assuming that nT 3 ! 0 and T
n3 ! 0, we are going to prove Equation (E.15) and
(E.16).
From Equation (E.11) and that �d
nT � �0 = Op�max
�1T ;
1n
��from Equation (E.10),
� 1
nTE@2 lnLdnT (�nT )
@�@�0= (�d�0;nT )
�1 +Op
�max
�1
T;1
n
��.
Hence, rn
T
8<: � 1
nTE@2 lnLdnT (�nT )
@�@�0
!�1a1;n(�
d
nT )� (�d�0;nT )�1a1;n(�0)
9=;=
rn
T
��(�d�0;nT )
�1 +Op
�max
�1
T;1
n
���a1;n(�
d
nT )� (�d�0;nT )�1a1;n(�0)
�=
rn
T
n(�d�0;nT )
�1�a1;n(�
d
nT )� a1;n(�0)�o+
rn
Ta1;n(�
d
nT )�Op�max
�1
T;1
n
��.
As �d
nT � �0 = Op�max
�1T ;
1n
��and a1;n(�0) is O(1), according to Taylor expansion of a1;n(�
d
nT ) around
a1;n(�0), to prove Equation (E.15) is reduced to prove that the sequence of@a1;n(�)@�0 is bounded, uniformly
in a small neighborhood of �0, which can be proved by similar argument at the end of Appendix D.8.
40
For Equation (E.16), we haverT
n
8<: � 1
nTE@2 lnLdnT (�
d
nT )
@�@�0
!�1a2(�
d
nT )� (�d�0;nT )�1a2(�0)
9=;=
rT
n
��(�d�0;nT )
�1 +Op
�max
�1
T;1
n
���a2(�
d
nT )� (�d�0;nT )�1a2(�0)
�=
rT
n
n(�d�0;nT )
�1�a2(�
d
nT )� a2(�0)�o+
rT
na2(�
d
nT )�Op�max
�1
T;1
n
��.
As �d
nT � �0 = Op�max
�1T ;
1n
��and a2(�0) is O(1), according to Taylor expansion of a2;n(�
d
nT ) around
a2(�0), to prove Equation (E.16) is reduced to prove that elements of@a2(��nT )
@�0 <1 where ��nT lies between
�d
nT and �0 underTn3 ! 0. As we have the explicit form of @a2(
��nT )@�0 implied by (4.7), it is proved. �
References
[1] Amemiya, T. (1971), The Estimation of the Variances in a Variance-Components Model, International
Economic Review 12, 1-13.
[2] Baltagi, B., Song, S.H. and W. Kon (2003), Testing Panel Data Regression Models with Spatial Error
Correlation, Journal of Econometrics, Vol.117, 123-150.
[3] Hahn, J. and G. Kuersteiner (2002), Asymptotically Unbiased Inference for a Dynamic Panel Model
with Fixed E¤ects When Both n and T Are Large, Econometrica, Vol.70, No.4, 1639-1657.
[4] Hahn, J. and W. Newey (2004), Jackknife and Analytical Bias Reduction for Nonlinear Panel Models,
Econometrica, Vol.72, No.4, 1295-1319.
[5] Horn, R. and C. Johnson (1985), Matrix Algebra, Cambridge University Press.
[6] Kapoor, M., Kelejian, H.H. and I.R. Prucha (2004), Panel Data Models with Spatially Correlated Error
Components, forthcoming in Journal of Econometrics.
[7] Kelejian, H.H. and I.R. Prucha (1998), A Generalized Spatial Two-Stage Least Squares Procedure for
Estimating a Spatial Autoregressive Model with Autoregressive Disturbance, Journal of Real Estate
Finance and Economics, Vol. 17:1, 99-121.
[8] Kelejian H.H. and I.R. Prucha (2001), On the Asymptotic Distribution of the Moran I Test Statistic
With Applications, Journal of Econometrics, 104, 219-257.
[9] Nerlove, M. (1971), A Note on Error Components Models, Econometrica 39, 383-396
[10] Ord, J.K. (1975), Estimation Methods for Models of Spatial Interaction, Journal of the American
Statistical Association 70, 120-297.
41
[11] Phillips, P.C.B. and H.R. Moon (1999), Linear Regression Limit Theory for Nonstationary Panel Data,
Econometrica, Vol.67, No.5, 1057-1111.
[12] Rothenberg, T.J. (1971), Identi�cation in Parametric Models, Econometrica, Vol. 39, No.3, 577-591.
[13] Theil, H. (1971), Principles of Econometrics, New York: John Wiley & Sons.
[14] Wallace, T.D. and A. Hussain (1969), The Use of Error Components Models in Combining Cross-Section
and Time-Series Data, Econometrica 37, 55-72
[15] White, H. (1994), Estimation, Inference and Speci�cation Analysis, Cambridge University Press.
[16] Yu, J, R. de Jong and L-f.Lee (2006), Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel
Data With Fixed E¤ects When Both n and T Are Large, Working Paper, The Ohio State University.
42