A Spatial Dynamic Panel Data Model with Both Time and Individual Fixed E⁄ects · 2011. 12. 16. · A Spatial Dynamic Panel Data Model with Both Time and Individual Fixed E⁄ects

A Spatial Dynamic Panel Data Model with Both Time and

Individual Fixed E¤ects

Lung-fei Lee�

Department of EconomicsThe Ohio State University

Jihai YuDepartment of EconomicsThe Ohio State University

July 18, 2007

Abstract

This paper establishes asymptotic properties of quasi-maximum likelihood estimators for spatial dy-

namic panel data with both time and individual �xed e¤ects when both the number of individuals n

and the number of time periods T can be large. Instead of using the direct approach where we estimate

both individual e¤ects and time e¤ects directly, we propose a data transformation approach to eliminate

the time e¤ects so that the bias of the order O(n�1) is avoided. When T is relatively larger than n,

the estimators arepnT consistent and asymptotically centered normal; when n is asymptotically pro-

portional to T , the estimators arepnT consistent and asymptotically normal, but the limit distribution

is not centered around 0; when T is relatively smaller than n, the estimators are consistent with rate

T and have a degenerate limit distribution. We also propose a bias correction for our estimators. We

show that when T grows faster than n1=3, the correction will asymptotically eliminate the bias and yield

a centered con�dence interval. This transformation approach has advantage over the direct approach

especially when n is relatively small as the direct approach has the bias of order O(n�1) remained.

JEL classi�cation: C13; C23

Keywords: Spatial autoregression, Dynamic panels, Fixed e¤ects, Time e¤ects, Quasi-maximum like-

lihood estimation, Bias correction

�We acknowledge �nancial support for the research from NSF under Grant No. SES-0519204.

1

1 Introduction

This paper investigates the properties of maximum likelihood (ML) estimator and quasi-maximum like-

lihood (QML) estimator for spatial dynamic panel data models with both individual e¤ects and time e¤ects

when both the number of individuals n and the number of time periods T can be large.

Recently, there is a growing literature on the estimation of dynamic panel data models when both n and

T are large (see Phillips and Moon (1999), Hahn and Kuersteiner (2002), Hahn and Newey (2004), etc).

For the panel data with spatial interaction, we have Baltagi, Song and Koh (2003), Kapoor, Kelejian and

Prucha (2004) and Yu, de Jong and Lee (2006), etc. Those papers focus on models with only individual

e¤ects. While in the panel data literature, we also have the two way error component regression model where

we have not only unobservable individual e¤ects but also unobserved time e¤ects (See Wallace and Hussain

(1969), Nerlove (1971) and Amemiya (1971), etc). Hence, it is natural to study the spatial dynamic panel

data models when both n and T are large with both individual e¤ects and time e¤ects.

In Yu, de Jong and Lee (2006), the consistency and asymptotic distribution of the QML estimator are

established when individual e¤ects are included. Also, a bias correction procedure for the estimator is

proposed. It is shown that as long as T grows faster than n1=3, the correction will asymptotically eliminate

the bias of the order O(T�1) and yield a centered con�dence interval. This approach can be directly

extended to the estimation of models with both individual and time �xed e¤ects, where �direct�means we

estimate also the �xed time e¤ects jointly with other parameters. When there are also time e¤ects included

in the model, we might have additional bias of the order O(n�1) in the estimation due to the presence

of time e¤ects if we estimate the time e¤ects directly (see Theorem 4.2). In this paper, we propose an

transformation procedure to eliminate the time e¤ects that can avoid the additional O(n�1) order bias with

the same asymptotic e¢ ciency as the direct QML estimates when n is not relatively smaller than T . Our

transformation procedure is particularly useful when n is relatively smaller than T . For the latter, the

estimates of the transformed approach has faster rates of convergence than that of the direct estimates. The

direct estimates have a degenerate limit distribution but the transformed estimates are properly centered

and are asymptotically normal.

This paper is organized as follows. In Section 2, the model is introduced and the data transformation

procedure is proposed. We then explain our method of estimation, which is a concentrated quasi-maximum

likelihood estimation. In Section 3, we establish the consistency and asymptotic distribution of the QML

estimator of the transformation approach. A bias correction procedure is proposed and the simulation result

is reported. Section 4 presents the asymptotic properties of the direct approach estimator and compares the

two approaches. Section 5 concludes the paper. Some useful lemmas and proofs are collected in Appendix.

2 The Model

2.1 Data Generating Process and Data Transformation

The model considered in this paper is

2

Ynt = �0WnYnt + 0Yn;t�1 + �0WnYn;t�1 +Xnt�0 + cn0 + �t0ln + Vnt; t = 1; 2; :::; T , (2.1)

where Ynt = (y1t; y2t; :::; ynt)0 and Vnt = (v1t; v2t; :::; vnt)0 are n � 1 column vectors and vit is i:i:d: across iand t with zero mean and variance �20. Also,Wn is an n�n spatial weights matrix which is nonstochastic andgenerates the spatial dependence between cross sectional units yit, Xnt is an n� kx matrix of nonstochasticregressors, cn0 is n� 1 column vector of individual �xed e¤ects, �t0 is a scalar of time e¤ect and ln is n� 1column vector of ones1 . Therefore, the total number of parameters in this model is equal to the sum of the

number of individuals n and the number of time periods T , plus the dimension of the common parameters

( ; �; �0; �; �2)0, which is kx + 4.

Wn is usually row normalized from a symmetric matrix such that its ith row is

[cn;i1; cn;i2; � � � ; cn;in]=Xn

j=1cn;ij , (2.2)

where cn;ij represents a function of the spatial distance of di¤erent units in some space. As a normalization,

cn;ii = 0. It is a common practice in empirical work that Wn is row normalized, which ensures that all the

weights are between 0 and 1 and weighting operations can be interpreted as an average of the neighboring

values. Also, a spatial weights matrix row normalized has the property Wnln = ln.

De�ne Sn(�) = In��Wn and Sn � Sn(�0) = In��0Wn. Then, presuming Sn is invertible and denoting

An = S�1n ( 0In + �0Wn), (2.1) can be rewritten as

Ynt = AnYn;t�1 + S�1n Xnt�0 + S

�1n cn0 + �t0S

�1n ln + S

�1n Vnt. (2.3)

Assuming that the in�nite sums are well-de�ned, by continuous substitution of (2.3),

Ynt =1Xh=0

AhnS�1n (cn0 +Xn;t�h�0 + �t0ln + Vn;t�h) = �n + Xnt�0 + �t0ln

1Xh=0

� 0 + �01� �0

�h+ Unt, (2.4)

where �n �1Ph=0

AhnS�1n cn0, Xnt �

1Ph=0

AhnS�1n Xn;t�h and Unt �

1Ph=0

AhnS�1n Vn;t�h.

One way to estimate (2.1) is to estimate all the parameters including the time e¤ects and individual

e¤ects in the model, which will yield a bias of the order O(max(n�1; T�1)) for the common parameters (see

Theorem 4.2). In this paper, we will introduce a data transformation approach (to eliminate the time e¤ects)

such that we can avoid the bias of the order O(n�1), where the estimator has the same asymptotic e¢ ciency

as the direct QML estimator when n is not relatively smaller than T . This transformation procedure is

particularly useful when n is relatively smaller than T where the estimates of the transformed approach has

faster rates of convergence than that of the direct estimates. Also, when n is relatively smaller than T , the

direct estimates have a degenerate limit distribution but the transformed estimates are properly centered

and are asymptotically normal.

Let Jn = In � 1n lnl

0n be the deviation from the group mean transformation. As In = Jn +

1n lnl

0n and

Wnln = ln, we have1Due to the presence of �xed individual and time e¤ects, the Xnt will not include time invariant or individual invariant

regressors.

3

JnWn = JnWn(Jn +1

nlnl

0n) = JnWnJn,

because JnWnln = Jnln = 0. Hence, for t = 1; 2; :::; T , we have

(JnYnt) = �0(JnWn)(JnYnt)+ 0(JnYn;t�1)+ �0(JnWn)(JnYn;t�1)+ (JnXnt)�0+(Jncn0)+ (JnVnt); (2.5)

which does not involve the time e¤ects and Jncn0 can be regarded as the transformed individual e¤ects.

Thus, we can estimate �0 and Jncn0 basing on the transformed equation (2.5) where relevant variables are

premultiplied by Jn.

A special feature of the transformed equation (2.5) is that the variance matrix of JnVnt is equal to �20Jnso that the elements of JnVnt are correlated. Also, Jn is singular with rank (n � 1) as Jn is an orthogonalprojector with trace (n� 1). Hence, there is a linear dependence among the elements of JnVnt. An e¤ectiveestimation method shall eliminate the linear dependence in sample observations. This can be done with

the eigenvalues and eigenvectors decomposition as in the theory of generalized inverses for the estimation of

linear regression models (see, e.g., Theil (1971), Ch. 6).

The eigenvalues of Jn are a single zero and (n � 1) ones. An eigenvector corresponding to the zeroeigenvalue is proportional to ln. Let (Fn;n�1, ln=

pn) be the orthonormal matrix of eigenvectors of Jn where

Fn;n�1 corresponds to the eigenvalues of ones and ln=pn corresponds to the eigenvalue zero. As is derived

in Appendix A.1, the transformation of JnYnt to Y �nt, where Y�nt = F

0n;n�1JnYnt is of dimension (n�1), gives

Y �nt = �0W�nY

�nt + 0Y

�n;t�1 + �0W

�nY

�n;t�1 +X

�nt�0 + c

�n0 + V

�nt, (2.6)

where W �n = F

0n;n�1WnFn;n�1, Y �nt = F

0n;n�1JnYnt = F

0n;n�1Ynt, X

�nt = F

0n;n�1JnXnt = F

0n;n�1Xnt, c

�n0 =

F 0n;n�1Jncn0 = F 0n;n�1cn0, V�nt = F 0n;n�1JnVnt = F 0n;n�1Vnt and V

�nt is an n � 1 dimensional disturbance

vector with zero mean and variance matrix �20In�1.

2.2 The Method of Maximum Likelihood Estimation

Suppose that Vnt is normally distributed N(0; �20In), the transformed V�nt will be N(0; �

20In�1) and (2.6)

can be estimated by the ML. The log likelihood function of (2.6) for Y �nt is

lnLn;T (�; c�n) = �

(n� 1)T2

ln 2� � (n� 1)T2

ln�2 + T ln jIn�1 � �W �n j �

1

2�2

TXt=1

V �0nt(�)V�nt(�), (2.7)

where V �nt(�) = (In�1 � �W �n)Y

�nt � Z�nt� � c�n, Znt = (Yn;t�1;WnYn;t�1; Xnt) and � = ( ; �; �0)0. In order

to use Equation (2.6) for e¤ective estimation, the determinant and inverse of (In�1 � �W �n) are needed. We

shall show that their computations are not more complicated than those for the original matrix In � �Wn.

As is derived in Appendix A.2, (In�1 � �W �n) = F

0n;n�1(In � �Wn)Fn;n�1 and we have

jIn�1 � �W �n j =

1

1� � jIn � �Wnj , (2.8)

4

(In�1 � �W �n)�1 = F 0n;n�1(In � �Wn)

�1Fn;n�1. (2.9)

Also, as we have

(In�1 � �W �n)Y

�nt � Z�nt� � c�n

= F 0n;n�1(In � �Wn)Fn;n�1 � F 0n;n�1Ynt � F 0n;n�1Znt� � F 0n;n�1cn

= F 0n;n�1(In � �Wn)(In �1

nlnl

0n)Ynt � F 0n;n�1Znt� � F 0n;n�1cn

= F 0n;n�1[(In � �Wn)Ynt � Znt� � cn],

because F 0n;n�1Wnln = F0n;n�1ln = 0, it follows that

V �0nt(�)V�nt(�)

= [(In�1 � �W �n)Y

�nt � Z�nt� � c�n]0[(In�1 � �W �

n)Y�nt � Z�nt� � c�n]

= [(In � �Wn)Ynt � Znt� � cn]0Fn;n�1F 0n;n�1[(In � �Wn)Ynt � Znt� � cn]= [(In � �Wn)Ynt � Znt� � cn]0Jn[(In � �Wn)Ynt � Znt� � cn],

by the property Fn;n�1F 0n;n�1 = Jn. Hence, the log likelihood function (2.7) for Y �nt can be expressed in

terms of Ynt as

lnLn;T (�; cn) = � (n� 1)T2

ln 2� � (n� 1)T2

ln�2 � T ln(1� �) + T ln jIn � �Wnj (2.10)

� 1

2�2

TXt=1

V 0nt(�)JnVnt(�),

where Vnt(�) = (In��Wn)Ynt�Znt��cn and Jn shall be read as the generalized inverse of ��20 V ar(JnVnt).

2.3 FOC and SOC of MLE

Therefore, we will �rst transform the data from Ynt to Y �nt = F0n;n�1Ynt, then we will maximize Equation

(2.7) by searching over the parameter space. This is equivalent to the estimation of the spatial dynamic

panel model with only individual e¤ects with (n� 1) cross-section units and T time periods. Alternatively,one may maximize Equation (2.10) instead. However, although the components of Vnt are i.i.d. in the model,

the elements of V �nt might not be independent in general, even though they are uncorrelated. The original

asymptotic analysis in Yu, de Jong and Lee (2006) does not directly carry over to the transformed model

with the disturbances V �nt.2 As Equation (2.7) is equivalent to Equation (2.10), it turns out that we can

analyze the asymptotic distribution of the estimator just from Equation (2.10).

2One could not treat the components of V �nt as if they were independent when the disturbances are not normally distributed.Furthermore, it is not clear whether W �

n and A�n = (In�1 � �0W �

n)�1( 0In�1 + �0W

�n) would be uniformly bounded in both

row and column sums even Wn and An are.

5

Using �rst order conditions, we concentrate out cn in (2.10) and get3

lnLn;T (�) = � (n� 1)T2

ln 2� � (n� 1)T2

ln�2 � T ln(1� �) + T ln jIn � �Wnj (2.11)

� 1

2�2

TXt=1

~V 0nt(�)Jn ~Vnt(�),

where ~Vnt(�) = (In��Wn) ~Ynt� ~Znt� and Jn ~Vnt(�) = Jn[(In��Wn) ~Ynt� ~Znt��~�tln] because Jnln = 0. Theconcentrated likelihood function is di¤erent from (4.2) of the direct approach4 in that we have an attachment

of T2 ln 2��2 � T ln(1 � �). For the concentrated likelihood function (2.11), the �rst order derivatives and

the second order derivatives are Equation (C.3) and (C.4) in Appendix C.2. Denote

Jn ~Vnt = Jn[Sn ~Ynt � ~Znt�0 � ~�t0ln].

Hence, for the �rst order derivative evaluated at �0, denote Gn =WnS�1n , we have

1p(n� 1)T

@ lnLn;T (�0)

@�=

0BBBBBB@1�20

1p(n�1)T

TPt=1

~Z 0ntJn~Vnt

1�20

1p(n�1)T

TPt=1(Gn ~Znt�0)

0Jn ~Vnt +1�20

1p(n�1)T

TPt=1( ~V 0ntG

0nJn

~Vnt � �20trJnGn)

12�40

1p(n�1)T

TPt=1( ~V 0ntJn

~Vnt � (n� 1)�20)

1CCCCCCA ,(2.12)

which is a linear and quadratic form of ~Vnt. For the information matrix, denote ��0;nT = �E�

1(n�1)T

@2 lnLn;T (�0)@�@�0

�and HnT =

1(n�1)T

TPt=1( ~Znt; Gn ~Znt�0)

0Jn( ~Znt; Gn ~Znt�0), we have

��0;nT =1

�20

EHnT 0

0 0

!+

0B@ 0 0 0

0 1n�1

�tr(G0nJnGn) + tr((JnGn)

2)�

1�20(n�1)

tr(JnGn)

0 1�20(n�1)

tr(JnGn)12�40

1CA

�

0B@ 0 � �1

�20(n�1)E(Gn �VnT )

0Jn �ZnT2

�20(n�1)E�(Gn �ZnT �0)

0JnGn �VnT�+ 1

(n�1)T tr(G0nJnGn) �

1�40(n�1)

E��Z 0nTJn

�VnT�0 1

�40(n�1)E�(Gn �ZnT �0)

0Jn �VnT�0+ 1

�20(n�1)Ttr(JnGn)

1T

1�40

1CA .3 QML Estimators

For our analysis of the asymptotic properties of estimator, we need the following assumptions:

3For notational purpose, we de�ne for any n � 1 vector at period t, �nt, we have ~�nt = �nt � ��nT and��n;t�1 =

�n;t�1 � ��nT;�1 for t = 1; 2; � � � ; T where ��nT = 1T

TPt=1

�nt and ��nT;�1 =1T

TPt=1

�n;t�1.

4The di¤erence term can be regarded as an adjustment by the transformation approach to eliminate the bias of the orderO(n�1) which will appear in the direct approach.

6

Assumption 1. Wn is a row normalized nonstochastic spatial weights matrix.

Assumption 2. The disturbances fvitg, i = 1; 2; :::; n and t = 1; 2; :::; T; are i:i:d across i and t with

zero mean, variance �20 and E jvitj4+�

<1 for some � > 0.

Assumption 3. Sn(�) is invertible for all � 2 �. Furthermore, � is compact and the true parameter �0is in the interior of �.

Assumption 4. The elements of Xnt are nonstochastic and bounded, uniformly in n and t, and

limT!11nT

PTt=1

~X 0ntJn ~Xnt exists and is nonsingular.

Assumption 5. The row and column sums of Wn and S�1n (�) are bounded uniformly5 in n, also

uniformly in � 2 � for S�1n (�).

Assumption 6. The row and column sums ofP1

h=1 abs(Ahn) are bounded uniformly in n, where

[abs(An)]ij = jAn;ij j.Assumption 7. n is a nondecreasing function of T .Assumption 1 is a standard normalization assumption in spatial econometrics. In many empirical ap-

plications, the rows of Wn sum to 1, which ensures that all the weights are between 0 and 1. Assumption

2 provides regularity assumptions for vit. Assumption 3 guarantees that Equation (2.3) is valid. When

exogenous variables Xnt are included in the model, it is convenient to assume that the exogenous regressors

are uniformly bounded as in Assumption 4. Assumption 5 is originated by Kelejian and Prucha (1998, 2001).

The uniform boundedness ofWn and S�1n (�) is a condition that limits the spatial correlation to a manageable

degree. Assumption 6 is the absolute summability condition and row/column sum boundedness condition,

which will play an important role to derive asymptotic properties of QML estimator. This assumption is

essential for the paper because it limits the dependence between time series and between cross sectional

units. In order to justify the absolute summability of An in Equation (2.4) and Assumption 6, a su¢ cient

condition is kAnk < 1 for any matrix norm (see Horn and Johnson (1985), Corollary 5.6.16) that satis�es

kAnk = kabs (An)k. When kAnk < 1,P1

h=0Ahn exists and can be de�ned as (In � An)�1. Assumption

7 allows two cases: (i) n ! 1 as T ! 1; (ii) n is �xed as T ! 1. Because (ii) is similar to a vectorautoregressive (VAR) model, our main interest is in (i). If Assumption 7 holds, then we say that n; T !1simultaneously.

3.1 Consistency

For the log likelihood function (2.11) divided by the e¤ective sample size (n�1)T , we have correspondingQn;T (�) = Emaxcn

1(n�1)T lnLn;T (�; cn). Hence,

Qn;T (�) =1

(n� 1)T E lnLn;T (�) (3.1)

= �12ln 2� � 1

2ln�2 � 1

n� 1 ln(1� �) +1

n� 1 ln jSn(�)j �1

2�21

(n� 1)T E

TXt=1

~V 0nt(�)Jn ~Vnt(�)

!.

To get the consistency proof, we need the following uniform convergence result.

5We say the row and column sums of a (sequence of n � n) matrix Pn are bounded uniformly in n ifsup1�i�n;n�1

Pnj=1 jpij;nj <1 and sup1�j�n;n�1

Pni=1 jpij;nj <1.

7

Claim 3.1 Let � be any compact parameter space. Then under Assumptions 1-7, 1(n�1)T lnLn;T (�) �

Qn;T (�)p! 0 uniformly in � 2 � and Qn;T (�) is uniformly equicontinuous for � 2 �.

Proof. See Appendix D.1.

For identi�cation, if the information matrix�E�

1(n�1)T

@2 lnLn;T (�0)@�@�0

�is nonsingular and�E

�1

(n�1)T@2 lnLn;T (�)

@�@�0

�has full rank for any � in some neighborhood N(�0) of �0, the parameters are locally identi�ed (see Rothen-

berg (1971)). For the information matrix, using Lemma B.2 in Yu, de Jong and Lee (2006), we have

��0;nT =1

�20

EHnT 0

0 0

!+

0B@ 0 0 0

0 1n�1


2)�

1�20(n�1)

tr(JnGn)

0 1�20(n�1)

tr(JnGn)12�40

1CA+O� 1T

�;

(3.2)

which is nonsingular if EHnT is nonsingular or 1n�1

htr(G0nJnGn) + tr((JnGn)

2)� 2tr2(JnGn)n�1

iis nonzero6 .

Also, its rank does not change in a small neighborhood of �0 (see Equation (C.8)).

When limT!1EHnT is nonsingular, we can get the global identi�cation of the parameters.

Theorem 3.2 Under Assumptions 1-7, if limT!1EHnT is nonsingular, �0 is globally identi�ed and �nTp!

�0.


When limT!1EHnT is singular, global identi�cation can still be obtained from the following theorem.

Denote �2n(�) =�20n�1 tr(S

0�1n S0n(�)JnSn(�)S

�1n ).

Theorem 3.3 Under Assumptions 1-7, �0 is globally identi�ed iflimn!1

�1n ln

��20S0�1n S�1n�� 1

n ln��2n(�)S0�1n (�)S�1n (�)

�� 6= 0 for � 6= �07 . And if this condition holds,

�nTp! �0.


3.2 Asymptotic Distribution

As Znt = (Yn;t�1;WnYn;t�1; Xnt) where Ynt is speci�ed in Equation (2.4), we can decompose Jn ~Znt such

that

Jn ~Znt = Jn ~Z(u)nt � (Jn �UnT;�1; JnWn

�UnT;�1; 0), (3.3)

where ~Z(u)nt = ((�Xn;t�1+Un;t�1); (Wn

�Xn;t�1+WnUn;t�1); ~Xnt) with

�Xn;t�1 = Xn;t�1� �XnT;�1. Hence, Jn ~Znt

has two components: one is Jn ~Z(u)nt , which is uncorrelated with Vnt; the other is�(Jn �UnT;�1; JnWn

�UnT;�1; 0),

6See Appendix D.2 for proof.

7When n is �nite, the condition is 1nln��20S0�1n S�1n

�� 1nln��2n(�)S0�1n (�)S�1n (�)

��+ � 1nln

�20(1��0)2

� 1nln

�2n(�)

(1��)2

�6= 0 for

� 6= �0. See Appendix D.4.

8

which is correlated with Vnt when t � T � 1. Therefore, from Equation (2.12), the score can be decomposed

into two parts such that

1p(n� 1)T

@ lnLn;T (�0)

@�=

1p(n� 1)T

@ lnL(u)n;T (�0)

@��nT , (3.4)

where

1p(n� 1)T

@ lnL(u)n;T (�0)

@�=

0BBBBBB@1�20

1p(n�1)T

TPt=1

~Z(u)0nt JnVnt

1�20

1p(n�1)T

TPt=1(Gn ~Z

(u)nt �0)

0JnVnt +1�20

1p(n�1)T

TPt=1(V 0ntG

0nJnVnt � �20trJnGn)

12�40

1p(n�1)T

TPt=1(V 0ntJnVnt � (n� 1)�20)

1CCCCCCA ,(3.5)

�nT =

rn� 1T

0BB@1�20

Tn�1 (Jn

�UnT;�1; JnWn�UnT;�1; 0)

0 �VnT1�20

Tn�1 (JnGn(

�UnT;�1; Wn�UnT;�1, 0)�0)0 �VnT + 1

�20

Tn�1

�V 0nTG0nJn �VnT

12�40

Tn�1

�V 0nTJn�VnT

1CCA . (3.6)

As is derived in Appendix C.3, the variance matrix of 1p(n�1)T

@ lnL(u)n;T (�0)

@� is equal to

E

1p

(n� 1)T@ lnL

(u)n;T (�0)

@�� 1p

(n� 1)T@ lnL

(u)n;T (�0)

@�0

!= ��0;nT +�0;n +O

�T�1

�, (3.7)

where ��0;nT is in (3.2) and �0;n =�4�3�40�40

0BBB@0 0 0

0 1n�1

nPi=1

[(JnGn)2]ii

12�20(n�1)

tr(JnGn)

0 12�20(n�1)

tr(JnGn)14�40

1CCCA is a symmetricmatrix with �4 being the fourth moment of vit. When Vnt are normally distributed, �0;n = 0 because

�4 � 3�40 = 0 for a normal distribution. Denote ��0 = limT!1��0;nT and �0 = limT!1�0;n, then,

limT!1

E

1p

(n� 1)T@ lnL

(u)n;T (�0)

@�� 1p

(n� 1)T@ lnL

(u)n;T (�0)

@�0

!= ��0 +�0 . (3.8)

The asymptotic distribution of 1p(n�1)T

@ lnL(u)n;T (�0)

@� can be derived from the central limit theorem for

martingale di¤erence arrays (Theorem B.3). For the term �nT , from Equation (B.3) in Lemma B.1 and

Equation (B.5) in Theorem B.2, �nT =q

n�1T a�0;n +O(

qn�1T 3 ) +Op(

1pT) where

a�0;n =

0BBBBBB@

1n�1 tr

��JnP1

h=0Ahn

�S�1n

�1

n�1 tr�Wn

�JnP1

h=0Ahn

�S�1n

�0kx�1

1n�1 0tr(Gn

�JnP1

h=0Ahn

�S�1n ) + 1

n�1�0tr(GnWn

�JnP1

h=0Ahn

�S�1n ) + 1

n�1 trJnGn12�20

1CCCCCCA (3.9)

9

is O(1). To get the asymptotic distribution of the score, we need the following additional assumption.

Assumption 8. limT!1EHnT is nonsingular or limn!11

n�1

htr(G0nJnGn) + tr((JnGn)


i6=

0.

Assumption 8 is a condition for the nonsingularity of the information matrix ��0 . When limT!1EHnT

is singular8 , as long as we have limn!11

n�1 trh(G0nJnGn) + tr((JnGn)


i6= 0, the information

matrix ��0 is still nonsingular (see Appendix D.2).

Claim 3.4 Under Assumptions 1-8,

1p(n� 1)T

@ lnLn;T (�0)

@�+�nT

d! N(0;��0 +�0). (3.10)

When fvitg, i = 1; 2; :::; n and t = 1; 2; :::; T; are normal, 1p(n�1)T

@ lnLn;T (�0)@� +�nT

d! N(0;��0).


Claim 3.5 Under Assumptions 1-8,

1

(n� 1)T@2 lnLn;T (�)

@�@�0� 1

(n� 1)T@2 lnLn;T (�0)

@�@�0= k� � �0k �Op(1), (3.11)

and1

(n� 1)T@2 lnLn;T (�0)

@�@�0� @

2Qn;T (�0)

@�@�0= Op

1p

(n� 1)T

!. (3.12)


Using Claim 3.4 and Claim 3.5, we have the following theorem for the distribution of �nT .

Theorem 3.6 Under Assumptions 1-8,

p(n� 1)T (�nT � �0) +

rn� 1T

b�0;nT +Op

max

rn� 1T 3

;

r1

T

!!d! N(0;��1�0 (��0 +�0)�

�1�0), (3.13)

where b�0;nT = ��1�0;nT

a�0;n is O(1).

When nT ! 0, p

nT (�nT � �0)d! N(0;��1�0 (��0 +�0)�

�1�0). (3.14)

When nT ! k <1, p

nT (�nT � �0) +pkb�0;nT

d! N(0;��1�0 (��0 +�0)��1�0). (3.15)

When nT !1,

T (�nT � �0) + b�0;nTp! 0. (3.16)

Proof. See Appendix D.7.8The limT!1 EHnT can be singular if, for example,�0 = 0.

10

From Equation (3:13), the QML estimator has the bias � 1T b�0;nT where

b�0;nT = ��1�0;nT

� a�0;n (3.17)

and the con�dence interval is not centered when nT ! k where 0 < k <1. Furthermore, when T is relatively

smaller than n, the presence of b�0;nT causes �nT to have a degenerated distribution. An analytical bias

reduction procedure is to correct the bias of the estimate. De�ne the bias corrected estimator as

�1

nT = �nT �BnTT, (3.18)

where, from Theorem 3.6,

BnT =h��1�;nT � a�;n

i��=�nT

. (3.19)

We will show that when n=T 3 ! 0, �1

nT ispnT consistent and asymptotically centered normal even when

n=T !1.To show our result for the bias corrected estimator, we need the following additional assumption.

Assumption 9.P1

h=0Ahn(�) and

P1h=1 hA

h�1n (�) are uniformly bounded in either row sum or column

sums, uniformly in a neighborhood of �0.

Assumption 9 can be veri�ed through the following lemma.

Lemma 3.7 If kAn(�0))k1 < 1 (resp: kAn(�0))k1 < 1), then the row sum (resp: column sum) ofP1

h=0Ahn(�)

andP1

h=1 hAh�1n (�) are bounded uniformly in n and in a neighborhood of �0.

Proof. This is Lemma 3.9 in Yu, de Jong and Lee (2006).

Our result for the bias corrected estimator is as follows.

Theorem 3.8 Under Assumptions 1-9, if nT 3 ! 0,

pnT (�

1

nT � �0)d! N(0;��1�0 +�

�1�0�0�

�1�0). (3.20)


3.3 Monte Carlo Results

We conduct a small Monte Carlo experiment to evaluate the performance of our ML estimator and

the bias corrected estimator for the transformation approach. We generate samples from Equation (2.1)

using �a0 = (0:2; 0:2; 1; 0:2; 1)0 and �b0 = (0:3; 0:3; 1; 0:3; 1)0 where �0 = ( 0; �0; �00; �0; �

20)0, and Xnt; cn0,

�T0 = (�1; �2; � � � ; �T ) and Vnt are generated from independent normal distributions9 and the spatial

weights matrix we use is a rook matrix. We use T = 10, 50, and n = 16, 49. For each set of generated

sample observations, we calculate the ML estimator �nT and evaluate the bias �nT � �0; we then constructthe bias corrected estimator �

1

nT and evaluate the bias �1

nT ��0. We do this for 1000 times to see if the bias is9We generated the spatial panel data with 20 + T periods and then take the last T periods as our sample. And the initial

value is generated as N(0; In) in the simulation. We have also generated the data with a much longer history 1000+ T and theresults are similar.

11

reduced on average by using the analytical bias correction procedure, i.e., to compare 11000

P1000i=1 (�nT � �0)i

with 11000

P1000i=1 (�

1

nT � �0)i. We also compare not only the empirical standard error of estimator but alsothe average of estimated standard error of the estimator10 . With two di¤erent values of �0 for each n and

T , �nite sample properties of both estimators are summarized in Table 1 and Table 3, where Table 1 is for

the magnitude of biases and Table 3 is for the standard errors of the estimator.

We can see that both estimators have some biases, but the bias corrected estimators reduce those biases.

This is consistent with our asymptotic analysis, because the bias corrected estimator will eliminate the bias

of order O(T�1). Also, the bias reduction is achieved while there is no signi�cant increase in the variance

of the estimator, as can be seen from Table 3. Also, the estimated variances and the empirical variances are

close, so the estimated variances constructed from the Hessian matrix may be reliable.

For di¤erent cases of n and T , we can see that for each given n, when T is larger, the biases of the two

sets of estimators will be smaller and the variance will be smaller; for each given T , when n is larger, the

biases of the two sets of estimators will not be smaller but the variance will be smaller. This is consistent

with our theoretical prediction, because the bias is of the order O(T�1) and the variance of the estimator is

of the order O((nT )�1).

4 An Alternative Approach: Direct Estimation

An alternative approach is to estimate both the individual e¤ects and the time e¤ects directly11 , which

will yields bias of the order O(max(n�1; T�1)).

4.1 Model, Likelihood Function, FOC and SOC

Denote �T = (�1; �2; � � � ; �T ) as the time e¤ects, the likelihood function of (2.1) is

lnLdn;T (�; cn;�T ) = �nT

2ln 2� � nT

2ln�2 + T ln jSn(�)j �

1

2�2

TXt=1

V 0nt(�; cn;�T )Vnt(�; cn;�T ), (4.1)

where Vnt(�; cn;�T ) = Sn(�)Ynt � Znt� � cn � �tln. We will use the concentrating approach to estimatethe common parameters. As is derived in Appendix E.1, the likelihood function with both cn and �Tconcentrated out is

lnLdn;T (�) = �nT

2ln 2� � nT


1

2�2

TXt=1

~V 0nt(�)Jn ~Vnt(�). (4.2)

10The empirical variance is the diagonal elements of 11000

P1000i=1 [(�nT �mean(�nT ))(�nT �mean(�nT ))0]i and the estimated

variance is the diagonal elements of 11000

P1000i=1 (�

�1�nT ;nT

)i.11Remark: There is an underidenti�cation problem on cn and �T . Because components of cn and �T appear additively,

there is a normalization issue in terms of location. By adding a constant to all components of cn and subtracting the sameconstant from all components of �T , the di¤erent sets of �xed e¤ects can not be identi�ed. So a proper normalization is neededin order to identify �xed e¤ects. A convenient location normalization is to set the sum of the components of cn to be zero, i.e.,the elements of cn have a zero (empirical) mean.

12

For the concentrated likelihood function (4.2), the �rst order derivatives are in (E.3) and (E.4) in Ap-

pendix E.1. Hence, for the �rst order derivative evaluated at �0, we have

1pnT

@ lnLdn;T (�0)

@�=

0BBBBBB@1�20

1pnT

TPt=1

~Z 0ntJn ~Vnt

1�20

1pnT

TPt=1(Gn ~Znt�0)

0Jn ~Vnt +1�20

1pnT

TPt=1( ~V 0ntG

0nJn

~Vnt � �20trGn)

12�40

1pnT

TPt=1( ~V 0ntJn

~Vnt � n�20)

1CCCCCCA , (4.3)

which is a linear and quadratic form of ~Vnt. For the information matrix of (4.2), denote �d�0;nT = �E�

1nT

@2 lnLdn;T (�0)

@�@�0

�and Hd

nT =1nT

TPt=1( ~Znt; Gn ~Znt�0)

0Jn( ~Znt; Gn ~Znt�0), we have12

�d�0;nT =1

�20

EHd

nT 0

0 0

!+

0B@ 0 0 0

0 1n

�tr(G0nJnGn) + tr(G

2n)�

1�20ntr(JnGn)

0 1�20ntr(JnGn)

12�40

1CA

�

0B@ 0 � �1�20nE(Gn �VnT )

0Jn �ZnT2�20nE�(Gn �ZnT �0)

0JnGn �VnT�+ 1

nT tr(G0nJnGn) �

1�40nE��Z 0nTJn

�VnT�0 1

�40nE�(Gn �ZnT �0)

0Jn �VnT�0+ 1

�20nTtr(JnGn)

1T�40

1CA .Here, Hd

nT is the covariance matrix of regressors of the reduced form (C.1) after demeaning from both the

time dimension and the cross sectional dimension (see Appendix C.1 for the reduced form).

4.2 Asymptotic Properties

Similarly as the transformation approach, we can get the asymptotic properties of the QML estimator.

To do that, we need to strengthen Assumption 7 such that both n and T are large. This is so in order that

the time dummies can be consistently estimated. But, compared to the transformation approach, we don�t

need the row normalization of the spatial weights matrix.

Assumption 1�. Wn is a nonstochastic spatial weights matrix.

Assumption 7�. n is an increasing function of T .

Theorem 4.1 Under Assumptions 1�,2�6,7�, if limT!1EHdnT is nonsingular or

limn!1

�1

nln��20S0�1n S�1n

�� 1

nln��2n(�)S0�1n (�)S�1n (�)

�� 6= 0 for � 6= �0,�0 is globally identi�ed and �

d

nTp! �0.

Proof. See Appendix E.3.

12The HdnT di¤ers from HnT in the transformation approach only in the division by n in Hd

nT instead of (n� 1) in HnT .

13

From (4.3) and (3.3), the score can be decomposed into 3 parts such that

1pnT

@ lnLdn;T (�0)

@�=

1pnT

@ lnLd(u)n;T (�0)

@��1;nT ��2;nT , (4.4)

where

1pnT

@ lnLd(u)n;T (�0)

@�=

0BBBBBB@1�20

1pnT

TPt=1

~Z(u)0nt JnVnt

1�20

1pnT

TPt=1(Gn ~Z

(u)nt �0)

0JnVnt +1�20

1pnT

TPt=1(V 0ntG


12�40

1pnT

TPt=1(V 0ntJnVnt � (n� 1)�20)

1CCCCCCA , (4.5)

�1;nT =

rn

T

0BB@1�20

Tn (Jn

�UnT;�1; JnWn�UnT;�1; 0)

0 �VnT1�20

Tn (JnGn(

�UnT;�1; Wn�UnT;�1, 0)�0)0 �VnT + 1

�20

Tn�V 0nTG

0nJn

�VnT12�40

Tn�V 0nTJn

�VnT

1CCA , (4.6)

�2;nT =

0BB@0q

Tn (trGn � trJnGn)q

Tn

12�20

1CCA =

rT

n

0B@ 01

1��012�20

1CA . (4.7)

The asymptotic distribution of 1pnT

@ lnLd(u)n;T (�0)

@� can be derived from the central limit theorem for mar-

tingale di¤erence arrays (Theorem B.3). For the term �1;nT , from Theorem B.2, �1;nT =p

nT a�0;n;1 +

O(p

nT 3 ) +Op(

1pT) where

a�0;n;1 =

0BBBBBB@

1n tr

��JnP1

h=0Ahn

�S�1n

�1n tr

�Wn

�JnP1

h=0Ahn

�S�1n

�0

1n 0tr(Gn

�JnP1

h=0Ahn

�S�1n ) + 1

n�0tr(GnWn

�JnP1

h=0Ahn

�S�1n ) + 1

n trJnGnn�1n

12�20

1CCCCCCA (4.8)

is O(1) and

a�0;2 = (0;1

1� �0;1

2�20)0. (4.9)

The distribution of the QML estimator is as follows.

Theorem 4.2 Under Assumption 1�,2-6,7�,8,

pnT (�

d

nT ��0)+rn

Tb�0;nT;1+

rT

nb�0;nT;2+Op

max

rn

T 3;

rT

n3;

r1

T

!!d! N(0;��1�0 (��0+�0)�

�1�0),

(4.10)

where b�0;nT;1 = (�d�0;nT

)�1a�0;n;1 and b�0;nT;2 = (�d�0;nT

)�1a�0;2 are O(1) and a�0;n;1, a�0;2 are de�ned in

(4.8) and (4.9).

14

When nT ! 0,

n(�d

nT � �0) + b�0;nT;2p! 0. (4.11)

When nT ! k <1,

pnT (�

d

nT � �0) +pkb�0;nT;1 +

pk�1b�0;nT;2

d! N(0;��1�0 (��0 +�0)��1�0). (4.12)

When nT !1,

T (�d

nT � �0) + b�0;nT;1p! 0. (4.13)


From Equation (4:10), the QML estimator has the bias � 1T b�0;nT;1 and �

1nb�0;nT;2 where

b�0;nT;1 = (�d�0;nT )�1 � a�0;n;1, (4.14)

b�0;nT;2 = (�d�0;nT )�1 � a�0;2, (4.15)

and the con�dence interval is not centered when nT ! k where 0 < k <1. Furthermore, when T and n do

not have the same order, the presence of b�0;nT;1 and b�0;nT;2 causes �d

nT to have a degenerated distribution.

An analytical bias reduction procedure is to correct the bias B1;nT = �b�0;nT;1 and B2;nT = �b�0;nT;2 byconstructing estimator B1;nT , B2;nT and de�ning the bias corrected estimator as

�d1

nT = �d

nT �B1;nTT

� B2;nTn

. (4.16)

From Equation 4.10, we can choose

B1;nT =��(�d�;nT )�1 � a�;n;1

��=�

dnT

, (4.17)

B2;nT =��(�d�;nT )�1 � a�;2

��=�

dnT

. (4.18)

We show that when n=T 3 ! 0 and T=n3 ! 0, �d1

nT ispnT consistent and asymptotically centered normal.

Our result for the bias corrected estimator is as follows.

Theorem 4.3 Under Assumption 1�,2-6,7�,8,9, if nT 3 ! 0 and T

n3 ! 0,pnT (�

d1

nT � �0)d! N(0;��1�0 +�

�1�0�0�

�1�0). (4.19)


By comparing Theorem 4.2, Theorem 4.3 with Theorem 3.6 and Theorem 3.8, we can see that both esti-

mators are consistent and have the same limiting distribution. For the transformed approach, the estimator

has the bias of the order O(T�1) and the bias correction requires n=T 3 ! 0. For the direct approach, the

estimator has the bias of the order O(max(n�1; T�1)) and the bias correction requires not only n=T 3 ! 0

but also T=n3 ! 0. Hence, the transformed approach has advantage over the direct approach especially

when n is relatively small. But the direct approach also has its own merit in that: we don�t need the weights

matrix to be row normalized. We conduct a small Monte Carlo to compare the two estimators where we use

the same simulated data. Table 2 is the counterpart of Table 1 and Table 4 is the counterpart of Table 3.

From the tables, we can see that for the biases of estimates, the transformation approach is smaller than the

direct approach when n is relatively small.

15

Table 1: Biases of Transformation Approach Estimators

Case Bias of �nT (1st line) and �1

nT (2nd line)

T n �0 � � � �2

(1) 10 49 �a0 -0.0627 -0.0045 -0.0101 -0.0013 -0.1159

-0.0038 -0.0023 -0.0026 0.0004 -0.0276

(2) 10 49 �b0 -0.0697 -0.0093 -0.0132 -0.0086 -0.1183

-0.0037 -0.0001 -0.0023 -0.0029 -0.0307

(3) 50 49 �a0 -0.0119 -0.0013 0.0001 0.0000 -0.0217

0.0000 -0.0013 0.0004 0.0001 -0.0022

(4) 50 49 �b0 -0.0131 -0.0017 -0.0001 -0.0006 -0.0218

-0.0000 -0.0013 0.0004 -0.0004 -0.0023

(5) 50 16 �a0 -0.0121 -0.0032 -0.0013 -0.0018 -0.0247

-0.0002 -0.0033 -0.0009 -0.0018 -0.0052

(6) 50 16 �b0 -0.0134 -0.0043 -0.0013 -0.0017 -0.0247

-0.0004 -0.0042 -0.0008 -0.0015 -0.0052

Note: �a0 = (0:2, 0:2, 1, 0:2, 1) and �b0 = (0:3, 0:3, 1, 0:3, 1).

Table 2: Biases of Direct Approach Estimators

Case Bias of �d

nT (1st line) and �d1

nT (2nd line)

T n �0 � � � �2

(1) 10 49 �a0 -0.0610 -0.0014 -0.0092 -0.0312 -0.1318

-0.0035 0.0026 -0.0022 -0.0009 -0.0298

(2) 10 49 �b0 -0.0669 -0.0014 -0.0117 -0.0391 -0.1327

-0.0032 0.0115 -0.0015 -0.0008 -0.0333

(3) 50 49 �a0 -0.0100 0.0051 0.0010 -0.0296 -0.0394

0.0002 0.0001 0.0007 -0.0019 -0.0031

(4) 50 49 �b0 -0.0099 0.0108 0.0016 -0.0297 -0.0379

0.0002 0.0014 0.0007 -0.0007 -0.0033

(5) 50 16 �a0 -0.0067 0.0115 -0.0030 -0.0973 -0.0842

0.0012 0.0013 0.0006 -0.0199 -0.0130

(6) 50 16 �b0 -0.0032 0.0272 -0.0008 -0.1038 -0.0785

0.0026 0.0082 0.0014 -0.0213 -0.0117

Note: �a0 = (0:2, 0:2, 1, 0:2, 1) and �b0 = (0:3, 0:3, 1, 0:3, 1).

16

Table 3: Standard Errors of Transformation Approach Estimators

Case Empirical S.E. of �nT (1st line) and �1

nT (2nd line)

T n �0 � � � �2

(1) 10 49 �a0 0.0351 0.0666 0.0482 0.0539 0.0610

0.0373 0.0707 0.0484 0.0540 0.0671

(2) 10 49 �b0 0.0349 0.0643 0.0484 0.0514 0.0610

0.0375 0.0711 0.0488 0.0523 0.0671

(3) 50 49 �a0 0.0149 0.0270 0.0203 0.0228 0.0297

0.0150 0.0273 0.0203 0.0228 0.0303

(4) 50 49 �b0 0.0147 0.0252 0.0203 0.0215 0.0298

0.0149 0.0256 0.0203 0.0215 0.0304

(5) 50 16 �a0 0.0248 0.0504 0.0372 0.0412 0.0519

0.0250 0.0510 0.0372 0.0412 0.0529

(6) 50 16 �b0 0.0239 0.0484 0.0373 0.0403 0.0518

0.0241 0.0492 0.0372 0.0403 0.0529

Case Estimated S.E. of �nT (1st line) and �1

nT (2nd line)

T n �0 � � � �2

(1) 10 49 �a0 0.0324 0.0621 0.0456 0.0504 0.0572

0.0340 0.0654 0.0478 0.0510 0.0629

(2) 10 49 �b0 0.0323 0.0602 0.0457 0.0487 0.0571

0.0339 0.0639 0.0479 0.0491 0.0628

(3) 50 49 �a0 0.0142 0.0273 0.0204 0.0224 0.0283

0.0143 0.0276 0.0206 0.0225 0.0288

(4) 50 49 �b0 0.0139 0.0258 0.0204 0.0214 0.0283

0.0141 0.0262 0.0206 0.0215 0.0289

(5) 50 16 �a0 0.0254 0.0508 0.0366 0.0422 0.0504

0.0256 0.0513 0.0369 0.0423 0.0514

(6) 50 16 �b0 0.0247 0.0491 0.0366 0.0412 0.0504

0.0249 0.0497 0.0369 0.0413 0.0514

17

Table 4: Standard Errors of Direct Approach Estimators

Case Empirical S.E. of �d


nT (2nd line)

T n �0 � � � �2

(1) 10 49 �a0 0.0350 0.0668 0.0482 0.0529 0.0600

0.0373 0.0712 0.0484 0.0539 0.0670

(2) 10 49 �b0 0.0347 0.0646 0.0484 0.0509 0.0601

0.0376 0.0801 0.0490 0.0533 0.0669

(3) 50 49 �a0 0.0149 0.0270 0.0203 0.0225 0.0292

0.0150 0.0273 0.0203 0.0228 0.0303

(4) 50 49 �b0 0.0147 0.0252 0.0203 0.0209 0.0293

0.0149 0.0258 0.0203 0.0212 0.0304

(5) 50 16 �a0 0.0249 0.0499 0.0375 0.0376 0.0489

0.0251 0.0508 0.0373 0.0394 0.0525

(6) 50 16 �b0 0.0238 0.0483 0.0375 0.0381 0.0492

0.0241 0.0504 0.0373 0.0399 0.0525

Case Estimated S.E. of �d


nT (2nd line)

T n �0 � � � �2

(1) 10 49 �a0 0.0321 0.0615 0.0452 0.0488 0.0556

0.0340 0.0653 0.0478 0.0491 0.0622

(2) 10 49 �b0 0.0320 0.0596 0.0453 0.0471 0.0558

0.0338 0.0637 0.0478 0.0469 0.0623

(3) 50 49 �a0 0.0141 0.0270 0.0202 0.0217 0.0275

0.0143 0.0276 0.0206 0.0217 0.0286

(4) 50 49 �b0 0.0138 0.0255 0.0203 0.0208 0.0277

0.0141 0.0260 0.0206 0.0206 0.0287

(5) 50 16 �a0 0.0246 0.0488 0.0354 0.0376 0.0458

0.0255 0.0507 0.0368 0.0375 0.0495

(6) 50 16 �b0 0.0239 0.0471 0.0355 0.0366 0.0463

0.0248 0.0489 0.0368 0.0360 0.0498

18

5 Conclusion

This paper establishes asymptotic properties of quasi-maximum likelihood estimator for spatial dynamic

panel data with both individual e¤ects and time e¤ects when both the number of individuals n and the

number of time periods T can be large. A possible approach is to estimate both e¤ects directly. But instead

of the direct estimation, we propose a data transformation approach so that the magnitude of the bias is

only of the order O(T�1) rather than O(max(T�1; n�1)) in the direct estimation approach. When n is

asymptotically proportional to T , the estimator ispnT consistent and asymptotically normal, but the limit

distribution is not centered around 0; when T is relatively larger than n, the estimator ispnT consistent and

asymptotically centered normal; when T is relatively smaller than n, the estimator is consistent with rate T

and has a degenerate limit distribution. We also propose a bias correction for our estimator. We show that

when T grows faster than n1=3, the correction will asymptotically eliminate the bias and yield a centered

con�dence interval. This transformation approach has advantage over the direct approach especially when

n is relatively small. When n is relatively small, the estimates of the transformed approach has faster rates

of convergence than that of the direct estimates. The direct estimates have a degenerate limit distribution

but the transformed estimates are properly centered and are asymptotically normal.

19

Appendices

A Some Notes

A.1 Data Transformation

Let Jn = In � 1n lnl

0n be the "deviation from the group mean" transformation. As In = Jn + 1

n lnl0n and

Wnln = ln, we have

JnWn = JnWn(Jn +1

nlnl

0n) = JnWnJn,

because JnWnln = Jnln = 0. Then, from (2.1), for t = 1; 2; :::; T , we have

(JnYnt) = �0(JnWn)(JnYnt)+ 0(JnYn;t�1)+�0(JnWn)(JnYn;t�1)+(JnXnt)�0+(Jncn0)+(JnVnt); (A.1)

which does not involve the time e¤ects and Jncn0 can be regarded as the individual e¤ects. A special feature

of the transformed equation (A.1) is that the variance matrix of JnVnt is equal to �20Jn, which is of the rank

n� 1.The variance of JnVnt is �20JnJ

0n = �

20Jn and the elements of JnVnt are correlated. Here, Jn is singular

because Jn has rank n�1 as Jn is an orthogonal projector with trace n�1. Hence, there is a linear dependenceamong the elements of JnVnt. An e¤ective estimation method shall eliminate the linear dependence in sample

observations. This can be done with the eigenvalues and eigenvectors decomposition as in the theory of

generalized inverse for the estimation of linear regression models (see, e.g., Theil (1971), Ch. 6).

The eigenvalues of Jn are a single zero and n�1 ones. An eigenvector corresponding to the zero eigenvalueis proportional to ln. Let (Fn;n�1, ln=

pn) be the orthonormal matrix of Jn where Fn;n�1 corresponds to

the eigenvalues of ones and ln=pn corresponds to the eigenvalue zero. Thus,

JnFn;n�1 = Fn;n�1; F 0n;n�1Fn;n�1 = In�1; Jnln = 0;

F 0n;n�1ln = 0; Fn;n�1F0n;n�1 +

1n lnl

0n = In; Fn;n�1F

0n;n�1 = Jn.

(A.2)

To eliminate the linear dependence in JnVnt, the transformation of JnYnt to Y �nt, where Y�nt = F

0n;n�1JnYnt

is a vector with dimension n� 1, gives

Y �nt = �0F0n;n�1(JnWn)(JnYnt) + 0F

0n;n�1(JnYn;t�1) + �0F

0n;n�1(JnWn)(JnYn;t�1)

+F 0n;n�1JnXnt�0 + F0n;n�1Jncn0 + F

0n;n�1JnVnt

= �0F0n;n�1JnWnJnYnt + 0Y

�n;t�1 + �0F

0n;n�1JnWnJnYn;t�1 +X

�nt�0 + c

�n0 + V

�nt, (A.3)

where Y �nt = F 0n;n�1JnYnt, X�nt = F 0n;n�1JnXnt, c

�n0 = F 0n;n�1Jncn0 and V

�nt = F 0n;n�1JnVnt. Observe

that JnWn = JnWn(Fn;n�1F0n;n�1 +

1n lnl

0n) = JnWnFn;n�1F

0n;n�1 because Jnln = 0. Denote W �

n =

F 0n;n�1JnWnFn;n�1 = F0n;n�1WnFn;n�1 (as F 0n;n�1ln = 0). One arrives at

20

Y �nt = �0W�nY

�nt + 0Y

�n;t�1 + �0W

�nY

�n;t�1 +X

�nt�0 + c

�n0 + V

�nt, (A.4)

where V �nt is a (n� 1) dimensional disturbance vector with zero mean and variance matrix �20In�1. Further-more, as l0nJnYnt = 0, l

0nJnWnYnt = 0, l0nJnXnt = 0, the singularity of JnVnt does not impose deterministic

constraints on the parameters (Theil (1971)). This is expected as (A.4) is derived by eliminating the time

e¤ects and there is no deterministic constraints imposed on the parameters in (2.1) to begin with.

Equation (A.4) shall provide the estimation of the structural parameters in the model. This equation is

in the format of a typical spatial autoregressive model in panel data, where the number of observations is

T (n� 1), reduced from the original sample observations by one for each period. Equation (A.4) is useful as

it motivates the derivation of the likelihood function for Y �nt in our approach.

A.2 Determinant and Inverse of In�1 � �W �n

We note that (In�1 � �W �n) = F

0n;n�1(In � �Wn)Fn;n�1. For the determinant, we have

[Fn;n�1; ln=pn]0(In � �Wn)[Fn;n�1; ln=

pn]

=

F 0n;n�1(In � �Wn)Fn;n�1 F 0n;n�1(In � �Wn)(ln=

pn)

(l0n=pn)(In � �Wn)Fn;n�1 (l0n=

pn)(In � �Wn)(ln=

pn)

!

=

F 0n;n�1(In � �Wn)Fn;n�1 0

��l0nWnFn;n�1=pn 1� �

!,

because F 0n;n�1Wnln = F0n;n�1ln = 0 and

1n l0nWnln = 1. Hence,

jIn�1 � �W �n j =

��F 0n;n�1(In � �Wn)Fn;n�1�� = 1

1� � jIn � �Wnj . (A.5)

Thus, the tractability in computing the determinant of In�1 � �W �n is exactly that of In � �Wn. When

Wn is constructed as a weights matrix row normalized from an original symmetric matrix, Ord (1975) has

suggested a computationally tractable method for the evaluation of jIn � �Wnj at various � for the MLmethod. Thus, this will also be useful for evaluating the determinant of (In�1 � �W �

n) even though the row

sums of the transformed spatial weights matrix W �n may not even be unity.

Furthermore, a spatial autoregressive model is an equilibrium model in the sense that the observed

outcomes are determined by the equation. That is, the matrix In�1 � �W �n shall be invertible. For the

transformed equation (2.6), In�1 � �W �n is invertible as long as the original matrices In � �Wn in (2.1) is

invertible. This is so as it shall be shown below.

The inverse of In�1 � �W �n is

(In�1 � �W �n)�1 = F 0n;n�1(In � �Wn)

�1Fn;n�1, (A.6)

21

because we have

(In�1 � �W �n) � F 0n;n�1(In � �Wn)

�1Fn;n�1

= F 0n;n�1(In � �Wn)Fn;n�1F0n;n�1(In � �Wn)

�1Fn;n�1

= F 0n;n�1(In � �Wn)Jn(In � �Wn)�1Fn;n�1

= F 0n;n�1Fn;n�1 � F 0n;n�1(In � �Wn)lnl

0n

n(In � �Wn)

�1Fn;n�1

= In�1,

as F 0n;n�1ln = 0 and F0n;n�1Wnln = 0. We have also

W �n(In�1 � �W �

n)�1 = F 0n;n�1WnFn;n�1F

0n;n�1(In � �Wn)

�1Fn;n�1

= F 0n;n�1WnJn(In � �Wn)�1Fn;n�1 = F

0n;n�1Wn(In � �Wn)

�1Fn;n�1,

and

(In�1 � �W �n)�1W �

n =W�n(In�1 � �W �

n)�1 = F 0n;n�1Wn(In � �Wn)

�1Fn;n�1.

A.3 About tr[(JnGn(�))m]

As is shown in (A.6) that S��1n (�) = (In � �W �n)�1 = F 0n;n�1S

�1n (�)Fn;n�1, it follows that G�n(�) =

W �nS

��1n (�) = F 0n;n�1WnFn;n�1 � F 0n;n�1S�1n (�)Fn;n�1 = F 0n;n�1WnJnS

�1n (�)Fn;n�1 = F 0n;n�1Gn(�)Fn;n�1

because F 0n;n�1WnJn = F0n;n�1Wn. Furthermore, as Gn(�)ln = 1

(1��) ln and F0n;n�1Gn(�)ln = 0, by induc-

tion, one has

G�mn (�) = F 0n;n�1Gmn (�)Fn;n�1; m = 0; 1; 2; � � � :

From this, tr(G�mn (�)) = tr(JnGmn (�)) because Fn;n�1F

0n;n�1 = Jn. Furthermore, because Gmn (�)ln =

1(1��)m ln,

tr(G�mn (�))� tr(Gmn (�)) = �1

ntr(Gmn (�)lnl

0n) = �

1

n(1� �m) tr(lnl0n) = �

1

(1� �m) ; m = 1; 2; � � � :

B Lemmas for Some Statistics in the Model

The following lemmas and theorems can be found in Yu, de Jong and Lee (2006).

Let Vnt = (v1t; v2t; � � � ; vnt)0 be n � 1 column vector. We assume that fvitg, i = 1; 2; � � � ; n and t =1; 2; � � � ; T; are i:i:d: across i and t with zero mean, variance �20 and E jvitj

4+�< 1 for some � > 0. Also,

let Dnt be n� 1 vector of uniformly bounded constants for all n and t. Denote

Unt =1Xh=1

PnhVn;t+1�h, (B.1)

where fPnhg1h=1 is a sequence of n� n nonstochastic square matrices.Assumption A1. The disturbances fvitg, i = 1; 2; :::; n and t = 1; 2; :::; T; are i:i:d across i and t with

zero mean, variance �20 and E jvitj4+�

<1 for some � > 0.

22

Assumption A2. The row and column sums ofP1

h=1 abs(Phn) are bounded uniformly in n.

Assumption A3. The elements of n� 1 vector Dnt are nonstochastic and bounded, uniformly in n andt.

Assumption A4. n is a nondecreasing function of T .

Lemma B.1 Under Assumptions A1 and A4, for an n� n nonstochastic matrix Bn, uniformly bounded inrow and column sums,

1

nT

TXt=1

V 0ntBnVnt � E(1

nT

TXt=1

V 0ntBnVnt) = Op

�1pnT

�, (B.2)

1

n�V 0nTBn �VnT � E(

1

n�V 0nTBn �VnT ) = Op

�1pnT 2

�, (B.3)

and1

nT

TXt=1

~V 0ntBn~Vnt � E(

1

nT

TXt=1

~V 0ntBn~Vnt) = Op

�1pnT

�; (B.4)

where E( 1nT

PTt=1 V

0ntBnVnt) =

1n�

20tr(Bn) = O(1) and E(

1n�V 0nTBn

�VnT ) =1nT �

20tr(Bn) = O

�T�1

�.

Theorem B.2 Under Assumptions A1, A2 and A4,rT

n

��U0nT;�1 �VnT � E

��U0nT;�1 �VnT

��= Op

�1pT

�, (B.5)

whereq

TnE

��U0nT;�1 �VnT

�=p

nT1n�

20tr (

P1h=1 Pnh) +O

�pnT 3

�when T !1.

For the theorem that follows, we will consider the following form:

QnT =TXt=1

�U0n;t�1Vnt +D0

ntVnt + V0ntBnVnt � �20trBn

�=

TXt=1

nXi=1

znt;i;

where Bn is a n � n nonstochastic symmetric matrix which is uniformly bounded in both row and columnsums, and znt;i = (ui;t�1 + dnti)vit + bn;ii(v2it � �20) + 2(

Pi�1j=1 bn;ijvjt)vit, where bn;ij is the (i; j) element of

Bn and dnti is the ith element of Dnt. Then, for the mean and variance of QnT , �QnT= 0 and

�2QnT= T�40tr

1Xh=1

P 0nhPnh

!+�20

TXt=1

D0ntDnt+T

��4 � 3�40

� nXi=1

b2n;ii + 2�40tr(B

2n)

!+2�3

TXt=1

nXi=1

dntibn;ii;

where �s = Evsit for s = 3; 4.

Theorem B.3 Under Assumptions A1, A2, A3, A4 and that row and column sums of Bn are bounded

uniformly in n, if the sequence 1nT �

2QnT

is bounded away from zero, then,

QnT�QnT

d! N(0; 1). (B.6)

23

Denote Znt = (Yn;t�1; WnYn;t�1; Xnt), we are going to provide some lemmas related to Jn ~Znt, Jn �ZnTand ~Vnt, �VnT of the model (2.1).

Lemma B.4 Under Assumptions 1-7, for an n�n nonstochastic matrix Bn, uniformly bounded in row andcolumn sums,

1

nT

TXt=1

(Jn ~Znt)0Bn(Jn ~Znt)� E

1

nT

TXt=1

(Jn ~Znt)0Bn(Jn ~Znt) = Op

�1pnT

�, (B.7)

where E 1nT

PTt=1(Jn

~Znt)0Bn(Jn ~Znt) is O(1);

1

nT

TXt=1

(Jn ~Znt)0Bn ~Vnt � E

1

nT

TXt=1

(Jn ~Znt)0Bn ~Vnt = Op

�1pnT

�, (B.8)

where E 1nT

PTt=1(Jn

~Znt)0Bn ~Vnt is O

�1T

�.


1

n(Jn �ZnT )

0Bn(Jn �ZnT )� E1

n(Jn �ZnT )

0Bn(Jn �ZnT ) = Op

�1pnT

�, (B.9)

where E 1n (Jn

�ZnT )0Bn(Jn �ZnT ) is O(1);

1

n(Jn �ZnT )

0Bn �VnT � E1

n(Jn �ZnT )

0Bn �VnT = Op

�1pnT

�, (B.10)

where E 1n (Jn

�ZnT )0Bn �VnT is O

�T�1

�.

Also, from Equation (3.3),


�UnT;�1; 0),

where ~Z(u)nt = ((�Xn;t�1 + Un;t�1); (Wn

�Xn;t�1 +WnUn;t�1); ~Xnt) with

�Xn;t�1 = Xn;t�1 � �XnT;�1. Hence

JnZnt has two components: one is Jn ~Z(u)nt , which is uncorrelated with Vnt; the other is�(Jn �UnT;�1 JnWn

�UnT;�1 0),

but is correlated with Vnt when t � T � 1. Following is a lemma related to ~Z(u)nt and ~Znt.


E1

nT

TXt=1

~Z 0ntJnBnJn ~Znt � E1

nT

TXt=1

~Z(u)0nt JnBnJn ~Z

(u)nt = O

�1

T

�, (B.11)

where E 1nT

PTt=1

~Z(u)0nt JnBnJn ~Z

(u)nt is O(1).

24

C Concentrated QML of Transformation Approach

C.1 Reduced Form of (2.1)

From Equation (2.1), we have Ynt = S�1n (Znt�0 + cn0 + �tln + Vnt) and WnYnt = GnZnt�0 + Gncn0 +

�tGnln +GnVnt. Hence, using S�1n = In + �0Gn, we have Ynt = Znt�0 + �0GnZnt�0 + S�1n cn0 + �tS�1n ln +

S�1n Vnt. Using S�1n ln =1

1��0 ln and Jnln = 0, we have

~Ynt = ~Znt�0 + �0Gn ~Znt�0 +~�t

1� �0ln + S

�1n~Vnt,

and

Jn ~Ynt = Jn ~Znt�0 + �0JnGn ~Znt�0 + JnS�1n~Vnt. (C.1)

Similarly, as we have Wn~Ynt = Gn ~Znt�0 + ~�tGnln +Gn ~Vnt, we can get

JnWn~Ynt = JnGn ~Znt�0 + JnGn ~Vnt, (C.2)

because JnGnln = JnWnS�1n ln = JnWnJnS

�1n ln = 0.

C.2 FOC and SOC of MLE

For the concentrated likelihood function (2.11), the �rst order derivatives are

@ lnLn;T (�)

@�=

0B@@ lnLn;T (�)

@�@ lnLn;T (�)

@�@ lnLn;T (�)

@�2

1CA =

0BBBBBB@1�2

TPt=1(Jn ~Znt)

0 ~Vnt(�)

1�2

TPt=1

�(JnWn

~Ynt)0 ~Vnt(�)

�� TtrJnGn(�)

12�4

TPt=1( ~V 0nt(�)Jn ~Vnt(�)� (n� 1)�2)

1CCCCCCA ,

and the second order derivatives are

@2 lnLn;T (�)

@�@�0

= �

0BBBBBB@1�2

TPt=1

~Z 0ntJn ~Znt1�2

TPt=1

~Z 0ntJnWn~Ynt

1�4

TPt=1

~Z 0ntJn ~Vnt(�)

� 1�2

TPt=1

�(Wn

~Ynt)0JnWn

~Ynt

�+ Ttr((JnGn(�))

2) 1�4

TPt=1(Wn

~Ynt)0Jn ~Vnt(�)

� � � (n�1)T2�4 + 1

�6

TPt=1


1CCCCCCA .

Using trGn(�) � tr(JnGn(�)) = 11�� and tr(G

2n(�)) � tr((JnGn(�))2) = 1

(1��)2 (see Appendix A.3), we

can get

25

@ lnLn;T (�)

@�=

0B@@ lnLn;T (�)

@�@ lnLn;T (�)

@�@ lnLn;T (�)

@�2

1CA =

0BBBBBB@1�2

TPt=1(Jn ~Znt)

0 ~Vnt(�)

1�2

TPt=1

�(JnWn

~Ynt)0 ~Vnt(�)

�� TtrGn(�) + T

1��

12�4

TPt=1( ~V 0nt(�)Jn ~Vnt(�)� (n� 1)�2)

1CCCCCCA , (C.3)


@2 lnLn;T (�)

@�@�0(C.4)

= �

0BBBBBB@1�2

TPt=1

~Z 0ntJn ~Znt1�2

TPt=1

~Z 0ntJnWn~Ynt

1�4

TPt=1

~Z 0ntJn ~Vnt(�)

� 1�2

TPt=1

�(Wn

~Ynt)0JnWn

~Ynt

�+ Ttr(G2n(�))� T

(1��)21�4

TPt=1(Wn

~Ynt)0Jn ~Vnt(�)

� � � (n�1)T2�4 + 1

�6

TPt=1


1CCCCCCA .

C.3 The Variance of the Gradient

We are going to show that for 1p(n�1)T

@ lnL(u)n;T (�0)

@� , E( 1p(n�1)T

@ lnL(u)n;T (�0)

@� � 1p(n�1)T

@ lnL(u)n;T (�0)

@�0 ) =

��0;nT + �0;n + O�T�1

�where �0;n =

�4�3�40�40

0BBB@0 0 0

0 1n�1

nPi=1

[(JnGn)2]ii

12�20(n�1)

trJnGn

0 12�20(n�1)

trJnGn14�40

1CCCA. First,

we can write E( 1p(n�1)T

@ lnL(u)n;T (�0)

@� � 1p(n�1)T

@ lnL(u)n;T (�0)

@�0 ) =

E 1(n�1)T

0BBBBBBB@

1�40

�TPt=1

~Z(u)0nt JnVnt

��TPt=1

~Z(u)0nt JnVnt

�0� �

1�40

�TPt=1(Gn ~Z

(u)nt �0)

0JnVnt +TPt=1(V 0ntG


��TPt=1

~Z(u)0nt JnVnt

�00 0

12�60

�TPt=1(V 0ntJnVnt � (n� 1)�20)

��TPt=1

~Z(u)0nt JnVnt

�00 0

1CCCCCCCA

+E 1(n�1)T

0BBBBB@0 0 0

0 1�40

�TPt=1(Gn ~Z

(u)nt �0)



�2�

0 12�40

�TPt=1(Gn ~Z

(u)nt �0)



��TPt=1(V 0ntJnVnt � (n� 1)�20)

�00

1CCCCCA

+E 1(n�1)T

0BBB@0 0 0

0 0 0

0 0 14�80

�TPt=1(V 0ntJnVnt � (n� 1)�20)

��TPt=1(V 0ntJnVnt � (n� 1)�20)

�01CCCA. As ~Z(u)nt is un-

26

correlated with Vnt, we have E( 1p(n�1)T

@ lnL(u)n;T (�0)

@� � 1p(n�1)T

@ lnL(u)n;T (�0)

@�0 )

=

0BBBBBBB@

1�20(n�1)T

ETPt=1

~Z(u)0nt Jn ~Z

(u)nt � �

1�20(n�1)T

ETPt=1(Gn ~Z

(u)nt �0)

0Jn ~Z(u)nt

0@ 1�20(n�1)T

ETPt=1(Gn ~Z

(u)nt �0)

0JnGn ~Z(u)nt �0

+ 1n�1


2)�

1A �

0 1�20(n�1)

tr(JnGn)12�40

1CCCCCCCA

+

0BBBB@0 � �

�3�40(n�1)T

nPi=1

(JnGn)iiE(TPt=1Jn ~Z

(u)nt )i

2�3�40(n�1)T

nPi=1

(JnGn)iiE(TPt=1JnGn ~Z

(u)nt �0)i +

�4�3�40�40(n�1)

nPi=1

(JnGn)2ii �

(1� 1n )

�32�60(n�1)T

l0nETPt=1Jn ~Z

(u)nt (1� 1

n )1

2�60(n�1)T�3l

0nE

TPt=1JnGn ~Z

(u)nt �0 +

�4�3�402�60(n�1)

trJnGn�4�3�404�80

1CCCCA.The �rst matrix is equal to ��0;nT + O

�T�1

�using Lemma B.6. The second matrix is equal to �0;n

using ETPt=1

~Z(u)nt = 0 and E

TPt=1Gn ~Z

(u)nt �0 = 0. Hence, E( 1p

(n�1)T@ lnL

(u)n;T (�0)

@� � 1p(n�1)T

@ lnL(u)n;T (�0)

@�0 ) =

��0;nT +�0;n+O�T�1

�. When Vnt are normally distributed, �0;n = 0 because �4� 3�40 = 0 for a normal

distribution.

C.4 About � 1(n�1)T

@2 lnLnT (�)@�@�0

Denote k� � �0k as the Euclidean norm of � � �0, and �1 as a neighborhood of �0, then, we have

1

(n� 1)T@2 lnLnT (�)

@�@�0� 1

(n� 1)T@2 lnLnT (�0)

@�@�0= k� � �0k �Op(1), (C.5)

1

(n� 1)T@2 lnLnT (�0)

@�@�0+��0;nT = Op

�1pnT

�, (C.6)

sup�2�

�� 1

(n� 1)T@2 lnLnT (�)

@�@�0� 1

(n� 1)T E@2 lnLnT (�)

@�@�0

��ij

= Op

�1pnT

�, (C.7)

and

sup�2�1

�� 1

(n� 1)T E@2 lnLnT (�)

@�@�0+��0;nT

��ij

= sup�2�1

k� � �0k �O(1) (C.8)

for all i; j = 1; 2; � � � ; kx + 4.These are Equations (C.7) to (C.10) in Yu, de Jong and Lee (2006).

D Proofs for Theorems

D.1 Proof of Claim 3.1

To prove 1(n�1)T lnLn;T (�)�Qn;T (�)

p! 0 uniformly in � in any compact parameter space �:

From ~Vnt(�) � Sn(�) ~Ynt � ~Znt� and ~Vnt = Sn ~Ynt � ~Znt�0, using Jnln = 0, we have

27

Jn ~Vnt(�) = Jn ~Vnt � (�� 0)JnWn~Ynt � Jn ~Znt(� � �0). (D.1)

Then,

~V 0nt(�)Jn ~Vnt(�) = ~V 0ntJn ~Vnt + (�� 0)2(Wn~Ynt)

0JnWn~Ynt + (� � �0)0 ~Z 0ntJn ~Znt(� � �0) (D.2)

+2(�� 0)(Wn~Ynt)

0Jn ~Znt(� � �0)� 2(�� 0)(Wn~Ynt)

0Jn ~Vnt � 2(� � �0)0 ~Z 0ntJn ~Vnt

where, using (C.2), we have

(Wn~Ynt)

0JnWn~Ynt = (Gn ~Znt�0)

0Jn(Gn ~Znt�0) + 2(Gn ~Znt�0)0JnGn ~Vnt + (Gn ~Vnt)

0JnGn ~Vnt.

Using Lemma B.1 and Lemma B.4,1

(n�1)TPT

t=1~V 0ntJn ~Vnt � E 1

(n�1)TPT

t=1~V 0ntJn ~Vnt

p! 0,1

(n�1)TPT

t=1(Wn~Ynt)

0JnWn~Ynt � E 1

(n�1)TPT

t=1(Wn~Ynt)

0JnWn~Ynt

p! 0,1

(n�1)TPT

t=1~Z 0ntJn ~Znt � E 1

(n�1)TPT

t=1~Z 0ntJn ~Znt

p! 0,1

(n�1)TPT

t=1(Wn~Ynt)

0Jn ~Znt � E 1(n�1)T

PTt=1(Wn

~Ynt)0Jn ~Znt

p! 0,1

(n�1)TPT

t=1(Wn~Ynt)

0Jn ~Vnt � E 1(n�1)T

PTt=1(Wn

~Ynt)0Jn ~Vnt

p! 0 and1

(n�1)TPT

t=1~Z 0ntJn ~Vnt � E 1

(n�1)TPT

t=1~Z 0ntJn ~Vnt

p! 0.

As � is compact so that �, � are bounded in �, we have

1

(n� 1)T

TXt=1

~V 0nt(�)Jn~Vnt(�)�

1

(n� 1)T ETXt=1

~V 0nt(�)Jn~Vnt(�)

p! 0 uniformly in � in �.

Also, from (3.1) and by using the fact that �2 is bounded away from zero in �,

1

(n� 1)T lnLn;T (�)�Qn;T (�) = �1

2�2

1

(n� 1)T

TXt=1

~V 0nt(�)Jn~Vnt(�)�

1

(n� 1)T ETXt=1

~V 0nt(�)Jn~Vnt(�)

!p! 0

uniformly in � in �.

To prove Qn;T (�) is uniformly equicontinuous in � in any compact parameter space �:

We have

QnT (�) = �12ln 2� � 1

2ln�2 � 1

n� 1 ln(1� �)

+1

n� 1 ln jSn(�)j �1

2�2(n� 1)T ETXt=1

~V 0nt(�)Jn ~Vnt(�):

As Jn ~Vnt(�) = JnhSn(�) ~Ynt � ~Znt�

iand ~Ynt = S�1n ~Znt�0 + S

�1n~Vnt +

~�t01��0 ln,

Jn ~Vnt(�) = Jn[Sn(�)S�1n~Znt�0 � ~Znt� + Sn(�)S

�1n~Vnt]

28

because Jnln = 0. Hence,

E1

(n� 1)T

TXt=1

~V 0nt(�)Jn ~Vnt(�) (D.3)

=1

(n� 1)T ETXt=1

(Sn(�)S�1n~Znt�0 � ~Znt�)

0Jn(Sn(�)S�1n~Znt�0 � ~Znt�)

+1

n� 1T � 1T

�20tr(S�10n S0n(�)JnSn(�)S

�1n )

+2

(n� 1)T ETXt=1

(Sn(�)S�1n~Znt�0 � ~Znt�)

0JnSn(�)S�1n~Vnt.

The third term 2(n�1)T E

TPt=1(Sn(�)S

�1n~Znt�0� ~Znt�)0JnSn(�)S�1n ~Vnt is O

�T�1

�according to expected values

in Lemma B.4 and the order O�T�1

�is uniformly in � in � because it is a polynomial function in � and � is a

bounded set. The �rst term is equal to (�0��00; ��0)EHnT (�0��00; ��0)0 using Sn(�)S�1n = In�(��0)Gn;

the second term is equal to T�1T �2n(�) where �

2n(�) =

�20n�1 tr(S

0�1n S0n(�)JnSn(�)S

�1n ), which are all polynomial

functions of �. To prove Qn;T (�) is uniformly equicontinuous in � in the compact parameter space �, the

followings are su¢ cient: (1) ln�2 is uniformly continuous; (2) 1n ln jSn(�)j is uniformly equicontinuous; (3)

(�0 � �00; �� 0)HnT (�0 � �00; �� 0)0 is uniformly equicontinuous; (4) �2n(�) is uniformly equicontinuous.

(1) is obvious because �2 is bounded away from zero in �. For (2), 1n ln jSn(�2)j �

1n ln jSn(�1)j =

1n tr

�WnS

�1n

��(�2 � �1) where �� lies between �2 and �1. As S�1n (�) is uniformly bounded in row and

column sums, uniformly in � 2 �, 1n tr�WnS

�1n

��is bounded, we have 1

n ln jS(�)j is uniformly equicontin-uous. For (3), because � and � are bounded and because EHnT is O(1) according to Lemma B.4, the result

follows. For (4), �2n(�2) � �2n(�1) =�20n�1 tr(S

0�1n S0n(�2)JnSn(�2)S

�1n ) � �20

n�1 tr(S0�1n S0n(�1)JnSn(�1)S

�1n ).

Using Sn(�)S�1n = In � (�� 0)Gn,

�2n(�2)� �2n(�1) = �20�(�2 � �1) (�2 + �1 � 2�0)

trG0nJnGnn

� (�2 � �1)tr (G0nJn + JnGn)

n

�:

As G0nJnGn and JnGn are uniformly bounded in row and column sums, �2n(�) is uniformly equicontinuous.

�

D.2 Proof of nonsingularity of the information matrix

We can prove the result by using an argument by contradiction. For ��0 � limT!1��0;nT , where ��0;nTis (3.2), we need to prove that ��0� = 0 implies � = 0 where � = (�

01; �2; �3)

0 and �2; �3 are scalars and �1is (kx + 2) � 1 vector. If this is true, then, columns of ��0 would be linear independent so that ��0 would

be nonsingular. Denote H� = plimT!11

(n�1)T

TPt=1

~Z 0ntJn ~Znt, H�� = plimT!11

(n�1)T

TPt=1

~Z 0ntJnGn ~Znt�0,

29

H�� = H0�� and H� = plimT!1

1(n�1)T

TPt=1(Gn ~Znt�0)

0JnGn ~Znt�0. Then

��0 =1

�20

0B@ H� H�� 0

H�� EH� + limn!1�20n�1


2)�

limn!11

n�1 tr(JnGn)

0 limn!11

n�1 tr(JnGn)12�20

1CA .Hence, ��0� = 0 implies

H� � �1 +H�� 2 = 0,1�20H�� 1 +

�1�20H� + limn!1

1n�1


2)�� 2 + limn!1

1�20(n�1)

tr(JnGn)� �3 = 0,

limn!11

(n�1) tr(JnGn)� �2 +12�20

� �3 = 0.

From the �rst equation, �1 = �(H�)�1H��2; from the third equation, �3 = �2 limn!1

�20n�1 tr(JnGn)��2.

By eliminating �1 and �3, the remaining equation becomes

��1

�20

�H� �H��H�1

� H��

��+ limn!1

1

n� 1


2)� 2 tr2(JnGn)

n� 1

�� 2 = 0.

Denote Cn = JnGn � tr(JnGn)n�1 Jn, then,

tr(G0nJnGn) + tr((JnGn)2)� 2tr

2(JnGn)

n� 1

=1

2tr(C 0n + Cn)(C

0n + Cn)

0,

which is nonnegative. Hence, if the limit of EHnT is nonsingular or the limit of 1n�1 (tr(G

0nJnGn) +

tr((JnGn)2) � 2tr2(JnGn)

n�1 ) is nonzero, we have �2 = 0 and hence � = 0. This proves the nonsingularity

of ��0 . �

D.3 Proof of Theorem 3.2

As ETPt=1

~V 0ntJn ~Vnt = (n� 1)(T � 1)�20 according to Lemma B.1, at �0, the expected log likelihood from

(3.1) implies

E lnLn;T (�0) = �(n� 1)T

2ln 2� � (n� 1)T

2ln�20 � T ln(1� �0) + T ln jSnj �

(n� 1)(T � 1)2

.

Denote �2n(�) =�20n�1 tr(S

�10n S0n(�)JnSn(�)S

�1n ). By using Sn(�)S�1n = In + (�0 � �)Gn for (D.3), it follows

that1

(n�1)T E lnLn;T (�)�1

(n�1)T E lnLn;T (�0)

30

= � 12 (ln�

2 � ln�20) + 1n�1 ln jSn(�)j �

1n�1 ln jSnj �

�12�2

1(n�1)T

TPt=1E ~V 0nt(�)Jn ~Vnt(�)� T�1

2T

�� 1n�1 (ln(1� �)� ln(1� �0))

= T1;n(�; �2)� 1

2�2T2;n;T (�; �) +O(T�1),

where

T1;n(�; �2) = �1

2(ln�2�ln�20)+

1

n� 1 ln jSn(�)j�1

n� 1 ln jSnj�1

2�2(�2n(�)��2)�

1

n� 1(ln(1��)�ln(1��0)),

and

T2;n;T (�; �) =1

(n� 1)T

TXt=1

En[ ~Znt(�0 � �) + (�0 � �)Gn ~Znt�0]0Jn[ ~Znt(�0 � �) + (�0 � �)Gn ~Znt�0]

o= ((�0 � �)0; (�0 � �)) � HnT � ((�0 � �)0; (�0 � �))0.

Consider the pure spatial process Ynt = �0WnYnt+�tln+Vnt for a period t. Similarly as Equation (2.10)

after data transformation, we can get the log likelihood function of this process as

lnLp;n(�; �2) = �n� 1

2ln 2� � n� 1

2ln�2 � ln(1� �) + ln jSn(�)j �

1

2�2V 0nt(�)JnV

0nt(�), (D.4)

where Vnt(�) = Sn(�)Ynt. Let Ep(�) be the expectation operator for Ynt based on this pure spatial autore-gressive process. It follows that

Ep(1

n� 1 lnLp;n(�; �2))� Ep(

1

n� 1 lnLp;n(�0; �20))

= �12(ln�2 � ln�20) +

1

n� 1 ln jSn(�)j �1

n� 1 ln jSn(�0)j

� 1

2�2(�2n(�)� �2)�

1

n� 1(ln(1� �)� ln(1� �0)),

which equals to T1;n(�; �2). By the information inequality, lnLp;n(�; �2) � lnLp;n(�0; �20) � 0. Thus,

T1;n(�; �2) � 0 for any (�; �2).

For T2;n;T (�; �), it is a quadratic function of � and �. Under the assumed condition that limT!1EHnT

is nonsingular, limT!1 T2;n;T (�; �) > 0 whenever (�; �) 6= (�0; �0). So, (�; �) is globally identi�ed. Given

�0, �20 is the unique maximizer of T1;n(�0; �2) for any given n. In the event that n ! 1, �20 is the unique

maximizer of limT!1 T1;n(�0; �2). Hence, (�; �; �2) is globally identi�ed.

Combined with uniform convergence and equicontinuity in Claim 3.1, the consistency follows. �


From the Proof of Theorem 3.2 (see Appendix D.3),

1

(n� 1)T E lnLn;T (�)�1

(n� 1)T E lnLn;T (�0) = T1;n(�; �2)� 1

2�2T2;n;T (�; �) +O(T

�1).

31

When the limit of EHnT is singular, �0 and �0 cannot be identi�ed from T2;n;T (�; �). Global identi�cation

requires that the limit of T1;n(�; �2) is strictly less than zero. Thus, we require the global identi�cation

just from the pure spatial model with the likelihood function (D.4). Using the concentrating approach by

concentrating out �2 in (D.4), we have the concentrated likelihood function

lnLp;n(�) = �n� 12

(ln(2�) + 1)� n� 12

ln �2nt(�)� ln(1� �) + ln jSn(�)j , (D.5)

where �2nt(�) =1

n�1V0nt(�)JnVnt(�). Also, we have corresponding Qn(�) = max�2 E(lnLp;n(�; �

2)) =

�n�12 (ln(2�)+1)�n�1

2 ln�2n(�)�ln(1��)+ln jSn(�)j. Global identi�cation of �0 requires that limn!11n [Qn(�)�

Qnt(�0)] 6= 0 whenever � 6= �0, which is equivalent to limn!11n [ln jSn(�)j � ln jSnj �

n�12 (ln�2n(�) �

ln�2n(�0))� (ln(1� �)� ln(1� �0))] 6= 0 whenever � 6= �0. By rearranging the terms, Qn(�)�Qnt(�0) 6= 0whenever � 6= �0 is just equivalent to

1

nln��20S�10n S�1n

�� 1

nln��2n(�)S�1n (�)0S�1n (�)

��+ � 1nln

�20(1� �0)2

� 1

nln

�2n(�)

(1� �)2

�6= 0

for � 6= �0. When n!1, it becomes

limn!1

�1

nln��20S�10n S�1n

�� 1

nln��2n(�)S�1n (�)0S�1n (�)

�� 6= 0for � 6= �0. After �0 is identi�ed, �20 is then identi�ed. Also, given �0, �0 can be identi�ed from

limT!1 T2;n;T (�; �). Combined with uniform convergence and equicontinuity in Claim 3.1, the consistency

follows. �


As Znt = (Yn;t�1;WnYn;t�1; Xnt) where Ynt is speci�ed in Equation (2.4), we have two components of

Jn ~Znt such that


�UnT;�1; 0), (D.6)

where ~Z(u)nt = ((�Xn;t�1+Un;t�1); (Wn

�Xn;t�1+WnUn;t�1); ~Xnt) with

�Xn;t�1 = Xn;t�1� �XnT;�1. Hence, Jn ~Znt

has two components: one is Jn ~Z(u)nt , which is uncorrelated with Vnt; the other is�(Jn �UnT;�1; JnWn

�UnT;�1; 0),

which is correlated with Vnt when t � T � 1. Then, the score can be decomposed into 2 parts such that

1p(n� 1)T

@ lnLn;T (�0)

@�=

1p(n� 1)T

@ lnL(u)n;T (�0)

@��nT , (D.7)

where 1p(n�1)T

@ lnL(u)n;T (�0)

@� is de�ned in (3.5) and �nT is de�ned in (3.6).

For the �rst part, it�s a linear and quadratic form of Vnt and the asymptotic distribution can be derived

from the central limit theorem for martingale di¤erence arrays (Theorem B.3). For the term �nT , from

Equation (B.3) in Lemma B.1 and Equation (B.5) in Theorem B.2, we have �nT =q

n�1T a�0;n+O(

qn�1T 3 )+

Op(1pT) where a�0;n is O(1) and speci�ed in Equation (3.9). Combined together, we get the result that

32

1p(n� 1)T

@ lnLn;T (�0)

@�+�nT

d! N(0;��0 +�0). � (D.8)


This is Equation (C.5) and (C.6).


The Taylor expansion gives

p(n� 1)T (�nT � �0) =

�� 1

(n� 1)T@2 lnLn;T (��nT )

@�@�0

��11p

(n� 1)T@ lnLn;T (�0)

@�, (D.9)

where ��nT lies between �0 and �nT and 1p(n�1)T

@ lnLn;T (�0)@� = 1p

(n�1)T@ lnL

(u)n;T (�0)

@� ��nT .As

� 1

(n� 1)T@2 lnLn;T (��nT )

@�@�0=

�� 1

(n� 1)T@2 lnLn;T (��nT )

@�@�0�� 1

(n� 1)T@2 lnLn;T (�0)

@�@�0

��+

�� 1

(n� 1)T@2 lnLn;T (�0)

@�@�0� ��0;nT

�+��0;nT

where the �rst term is ��nT � �0 � Op(1) (Equation (C.5)) and the second term is Op

�1p

(n�1)T

�(Equa-

tion (C.6)), we have � 1(n�1)T

@2 lnLn;T (��nT )@�@�0 =

��nT � �0 � Op(1) + Op� 1p(n�1)T

�+ ��0;nT . Because (1) ��nT � �0 = op(1) as �nT is consistent and (2) ��0;nT is the nonsingular in the limit according to Assumption

8 and Appendix D.2, we have � 1(n�1)T

@2 lnLn;T (��nT )@�@�0 is invertible for large T and

�� 1(n�1)T

@2 lnLn;T (��nT )@�@�0

��1is Op(1).

Also, we have 1p(n�1)T

@ lnL(u)n;T (�0)

@�

d! N(0;��0 + �0) and �nT =q

n�1T a�0;n + O(

qn�1T 3 ) + Op(

1pT)

with a�0;n = O(1). Then, from (D.9),p(n� 1)T (�nT ��0) = Op(1) �

�Op(1) +O

�pnT

�+Op

�1pT

��, which

implies that

�nT � �0 = Op�max

�1pnT;1

T

��. (D.10)

Using the fact that13�� 1

(n� 1)T@2 lnLn;T (��nT )

@�@�0

��1= ��1�0;nT +Op

�max

�1pnT;1

T

��, (D.11)

given that ��0;nT is nonsingular and its inverse is of order O(1), we have

13For two matrices Ck and Dk which are nonsingular and Ck �Dk = Op(T��) for � > 0, we have C�1k �D�1k = C�1k (Dk �

Ck)D�1k = Op(T��) when C

�1k and D�1

k are of order O(1).

33

p(n� 1)T (�nT � �0) =

�� 1

(n� 1)T@2 lnLn;T (��nT )

@�@�0

��

1p(n� 1)T

@ lnL(u)n;T (�0)

@��nT

!

= ��1�0;nT �1p

(n� 1)T@ lnL

(u)n;T (�0)

@�+Op

�max

�1pnT;1

T

�� 1p

(n� 1)T@ lnL

(u)n;T (�0)

@�

��1�0;nT ��nT �Op�max

�1pnT;1

T

��nT ,

which implies that

p(n� 1)T (�nT � �0) + ��1�0;nT ��nT +Op

�max

�1pnT;1

T

��nT

= (��1�0;nT + op(1)) �1p

(n� 1)T@ lnL

(u)n;T (�0)

@�. (D.12)

As ��0 = limT!1��0;nT exists, then using Claim 3.4 and that �nT =q

n�1T a�0;n + O(

qn�1T 3 ) + Op(

1pT)

with a�0;n = O(1),

p(n� 1)T (�nT � �0) +

rn� 1T

��1�0;nTa�0;n +Op

max

rn

T 3;

r1

T

!!d! N(0;��1�0 (��0 +�0)�

�1�0). �

(D.13)

D.8 Proof for Theorem 3.8

Theorem 3.6 states thatp(n� 1)T (�nT ��0)+

qn�1T b�0;nT +Op

�max

�pnT 3 ;

1pT

��d! N(0;��1�0 (��0+

�0)��1�0). As the bias corrected estimator is

�1

nT = �nT +1

T

� 1

(n� 1)T E@2 lnLnT (�nT )

@�@�0

!�1� an(�nT ),

where an(�) = a�;n, we will havep(n� 1)T (�

1

nT � �0)d! N(0;��1�0 (��0 +�0)�

�1�0) if

rn� 1T

0@ � 1


@�@�0

!�1an(�nT )� ��1�0;nTan(�0)

1A p! 0, (D.14)

and nT 3 ! 0. Assuming that n

T 3 ! 0, we are going to prove Equation (D.14).

From Equation (D.11) and that �nT � �0 = Op�max

�1pnT; 1T

��from Equation (D.10), we have

� 1


@�@�0= ��1�0;nT +Op

�max

�1pnT;1

T

��.

34

Hence, rn� 1T

8<: � 1


@�@�0

!�1an(�nT )� ��1�0;nTan(�0)

9=;=

rn� 1T

��1�0;nT +Op

�max

�1pnT;1

T

��an(�nT )� ��1�0;nTan(�0)

�=

rn� 1T

n��1�0;nT

�an(�nT )� an(�0)

�o+

rn� 1T

an(�nT )�Op�max

�1pnT;1

T

��.

As �nT ��0 = Op�max

�1pnT; 1T

��and an(�0) is O(1), according to the Taylor expansion of an(�nT ) around

an(�0), to prove Equation (D.14) is reduced to prove that the sequence of@an(�)@�0 is bounded, uniformly in a

small neighborhood of �0 where, from (3.9),

a�;n =

0BBBBBBBB@

1n�1 tr

��JnP1

h=0Ahn(�)

�S�1n (�)

�1

n�1 tr�Wn

�P1h=0A

hn(�)

�S�1n (�)

�0

1n�1 tr(Gn(�)

�JnP1

h=0Ahn(�)

�S�1n (�))

+ 1n�1�tr(Gn(�)Wn

�JnP1

h=0Ahn(�)

�S�1n (�)) + 1

n�1 trJnGn(�)

!12�2

1CCCCCCCCA.

As An(�) = S�1n (�)( In+�Wn) and Gn(�) =WnS�1n (�), we have @An(�)

@ = S�1n (�), @An(�)@� = S�1n (�)Wn,

@An(�)@�i

= 0 for i = 1; 2; � � � ; kx and @An(�)@� = S�1n (�)WnS

�1n (�)( In + �Wn). Because14 for each element of

�, @Ahn(�)@�i

= hAh�1n (�)@An(�)@�i

for h � 1,P1

h=1@Ah

n(�)@�i

=P1

h=1 hAh�1n (�)@An(�)

@�i. As (1)

P1h=0A

hn(�) andP1

h=1 hAh�1n (�) are uniformly bounded in either row sum or column sum, uniformly in a neighborhood of

�0, (2) S�1n (�) is uniformly bounded in both row and column sums, also uniformly in � in a neighborhood

of �0 and (3) Wn is uniformly bounded in both row and column sums, we have the result that the sequence

of @an(�)@�0 will be uniformly bounded in a neighborhood of �0. As ��nT converges in probability to �0, we

conclude that elements of @an(��nT )

@�0 are Op(1). �

E Direct Approach

E.1 Concentrated Likelihood Function and its FOC and SOC

The likelihood function for all the parameters including both individual and time dummies of the model

is (4.1) and we will concentrate out �T and cn. The �rst order condition for �t is

@ lnLd(�; cn;�T )

@�t=1

�2l0nVnt(�; cn;�T ),

and it implies

14This can be proved by mathematical induction. See footnote 9 in Yu, de Jong and Lee (2006).

35

�t(�; cn) =1

nl0n(Sn(�)Ynt � Znt� � cn).

It follows that

Vnt(�; cn; �T (�; cn)) = Jn [Sn(�)Ynt � Znt� � cn] ,

where Jn = In � 1n lnl

0n, which is idempotent with rank n� 1 and Jnln = 0. Hence, the likelihood function

with �T being concentrated out is

lnLdn;T (�; cn) = �nT

2ln 2� � nT


1

2�2

TXt=1

[Vnt(�; cn; �T (�; cn))]0[Vnt(�; cn; �T (�; cn))].

(E.1)

Compared to (2.10), (E.1) is the concentrated likelihood function for the true data generating process

where the time e¤ects are concentrated out; while (2.10) is the likelihood function for the transformed data.

For (E.1), the �rst order derivative of cn is@ lnLdnT (�;cn)

@cn= 1

�2

PTt=1 Vnt(�; cn; �T (�; cn)) and it implies

Jncn(�) = Jn�Sn(�) �YnT � �ZnT �

�where Znt = (Yn;t�1;WnYn;t�1; Xnt) and � = ( ; �; �

0)0. Therefore, the likelihood function with both cn and

�T concentrated out is

lnLdn;T (�) = �nT

2ln 2� � nT


1

2�2

TXt=1

~V 0nt(�)Jn ~Vnt(�). (E.2)

Our approach here is the same as the "di¤erence in di¤erence" one. Denote ~Q = InT � IT 1n lnl

0n �

1T lT l

0T In + 1

nT J = (IT � 1T lT l

0T ) (In � 1

n lnl0n) where J is an nT � nT matrix of ones, and denote

Z = (Z 0n1; Z0n2; � � � ; Z 0nT )0, then,

PTt=1

~Z 0ntJn ~Znt = Z0 ~QZ. This is so because

PTt=1

~Z 0ntJn~Znt =

h(Jn ~Zn1)

0; (Jn ~Zn2)0; � � � ; (Jn ~ZnT )0

i�h(Jn ~Zn1)

0; (Jn ~Zn2)0; � � � ; (Jn ~ZnT )0

i0where

Jn ~Znt = (In � 1n lnl

0n)�

2666640BBBB@z1t

z2t...

znt

1CCCCA� 1T

0BBBB@z11 + z12 + z1T

z21 + z22 + z2T...

zn1 + zn2 + znT

1CCCCA377775

=

0BBBB@z1t

z2t...

znt

1CCCCA � 1n lnl

0n

0BBBB@z1t

z2t...

znt

1CCCCA � 1T

0BBBB@z11 + z12 + z1T

z21 + z22 + z2T...

zn1 + zn2 + znT

1CCCCA + 1nT

0BBBB@Pn

i=1 (zi1 + zi2 + ziT )Pni=1 (zi1 + zi2 + ziT )

...Pni=1 (zi1 + zi2 + ziT )

1CCCCA. Hence,

denote ~Q = InT � IT 1n lnl

0n � 1

T lT l0T In + 1

nT J where J is nT � nT matrix of ones, and denote Z =(Z 0n1; Z

0n2; � � � ; Z 0nT )0, we have

PTt=1

~Z 0ntJn ~Znt = Z0 ~QZ.

For the concentrated likelihood function (E.2), the �rst order derivatives are

36

1pnT

@ lnLdn;T (�)

@�=

1pnT

0BB@@ lnLdn;T (�)

@�@ lnLdn;T (�)

@�@ lnLdn;T (�)

@�2

1CCA =

0BBBBBB@1�2

1pnT

TPt=1(Jn ~Znt)

0 ~Vnt(�)

1�2

1pnT

TPt=1

�(JnWn

~Ynt)0 ~Vnt(�)� �2trGn(�)

�12�4

1pnT

TPt=1( ~V 0nt(�)Jn

~Vnt(�)� n�2)

1CCCCCCA , (E.3)


1

nT

@2 lnLdn;T (�)

@�@�0(E.4)

= � 1

nT

0BBBBBB@1�2

TPt=1

~Z 0ntJn~Znt

1�2

TPt=1

~Z 0ntJnWn~Ynt

1�4

TPt=1

~Z 0ntJn~Vnt(�)

� 1�2

TPt=1

�(Wn

~Ynt)0JnWn

~Ynt + tr(G2n(�))

�1�4

TPt=1(Wn

~Ynt)0Jn ~Vnt(�)

� � � nT2�4 +

1�6

TPt=1


1CCCCCCA .

Similarly to that of 1pnT

@ lnL(u)n;T (�0)

@� , the variance matrix of 1pnT

@ lnLd(u)n;T (�0)

@� of (4.5) is equal to

E

1pnT

@ lnLd(u)n;T (�0)

@�� 1pnT

@ lnLd(u)n;T (�0)

@�0

!= �d�0;nT +

d�0;n +O

�T�1

�, (E.5)

and d�0;n =�4�3�40�40

0BBB@0 0 0

0 1n

nPi=1

G2n;ii1

2�20ntrGn

0 12�20n

trGn14�40

1CCCA is a symmetric matrix with �4 being the fourth mo-

ment of vit, where Gn;ii is the (i; i) entry of Gn. When Vnt are normally distributed, �0;n = 0 because

�4 � 3�40 = 0 for a normal distribution. Denote ��0 = limT!1�d�0;nT

and �0 = limT!1d�0;n

(note that

limT!1�d�0;nT

= limT!1��0;nT and limT!1d�0;n

= limT!1�0;n), then,

limT!1

E

1pnT

@ lnLd(u)n;T (�0)

@�� 1pnT

@ lnLd(u)n;T (�0)

@�0

!= ��0 +�0 . (E.6)

Claim E.1 Under Assumption 1�,2�6,7�,8,

1pnT

@ lnLdn;T (�0)

@�+�1;nT +�2;nT

d! N(0;��0 +�0). (E.7)

When fvitg, i = 1; 2; :::; n and t = 1; 2; :::; T; are normal, 1pnT

@ lnLdn;T (�0)

@� +�1;nT +�2;nTd! N(0;��0).


Claim E.2 Under Assumption 1�,2�6,7�,8, 1nT

@2 lnLdn;T (�)

@�@�0 � 1nT

@2 lnLdn;T (�0)

@�@�0 = k� � �0k �Op(1).Proof. Its proof is similar to that of Equation (C.5).

37

Claim E.3 Under Assumption 1�,2�6,7�,8, 1nT

@2 lnLdn;T (�0)

@�@�0 � 1nT E

@2 lnLdn;T (�0)

@�@�0 = Op

�1pnT

�.

Proof. Its proof is similar to that of Equation (C.6).

Using Claim E.1, Claim E.2 and Claim E.3, we have Theorem 4.2.

E.2 Proof of Claim E.1

From (4.4)-(4.7), the score is decomposed into 3 parts such that

1pnT

@ lnLdn;T (�0)

@�=

1pnT

@ lnLd(u)n;T (�0)

@��1;nT ��2;nT . (E.8)

For the �rst part, it�s a linear and quadratic form of Vnt and the asymptotic distribution can be de-

rived from the central limit theorem for martingale di¤erence arrays (Theorem B.3). For the term �1;nT ,

from Equation (B.3) in Lemma B.1 and Equation (B.5) in Theorem B.2, we have �1;nT =q

n�1T a�0;n;1 +

O(q

n�1T 3 )+Op(

1pT) where a�0;n;1 is O(1) and is speci�ed in Equation (4.8). Also, �2;nT is speci�ed in (4.9).

Combining above together, we get the result. �

E.3 Proof of Theorem 4.1

For the log likelihood function (4.1) divided by the sample size nT , we have corresponding Qdn;T (�) =

Emaxcn;�T

1nT lnLn;T (�; cn;�T ). Hence,

Qdn;T (�) =1

nTE lnLdn;T (�) = �

1

2ln 2� � 1

2ln�2 +

1

nln jSn(�)j �

1

2�2E1

nT

TXt=1

~V 0nt(�) ~Vnt(�). (E.9)

Similarly as the proof of Claim 3.1 in Appendix D.1, we can show the uniform convergence of 1nT E lnL

dn;T (�)�

Qdn;T (�) and that Qdn;T (�) is uniformly equicontinuous in � in any compact parameter space �. When

limT!1HdnT is nonsingular, we can show that �

d

nT is globally identi�ed and consistent similarly as proof of

Theorem 3.2 in Appendix D.3. When limT!1HdnT is singular, as long as

limn!1

�1

nln��20S0�1n S�1n

�� 1

nln��2n(�)S0�1n (�)S�1n (�)

�� 6= 0 for � 6= �0,�0 is still globally identi�ed and �

d

nTp! �0. The proof can be done similarly as Appendix D.4.

E.4 Proof of Theorem 4.2

The Taylor expansion gives

pnT (�

d

nT � �0) = � 1

nT

@2 lnLdn;T (��nT )

@�@�0

!�11pnT

@ lnLdn;T (�0)

@�

where ��nT lies between �0 and �d

nT and1pnT

@ lnLdn;T (�0)

@� = 1pnT

@ lnLd(u)n;T (�0)

@� ��1;nT ��2;nT .

38

As

� 1

nT


@�@�0=

� 1

nT


@�@�0� � 1

nT

@2 lnLdn;T (�0)

@�@�0

!!

+

� 1

nT

@2 lnLdn;T (�0)

@�@�0� �d�0;nT

!+�d�0;nT

where the �rst term is ��nT � �0 � Op(1) (Claim E.2) and the second term is Op

�1pnT

�(Claim E.3),

we have � 1nT


@�@�0 = ��nT � �0 � Op(1) + Op � 1p

nT

�+ �d�0;nT . Because (1)

��nT � �0 = op(1)

as �d

nT is consistent and (2) �d�0;nT is the nonsingular in the limit according to Assumption 8, we have

� 1nT


@�@�0 is invertible for large n and T and�� 1nT


@�@�0

��1is Op(1).

Also, we have 1pnT

@ lnLd(u)n;T (�0)

@�

d! N(0;��0 +�0) and �dnT = �1;nT +�2;nT where �1;nT and �2;nT are

speci�ed in (4.6) and (4.7). Then,pnT (�

d

nT � �0) = Op(1) ��Op(1) +Op

�pnT

�+O

�qTn

��, which implies

that

�d

nT � �0 = Op�max

�1

n;1

T

��. (E.10)

Given that �d�0;nT is nonsingular and its inverse is O(1), we have � 1

nT


@�@�0

!�1= (�d�0;nT )

�1 +Op

�max

�1

n;1

T

��. (E.11)

Hence,

pnT (�

d

nT � �0) =

� 1

nT


@�@�0

!�

1pnT

@ lnLd(u)n;T (�0)

@��1;nT ��2;nT

!

= (�d�0;nT )�1 � 1p

nT

@ lnLd(u)n;T (�0)

@�+Op

�max

�1

n;1

T

�� 1pnT

@ lnLd(u)n;T (�0)

@�

�(�d�0;nT )�1 � (�1;nT +�2;nT )�Op

�max

�1

n;1

T

�� (�1;nT +�2;nT ),

which implies that

pnT (�

d

nT � �0) + (�d�0;nT )�1 � (�1;nT +�2;nT ) +Op

�max

�1

n;1

T

��(�1;nT +�2;nT )

= ((�d�0;nT )�1 + op(1)) �

1pnT

@ lnLd(u)n;T (�0)

@�. (E.12)

As ��0 = limT!1�d�0;nT

exists, then using Claim E.1 and that �1;nT =p

nT an;�0;1 + O(

pnT 3 ) + Op(

1pT),

we have

39

pnT (�

d

nT � �0) +rn

T(�d�0;nT )

�1an;�0;1 +

rT

n(�d�0;nT )

�1a�0;2 (E.13)

+Op

max

rn

T 3;

rT

n3;

r1

T

!!(E.14)

d! N(0;��1�0 (��0 +�0)��1�0). �

E.5 Proof for Theorem 4.3

Theorem 4.2 states thatpnT (�

d

nT � �0) +p

nT b�0;nT;1 +

qTn b�0;nT;2 + Op

�max

�pnT 3 ;q

Tn3 ;q

1T

��d!

N(0;��1�0 (��0 +�0)��1�0). As the bias corrected estimator is

�d1

nT = �d

nT +1

T(� 1

nTE@2 lnLdnT (�

d

nT )

@�@�0)�1a1;n(�

d

nT ) +1

n(� 1

nTE@2 lnLdnT (�

d

nT )

@�@�0)�1a2(�

d

nT )

where a1;n(�) = a�;n;1 and a2(�) = a�;2, we will havepnT (�

d1

nT � �0)d! N(0;��1�0 (��0 +�0)�

�1�0) if

rn

T

0@ � 1

nTE@2 lnLdnT (�

d

nT )

@�@�0

!�1a1;n(�

d

nT )� (�d�0;nT )�1a1;n(�0)

1A p! 0, (E.15)

rT

n

0@ � 1

nTE@2 lnLdnT (�

d

nT )

@�@�0

!�1a2(�

d

nT )� (�d�0;nT )�1a2(�0)

1A p! 0, (E.16)

when nT 3 ! 0, T

n3 ! 0. Assuming that nT 3 ! 0 and T

n3 ! 0, we are going to prove Equation (E.15) and

(E.16).

From Equation (E.11) and that �d


�1T ;

1n

��from Equation (E.10),

� 1

nTE@2 lnLdnT (�nT )

@�@�0= (�d�0;nT )

�1 +Op

�max

�1

T;1

n

��.

Hence, rn

T

8<: � 1

nTE@2 lnLdnT (�nT )

@�@�0

!�1a1;n(�

d

nT )� (�d�0;nT )�1a1;n(�0)

9=;=

rn

T

��(�d�0;nT )

�1 +Op

�max

�1

T;1

n

��a1;n(�

d

nT )� (�d�0;nT )�1a1;n(�0)

�=

rn

T

n(�d�0;nT )

�1�a1;n(�

d

nT )� a1;n(�0)�o+

rn

Ta1;n(�

d

nT )�Op�max

�1

T;1

n

��.

As �d


�1T ;

1n

��and a1;n(�0) is O(1), according to Taylor expansion of a1;n(�

d

nT ) around

a1;n(�0), to prove Equation (E.15) is reduced to prove that the sequence of@a1;n(�)@�0 is bounded, uniformly

in a small neighborhood of �0, which can be proved by similar argument at the end of Appendix D.8.

40

For Equation (E.16), we haverT

n

8<: � 1

nTE@2 lnLdnT (�

d

nT )

@�@�0

!�1a2(�

d

nT )� (�d�0;nT )�1a2(�0)

9=;=

rT

n

��(�d�0;nT )

�1 +Op

�max

�1

T;1

n

��a2(�

d

nT )� (�d�0;nT )�1a2(�0)

�=

rT

n

n(�d�0;nT )

�1�a2(�

d

nT )� a2(�0)�o+

rT

na2(�

d

nT )�Op�max

�1

T;1

n

��.

As �d


�1T ;

1n

��and a2(�0) is O(1), according to Taylor expansion of a2;n(�

d

nT ) around

a2(�0), to prove Equation (E.16) is reduced to prove that elements of@a2(��nT )

@�0 <1 where ��nT lies between

�d

nT and �0 underTn3 ! 0. As we have the explicit form of @a2(

��nT )@�0 implied by (4.7), it is proved. �

References

[1] Amemiya, T. (1971), The Estimation of the Variances in a Variance-Components Model, International

Economic Review 12, 1-13.

[2] Baltagi, B., Song, S.H. and W. Kon (2003), Testing Panel Data Regression Models with Spatial Error

Correlation, Journal of Econometrics, Vol.117, 123-150.

[3] Hahn, J. and G. Kuersteiner (2002), Asymptotically Unbiased Inference for a Dynamic Panel Model

with Fixed E¤ects When Both n and T Are Large, Econometrica, Vol.70, No.4, 1639-1657.

[4] Hahn, J. and W. Newey (2004), Jackknife and Analytical Bias Reduction for Nonlinear Panel Models,

Econometrica, Vol.72, No.4, 1295-1319.

[5] Horn, R. and C. Johnson (1985), Matrix Algebra, Cambridge University Press.

[6] Kapoor, M., Kelejian, H.H. and I.R. Prucha (2004), Panel Data Models with Spatially Correlated Error

Components, forthcoming in Journal of Econometrics.

[7] Kelejian, H.H. and I.R. Prucha (1998), A Generalized Spatial Two-Stage Least Squares Procedure for

Estimating a Spatial Autoregressive Model with Autoregressive Disturbance, Journal of Real Estate

Finance and Economics, Vol. 17:1, 99-121.

[8] Kelejian H.H. and I.R. Prucha (2001), On the Asymptotic Distribution of the Moran I Test Statistic

With Applications, Journal of Econometrics, 104, 219-257.

[9] Nerlove, M. (1971), A Note on Error Components Models, Econometrica 39, 383-396

[10] Ord, J.K. (1975), Estimation Methods for Models of Spatial Interaction, Journal of the American

Statistical Association 70, 120-297.

41

[11] Phillips, P.C.B. and H.R. Moon (1999), Linear Regression Limit Theory for Nonstationary Panel Data,

Econometrica, Vol.67, No.5, 1057-1111.

[12] Rothenberg, T.J. (1971), Identi�cation in Parametric Models, Econometrica, Vol. 39, No.3, 577-591.

[13] Theil, H. (1971), Principles of Econometrics, New York: John Wiley & Sons.

[14] Wallace, T.D. and A. Hussain (1969), The Use of Error Components Models in Combining Cross-Section

and Time-Series Data, Econometrica 37, 55-72

[15] White, H. (1994), Estimation, Inference and Speci�cation Analysis, Cambridge University Press.

[16] Yu, J, R. de Jong and L-f.Lee (2006), Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel

Data With Fixed E¤ects When Both n and T Are Large, Working Paper, The Ohio State University.

42

Documents

A Spatial Dynamic Panel Data Model with Both Time and Individual Fixed E⁄ects · 2011. 12. 16. · A Spatial Dynamic Panel Data Model with Both Time and Individual Fixed E⁄ects