View
21
Download
0
Category
Preview:
Citation preview
Lecture 1: Introduction to Spatial Autoregressive (SAR) Models
1. Some SAR models:
Some specified models for cross sectional data which may capture possible spatial interac-
tions across spatial units.
The following models are generalizations of autoregressive processes and autoregression models
in time series.
1.1) (First order) spatial autoregressive (SAR) process:
yi = λwi,nYn + εi, i = 1, · · · , n,
where Yn = (y1, · · · , yn)′ is the column vector of dependent variables, wi,n is a n-dimensional row
vector of constants, and εi’s are i.i.d.(0, σ2). In the vector/matrix form,
Yn = λWnYn + En.
The term WnYn has been termed ‘spatial lag’. This is supposed to be an equilibrium model. Under
the assumption that Sn(λ) = In − λWn is nonsingular, one has
Yn = S−1n (λ)En.
This process model may not be not useful alone in empirical econometrics but it is used to
model possible spatial correlations in disturbances of a regression equation. The regression model
with SAR disturbance Un is specified as
Yn = Xnβ + Un, Un = ρWnUn + En,
where En has zero mean and variance σ2In. In this regression model, the disturbances εi’s
in En are correlated across units according to a SAR process. The variance matrix of Un is
σ20S
−1n (ρ0)S
′−1n (ρ0). As the diagonal elements of S−1
n (ρ0)S′−1n (ρ0) may not be zero, there are
correlations of disturbances ui’s across units.
1.2) Mixed regressive, spatial autoregressive model (MRSAR):
This model generalizes the SAR process by incorporating exogenous variables xi in the SAR
process. It has also simply been called the spatial autoregressive model. In vector/matrix form,
Yn = λWnYn + Xnβ + En,
1
where En is (0, σ2In). This model has the feature of a simultaneous equation model and its reduced
form is
Yn = S−1n (λ)Xnβ + S−1
n (λ)En.
1.3) Some Intuitions on Spatial Weights Matrix Wn
The matrix Wn is called the spatial weights matrix. The value wn,ij of the jth element of wi,n
represents the link (or distance) between the neighbor j to the spatial unit i. Usually, the diagonal
of Wn is specified to be zero, i.e., wn,ii = 0 for all i, because λwi,nYn represents the effect of other
spatial units on the spatial unit i.
In some empirical applications, it is a common practice to have Wn having a zero diagonal and
being row-normalized such that the summation of elements in each row of Wn is a unity. In some
applications, the ith row wi,n of Wn may be constructed as wi,n = (di1, di2, . . . , din)/∑n
j=1 dij ,
where dij ≥ 0, represents a function of the spatial distance of the ith and jth units, for example,
the inverse of the distance of the i and jth units, in some (characteristic) space. The weighting
operation may be interpreted as an average of neighboring values. However, in some cases, one may
argue that row-normalization for weights matrix may not be meaningful (e.g., Bell and Bochstael
(2000) RESTAT for real estate problems with microlevel data).
When neighbors are defined as adjacent ones for each unit, the correlation is local in the sense
that correlations across units will be stronger for neighbors but will be weak for units far away.
This becomes clear with an expansion of S−1n (ρ0). Suppose that ‖ ρWn ‖≤ 1 for some matrix norm
‖ ‖, then
S−1n (ρ) = In +
∞∑i=1
ρiW in,
(see, e.g., Horn and Johnson 1985). As ‖ ∑∞i=m ρiW i
n ‖≤ |ρWn|i ‖ S−1n (ρ) ‖, in particular, if Wn
is row normalized, ‖ ∑∞i=m ρiW i
n ‖∞≤ |∑∞i=m |ρ|i = |ρ|m
1−|ρ| , will become small when m becomes
larger. The Un can be represented as
Un = En + ρWnEn + ρ2W 2nEn + · · · ,
where ρWn may represent the influence of neighbors on each unit, ρ2W 2n represents the second
layer neighborhood influence, etc.
2
In the social science (or social interactions) literature, WnS−1n (ρ) is a vector of measures of
centrality, which summaries the position of each spatial unit (in a network).
In conventional spatial weights matrices, neighboring units are defined by only a few adjacent
ones. However, there are cases where ‘neighbors’ may consist of many units. An example is a
social interactions model, where ‘neighbors’ refer to individuals in a same group. The latter may
be regarded as models with large group interactions (a kind of models with social interactions).
For models with a large number of interactions for each unit, the spatial weights matrix Wn will
associate with the sample size. In a model, suppose there are R groups and there are m individuals
in each group. The sample size will be n = mR. In a model without special network (e.g., frindship)
information, one may assume that each individual in a group is given equal weight. In that case,
Wn = IR ⊗ Bm where Bm = (lml′m − Im)/(m − 1), ⊗ is the Kronecker product, and lm is the
m-dimensional vector of ones. Both m and R can be large in a sample. In which case, asymptotic
analysis for large sample will be relevant with both R and m go to infinity. Of course, in the
general case, the number of members in each district may be large but have different sizes. This
model has many interesting applications in the study of social interactions issues.
1.4) Other generalizations of those models can have high order spatial lags and/or SAR
disturbances.
A more rich SAR model may combine the MRSAR equation with SAR distrubances
Yn = λWnYn + Xnβ + Un, Un = ρMnUn + En, (1.5)
where Wn and Mn are spatial weights matrices, which may or may not be identical.
Further extension of a SAR model may allow high-order spatial lags as in
Yn =p∑
j=1
λjWjnYn + Xnβ + En,
where Wjn’s are p distinct spatial weights matrices.
2. Some matrix algebra
2.1) Vector and Matrix Norms
Definition. Let V be a vector space. A function ‖ · ‖: V −→ R is a vector norm if for any
x, y ∈ V ,
3
(1) nonnegative, ‖ x ‖≥ 0,
(1a) positive, ‖ x ‖= 0 if and only if x = 0,
(2) homogeneous, ‖ cx ‖= |c| ‖ x ‖ for any scalar c,
(3) triangle inequality, ‖ x + y ‖≤‖ x ‖ + ‖ y ‖.
Examples:
a) The Euclidean norm (or l2 norm) is
‖ x ‖2= (|x1|2 + · · ·+ |xn|2)1/2.
b) The sum norm (or l1 norm) is
‖ x ‖1= |x1| + · · ·+ |xn|.
c) The max norm (or l∞ norm) is
‖ x ‖∞= max{|x1|, · · · , |xn|}.
Definition. A function ‖ · ‖ of a square matrix is a matrix norm if, for any square matrices A
and B, it satisfies the following axioms:
(1) ‖ A ‖≥ 0,
(1a) ‖ A ‖= 0 if and only if A = 0,
(2) ‖ cA ‖= |c| ‖ A ‖ for any scalar c,
(3) ‖ A + B ‖≤‖ A ‖ + ‖ B ‖,(4) submultiplicative, ‖ AB ‖≤‖ A ‖ · ‖ B ‖.
Associated with each vector norm is a natural matrix norm induced by the vector norm.
Let ‖ · ‖ be a vector norm. Define ‖ A ‖= max‖x‖=1 ‖ Ax ‖= maxx‖Ax‖‖x‖ . This function ‖ · ‖
for A is a matrix norm and it has the properties that ‖ Ax ‖≤‖ A ‖ · ‖ x ‖ and ‖ I ‖= 1.
Examples:
a) The maximum column sum matrix norm ‖ · ‖1 is
‖ A ‖1= max1≤j≤n
n∑i=1
|aij |,
4
which is a matrix norm induced by the l1 vector norm.
b) The maximum row sum matrix norm ‖ · ‖∞ is
‖ A ‖∞= max1≤i≤n
n∑j=1
|aij |,
which is a matrix norm induced by the l∞ vector norm.
One important application of matrix norms is in giving bounds for the eigenvalues of a matrix.
Definition: The spectral radius ρ(A) of a matrix A is
ρ(A) = max{|λ| : λ is an eigenvalue of A}.
Let λ be an eigenvalue of A and x be the corresponding eigenvector, i.e., Ax = λx. Then,
|λ|· ‖ x ‖=‖ λx ‖=‖ Ax ‖≤‖ A ‖ · ‖ x ‖,
and, therefore, |λ| ≤‖ A ‖. That is, ρ(A) ≤‖ A ‖.Another useful application of the matrix norm is that, if ‖ · ‖ is a matrix norm, and if ‖ A ‖< 1,
then I −A is invertible and (I − A)−1 =∑∞
j=0 Aj .
When all of the elements of Wn are non-negative and its rows are normalized to one, ‖ Wn ‖∞=
1. For this case, when |λ| < 1,
(In − λWn)−1 =∞∑
j=0
(λWn)j = In + λWn + λ2W 2n + · · · .
2.2) Useful (important) regularity conditions on Wn and associated matrix:
Definition: Let {An} be a sequence of square matrices, where An is of dimension n.
(i) {An} is said to be uniformly bounded in row sums if {‖ An ‖1} is a bounded sequence;
(ii) {An} is said to be uniformly bounded in column sums if {‖ An ‖∞} is a bounded sequence.
A useful property is that if {An} and {Bn} are uniformly bounded in row sums (column
sums), then {AnBn} is also uniformly bounded in row sums (column sums). This follows from
the submultiplicative property of matrix norms. For example, ‖ AnBn ‖∞≤‖ An ‖∞‖ Bn ‖∞≤ c2
when ‖ An ‖∞≤ c and ‖ Bn ‖∞≤ c.
5
An Important Assumption: The spatial weights matrices {Wn} and {S−1n } are uniformly
bounded in both row and column sums.
Equivalently, this assumes that {‖ Wn ‖1} and {‖ Wn ‖∞} are bounded sequences. Similarly,
the maximum row and column sum norms of S−1n are bounded.
Rule out unit root process:
The assumption that S−1n are uniformly bounded in both row and column sums rules out the
unit root case (in time series). Consider the unit root process that yt = yt−1 + εt, t = 2, · · · , n and
y1 = ε1, i.e., ⎛⎜⎝
y1...
yT
⎞⎟⎠ =
⎛⎜⎜⎜⎜⎝
0 0 · · · 0 01 0 · · · 0 00 1 · · · 0 0...
......
...0 0 · · · 1 0
⎞⎟⎟⎟⎟⎠⎛⎜⎝
y1...
yT
⎞⎟⎠ +
⎛⎝ ε1
...εT
⎞⎠ .
It implies that
yt =t∑
j=1
εj .
That is, S−1n is a lower diagonal matrix with all nonzero elements being unity. The sum of its
first column is n, and the sum of its last row is also n. For the unit root process, S−1n are neither
bounded in row sums nor column sums.
3. Estimation Methods
Estimation methods, which have been considered in the existing literature, are mainly the ML
(or QML) method, the 2SLS (or IV) method, and the generalized method of moments (GMM).
The QML method has usually good finite sample properties relative to the other methods for the
estimation of SAR models with the first order of spatial lag. However, the ML method is not
computationally attractive for models with more than one single spatial lag. For the higher spatial
lags model, the IV and GMM methods are feasible. With properly designed moment equations, the
best GMM estimator exists and can be asymptotically efficient as the ML estimate under normal
disturbances (Lee 2007).
The most popular and traditional estimation method is the ML under the assumption that
En is N (0, σ2In). Other estimation methods are the method of moments (MOM), GMM and the
2SLS. The latter is applicable only to the MRSAR model.
6
3.1) ML for the SAR process,
ln Ln(λ, σ2) = −n
2ln(2π)− n
2lnσ2 + ln |Sn(λ)| − 1
2σ2Y ′
nS′n(λ)Sn(λ)Yn,
where Sn(λ) = In − λWn.
For the regression model with SAR disturbances, the log likelihood function is
lnLn(λ, σ2) = −n
2ln(2π)− n
2ln σ2 + ln |Sn(λ)| − 1
2σ2(Yn − Xnβ)′S′
n(λ)Sn(λ)(Yn − Xnβ).
The log likelihood function for the MRSAR model is
ln Ln(λ, β, σ2) = −n
2ln(2π)− n
2ln σ2 + ln |Sn(λ)| − 1
2σ2(YnSn(λ)− Xnβ)′(Sn(λ)Yn −Xnβ).
The likelihood function involves the computation of the determinant of Sn(λ), which is a
function of the unknown parameter λ, and may have a large dimension n.
A computationally tractable method is due to Ord 1975, where Wn is a row-normalized weights
matrix with Wn = DnW ∗n , where W ∗
n is a symmetric matrix and Dn = Diag{∑nj=1 w∗
n,ij}−1. Note
that, in this case, Wn is not a symmetric matrix. However, the eigenvalues of Wn are still all real.
This is so, because
|Wn − νIn| = |DnW ∗n − νIn| = |DnW ∗
nD1/2n D−1/2
n − νD1/2n D−1/2
n |
= |D 12n | · |D
12nW ∗
nD12n − νIn| · |D− 1
2n | = |D1/2
n W ∗nD1/2
n − νIn|,
the eigenvalues of Wn are the same as those of D1/2n W ∗
nD1/2n , which is a symmetric matrix. As
the eigenvalues of a symmetric matrix are real, the eigenvalues of Wn are real. Let μi’s be the
eigenvalues of Wn, which are the same as those of D1/2n W ∗
nD1/2n . Let Γ be the orthogonal matrix
such that D1/2n W ∗
nD1/2n = ΓDiag{μi}Γ′. The above relations show also that
|In − λWn| = |In − λDnW ∗n | = |In − λD1/2
n W ∗nD1/2
n | = |In − λΓDiag{μi}Γ′|
= |In − λDiag{μi}| =n∏
i=1
(1− λμi).
Thus, |In − λWn| can be easily updated during iterations within a maximization subroutine, as
μi’s need be computed only once.
When Wn is a spare matrix, specific numerical methods for handling spare matrices are useful.
7
Another computational tractable method is the characteristic polynomial. The determinant
|Wn −μIn| is a polynomial in μ and is called the characteristic polynomial of Wn. (note: the zeros
of the characteristic polynomial are the eigenvalues of Wn). Thus,
|In − λWn| = anλn + · · ·+ a1λ + a0,
the constants a’s depend only on Wn. So the a’s can be computed once during the maximization
algorithm.
3.2) 2SLS Estimation for the MRSAR model
For the MRSAR Yn = λWnYn + Xnβ + En, the spatial lag WnYn can be correlated with the
disturbance vector En. So, in general, OLS may not be a consistent estimator. However, there is a
class of spatial Wn (with large group interaction) that the OLS estimator can be consistent (Lee
2002).
To avoid the bias due to the correlation of WnYn with En, Kelejian and Prucha (1998) has
suggested the use of instrumental variables (IVs). Let Qn be a matrix of instrumental variables.
Denote Zn = (WnYn, Xn) and θ = (λ, β′)′. The MRSAR equation can be rewritten as Yn =
Znθ + En. The 2SLS estimator of θ with Qn is
θ2sl,n = [Z′nQn(Q′
nQn)−1Q′nZn]−1Z′
nQn(Q′nQn)−1Q′
nYn.
It can be shown that the asymptotic distribution of θ2sl,n follows from
√n(θ2sl,n − θ0)
d→ N (0, σ20 lim
n→∞{ 1n
(GnXnβ0, Xn)′Qn(Q′nQn)−1Q′
n(GnXnβ0, Xn)}−1),
where Gn = WnS−1n , under the assumption that the limiting matrix 1
n(GnXnβ0, Xn) has the full
column rank (k + 1) where k is the dimension of β or, equivalently, the number of columns of
Xn. In practice, Kelejian and Prucha (1998) suggest the use of linearly independent variables in
(Xn, WnXn) for the construction of Qn.
By the Schwartz inequality, the optimum IV matrix is (GnXnβ0, Xn).
This 2SLS method can not be used for the estimation of the (pure) SAR process. The SAR
process is a special case of the MRSAR model with β0 = 0. However, when β0 = 0, GnXnβ0 = 0
8
and, hence, (GnXnβ0, Xn) = (0, Xn) would have rank k but not full rank (k+1). Intuitively, when
β0 = 0, Xn does not appear in the model and there is no other IV available.
3.3) Method of Moments (MOM) for SAR process:
Kelejian and Prucha (1999) has suggested a MOM estimation:
minθ
g′n(θ)gn(θ).
The moment equations are based on three moments:
E(E ′nEn) = nσ2, E(E ′
nW ′nWnEn) = σ2tr(W ′
nWn), and E(E ′nWnEn) = 0.
These correspond to
gn(θ) = (Y ′nS′
n(λ)Sn(λ)Yn − nσ2, Y ′nS′
n(λ)W ′nWnSn(λ)Yn − σ2tr(W ′
nWn), Y ′nS′
n(λ)WnSn(λ)Yn)′
for the SAR process. For the regression model with SAR disturbances, Yn shall be replaced by
least squares residuals.
Remark: In Kelejian and Prucha (1999), they have shown consistency of the MOM estimator
only but not asymptotic normality of the estimator.
3.4) Moments for GMM estimation
For the MRSAR model, in addition to the moments based on the IV matrix Qn, other moment
equations can also be constructed for estimation.
Let Qn be an n×kx IV matrix constructed as functions of Xn and Wn. Let εn(θ) = Sn(λ)Yn−Xnβ for any possible value θ. The kx moment functions correspond to the orthogonality conditions
are Q′nεn(θ).
Now consider a finite number, say m, of n × n constant matrices P1n, · · · , Pmn of which each
has a zero diagonal. The moment functions (Pjnεn(θ))′εn(θ) can be used in addition to Q′nεn(θ).
These moment functions form a vector
gn(θ) = (P1nεn(θ), · · · , Pmnεn(θ), Qn)′εn(θ) =
⎛⎜⎜⎝
ε′n(θ)P1nεn(θ)...
ε′n(θ)Pmnεn(θ)Q′
nεn(θ)
⎞⎟⎟⎠
9
in the estimation in the GMM framework.
Proposition. For any constant n × n matrix Pn with tr(Pn) = 0, Pnεn is uncorrelated with
εn, i.e., E((Pnεn)′εn) = 0.
Proof: E((Pnεn)′εn) = E(ε′nP ′nεn) = E(ε′nPnεn) = σ2
0tr(Pn) = 0.
This shows, in particular, that E(gn(θ0)) = 0. Thus, gn(θ) are valid moment equations for
estimation.
INTUITION:
As WnYn = GnXnβ0 + Gnεn, where Gn = WnS−1n and Sn = Sn(λ0), and Gnεn is correlated
with the disturbance εn in the model,
Yn = λWnYn + Xnβ + εn,
any Pjnεn, which is uncorrelated with εn, can be used as IV for WnYn as long as Pjnεn and Gnεn
are correlated.
4. Basic Asymptotic Foundation
Some laws of large numbers and central limit theorem
Lemma. Suppose that {An} are uniformly bounded in both row and column sums. Then,
E(ε′nAnεn) = O(n) and var(ε′nAnεn) = O(n). Furthermore, ε′nAnεn = OP (n) and, 1nε′nAnεn −
1nE(ε′nAnεn) = oP (1).
Lemma. Suppose that An is a n × n matrix with its column sums being uniformly bounded,
elements of the n× k matrix Cn are uniformly bounded, and εn1, · · · , εnn are i.i.d. with zero mean,
finite variance σ2 and E|ε|3 < ∞. Then, 1√nC ′
nAnεn = OP (1) and 1nC ′
nAnεn = op(1). Further-
more, if the limit of 1nC ′
nAnA′nCn exists and is positive definite, then
1√n
C ′nAnεn
D→ N (0, σ2 limn→∞
1n
C ′nAnA′
nCn).
This lemma can be shown with the Liapounov central limit theorem.
Theorem: (Liapounov’s theorem) Let {Xn} be a sequence of independent random variables. Let
E(Xn) = μn, E(Xn−μn)2 = σ2n = 0. Denote Cn = (
∑ni=1 σ2
i )1/2. If 1
C2+δn
∑nk=1 E|Xk−μk|2+δ → 0
for a positive δ > 0, then∑n
i=1(Xi−μi)
Cn
d→ N (0, 1).
10
Lemma. Suppose that An is a constant n × n matrix uniformly bounded in both row and
column sums. Let cn be a column vector of constants. If 1nc′ncn = o(1), then 1√
nc′nAnεn = oP (1).
On the other hand, if 1nc′ncn = O(1), then 1√
nc′nAnεn = OP (1).
Proof: The results follow from Chebyshev’s inequality by investigating var(√
1nc′nAnεn) =
σ20
1nc′nAnA′
ncn. Let Λn be the diagonal matrix of eigenvalues of AnA′n and Γn be the orthonormal
matrix of eigenvectors. As eigenvalues in absolute values are bounded by any norm of the matrix,
eigenvalues in Λn in absolute value are uniformly bounded because ‖ An ‖∞ (and ‖ An ‖1) are
uniformly bounded. Hence, 1nc′nAnA′
ncn ≤ 1nc′nΓnΓ′
ncn · |λn,max| = 1nc′ncn|λn,max| = o(1), where
λn,max is the eigenvalue of AnA′n with the largest absolute value.
When 1nc′ncn = O(1), 1
nc′nAnA′
ncn ≤ 1nc′ncn|λn,max| = O(1). In this case, var(
√1nc′nAnεn) =
σ2 1nc′nAnA′
ncn = O(1). Therefore,√
1nc′nAnεn = OP (1). Q.E.D.
A Martingale Central Limit Theorem:
Definition: Let {zt} be a sequence of random scalars, and let {Jt} be a sequence of σ-fields
such that Jt−1 ⊆ Jt for all t. If zt is measurable with respect to Jt, then {zt,Jt} is called an
adapted stochastic sequence.
The σ-field Jt can be thought of as the σ-algebra generated by the entire current and past
history of zt and some other random variables.
Definition: Let {yt,Jt} be an adapted stochastic sequence. Then {yt,Jt} is a martingale
difference sequence if and only if E(yt|Jt−1) = 0, for all t ≥ 2.
Theorem (A Martingale Central Limit Theorem):
Let {ξT t} be a martingale-difference array. If as T → ∞,
(i)∑T
t=1 vT tp→ σ2, where σ2 > 0 and vT t = E(ξ2
T t|JT,t−1) denotes the conditional variances,
(ii) for any ε > 0,∑T
t=1 E(ξ2T tI(|ξT t| > ε)
∣∣JT,t−1) converges in probability to zero (a Lindeberg
condition),
then ξT1 + · · ·+ ξT TD→ N (0, σ2).
A CLT for quadratic-linear function of εn:
Suppose that {An} is a sequence of symmetric n × n matrices with row and column sums
uniformly bounded and bn is a n-dimensional column vector satisfying supn1n
∑ni=1 |bni|2+η < ∞
11
for some η > 0. The εn1, · · · , εnn are i.i.d. random variables with zero mean and finite variance
σ2, and its moment E(|ε|4+δ) for some δ > 0 exists. Let σ2Qn
be the variance of Qn where Qn =
ε′nAnεn + b′nεn − σ2tr(An). Assume that the variance σ2Qn
is bounded away from zero at the rate
n. Then, Qn
σQn
D−→ N (0, 1).
This theorem has been proved by Kelejian and Prucha (2001) with a martingale central limit
theorem.
5. The Cliff and Ord (1973) Test for Spatial Correlation
Consider the model
Y = Xβ + U, U = ρWU + ε,
where Y is an n-dimensional column vector of the dependent variable, X is an n × k matrix of
regressors, U is an n × 1 vector of disturbances, W is a n × n spatial weights matrix with a zero
diagonal, and ε is N (0, σ2In). One may want to test H0 : ρ = 0, i.e., the test of spatial correlation
in U .
The log likelihood of this model is
ln L(θ) = −n
2ln(2πσ2) + ln |I − ρW | − 1
2σ2(y − Xβ)′(I − ρW )′(I − ρW )(y − Xβ).
The first-order conditions are
∂ ln L
∂β′ =1σ2
X ′(I − ρW )′(I − ρW )(y −Xβ),
∂ lnL
∂σ2= − n
2σ2+
12σ4
(y − Xβ)′(I − ρW )′(I − ρW )(y −Xβ),
∂ lnL
∂ρ= −tr(Wn(I − ρW )−1) +
12σ2
(y − Xβ)′[(I − ρW )′W + W ′(I − ρW )](y −Xβ),
where ∂ ln |I−ρW |∂ρ
= −tr(W (I − ρW )−1) by using the matrix differentiation formulae ddξ
ln |A| =
tr(A−1 dAdξ
) (see, e.g., Amemiya (1985) Advanced Econometrics, p461).
The second order derivatives with respect to ρ have
∂2 ln L
∂β′∂ρ= − 1
σ2X ′[W ′(I − ρW ) + (I − ρW )′W ](y − Xβ),
∂2 ln L
∂σ2∂ρ= − 1
2σ4(y −Xβ)′[W ′(I − ρW ) + (I − ρW )′W ](y − Xβ),
12
and∂2 lnL
∂ρ2= −tr([W (I − ρW )−1]2) − 1
σ2(y − Xβ)′W ′W (y − Xβ),
where ∂(I−ρW )−1
∂ρ= (I−ρW )−1W (I−ρW )−1, by using the matrix differentiation formulae d
dξA−1 =
−A−1 dAdξ A−1. We see that EH0(
∂2 ln L∂β∂ρ ) = 0, EH0(
∂2 ln L∂σ∂ρ ) = 0 because E(ε′Wε) = σ2tr(W ) = 0 as
W has a zero diagonal. Furthermore,
EH0(∂2 lnL
∂ρ2) = −tr(W 2 + W ′W ).
Because of the block diagonal structure of the information matrix EH0(∂2 ln L∂θ∂θ′ ), an efficient score
test statistic for testing H0 can have a simple expression. The efficient score test is also known as
a Lagrange multiplier (LM) test.
The efficient score test (Rao) statistic in its general form can be
∂ ln L(θ)∂θ′
[−E(∂2 ln L(θ)
∂θ∂θ′)]−1∂ ln L(θ)
∂θ
where θ is the constrained MLE of θ under H0.
Let β and σ2 be the MLE under H0 : ρ = 0, i.e., β is the usual OLS estimate and σ2 = 1ne′e,
where e = y − Xβ the least squares residual, is the average sum of squared residuals. Then
θ = (β′, σ2, 0)′ and
∂ ln L(θ)∂ρ
= −tr(W ) +1
2σ2(y − Xβ)′(W + W ′)(y − Xβ) =
12σ2
(y − Xβ)′(W + W ′)(y − Xβ),
because tr(W ) = 0. Hence, the LM test statistic is
ne′(W + W ′)e/(2e′e)/(tr(W 2 + W ′W ))12 ,
which shall be asymptotically normal N (0, 1).
The test statistic given by Moran (1950) and Cliff and Ord (1973) is
I = e′We/e′e,
which is the the LM statistic with an unscaled denominator. The LM interpretation of the Cliff
and Ord (1973) test of spatial correlation is due to Burridge (1980), J.R.Statist.Soc.B, vol 42,
pp.107-108.
13
Burridge (1980) has also pointed out the LM test for a moving average alternative, i.e., U =
ε + ρWε, is exactly the same as that given above.
Note that for the regression model with SAR disturbances, y = Xβ + u with u = ρWu +
ε, 1√nu′Wu and 1√
nu′Wu, where u is the least squares residual vector, can be asymptotically
equivalent(under both H0 and H1, i.e., 1√nu′Wu = 1√
nu′Wu + op(1). This can be shown as
follows. As u = [I −X(X ′X)−1X ′]u,
u′Wu = u′[I −X(X ′X)−1X ′]W [I −X(X ′X)−1X ′]u
= u′Wu − u′(W + W ′)X(X ′X)−1X ′u + u′X(X ′X)−1X ′WX(X ′X)−1X ′u.
Because u = (I − ρW )−1ε, under the assumptions that elements of X are uniformly bounded and
(I − ρW )−1 are uniformly bounded in both row and column sums, 1√nX ′u = 1√
nX ′(I − ρW )−1ε =
Op(1) and 1√nX ′Wu = 1√
nX ′W (I−ρW )−1ε = Op(1) by the CLT for the linear form. These imply,
in particular, 1nX ′u = op(1) and 1
nX ′Wu = op(1). Hence,
1√n
u′Wu
=u′Wu√
n− u′X
n(X ′X
n)−1 1√
nX ′(W + W ′)u +
u′Xn
(X ′X
n)−1 X ′WX
n(X ′X
n)−1X ′u√
n
=1√n
u′Wu + oP (1).
Consequently, the Cliff-Ord test using the least squares residual is asymptotically χ2(1) under H0.
6. More on GMM and 2SLS Estimation of MRSAR Models
6.1) IVs and Quadratic Moments
The MRSAR model is
Yn = λWnYn + Xnβ + εn,
where Xn is a n × k dimensional matrix of nonstochastic exogenous variables, Wn is a spatial
weights matrix of known constants with a zero diagonal.
We assume that
Assumption 1. The εni’s are i.i.d. with zero mean, variance σ2 and that a moment of order
higher than the fourth exists.
Assumption 2. The elements of Xn are uniformly bounded constants, Xn has the full rank
k, and limn→∞ 1nX ′
nXn exists and is nonsingular.
14
Assumption 3. The spatial weights matrices {Wn} and {(In − λWn)−1} at λ = λ0 are
uniformly bounded in both row and column sums in absolute value.
Let Qn be an n × kx matrix of IVs constructed as functions of Xn and Wn in a 2SLS
approach. Denote εn(θ) = (In − λWn)Yn − Xnβ for any possible value θ. Let P1n = {P :
P is n × n matrix , tr(P ) = 0} be the class of constant n × n matrices which have a zero trace. A
subclass P2n of P1n is P2n = {P : P ∈ P1n, diag(P ) = 0}.Assumption 4. The matrices Pjns from P1n are uniformly bounded in both row and column
sums in absolute value, and elements of Qn are uniformly bounded.
With the selected matrices Pjns and IV matrices Qn, the set of moment functions forms a vector
gn(θ) = (P1nεn(θ), · · · , Pmnεn(θ), Qn)′εn(θ) = (ε′n(θ)P1nεn(θ), · · · , ε′n(θ)Pmnεn(θ), ε′n(θ)Qn)′
for the GMM estimation.
At θ0, gn(θ0) = (ε′nP1nεn, · · · , ε′nPmnεn, ε′nQn)′. It follows that E(gn(θ0)) = 0 because
E(Q′nεn) = Q′
nE(εn) = 0 and E(ε′nPjnεn) = σ20tr(Pjn) = 0 for j = 1, · · · , m. The intuition is
as follows:
(1) The IV variables in Qn, can be used as IV variables for WnYn and Xn.
(2) The Pjnεn is uncorrelated with εn. As WnYn = Gn(Xnβ0 + εn), Pjnεn can be used as an IV
for WnYn as long as Pjnεn and Gnεn are correlated, where Gn = WnS−1n .
6.2) Identification
In the GMM framework, the identification condition requires the unique solution of the limiting
equations, limn→∞ 1nE(gn(θ)) = 0 at θ0 (Hansen 1982). The moment equations corresponding to
Qn are
limn→∞
1n
Q′ndn(θ) = lim
n→∞1n
(Q′nXn, Q′
nGnXnβ0)((β0 − β)′, λ0 − λ)′ = 0.
They will have a unique solution at θ0 if (Q′nXn, Q′
nGnXnβ0) has a full column rank, i.e., rank
(k + 1), for large enough n.
This sufficient rank condition implies the necessary rank condition that (Xn, GnXnβ0) has a
full column rank (k +1) and that Qn has a rank at least (k +1), for large enough n. The sufficient
rank condition requires Qn to be correlated with WnYn in the limit as n goes to infinity. This is so
15
because E(Q′nWnYn) = Q′
nGnXnβ0. Under the sufficient rank condition, θ0 can thus be identified
via limn→∞ 1nQ′
ndn(θ) = 0.
The necessary full rank condition of (Xn, GnXnβ0) for large n is possible only if the set
consisting of GnXnβ0 and Xn is not asymptotically linearly dependent. This rank condition would
not hold, in particular, if β0 were zero. There are other cases when this dependence can occur (see,
e.g., Kelejian and Prucha 1998).
As Xn has rank k but (Xn, GnXnβ0) does not have a full rank (k + 1), the corresponding
moment equations Q′ndn(θ) = 0 will have many solutions but the solutions are all described by the
relation that β = β0 + (λ0 − λ)c as long as Q′nXn has a full rank k. Under this scenario, β0 can
be identified only if λ0 is identifiable. The identification of λ0 will rely on the remaining quadratic
moment equations. In this case,
Yn = λ0(GnXnβ0) + Xnβ0 + S−1n εn = Xn(β0 + λ0c) + un,
where un = S−1n εn. The relationship un = λ0Wnun + εn is a SAR process. The identification of
λ0 thus comes from the SAR process un.
The following assumption summarizes some sufficient conditions for the identification of θ0
from the moment equations limn→∞ 1nE(gn(θ)) = 0.
Assumption 5. Either (i) limn→∞ 1nQ′
n(GnXnβ0, Xn) has the full rank (k + 1), or (ii)
limn→∞ 1nQ′
nXn has the full rank k, limn→∞ 1ntr(P s
jnGn) = 0 for some j,
and limn→∞ 1n [tr(P s
1nGn), · · · , tr(P smnGn)]′ is linearly independent of
limn→∞
1n
[tr(G′nP1nGn), · · · , tr(G′
nPmnGn)]′.
5.3) Optimum GMM
The variance matrix of these moments functions involves variances and covariances of linear
and quadratic forms of εn. For any square n × n matrix A, let vecD(A) = (a11, · · · , ann)′ denote
the column vector formed with the diagonal elements of A. Then,
E(Q′nεn · ε′nPnεn) = Q′
n
n∑i=1
n∑j=1
pn,ijE(εnεniεnj) = Q′nvecD(Pn)μ3
16
and
E(ε′nPjnεn · ε′nPlnεn) = (μ4 − 3σ40)vec′D(Pjn)vecD(Pln) + σ4
0tr(PjnP sln).
It follows that var(gn(θ0)) = Ωn where
Ωn =(
(μ4 − 3σ40)ω
′nmωnm μ3ω
′nmQn
μ3Q′nωnm 0
)+ Vn,
with ωnm = [vecD(P1n), · · · , vecD(Pmn)] and
Vn = σ40
⎛⎜⎜⎜⎝
tr(P1nP s1n) · · · tr(P1nP s
mn) 0...
......
tr(PmnP s1n) · · · tr(PmnP s
mn) 00 · · · 0 1
σ20Q′
nQn
⎞⎟⎟⎟⎠ = σ4
0
(Δmn 0
0 1σ20Q′
nQn
),
where Δmn = [vec(P ′1n), · · · , vec(P ′
mn)]′[vec(P s1n), · · · , vec(P s
mn)], by tr(AB) = vec(A′)′vec(B) for
any conformable matrices A and B.
When εn is normally distributed, Ωn is simplified to Vn because μ3 = 0 and μ4 = 3σ40 . If
Pjn’s are from P2n, Ωn = Vn also because ωnm = 0. In general, Ωn is nonsingular if and only if
both (vec(P1n), · · · , vec(Pmn)) and Qn have full column ranks. This is so, because Ωn would be
singular if and only if the moments in gn(θ0) are functionally dependent, equivalently, if and only
if∑m
j=1 ajPjn = 0 and Qnb = 0 for some constant vector (a1, · · · , am, b′) = 0.
Let an converge to a constant full rank matrix a0.
Proposition 1. Under Assumptions 1-5, suppose that Pjn for j = 1, · · · , m, are from P1n
and Qn is a n × kx IV matrix so that a0 limn→∞ 1nE(gn(θ)) = 0 has a unique root at θ0 in Θ.
Then, the GMME θn derived from minθ∈Θ g′n(θ)a′
nangn(θ) is a consistent estimator of θ0, and√
n(θn − θ0)D→ N (0, Σ), where
Σ = limn→∞
[(1n
D′n)a′
nan(1n
Dn)]−1
(1n
D′n)a′
nan(1n
Ωn)a′nan(
1n
Dn)[(1n
D′n)a′
nan(1n
Dn)]−1
,
and
Dn =(
σ20tr(P
s1nGn) · · · σ2
0tr(PsmnGn) (GnXnβ0)′Qn
0 · · · 0 X ′nQn
)′,
under the assumption that limn→∞ 1nanDn exists and has the full rank (k + 1).
Proposition 2. Under Assumptions 1-6, suppose that ( Ωn
n )−1 − (Ωn
n )−1 = oP (1), then the
feasible OGMME θo,n derived from minθ∈Θ g′n(θ)Ω−1
n gn(θ) based on gn(θ) with Pjn’s from P1n has
17
the asymptotic distribution
√n(θo,n − θ0)
D→ N (0, ( limn→∞
1n
D′nΩ−1
n Dn)−1).
Furthermore,
g′n(θo,n)Ω−1
n gn(θo,n) D→ χ2((m + kx) − (k + 1)).
5.4) Efficiency and the BGMME
The optimal GMME θo,n can be compared with the 2SLSE. With Qn as the IV matrix, the
2SLSE of θ0 is
θ2sl,n = [Z′nQn(Q′
nQn)−1Q′nZn]−1Z′
nQn(Q′nQn)−1Q′
nYn, (4.1)
where Zn = (WnYn, Xn). The asymptotic distribution of θ2sl,n is
√n(θ2sl,n − θ0)
D→
N (0, σ20 lim
n→∞{ 1n
(GnXnβ0, Xn)′Qn(Q′nQn)−1Q′
n(GnXnβ0, Xn)}−1),(4.2)
under the assumptions that limn→∞ 1nQ′
nQn is nonsingular and limn→∞ 1nQn(GnXnβ0, Xn) has
the full column rank (k +1) (Kelejian and Prucha 1998). Because the 2SLSE can be a special case
of the GMM estimation, by Proposition 2, θo,n shall be efficient relative to θ2sl,n. The best Qn is
Q∗n = (GnXnβ0, Xn) by the Schwartz inequality.
The remaining issue is on the best selection of Pjn’s. When the disturbance εn is normally
distributed or Pjn’s are from P2n, with Q∗n for Qn,
D′nΩ−1
n Dn =(
CmnΔ−1mnC ′
mn 00 0
)+
1σ2
0
(GnXnβ0, Xn)′(GnXnβ0, Xn),
where Cmn = [tr(P s1nGn), · · · , tr(P s
mnGn)]. Note that, because tr(PjnP sln) = 1
2tr(P s
jnP sln), Δmn
can be rewritten as
Δmn =12
⎛⎜⎝
tr(P s1nP s
1n) · · · tr(P s1nP s
mn)...
...tr(P s
mnP s1n) · · · tr(P s
mnP smn)
⎞⎟⎠ =
12[vec(P s
1n) · · ·vec(P smn)]′[vec(P s
1n) · · ·vec(P smn)].
(i) When Pjns are from P2n,
tr(P sjnGn) =
12tr(P s
jn[Gn −Diag(Gn)]s)
=12vec′
([Gn − Diag(Gn)]s
)vec(P s
jn)
18
for j = 1, · · · , m, in Cmn, where Diag(A) denotes the diagonal matrix formed by the diagonal
elements of a square matrix A. Therefore, the generalized Schwartz inequality implies that
CmnΔ−1mnC ′
mn ≤ 12vec′
([Gn − Diag(Gn)]s
)· vec
([Gn − Diag(Gn)]s
)= tr
([Gn − Diag(Gn)]sGn
).
Thus, in the subclass P2n, [Gn − Diag(Gn)] and together with [GnXnβ0, Xn] provide the set of
best IV functions.
(ii) For the case where εn is N (0, σ20In), because, for any Pjn ∈ P1n,
tr(P sjnGn) =
12vec′
([Gn − tr(Gn)
nIn]s
)vec(P s
jn),
for j = 1, · · · , m, the generalized Schwartz inequality implies that
CmnΔ−1mnC ′
mn ≤ tr([Gn − tr(Gn)
nIn]sGn
).
Hence, in the broader class P1n, [Gn − tr(Gn)n
In] and [GnXnβ0, Xn] provide the best set of IV
functions.
Proposition 3. Under Assumptions 1-3, suppose that λn is a√
n-consistent estimate of λ0,
βn is a consistent estimate of β0, and σ2n is a consistent estimate of σ2
0.
Within the class of GMME’s derived with P2n, the BGMME θ2b,n has the limiting distribution
that√
n(θ2b,n − θ0)D→ N (0, Σ−1
2b ) where
Σ2b = limn→∞
1n
(tr[(Gn − Diag(Gn))sGn] + 1
σ20(GnXnβ0)′(GnXnβ0) 1
σ20(GnXnβ0)′Xn
1σ20X ′
n(GnXnβ0) 1σ20X ′
nXn
),
which is assumed to exist.
In the event that εn ∼ N (0, σ20In), within the broader class of GMME’s derived with P1n, the
BGMME θ1b,n has the limiting distribution that√
n(θ1b,n − θ0)D→ N (0, Σ−1
1b ) where
Σ1b = limn→∞
1n
(tr[(Gn − tr(Gn)
nIn)sGn] + 1
σ20(GnXnβ0)′(GnXnβ0) 1
σ20(GnXnβ0)′Xn
1σ20X ′
n(GnXnβ0) 1σ20X ′
nXn
),
which is assumed to exist.
5.5) Links between BGMME and MLE
19
When εn is N (0, σ20In), the model can be estimated by the ML method. The log likelihood
function of the MRSAR model via its reduced form equation is
lnLn = −n
2ln(2π)− n
2ln σ2 + ln |(In − λWn)|
− 12σ2
[Yn − (In − λWn)−1Xnβ]′(In − λW ′n)(In − λWn)[Yn − (In − λWn)−1Xnβ].
(4.6)
The asymptotic variance of the MLE (θml,n, σ2ml,n) is
AsyVar(θml,n, σ2ml,n)
=
⎛⎜⎜⎝
tr(G2n) + tr(G′
nGn) + 1σ20(GnXnβ0)′(GnXnβ0) 1
σ20(X ′
nGnXnβ0)′tr(Gn)
σ20
1σ20X ′
nGnXnβ01
σ20X ′
nXn 0tr(Gn)
σ20
0 n2σ4
0
⎞⎟⎟⎠
−1
(see, e.g., Anselin and Bera (1998), p.256). From the inverse of a partitioned matrix, the asymptotic
variance of the MLE θml,n is
AsyV ar(θml,n)
=
(tr(G2
n) + tr(G′nGn) + 1
σ20(GnXnβ0)′(GnXnβ0) − 2
ntr2(Gn) 1
σ20(X ′
nGnXnβ0)′1
σ20X ′
nGnXnβ01
σ20X ′
nXn
)−1.
As tr(G2n) + tr(G′
nGn) − 2ntr2(Gn) = tr((Gn − tr(Gn)
nIn)sGn), the GMME θ1b,n has the same
limiting distribution as the MLE of θ0 from Proposition 3.
There is an intuition on the best GMM approach compared with the maximum likelihood one.
The derivatives of the log likelihood in (4.6) are
∂ ln Ln
∂β=
1σ2
X ′nεn(θ),
∂ ln Ln
∂σ2= − n
2σ2+
12σ4
ε′n(θ)εn(θ),
and∂ lnLn
∂λ= −tr(Wn(In − λWn)−1) +
1σ2
[Wn(In − λWn)−1Xnβ]′εn(θ)
+1σ2
ε′n(θ)[Wn(In − λWn)−1]′εn(θ).
The equation ∂ ln Ln
∂σ2 = 0 implies that the MLE is σ2n(θ) = 1
nε′n(θ)εn(θ) for a given value θ. By
substituting σ2n(θ) into the remaining likelihood equations, the MLE θml,n will be characterized by
the moment equations: X ′nεn(θ) = 0, and
[Wn(In − λWn)−1Xnβ]′εn(θ) + ε′n(θ)[Wn(In − λWn)−1 − 1
ntr(Wn(In − λWn)−1)
]εn(θ) = 0.
20
The similarity of the best GMM moments and the above likelihood equations is revealing. The
best GMM approach has the linear and quadratic moments of εn(θ) in its formation and uses
consistently estimated matrices in its linear and quadratic form.
5.6) More recent results
1) generalization to the estimation of higher order autoregressive models
2) robust GMM estimation in the presence of unknown heteroskedasticity
3) improve GMM estimators when disturbances are non-normal.
21
Lecture 2: Econometric Models with Social Interactions
We consider interactions of individuals in a group setting.
2.1) Linear in the Mean (Expectation) Model and the Reflection Problem (Man-
ski)
Endogenous effects, wherein the propensity of an individual to behave in some way varies with
the behaviour of the group;
Exogenous (contextual) effects, wherein the propensity of an individual to behave in some way
varies with the exogenous characteristics of the group, and
Correlated effects, wherein individuals in the same group tend to behave similarly because
they have similar individual characteristics or face similar institutional environment.
The Manski’s linear model specification:
yri = λ0E(yrj) + xri,1β10 + E(xrj,2)β20 + αr + εri
where E(yrj) with the coefficient λ0 captures the endogenous interactions, E(xrj,2) with β20 cap-
tures the exogenous interactions, and αr represents the correlation effect. The E(yr) is assumed
to solve the “social equilibrium” equation,
E(yr) = λ0E(yr) + E(xr,1)β10 + E(xr,2)β20 + αr.
Provided that λ0 = 1, this equation has a unique solution, which is
E(yr) = E(xr1)β10
1 − λ0+ E(xr,2)
β20
1 − λ0+
αr
1 − λ0.
The case where xr1 = xr2(= xr), or more general, xr,1 is a subset of xr,2, will have an underiden-
tification problem.
When xr,1 = xr,2 = xr, it follows E(yr) = E(xr)β10+β20(1−λ0)
+ αr
1−λ0and that the reduced form of
this model is
yri = E(xr)[β20 +λ0
1 − λ0(β10 + β20)] + xriβ10 +
αr
1 − λ0+ εri.
The β10 and the composite parameter vecotr β20 + λ01−λ0
(β10 + β20) can be identified, but λ0 and
β20 can not be separately identified. This is so, because E(yr) is linearly dependent on E(xr) and
the constant intercept term αr — the “reflection problem”.
22
From the implied equation, β10 is apparently identifiable. For the identification of λ0 and
β20, a sufficent identification condition is that some relevant variables are in xri,1, but they are
not in xri,2 (under the random components specification for the overall disturbance). This follows,
because, with the relevant variables in xri,1 but excluded in xri,2, it allows us to identify λ0 from
the composite parameter vector. In that case, the other coefficients λ and β2 are also identifiable.
2.2) A SAR Model with Social Interactions
Instead of rational expectation formulation, an alternative model is to specify direct interac-
tions of individuals in a group setting. This model will be useful in particular for cases where each
group has a small or moderate number of members. Suppose there are mr members in the group
r, for each unit i in the group r,
yri = λ0(1
mr − 1
mr∑j=1,j =i
yrj) + xri,1β10 + (1
mr − 1
mr∑j=1,j =i
xrj,2)β20 + αr + εri,
with i = 1, · · · , mr and r = 1, · · · , R, where yri is the ith individual in the rth group, xri,1 and
xri,2 are, respectively, k1 and k2-dimensional row vectors of exogenous variables, and εri’s are i.i.d.
(0, σ20).
It is revealing to decompose this equation into two parts. Because of the group structure, the
model is equivalent to the following two equations:
(1 − λ0)yr = xr1β10 + xr2β20 + αr + εr, r = 1, · · · , R,
and
(1 +λ0
mr − 1)(yri − yr) = (xri,1 − xr1)β10 − 1
mr − 1(xri,2 − xr2)β20 + (εri − εr),
for i = 1, · · · , mr; r = 1, · · · , R, where yr = 1mr
∑mr
i=1 yri, xr1 = 1mr
∑mr
i=1 xri,1 and xr2 =
1mr
∑mr
i=1 xri,2 are means for the r-th group. The first equation may be called a ‘between’ equation
and that in the second one is a ‘within’ equation as they have similarity with those of a panel
data regression model (Hsiao 1986). The possible effects due to interactions are revealing in the
reduced-form between and within equations:
yr = xr1β10
(1 − λ0)+ xr2
β20
(1 − λ0)+
αr
(1 − λ0)+
εr
(1 − λ0), r = 1, · · · , R,
23
and(yri − yr) = (xri,1 − xr1)
(mr − 1)β10
(mr − 1 + λ0)− (xri,2 − xr2)
β20
(mr − 1 + λ0)
+(mr − 1)
(mr − 1 + λ0)(εri − εr),
with i = 1, · · · , mr; r = 1, · · · , R.
Identification of the within equation
When all groups have the same number of members, i.e., mr is a constant, say m, for all r, the
effect λ can not be identifiable from the within equation. This is apparent as only the functions(m−1)β10(m−1+λ0)
, β20(m−1+λ0)
, and (m−1)σ20
(m−1+λ0)may ever be identifiable from the within equation.
Estimation of the within equation
Conditional likelihood and CMLE:
ln Lw,n(θ) = c +R∑
r=1
(mr − 1) ln(mr − 1 + λ) − (n − R)2
lnσ2
− 12σ2
R∑r=1
[1
cr(λ)Yr − Zrδm]′Jr[
1cr(λ)
Yr − Zrδm],
where cr(λ) = ( mr−1mr−1+λ
), zri = (xri,1,− mmr−1
xri,2), m = 1R
∑Rr=1 mr; δm = (β1,
β2m
).
CMLE of β and σ2:
βn(λ) =(
Ik1 00 mIk2
)( R∑r=1
Z′rJrZr
)−1 R∑r=1
Z′rJrYr
1cr(λ)
,
and
σ2n(λ) =
1n − R
{ R∑r=1
1c2r(λ)
Y ′rJrYr
−R∑
r=1
1cr(λ)
Y ′rJrZr
(R∑
r=1
Z′rJrZr
)−1 R∑r=1
Z′rJrYr
1cr(λ)
}.
Concentrated log likelihood at λ:
lnLc,n(λ) = c1 +R∑
r=1
(mr − 1) ln(mr − 1 + λ) − (n − R)2
ln σ2n(λ).
Instrumental Variables Estimation
yri − yr = −λ0(yri − yr)mr − 1
+ (xri,1 − xr,1)β10 − (xri,2 − xr,2)mr − 1
β20 + (εri − εr).
24
The IV estimator of θ0 = (λ0, β′10, β
′20)
′ is
θn,IV =
[R∑
r=1
(Qr
mr − 1, Xr,1,− Xr,2
mr − 1)′Jr(− Yr
mr − 1, Xr,1,− Xr,2
mr − 1)
]−1
·R∑
r=1
(Qr
mr − 1, Xr,1,− Xr,2
mr − 1)′JrYr.
2.3) Expectation Model (self-influence excluded)
The social interactions model under consideration is
yri = λ0(1
mr − 1
mr∑j=1, =i
E(yrj |Jr)) + xri,1β10
+ (1
mr − 1
n∑j=1, =i
xrj,2)β20 + αr + εri,
where Jr is the information set of the rth group, which includes all observed x’s and the group
fixed effect αr . This model assumes that each invidual knows all the exogenous characteristics
of all other members in the group but does not know their actual outcomes. Because he does
not know what the outcomes of others will be, he formulates his expectation about their possible
(rational expectation) outcomes.
The group means of the expected outcomes are
1mr
mr∑i=1
E(yri|Jr) =1
1 − λ0(
1mr
mr∑i=1
xri,1β10 +1
mr
mr∑i=1
xri,2β20 + αr),
for r = 1, · · · , R, which are linear functions of group means of exogenous variables. The model
implies the between equation representation
yr =1
(1 − λ0)(xr,1β10 + xr,2β20) + vr + εr,
and the within equation representation
yri − yr =mr − 1
mr − 1 + λ0(xri,1 − xr,1)β10 − 1
mr − 1 + λ0(xri,2 − xr,2)β20
+ (εri − εr).
From these equations, we see no implication on the possible correlation of disturbances due to
endogenous interactions, while the mean components are exactly the same as the simultaneous
25
interaction model. For this model, there is an underidentification when β20 + λ0β10 = 0 with
xri,1 = xri,2(= xri). These features imply (mr−1)β10−β20mr−1+λ0
= β10 and, hence,
yri − yr = (xri − xr)β10 + (εri − εr).
We see that, in this case, λ0 and β20 do not appear in the equation and they can not be identified
from the within equation. In the simultaneous equation interactions model, the λ0 can be identified
from the reduced form disturbances of the within equation, even that constraint on β10, β20 and
λ0 occurs.
Lee (2007) suggests the estimation of the within equation by the conditional ML method. In
addition, the IV method is also feasible, and optimal IV’s are also available.
We note that if the model includes self-influence as in
yri = λ0(1
mr
mr∑j=1
E(yrj |Jr)) + xri,1β10 + (1
mr
mr∑j=1
xrj,2)β20 + αr + εri,
the social effects in the model would also not be identifiable when xri,1 = xri,2. For this model,
the group means of the expected outcomes are
1mr
mr∑i=1
E(yri|Jr) =1
1 − λ0(
1mr
mr∑i=1
xri,1β10 +1
mr
mr∑i=1
xri,2β20 + αr),
which is a linear function of group means of exogenous variables.
2.4) A Network Model with Social Interactions
Network Model: rth group
Yr = λWrYr + Xrβ1 + WrXrβ2 + lmrαr + ur ,
and
ur = ρMrur + εr,
where εr = (ε1r, · · · , εmrr)′, εir i.i.d. (0, σ2).
R # of groups
mr # of members in r group
Wr , Mr exogenous network (social) matrices
26
λWrYr endogenous effect
WrXrβ2 exogenous effect
αr group (fixed) effect; lmr= (1, · · · , 1)′
ρMrur correlation effects
Lin (2005, 2006) — AddHealth Data;
The Add Health Survey:
Students in grades 7-12; 132 schools
Wave I in school survey — 90,118 students;
Friendship network — name up to 5 male and 5 female friends
Special features (in this paper):
Group structure (R can be large – incidental parameters);
Wr and Mr – row normalized, i.e., Wrlmr= lmr
.
Bonacich measure of centrality of members in a group:
(Imr− λ0Wr)−1Wrlmr
= (1 − λ0)lmr,
i.e., all members in the same group has the same measure of centrality.
Related works:
Lin (2005, 2006): Eliminate αr by (In −Wn) and estimation by IV method.
Calvo-Armengol, Patacchini and Zenou (2006) — social network in education;
Bramoulle, Djebbari and Fortin (2006) — identification.
This study:
• allow ‘correlated effects’;
• more on identification issues;
• more efficient estimation methods;
• MC study on sensitivity of estimated effects by omission of a certain effects;
• empirical applications with AddHealth data.
3.5) Estimation
27
Elimination of group fixed effects:
1) Quasi-difference
RrYr = λ0RrWrYr + RrZrβ0 + (1 − ρ0)lmrαr0 + εr,
where Zr = (Xr, WrXr) and Rr = Imr− ρ0Mr .
2) Eliminate group fixed effect
• deviation from group mean
• and eliminate linear dependence on disturbances
R∗rY
∗r = λ0R
∗rW
∗r Y ∗
r + R∗rZ
∗r β0 + ε∗r ,
where
i) Jr = Imr− 1
mrlmr
l′mr= FrF
′r – an eigenvalue-eigenvector decomposition;
[Fr ,1√mr
lmr] an orthonormal matrix,
ii) R∗r = F ′
rRrFr; W ∗r = F ′
rWrFr
iii) Y ∗r = F ′
rYr; Z∗r = F ′
rZr;
iv) using Wr = Wr(FrF′r + 1
mrlmr
l′mr), Wrlmr
= lmr, and F ′
rlmr= 0, one has
F ′rWr = F ′
rWrFrF′r.
Similarly for Mr ;
v) ε∗r has zero mean and the variance matrix σ20Imr−1.
—- spatial AR model.
Estimation: quasi-ML
ln Ln(θ) = − (n − R)2
ln(2πσ2) +R∑
r=1
ln|Sr(λ)|1 − λ
+R∑
r=1
ln|Rr(ρ)|1 − ρ
− 12σ2
R∑r=1
[Rr(ρ)(Sr(λ)Yr − Zrβ)]′Jr [Rr(ρ)(Sr(λ)Yr − Zrβ)].
Concentrated log likelihood function
28
lnLn(λ, ρ) = − (n − R)2
(ln(2π) + 1) − (n − R)2
ln σ2n(λ, ρ)
+ ln |Sn(λ)|+ ln |Rn(ρ)| −R ln[(1− λ)(1− ρ)].
Another (feasible) estimation method: 2SLS or G2SLS
apply IVs
F ′rRrYr = λ0F
′rRrWrYr + F ′
rRrZrβ0 + ε∗r. (∗∗)
(stacked up all the groups together – all sample observations)
i) Estimate F ′rYr = λ0F
′rWrYr +F ′
rZrβ0+F ′rur by 2sls with some IV matrix Q1r to get an initial
(consistent) estimate of λ and β;
ii) use the estimated residuals F ′rur(= u∗
r) to estimate ρ; via (Kelejian-Prucha 1999, MOM)
(Imr−1 − ρ0M∗r )u∗
r = ε∗r ;
iii) estimate Rr;
iv) then feasible 2SLS for (**).
• The best IV matrix is F ′rRr[Wr(Imr
− λ0Wr)−1Zrβ0, Zr].
• Under normal disturbances, the (transformed) ML is asymptotically more efficient.
29
Recommended