library(spsur)
library(spdep)
library(spatialreg)
library(sf)

1 Introduction

The spsur R-package (Minguez, Lopez, and Mur (\noop{2022}forthcoming)) has been designed for estimate spatial regression models in a multiequational framework. However, because of its flexibility, it is also possible to obtain useful results for uniequational models. On the oher hand, the spatialreg package ( Bivand, Pebesma, and Gómez-Rubio (2013); Bivand and Piras (2015)), is the most adequate alternative for working with uniequational models, in a pure cross-sectional setting. The purpose of this vignette is to compare the results of spsur and spatialreg in case of uniequational models. As we will see, the differences between the two are negligible.

In sum, the purpose of this vignette is to compare the results of spsur and spatialreg in the following issues:

  • Estimation of spatially independent model, or SIM models.
  • Lagrange Multiplier misspecification tests for SIM models.
  • ML estimation of SLM, SEM, SARAR models.
  • 3SLS estimation of SLM and SDM models.

2 The data set NCOVR

Throughout the vignette we use the dataset NCOVR (National Consortium on Violence Research). These data were employed by Baller et al. (2001) to analyze the incidence of homicides rates in the US counties. The dataset can be freely dowloaded from https://geodacenter.github.io/data-and-lab/ncovr/

NCOVR contains 3085 spatial units (counties), for 4 different cross-sections (1960, 1970, 1980, 1990) and 69 variables. According to Baller et al. (2001), the main variables of interest are:

  • HR: homicide rate per 100000 inhabitants
  • RD: resource deprivation
  • PS: population structure
  • MA: median age
  • DV: divorce rate (% males over 14 divorced)
  • UE: unemployment rate
  • SOUTH: dummy variable for Southern counties (South = 1)

First, we can read the NCOVR dataset as a simple feature (sf) object (named NCOVR.sf),

data(NCOVR, package = "spsur")

The first three observations in NCOVR appear below:

NCOVR <- st_drop_geometry(NCOVR.sf)
knitr::kable(
  head((NCOVR[1:3, 1:6])),
  caption = 'First observations of NCOVR dataset' )
First observations of NCOVR dataset
NAME STATE_NAME FIPS SOUTH HR60 HR70
Lake of the Woods Minnesota 27077 0 0.000000 0.000000
Ferry Washington 53019 0 0.000000 0.000000
Stevens Washington 53065 0 1.863863 1.915158

Whereas the geometry of the USA counties is shown in Figure ??:

plot(st_geometry(NCOVR.sf))

Geometry of the USA counties

Following Baller et al. (2001), we consider a W matrix based on the criterion of 10 nearest-neighbourhood, which is immediate to obtain using the spdep package (Bivand, Pebesma, and Gómez-Rubio (2013); Bivand and Wong (2018)). The resulting weighting matrix will be called listw. Note that this matrix is non-symmetric and it is row-standardized.

# Obtain coordinates of centroids
co <- sf::st_coordinates(sf::st_centroid(NCOVR.sf))
listw <- spdep::nb2listw(spdep::knn2nb(spdep::knearneigh(co, k = 10,longlat = TRUE)))

Baller et al. (2001) specify a single linear model to explain the case of HR in the year 1960 with the results shown in the Table below,

\[\begin{equation} HR_{60} = \beta_{0}+\beta_{1}RD_{60}+\beta_{2}PS_{60}+\beta_{3}MA_{60}+\beta_{4}DV_{60} +\beta_{5}UE_{60} +\beta_{6}SOUTH+\epsilon_{60} \tag{2.1} \end{equation}\]

formula_60 <- HR60 ~ RD60 + PS60 + MA60 + DV60 + UE60 + SOUTH
lm_60 <- stats::lm(formula = formula_60, data = NCOVR.sf)
summary(lm_60)
#> 
#> Call:
#> stats::lm(formula = formula_60, data = NCOVR.sf)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -14.026  -2.217  -0.635   1.393  88.312 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)
#> (Intercept)  8.12591    0.63465  12.804  < 2e-16
#> RD60         1.79824    0.12341  14.571  < 2e-16
#> PS60         0.35871    0.09216   3.892 0.000101
#> MA60        -0.23047    0.01932 -11.931  < 2e-16
#> DV60         1.16002    0.09483  12.233  < 2e-16
#> UE60        -0.06195    0.03515  -1.762 0.078138
#> SOUTH        2.63862    0.23325  11.312  < 2e-16
#>                
#> (Intercept) ***
#> RD60        ***
#> PS60        ***
#> MA60        ***
#> DV60        ***
#> UE60        .  
#> SOUTH       ***
#> ---
#> Signif. codes:  
#> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 4.743 on 3078 degrees of freedom
#> Multiple R-squared:  0.2966, Adjusted R-squared:  0.2952 
#> F-statistic: 216.3 on 6 and 3078 DF,  p-value: < 2.2e-16

The model can be estimated in usual way with the function stats::lm (R Core Team (2019)),

formula_60 <- HR60 ~ RD60 + PS60 + MA60 + DV60 + UE60 + SOUTH
lm_60 <- lm(formula = formula_60, data = NCOVR.sf)
summary(lm_60)
#> 
#> Call:
#> lm(formula = formula_60, data = NCOVR.sf)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -14.026  -2.217  -0.635   1.393  88.312 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)
#> (Intercept)  8.12591    0.63465  12.804  < 2e-16
#> RD60         1.79824    0.12341  14.571  < 2e-16
#> PS60         0.35871    0.09216   3.892 0.000101
#> MA60        -0.23047    0.01932 -11.931  < 2e-16
#> DV60         1.16002    0.09483  12.233  < 2e-16
#> UE60        -0.06195    0.03515  -1.762 0.078138
#> SOUTH        2.63862    0.23325  11.312  < 2e-16
#>                
#> (Intercept) ***
#> RD60        ***
#> PS60        ***
#> MA60        ***
#> DV60        ***
#> UE60        .  
#> SOUTH       ***
#> ---
#> Signif. codes:  
#> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 4.743 on 3078 degrees of freedom
#> Multiple R-squared:  0.2966, Adjusted R-squared:  0.2952 
#> F-statistic: 216.3 on 6 and 3078 DF,  p-value: < 2.2e-16

The same results can be obtained with the function spsurml from spsur, selecting the argument type = “sim”

ols.spsur <- spsurml(formula = formula_60, type = "sim", 
                     data = NCOVR.sf)
#> Initial point:  
#> log_lik:  -9176.338 
#> Iteration:  1  log_lik:  -9176.338 
#> Time to fit the model:  0.05  seconds
#> Time to compute covariances:  0.01  seconds
summary(ols.spsur)
#> Call:
#> spsurml(formula = formula_60, data = NCOVR.sf, type = "sim")
#> 
#>  
#> Spatial SUR model type:  sim 
#> 
#> Equation  1 
#>                Estimate Std. Error  t value
#> (Intercept)_1  8.125915   0.634033  12.8162
#> RD60_1         1.798240   0.123294  14.5850
#> PS60_1         0.358706   0.092069   3.8960
#> MA60_1        -0.230475   0.019298 -11.9428
#> DV60_1         1.160020   0.094737  12.2446
#> UE60_1        -0.061948   0.035120  -1.7639
#> SOUTH_1        2.638618   0.233024  11.3234
#>                Pr(>|t|)    
#> (Intercept)_1 < 2.2e-16 ***
#> RD60_1        < 2.2e-16 ***
#> PS60_1        9.986e-05 ***
#> MA60_1        < 2.2e-16 ***
#> DV60_1        < 2.2e-16 ***
#> UE60_1          0.07785 .  
#> SOUTH_1       < 2.2e-16 ***
#> ---
#> Signif. codes:  
#> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> R-squared: 0.2966 
#>   
#> Residual standard error:  4.739

The two functions, stats::lm and spsurml, produce identical results. The output of spsurml is shorter than that of lm so, depending on the necessities of the user, he/she can choose between lm and spsurml without loss of information. If you want all the estimation details, then use lm.

3 LM tests for spatial autocorrelation

The existence of omitted spatial dependence in the results of a SIM model, estimated by LS, can be tested by using the classical LM tests. The function spdep::lm.LMtests report the values of these Lagrange Multipliers

lmtest.spdep <- spdep::lm.LMtests(lm_60, listw, test = "all")
print(lmtest.spdep)
#> 
#>  Lagrange multiplier diagnostics for spatial
#>  dependence
#> 
#> data:  
#> model: lm(formula = formula_60, data =
#> NCOVR.sf)
#> weights: listw
#> 
#> LMerr = 208.32, df = 1, p-value < 2.2e-16
#> 
#> 
#>  Lagrange multiplier diagnostics for spatial
#>  dependence
#> 
#> data:  
#> model: lm(formula = formula_60, data =
#> NCOVR.sf)
#> weights: listw
#> 
#> LMlag = 232.36, df = 1, p-value < 2.2e-16
#> 
#> 
#>  Lagrange multiplier diagnostics for spatial
#>  dependence
#> 
#> data:  
#> model: lm(formula = formula_60, data =
#> NCOVR.sf)
#> weights: listw
#> 
#> RLMerr = 1.1473, df = 1, p-value = 0.2841
#> 
#> 
#>  Lagrange multiplier diagnostics for spatial
#>  dependence
#> 
#> data:  
#> model: lm(formula = formula_60, data =
#> NCOVR.sf)
#> weights: listw
#> 
#> RLMlag = 25.19, df = 1, p-value = 5.196e-07
#> 
#> 
#>  Lagrange multiplier diagnostics for spatial
#>  dependence
#> 
#> data:  
#> model: lm(formula = formula_60, data =
#> NCOVR.sf)
#> weights: listw
#> 
#> SARMA = 233.51, df = 2, p-value < 2.2e-16

The same tests can be obtained with the function lmtestspsur

lmtest.spsur <- lmtestspsur(formula = formula_60, listw = listw, 
                            data = NCOVR.sf)
print(lmtest.spsur)
#> [[1]]
#> 
#>  LM-SUR-SLM
#> 
#> data:  NCOVR.sf
#> LM-stat = 232.22, df = 1, p-value < 2.2e-16
#> 
#> 
#> [[2]]
#> 
#>  LM-SUR-SEM
#> 
#> data:  NCOVR.sf
#> LM-stat = 208.19, df = 1, p-value < 2.2e-16
#> 
#> 
#> [[3]]
#> 
#>  LM*-SUR-SLM
#> 
#> data:  NCOVR.sf
#> LM-stat = 25.181, df = 1, p-value =
#> 5.218e-07
#> 
#> 
#> [[4]]
#> 
#>  LM*-SUR-SEM
#> 
#> data:  NCOVR.sf
#> LM-stat = 1.1431, df = 1, p-value = 0.285
#> 
#> 
#> [[5]]
#> 
#>  LM-SUR-SARAR
#> 
#> data:  NCOVR.sf
#> LM-stat = 233.37, df = 2, p-value < 2.2e-16

Note that the ordering of the battery of Lagrange Multipliers is not the same. Otherwise, the results are almost identical.

4 The Spatial Lag Model

Both R packages spatialreg and spsur can estimate Spatial Lag Models for a single cross-section. Continuing with the example before, the model that we want to estimate is:

\[ \begin{equation} HR_{60} = \rho W HR_{60} +\beta_{0}+\beta_{1}RD_{60}+\beta_{2}PS_{60}+\beta_{3}MA_{60}+\beta_{4}DV_{60} +\beta_{5}UE_{60} +\beta_{6}SOUTH+\epsilon_{60}\ \tag{4.1} \end{equation} \]

The ML estimation of equation (4.1) using the function spsurml() of spsur renders the following results:

slm.spsur <- spsur::spsurml(formula = formula_60, type = "slm", 
                            listw = listw, data = NCOVR.sf)
#> neighbourhood matrix eigenvalues
#> Computing eigenvalues ...
#> 
#> Initial point:   log_lik:  -9101.642  rhos:  0.364 
#> Iteration:  1   log_lik:  -9098.484  rhos:  0.371 
#> Iteration:  2   log_lik:  -9098.484  rhos:  0.371 
#> Time to fit the model:  1.06  seconds
#> Time to compute covariances:  5.25  seconds
summary(slm.spsur)
#> Call:
#> spsur::spsurml(formula = formula_60, data = NCOVR.sf, listw = listw, 
#>     type = "slm")
#> 
#>  
#> Spatial SUR model type:  slm 
#> 
#> Equation  1 
#>                Estimate Std. Error t value
#> (Intercept)_1  5.473546   0.647627  8.4517
#> RD60_1         1.357784   0.125733 10.7989
#> PS60_1         0.284344   0.089317  3.1835
#> MA60_1        -0.165429   0.019384 -8.5342
#> DV60_1         0.884949   0.093721  9.4424
#> UE60_1        -0.021774   0.034003 -0.6404
#> SOUTH_1        1.354664   0.244606  5.5382
#> rho_1          0.370632   0.030271 12.2436
#>                Pr(>|t|)    
#> (Intercept)_1 < 2.2e-16 ***
#> RD60_1        < 2.2e-16 ***
#> PS60_1         0.001469 ** 
#> MA60_1        < 2.2e-16 ***
#> DV60_1        < 2.2e-16 ***
#> UE60_1         0.521983    
#> SOUTH_1       3.314e-08 ***
#> rho_1         < 2.2e-16 ***
#> ---
#> Signif. codes:  
#> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> R-squared: 0.3409 
#>   
#> Residual standard error:  4.588
#>  LMM: 22.483  p-value: (2.12e-06)

The output of the function spatialreg::lagsarlm() from spatialreg is

slm.spatialreg <- spatialreg::lagsarlm(formula = formula_60, 
                                       listw = listw, type = "lag", 
                                       data = NCOVR.sf)
summary(slm.spatialreg)
#> 
#> Call:
#> spatialreg::lagsarlm(formula = formula_60, data = NCOVR.sf, listw = listw, 
#>     type = "lag")
#> 
#> Residuals:
#>       Min        1Q    Median        3Q       Max 
#> -13.54486  -2.11156  -0.63891   1.30728  88.57308 
#> 
#> Type: lag 
#> Coefficients: (asymptotic standard errors) 
#>              Estimate Std. Error z value  Pr(>|z|)
#> (Intercept)  5.473303   0.647529  8.4526 < 2.2e-16
#> RD60         1.357744   0.125714 10.8002 < 2.2e-16
#> PS60         0.284338   0.089303  3.1840  0.001453
#> MA60        -0.165423   0.019381 -8.5352 < 2.2e-16
#> DV60         0.884924   0.093706  9.4436 < 2.2e-16
#> UE60        -0.021771   0.033998 -0.6404  0.521938
#> SOUTH        1.354546   0.244570  5.5385 3.051e-08
#> 
#> Rho: 0.37067, LR test value: 155.71, p-value: < 2.22e-16
#> Asymptotic standard error: 0.03027
#>     z-value: 12.245, p-value: < 2.22e-16
#> Wald statistic: 149.95, p-value: < 2.22e-16
#> 
#> Log likelihood: -9098.484 for lag model
#> ML residual variance (sigma squared): 21.041, (sigma: 4.5871)
#> Number of observations: 3085 
#> Number of parameters estimated: 9 
#> AIC: 18215, (AIC for lm: 18369)
#> LM test for residual autocorrelation
#> test value: 22.524, p-value: 2.0753e-06

Once again, the two estimations are almost the same. The estimated log-likelihood of the SLM can be recovered using the function logLik()

logLik(slm.spsur)
#> 'log Lik.' -9098.484 (df=9)

The log-likelihood corresponding to the SIM model is much lower, -9176.338, which points to a severe misspecification in the last model. More formally, the LR, obtained with the function anova of spsur strongly rejects the SIM model in favour of the SLM alternative; the AIC and BIC statistics indicates the same.

anova(ols.spsur, slm.spsur)
#>               logLik df   AIC   BIC LRtest
#> model 1: sim -9176.3  8 18369 18353       
#> model 2: slm -9098.5  9 18215 18197 155.71
#>                  p.val
#> model 1: sim          
#> model 2: slm 9.805e-36

The SLM model can also be estimate by Three Stages-Least-Squares (3SLS) by both R-packages. The function for spsur is spsur3sls()

slm.3sls.spsur <- spsur3sls(formula = formula_60, type = "slm", 
                            listw = listw, data = NCOVR.sf)
#> Time to fit the model:  0.05  seconds
summary(slm.3sls.spsur)
#> Call:
#> spsur3sls(formula = formula_60, data = NCOVR.sf, listw = listw, 
#>     type = "slm")
#> 
#>  
#> Spatial SUR model type:  slm 
#> 
#> Equation  1 
#>                  Estimate  Std. Error t value
#> (Intercept)_1  3.99354886  0.80980919  4.9315
#> RD60_1         1.11201434  0.14804917  7.5111
#> PS60_1         0.24285142  0.09010103  2.6953
#> MA60_1        -0.12913458  0.02271825 -5.6842
#> DV60_1         0.73146230  0.10670843  6.8548
#> UE60_1         0.00064201  0.03483656  0.0184
#> SOUTH_1        0.63822931  0.34132155  1.8699
#> rho_1          0.57744206  0.07411084  7.7916
#>                Pr(>|t|)    
#> (Intercept)_1 8.594e-07 ***
#> RD60_1        7.638e-14 ***
#> PS60_1          0.00707 ** 
#> MA60_1        1.437e-08 ***
#> DV60_1        8.595e-12 ***
#> UE60_1          0.98530    
#> SOUTH_1         0.06160 .  
#> rho_1         8.979e-15 ***
#> ---
#> Signif. codes:  
#> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> R-squared: 0.3447 
#>   
#> Residual standard error:  4.574

Whereas spatialreg uses the function spatialreg::stsls()

slm.3sls.spatialreg <- spatialreg::stsls(formula = formula_60, listw = listw, 
                             data = NCOVR.sf)
summary(slm.3sls.spatialreg)
#> 
#> Call:
#> spatialreg::stsls(formula = formula_60, data = NCOVR.sf, listw = listw)
#> 
#> Residuals:
#>       Min        1Q    Median        3Q       Max 
#> -14.79844  -2.06529  -0.57084   1.22908  88.71872 
#> 
#> Coefficients: 
#>                Estimate  Std. Error t value
#> Rho          0.57744206  0.07419509  7.7828
#> (Intercept)  3.99354886  0.81072980  4.9259
#> RD60         1.11201434  0.14821748  7.5026
#> PS60         0.24285142  0.09020346  2.6923
#> MA60        -0.12913458  0.02274408 -5.6777
#> DV60         0.73146230  0.10682974  6.8470
#> UE60         0.00064201  0.03487617  0.0184
#> SOUTH        0.63822931  0.34170958  1.8678
#>              Pr(>|t|)
#> Rho         7.105e-15
#> (Intercept) 8.399e-07
#> RD60        6.262e-14
#> PS60         0.007097
#> MA60        1.365e-08
#> DV60        7.542e-12
#> UE60         0.985313
#> SOUTH        0.061796
#> 
#> Residual variance (sigma squared): 20.966, (sigma: 4.5788)

There is hardly any difference between them because the estimation algorithm is quasilinear and both functions use the same set of instruments. The case of the SDM model produces identical results.

5 Spatial Error Model

The model to estimate in this case is:

\[ \begin{equation} HR_{60} = \beta_{0}+\beta_{1}RD_{60}+\beta_{2}PS_{60}+\beta_{3}MA_{60}+\beta_{4}DV_{60} +\beta_{5}UE_{60} +\beta_{6}SOUTH+ u_{60} \\ u_{60} = \lambda W u_{60} + \epsilon_{60}\ \tag{5.1} \end{equation} \]

which can be solved by ML using the function spatialreg::errorsarlm(), from spatialreg.

sem.spatialreg <- spatialreg::errorsarlm(formula = formula_60, 
                             listw = listw, data = NCOVR.sf)
summary(sem.spatialreg)
#> 
#> Call:
#> spatialreg::errorsarlm(formula = formula_60, data = NCOVR.sf, 
#>     listw = listw)
#> 
#> Residuals:
#>       Min        1Q    Median        3Q       Max 
#> -13.54039  -2.10737  -0.66853   1.27843  88.31442 
#> 
#> Type: error 
#> Coefficients: (asymptotic standard errors) 
#>              Estimate Std. Error z value  Pr(>|z|)
#> (Intercept)  7.527324   0.766690  9.8179 < 2.2e-16
#> RD60         1.742345   0.148640 11.7219 < 2.2e-16
#> PS60         0.346998   0.109230  3.1768  0.001489
#> MA60        -0.208238   0.023391 -8.9024 < 2.2e-16
#> DV60         0.957270   0.108387  8.8320 < 2.2e-16
#> UE60         0.018167   0.039935  0.4549  0.649165
#> SOUTH        2.494740   0.316266  7.8881 3.109e-15
#> 
#> Lambda: 0.38258, LR test value: 140.59, p-value: < 2.22e-16
#> Asymptotic standard error: 0.032059
#>     z-value: 11.934, p-value: < 2.22e-16
#> Wald statistic: 142.41, p-value: < 2.22e-16
#> 
#> Log likelihood: -9106.043 for error model
#> ML residual variance (sigma squared): 21.123, (sigma: 4.596)
#> Number of observations: 3085 
#> Number of parameters estimated: 9 
#> AIC: 18230, (AIC for lm: 18369)

spsur always uses the same function for the ML algorithm, spsurml(); you only have to change the type of model to estimate.

sem.spsur <- spsurml(formula = formula_60, type = "sem",
                     listw = listw, data = NCOVR.sf)
#> neighbourhood matrix eigenvalues
#> Computing eigenvalues ...
#> 
#> Initial point:   log_lik:  -9108.833  lambdas:  0.375 
#> Iteration:  1  log_lik:  -9106.044  lambdas:  0.382 
#> Iteration:  2  log_lik:  -9106.043  lambdas:  0.383 
#> Time to fit the model:  1.89  seconds
#> Time to compute covariances:  66.02  seconds
summary(sem.spsur)
#> Call:
#> spsurml(formula = formula_60, data = NCOVR.sf, listw = listw, 
#>     type = "sem")
#> 
#>  
#> Spatial SUR model type:  sem 
#> 
#> Equation  1 
#>                Estimate Std. Error t value
#> (Intercept)_1  7.527394   0.766797  9.8167
#> RD60_1         1.742351   0.148661 11.7203
#> PS60_1         0.346996   0.109246  3.1763
#> MA60_1        -0.208241   0.023395 -8.9013
#> DV60_1         0.957295   0.108403  8.8309
#> UE60_1         0.018158   0.039940  0.4546
#> SOUTH_1        2.494771   0.316304  7.8873
#> lambda_1       0.382536   0.032105 11.9151
#>                Pr(>|t|)    
#> (Intercept)_1 < 2.2e-16 ***
#> RD60_1        < 2.2e-16 ***
#> PS60_1         0.001507 ** 
#> MA60_1        < 2.2e-16 ***
#> DV60_1        < 2.2e-16 ***
#> UE60_1         0.649409    
#> SOUTH_1       4.255e-15 ***
#> lambda_1      < 2.2e-16 ***
#> ---
#> Signif. codes:  
#> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> R-squared: 0.3388 
#>   
#> Residual standard error:  4.597
#>  LMM: 1.1852  p-value: (0.276)

The LR test for nested models, in this case SIM vs SEM, can be obtained as before using the function anova()

anova(ols.spsur, sem.spsur)
#>               logLik df   AIC   BIC LRtest
#> model 1: sim -9176.3  8 18369 18353       
#> model 2: sem -9106.0  9 18230 18212 140.59
#>                   p.val
#> model 1: sim           
#> model 2: sem 1.9786e-32

Clearly, the SEM model is preferable to the SIM specification because there is spatial dependence in the data, which has been effectively captured by the SEM mechanism. Moreover, according to the LMM test below (this is a Marginal Lagrange Multiplier of the type \(LM(\rho|\lambda)\); see the spsur user guide), once the spatial errors are introduced in the equation, the spatial lag of the endogenous variable, \(WHR_{60}\) is not statistically significant.

print(sem.spsur$LMM)
#> [1] 1.185223

6 Spatial Simultaneous Autoregressive Model (SARAR)

The especification of the SARAR model is as usual

\[ \begin{equation} HR_{60} = \rho W HR_{60} + \beta_{0}+\beta_{1}RD_{60}+\beta_{2}PS_{60}+\beta_{3}MA_{60}+\beta_{4}DV_{60} +\beta_{5}UE_{60} +\beta_{6}SOUTH+ u_{60} \\ u_{60} = \lambda W u_{60} + \epsilon_{60}\ \tag{6.1} \end{equation} \]

spsur estimates this model by using the funtion spsurml(); you only have to adjust the argument type to sarar, that is

sarar.spsur <- spsurml(formula = formula_60, listw = listw, 
                       type ="sarar",data = NCOVR.sf)
#> neighbourhood matrix eigenvalues
#> Computing eigenvalues ...
#> 
#> Initial point:   log_lik:  -9085.318  rhos:  0.663  lambdas:  -0.603 
#> Iteration:  1  log_lik:  -9062.748  rhos:  0.748  lambdas:  -0.837 
#> Iteration:  2  log_lik:  -9060.355  rhos:  0.77  lambdas:  -0.905 
#> Iteration:  3  log_lik:  -9060.132  rhos:  0.776  lambdas:  -0.925 
#> Iteration:  4  log_lik:  -9060.111  rhos:  0.777  lambdas:  -0.931 
#> Iteration:  5  log_lik:  -9060.109  rhos:  0.778  lambdas:  -0.933 
#> Iteration:  6  log_lik:  -9060.108  rhos:  0.778  lambdas:  -0.933 
#> Time to fit the model:  42.97  seconds
#> Time to compute covariances:  10.36  seconds
summary(sarar.spsur)
#> Call:
#> spsurml(formula = formula_60, data = NCOVR.sf, listw = listw, 
#>     type = "sarar")
#> 
#>  
#> Spatial SUR model type:  sarar 
#> 
#> Equation  1 
#>                Estimate Std. Error  t value
#> (Intercept)_1  2.421946   0.391455   6.1870
#> RD60_1         0.657214   0.079674   8.2488
#> PS60_1         0.149669   0.054110   2.7660
#> MA60_1        -0.079578   0.011837  -6.7230
#> DV60_1         0.500917   0.063113   7.9369
#> UE60_1        -0.031826   0.021372  -1.4892
#> SOUTH_1        0.260660   0.127846   2.0389
#> rho_1          0.777937   0.018114  42.9474
#> lambda_1      -0.933030   0.083248 -11.2078
#>                Pr(>|t|)    
#> (Intercept)_1 6.942e-10 ***
#> RD60_1        2.344e-16 ***
#> PS60_1         0.005708 ** 
#> MA60_1        2.114e-11 ***
#> DV60_1        2.880e-15 ***
#> UE60_1         0.136550    
#> SOUTH_1        0.041550 *  
#> rho_1         < 2.2e-16 ***
#> lambda_1      < 2.2e-16 ***
#> ---
#> Signif. codes:  
#> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> R-squared: 0.4441 
#>   
#> Residual standard error:   4.24

In the case of spatialreg, the function for this case is spatialreg::sacsar()

sarar.spatialreg <- spatialreg::sacsarlm(formula = formula_60, 
                             listw = listw, data = NCOVR.sf)
summary(sarar.spatialreg)
#> 
#> Call:
#> spatialreg::sacsarlm(formula = formula_60, data = NCOVR.sf, listw = listw)
#> 
#> Residuals:
#>       Min        1Q    Median        3Q       Max 
#> -12.73012  -1.99470  -0.57684   1.20274  80.71945 
#> 
#> Type: sac 
#> Coefficients: (asymptotic standard errors) 
#>              Estimate Std. Error z value  Pr(>|z|)
#> (Intercept)  2.419903   0.410043  5.9016 3.600e-09
#> RD60         0.656738   0.085035  7.7232 1.132e-14
#> PS60         0.149557   0.054411  2.7486  0.005985
#> MA60        -0.079518   0.012398 -6.4137 1.420e-10
#> DV60         0.500589   0.066415  7.5373 4.796e-14
#> UE60        -0.031805   0.021375 -1.4879  0.136767
#> SOUTH        0.260047   0.131573  1.9764  0.048104
#> 
#> Rho: 0.77818
#> Asymptotic standard error: 0.023041
#>     z-value: 33.774, p-value: < 2.22e-16
#> Lambda: -0.93384
#> Asymptotic standard error: 0.072693
#>     z-value: -12.846, p-value: < 2.22e-16
#> 
#> LR test value: 232.46, p-value: < 2.22e-16
#> 
#> Log likelihood: -9060.108 for sac model
#> ML residual variance (sigma squared): 17.97, (sigma: 4.2391)
#> Number of observations: 3085 
#> Number of parameters estimated: 10 
#> AIC: 18140, (AIC for lm: 18369)

The same as before, the differences between the two codes are minimal.

Finally, according to the anova() function of spsur, the SARAR model is preferable to the SLM and SEM alternatives, which are nested in the SARAR, in spite of the high value estimated for the parameter \(\lambda\), -0.93327, close to a unit root case. The LR and both AIC and BIC statistics reach the same conclusion.

anova(slm.spsur,sarar.spsur)
#>                 logLik df   AIC   BIC LRtest
#> model 1: slm   -9098.5  9 18215 18197       
#> model 2: sarar -9060.1 10 18140 18120 76.751
#>                     p.val
#> model 1: slm             
#> model 2: sarar 1.9393e-18
anova(sem.spsur,sarar.spsur)
#>                 logLik df   AIC   BIC LRtest
#> model 1: sem   -9106.0  9 18230 18212       
#> model 2: sarar -9060.1 10 18140 18120  91.87
#>                     p.val
#> model 1: sem             
#> model 2: sarar 9.2565e-22

References

Baller, Robert D, Luc Anselin, Steven F Messner, Glenn Deane, and Darnell F Hawkins. 2001. “Structural Covariates of US County Homicide Rates: Incorporating Spatial Effects.” Criminology 39 (3): 561–88.
Bivand, Roger, Edzer Pebesma, and Virgilio Gómez-Rubio. 2013. Applied Spatial Data Analysis with R, Second Edition. Springer-Verlag, NY. https://asdar-book.org/.
Bivand, Roger, and Gianfranco Piras. 2015. “Comparing Implementations of Estimation Methods for Spatial Econometrics.” Journal of Statistical Software 63 (18): 1–36. https://doi.org/10.18637/jss.v063.i18.
Bivand, Roger, and David W. S. Wong. 2018. “Comparing Implementations of Global and Local Indicators of Spatial Association.” TEST 27 (3): 716–48. https://doi.org/10.1007/s11749-018-0599-x.
Minguez, Roman, Fernando Lopez, and Jesus Mur. \noop{2022}forthcoming. spsur: an R package for dealing with Spatial Seemingly Unrelated Regression Models.” Journal of Statistical Software, \noop{2022}forthcoming. https://github.com/rominsal/spsur/tree/master/vignettes.
R Core Team. 2019. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.