spfilteR: Spatial Filtering with Eigenvectors

Sebastian Juhl

2021-08-13

Introduction

The spfilteR package provides tools to implement the eigenvector-based spatial filtering (ESF) approach put forth by Griffith (2003) in linear and generalized linear regression models. It allows users to obtain eigenvectors from a transformed connectivity matrix and supports the supervised and unsupervised creation of a synthetic spatial filter. For the unsupervised approach, eigenvectors can be selected based on either i) the maximization of model fit, ii) the minimization of residual autocorrelation, iii) the statistical significance of residual autocorrelation, or iv) the significance level of candidate eigenvectors. Alternatively, all eigenvectors in the search set may be included so that no selection takes place. While the functions provide a high flexibility, only a minimum of code is required to implement the unsupervised ESF approach.

This vignette solely focuses on the workflow to illustrate the functionality of the spfilteR package. For a discussion on the ESF methodology, interested readers may be referred to other sources, e.g., Griffith, Chun, and Li (2019) or Tiefelsdorf and Griffith (2007).

Installation

A stable release version of the spfilteR package is available on CRAN and the latest development version can be downloaded from GitHub.

# CRAN release
install.packages("spfilteR")

# GitHub
library(devtools)
devtools::install_github("sjuhl/spfilteR")

Getting Started

Together with the main functions, the package contains an artificial dataset (fakedataset) and a connectivity matrix (W) connecting the 100 cross-sections on a regular grid by using the rook criterion of adjacency. For illustrative purposes, the following examples utilize the fake dataset.

# load the package
library(spfilteR)

# load the dataset
data("fakedata")

# take a look at the connectivity matrix and the variables
dim(W)
#> [1] 100 100

head(fakedataset)
#>   ID indicator count       x1        x2         x3  negative negcount
#> 1  1         0     0 11.29514  5.527426  6.6406409 -2.338103        9
#> 2  2         0     4 13.92705  7.047870  4.0049350 -2.048737        3
#> 3  3         1     1 17.31431 10.525834 -0.7022595 -1.868379        5
#> 4  4         1     4 15.21778  7.817901  1.5948141 -2.737914        2
#> 5  5         1     3 15.56298  7.903947 -1.2708973 -1.117427        2
#> 6  6         1     2 17.43941 11.254842 -1.4158475  1.577442        2

Besides an ID variable, the dataset contains a binary indicator variable, two count variables, and four continuous variables.

Identifying Spatial Autocorrelation

In a first step, the function MI.vec() checks for the presence of spatial autocorrelation in the continuous variables by means of Moran’s I (see also Cliff and Ord 1981, 1972). The function further allows users to define the alternative hypothesis in order to obtain p-values.

# select continuous variables
cont <- cbind(fakedataset$x1, fakedataset$x2, fakedataset$x3, fakedataset$negative)
colnames(cont) <- c("x1", "x2", "x3", "negative")

# Moran test of spatial autocorrelation
(I <- MI.vec(x = cont, W = W, alternative = 'greater'))
#>                     I          EI        VarI         zI           pI    
#> x1        0.311178214 -0.01010101 0.005344193  4.3948248 5.543107e-06 ***
#> x2        0.168757531 -0.01010101 0.005344193  2.4466317 7.209904e-03  **
#> x3        0.009112739 -0.01010101 0.005344193  0.2628276 3.963417e-01    
#> negative -0.244064424 -0.01010101 0.005344193 -3.2004192 9.993139e-01

The output suggests that the variables x1 and x2 are positively autocorrelated at conventional levels of statistical significance. Moreover, the standardized value of Moran’s I (zI) indicates that the variable negative is negatively autocorrelated. We can use the function MI.vec() and specify alternative = 'lower' to assess the significance of the negative autocorrelation:

MI.vec(x = fakedataset$negative, W = W, alternative = 'lower')
#>            I          EI        VarI        zI           pI    
#> 1 -0.2440644 -0.01010101 0.005344193 -3.200419 0.0006861391 ***

Since the Moran coefficient is a global measure of spatial autocorrelation, spatial heterogeneity constitutes a problem for this statistic. More specifically, the simultaneous presence of positive and negative spatial autocorrelation at different scales cannot be revealed by the classical Moran’s I. To circumvent this problem, the function MI.decomp() decomposes the Moran coefficient into a positively and a negatively autocorrelated part and performs a permutation procedure to assess the significance (Dray 2011):

# decompose Moran's I
(I.dec <- MI.decomp(x = cont, W = W, nsim=100))
#>                 I+       VarI+        pI+            I-       VarI-        pI-
#> x1       0.3894789 0.002025092 0.00990099 ** -0.0783007 0.001730153 1.00000000
#> x2       0.3048369 0.002248250 0.03960396  * -0.1360794 0.001786746 0.98019802
#> x3       0.2247566 0.001587511 0.29702970    -0.2156438 0.001829118 0.46534653
#> negative 0.1131614 0.002175911 1.00000000    -0.3572259 0.001885981 0.00990099
#>             pItwo.sided  
#> x1           0.01980198 *
#> x2           0.07920792 .
#> x3           0.59405941  
#> negative **  0.01980198 *

Note that the global Moran’s I is the sum of I+ and I-:

# I = 'I+' + 'I-'
cbind(I[, "I"], I.dec[, "I+"] + I.dec[, "I-"])
#>                  [,1]         [,2]
#> x1        0.311178214  0.311178214
#> x2        0.168757531  0.168757531
#> x3        0.009112739  0.009112739
#> negative -0.244064424 -0.244064424

Unsupervised Spatial Filtering in Linear Models

Assume that we wish to regress x1 on x2 and test for residual autocorrelation using the function MI.resid().

# define variables
y <- fakedataset$x1
X <- cbind(1, fakedataset$x2)

# OLS regression
ols.resid <- resid(lm(y ~ X))

# Moran test of residual spatial autocorrelation
MI.resid(resid = ols.resid, W = W, alternative = 'greater')
#>          I          EI        VarI       zI           pI    
#> 1 0.350568 -0.01010101 0.005344193 4.933643 4.035495e-07 ***

The results show that the residuals are significantly autocorrelated which violates the assumption of uncorrelated errors. In order to resolve this problem of spatial autocorrelation in regression residuals, the function lmFilter() estimates a spatially filtered linear regression model using an unsupervised stepwise regression to identify relevant eigenvectors derived from the transformed connectivity matrix. Below, the unsupervised eigenvector search algorithm selects eigenvectors based on the reduction in residual autocorrelation. The output is a class spfilter object.

# ESF model
(lm.esf <- lmFilter(y = y, x = X, W = W, objfn = 'MI', positive = TRUE,
                    ideal.setsize = TRUE, tol = .2))
#> 8 out of 22 candidate eigenvectors selected

summary(lm.esf, EV = TRUE)
#> 
#>  - Spatial Filtering with Eigenvectors (Linear Model)  -
#> 
#> Coefficients (OLS):
#>              Estimate         SE      p-value    
#> (Intercept) 9.2169316 0.66572188 5.088489e-24 ***
#> beta_1      0.9947113 0.07995667 2.932913e-21 ***
#> 
#> Adjusted R-squared:
#>   Initial  Filtered 
#> 0.4673945 0.7257458 
#> 
#> Filtered for positive spatial autocorrelation
#> 8 out of 22 candidate eigenvectors selected
#> Objective Function: "MI"
#> 
#> Summary of selected eigenvectors:
#>        Estimate       SE      p-value  partialR2      VIF        MI    
#> ev_13 -9.580545 1.447552 2.550912e-09 0.23208263 1.005781 0.6302019 ***
#> ev_4  -3.239103 1.481047 3.133258e-02 0.02873360 1.049729 0.9257835   *
#> ev_10 -5.529712 1.453592 2.589498e-04 0.07901543 1.013360 0.7303271 ***
#> ev_2   4.885225 1.444218 1.064466e-03 0.06058390 1.001660 1.0004147  **
#> ev_9   3.458918 1.469630 2.076982e-02 0.03041775 1.034209 0.7638378   *
#> ev_20  4.188058 1.445305 4.720290e-03 0.04411345 1.002998 0.4539879  **
#> ev_5  -2.863850 1.452437 5.170840e-02 0.02055372 1.011899 0.8968632   .
#> ev_19  3.805433 1.443517 9.873617e-03 0.03671349 1.000798 0.4615722  **
#> 
#> Moran's I (Residuals):
#>             Observed    Expected   Variance         z      p-value    
#> Initial   0.35056797 -0.01192610 0.01207299 3.2990854 0.0004850019 ***
#> Filtered -0.03446177 -0.07716672 0.04880722 0.1933019 0.4233612531

While the print method shows that 8 eigenvectors were selected from the candidate set consisting of 22 eigenvectors, the summary method provides additional information. Besides the ordinary least squares (OLS) parameter estimates, the output shows that the ESF model filters for positive spatial autocorrelation using the minimization of residual autocorrelation as objective function during the eigenvector search. A comparison between the filtered and the nonspatial OLS model with respect to model fit and residual autocorrelation is also provided. Since the option EV is set to TRUE, the summary method also displays information on the selected eigenvectors. As the results show, the ESF model successfully removes the spatial pattern from model residuals.

The plot method allows for an easy visualization of the results. The graph displays the Moran coefficient associated with each of the eigenvectors. The shaded area signifies the candidate set and selected eigenvectors are illustrated by black dots.

plot(lm.esf)

Moreover, lmFilter() also allows users to select eigenvectors based on alternative selection criteria:

### Alternative selection criteria:
# maximization of model fit
lmFilter(y = y, x = X, W = W, objfn = 'R2', positive = TRUE, ideal.setsize = TRUE)
#> 11 out of 22 candidate eigenvectors selected

# significance of residual autocorrelation
lmFilter(y = y, x = X, W = W, objfn = 'pMI', sig = .1, bonferroni = FALSE,
         positive = TRUE, ideal.setsize = TRUE)
#> 3 out of 22 candidate eigenvectors selected

# significance of eigenvectors
lmFilter(y = y, x = X, W = W, objfn = 'p', sig = .1, bonferroni = TRUE,
         positive = TRUE, ideal.setsize = TRUE)
#> 3 out of 22 candidate eigenvectors selected

# all eigenvectors in the candidate set
lmFilter(y = y, x = X, W = W, objfn = 'all', positive = TRUE, ideal.setsize = TRUE)
#> 22 out of 22 candidate eigenvectors selected

If users wish to select eigenvectors based on individual selection criteria, they can obtain the eigenvectors using the function getEVs() and perform a supervised selection procedure using the basic stats::lm() command.

Extension to Generalized Linear Models

The ESF approach outlined above easily extends to generalized linear models (GLMs) as well (see also Griffith, Chun, and Li 2019). Therefore, the spfilteR package contains the function glmFilter() which uses maximum likelihood estimation (MLE) to fit a spatially filtered GLM and performs an unsupervised eigenvector search based on alternative objective functions.

Except for minor differences, glmFilter() works similar to the lmFilter() function discussed above. The option model defines the model that needs to be estimated. Currently, unsupervised spatial filtering is possible for logit, probit, and poisson models. Moreover, the option optim.method specifies the optimization method used to maximize the log-likelihood function. Finally, resid.type allows users to define the type of residuals used to calculate Moran’s I and boot.MI is an integer specifying the number of bootstrap permutations to obtain the variance of Moran’s I.

# define outcome variables
y.bin <- fakedataset$indicator
y.count <- fakedataset$count

# ESF logit model
(logit.esf <- glmFilter(y = y.bin, x = NULL, W = W, objfn = 'p', model = 'logit',
                        optim.method = 'BFGS', sig = .1, bonferroni = FALSE,
                        positive = TRUE, ideal.setsize = FALSE, alpha = .25,
                        resid.type = 'deviance', boot.MI = 100))
#> 5 out of 31 candidate eigenvectors selected

# ESF probit model
(probit.esf <- glmFilter(y = y.bin, x = NULL, W = W, objfn = 'p', model = 'probit',
                         optim.method = 'BFGS', sig = .1, bonferroni = FALSE,
                         positive = TRUE, ideal.setsize = FALSE, alpha = .25,
                         resid.type = 'deviance', boot.MI = 100))
#> 5 out of 31 candidate eigenvectors selected

# ESF poisson model
(poisson.esf <- glmFilter(y = y.count, x = NULL, W = W, objfn = 'BIC', model = 'poisson',
                          optim.method = 'BFGS', positive = TRUE, ideal.setsize = FALSE,
                          alpha = .25, resid.type = 'deviance', boot.MI = 100))
#> 7 out of 31 candidate eigenvectors selected

Again, if users wish to define individual selection criteria or fit a GLM currently not implemented in glmFilter(), they can obtain the eigenvectors using the getEVs() command and perform supervised eigenvector selection using the standard stats::glm() function.

References

Cliff, Andrew D, and John K. Ord. 1972. “Testing for Spatial Autocorrelation Among Regression Residuals.” Geographical Analysis 4 (3): 267–84. https://doi.org/10.1111/j.1538-4632.1972.tb00475.x.
———. 1981. Spatial Processes: Models & Applications. Advances in Spatial Science. London: Pion.
Dray, Stéphane. 2011. “A New Perspective about Moran’s Coefficient: Spatial Autocorrelation as a Linear Regression Problem.” Geographical Analysis 43 (2): 127–41. https://doi.org/10.1111/j.1538-4632.2011.00811.x.
Griffith, Daniel A. 2003. Spatial Autocorrelation and Spatial Filtering: Gaining Understanding Through Theory and Scientific Visualization. Advances in Spatial Science. Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-540-24806-4.
Griffith, Daniel A., Yongwan Chun, and Bin Li. 2019. Spatial Regression Analysis Using Eigenvector Spatial Filtering. Spatial Econometrics and Spatial Statistics. London: Elsevier. https://doi.org/10.1016/B978-0-12-815043-6.09990-0.
Tiefelsdorf, Michael, and Daniel A Griffith. 2007. “Semiparametric Filtering of Spatial Autocorrelation: The Eigenvector Approach.” Environment and Planning A: Economy and Space 39 (5): 1193–1221. https://doi.org/10.1068/a37378.