Start with the necessary packages for the vignette.
<- c("ggplot2", "ndi", "sf", "tidycensus", "tigris")
loadedPackages invisible(lapply(loadedPackages, library, character.only = TRUE))
options(tigris_use_cache = TRUE)
Set your U.S. Census Bureau access key. Follow this link to
obtain one. Specify your access key in the messer()
or
powell_wiley()
functions using the key
argument of the get_acs()
function from the
tidycensus
package called within each or by using the
census_api_key()
function from the tidycensus
package before running the messer()
or
powell_wiley()
functions (see an example of the latter
below).
::census_api_key("...") # INSERT YOUR OWN KEY FROM U.S. CENSUS API tidycensus
Compute the NDI (Messer) values (2006-2010 5-year ACS) for Georgia, U.S.A., census tracts. This metric is based on Messer et al. (2006) with the following socio-economic status (SES) variables:
Characteristic | SES dimension | ACS table source | Description |
---|---|---|---|
OCC | Occupation | C24030 | Percent males in management, science, and arts occupation |
CWD | Housing | B25014 | Percent of crowded housing |
POV | Poverty | B17017 | Percent of households in poverty |
FHH | Poverty | B25115 | Percent of female headed households with dependents |
PUB | Poverty | B19058 | Percent of households on public assistance |
U30 | Poverty | B19001 | Percent households earning <$30,000 per year |
EDU | Education | B06009 | Percent earning less than a high school education |
EMP | Employment | B23001 (2010 only); B23025 (2011 onward) | Percent unemployed |
<- ndi::messer(state = "GA", year = 2010) messer2010GA
One output from messer()
function is tibble containing
the identification, geographic name, NDI (Messer) values, and raw census
characteristics for each tract.
$ndi messer2010GA
## # A tibble: 1,969 × 14
## GEOID state county tract NDI NDIQu…¹ OCC CWD POV FHH PUB U30
## <chr> <chr> <chr> <chr> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 13001… Geor… Appli… 9501 -0.0075 2-Belo… 0 0 0.1 0.1 0.1 0.3
## 2 13001… Geor… Appli… 9502 0.0458 4-Most… 0 0 0.3 0.1 0.2 0.5
## 3 13001… Geor… Appli… 9503 0.0269 3-Abov… 0 0 0.2 0 0.2 0.4
## 4 13001… Geor… Appli… 9504 -0.0083 2-Belo… 0 0 0.1 0 0.1 0.3
## 5 13001… Geor… Appli… 9505 0.0231 3-Abov… 0 0 0.2 0 0.2 0.4
## 6 13003… Geor… Atkin… 9601 0.0619 4-Most… 0 0.1 0.2 0.2 0.2 0.5
## 7 13003… Geor… Atkin… 9602 0.0593 4-Most… 0 0.1 0.3 0.1 0.2 0.4
## 8 13003… Geor… Atkin… 9603 0.0252 3-Abov… 0 0 0.3 0.1 0.2 0.4
## 9 13005… Geor… Bacon… 9701 0.0061 3-Abov… 0 0 0.2 0 0.2 0.4
## 10 13005… Geor… Bacon… 9702… 0.0121 3-Abov… 0 0 0.2 0.1 0.1 0.5
## # … with 1,959 more rows, 2 more variables: EDU <dbl>, EMP <dbl>, and
## # abbreviated variable name ¹NDIQuart
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
A second output from messer()
function is the results
from the principal component analysis used to compute the NDI (Messer)
values.
$pca messer2010GA
## Principal Components Analysis
## Call: psych::principal(r = ndi_vars_pca, nfactors = 1, n.obs = nrow(ndi_vars_pca),
## covar = FALSE, scores = TRUE, missing = imp)
## Standardized loadings (pattern matrix) based upon correlation matrix
## PC1 h2 u2 com
## OCC -0.59 0.35 0.65 1
## CWD 0.47 0.22 0.78 1
## POV 0.87 0.76 0.24 1
## FHH 0.67 0.45 0.55 1
## PUB 0.89 0.79 0.21 1
## U30 0.90 0.81 0.19 1
## EDU 0.79 0.62 0.38 1
## EMP 0.46 0.21 0.79 1
##
## PC1
## SS loadings 4.21
## Proportion Var 0.53
##
## Mean item complexity = 1
## Test of the hypothesis that 1 component is sufficient.
##
## The root mean square of the residuals (RMSR) is 0.11
## with the empirical chi square 1221.09 with prob < 2.3e-246
##
## Fit based upon off diagonal values = 0.95
A third output from messer()
function is a tibble
containing a breakdown of the missingness of the census characteristics
used to compute the NDI (Messer) values.
$missing messer2010GA
## # A tibble: 8 × 5
## # Groups: variable, total, missing [8]
## variable total missing n percent
## <chr> <int> <lgl> <int> <chr>
## 1 CWD 1969 TRUE 14 0.71 %
## 2 EDU 1969 TRUE 13 0.66 %
## 3 EMP 1969 TRUE 13 0.66 %
## 4 FHH 1969 TRUE 14 0.71 %
## 5 OCC 1969 TRUE 15 0.76 %
## 6 POV 1969 TRUE 14 0.71 %
## 7 PUB 1969 TRUE 14 0.71 %
## 8 U30 1969 TRUE 14 0.71 %
We can visualize the NDI (Messer) values geographically by linking
them to spatial information from tigris
and plotting with
the ggplot2
package suite.
# Obtain the 2010 counties from the "tigris" package
<- tigris::counties(state = "GA", year = 2010, cb = TRUE)
county2010GA # Remove first 9 characters from GEOID for compatibility with tigris information
$GEOID <- substring(county2010GA$GEO_ID, 10)
county2010GA
# Obtain the 2010 census tracts from the "tigris" package
<- tigris::tracts(state = "GA", year = 2010, cb = TRUE)
tract2010GA # Remove first 9 characters from GEOID for compatibility with tigris information
$GEOID <- substring(tract2010GA$GEO_ID, 10)
tract2010GA
# Join the NDI (Messer) values to the census tract geometry
<- merge(tract2010GA, messer2010GA$ndi, by = "GEOID")
GA2010messer
# Visualize the NDI (Messer) values (2006-2010 5-year ACS) for Georgia, U.S.A., census tracts
## Continuous Index
::ggplot() +
ggplot2::geom_sf(data = GA2010messer,
ggplot2::aes(fill = NDI),
ggplot2size = 0.05,
color = "white") +
::geom_sf(data = county2010GA,
ggplot2fill = "transparent",
color = "white",
size = 0.2) +
::theme_minimal() +
ggplot2::scale_fill_viridis_c() +
ggplot2::labs(fill = "Index (Continuous)",
ggplot2caption = "Source: U.S. Census ACS 2006-2010 estimates") +
::ggtitle("Neighborhood Deprivation Index (Messer)",
ggplot2subtitle = "Georgia, U.S.A., census tracts as the referent")
## Categorical Index
### Rename "9-NDI not avail" level as NA for plotting
$NDIQuartNA <- factor(replace(as.character(GA2010messer$NDIQuart),
GA2010messer$NDIQuart == "9-NDI not avail", NA),
GA2010messerc(levels(GA2010messer$NDIQuart)[-5], NA))
::ggplot() +
ggplot2::geom_sf(data = GA2010messer,
ggplot2::aes(fill = NDIQuartNA),
ggplot2size = 0.05,
color = "white") +
::geom_sf(data = county2010GA,
ggplot2fill = "transparent",
color = "white",
size = 0.2) +
::theme_minimal() +
ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
ggplot2na.value = "grey80") +
::labs(fill = "Index (Categorical)",
ggplot2caption = "Source: U.S. Census ACS 2006-2010 estimates") +
::ggtitle("Neighborhood Deprivation Index (Messer) Quartiles",
ggplot2subtitle = "Georgia, U.S.A., census tracts as the referent")
The results above are at the tract-level. The NDI (Messer) values can also be calculated at the county level.
<- ndi::messer(geo = "county", state = "GA", year = 2010)
messer2010GA_county
# Join the NDI (Messer) values to the county geometry
<- merge(county2010GA, messer2010GA_county$ndi, by = "GEOID")
GA2010messer_county
# Visualize the NDI (Messer) values (2006-2010 5-year ACS) for Georgia, U.S.A., counties
## Continuous Index
::ggplot() +
ggplot2::geom_sf(data = GA2010messer_county,
ggplot2::aes(fill = NDI),
ggplot2size = 0.20,
color = "white") +
::theme_minimal() +
ggplot2::scale_fill_viridis_c() +
ggplot2::labs(fill = "Index (Continuous)",
ggplot2caption = "Source: U.S. Census ACS 2006-2010 estimates") +
::ggtitle("Neighborhood Deprivation Index (Messer)",
ggplot2subtitle = "Georgia, U.S.A., counties as the referent")
## Categorical Index
### Rename "9-NDI not avail" level as NA for plotting
$NDIQuartNA <- factor(replace(as.character(GA2010messer_county$NDIQuart),
GA2010messer_county$NDIQuart == "9-NDI not avail", NA),
GA2010messer_countyc(levels(GA2010messer_county$NDIQuart)[-5], NA))
::ggplot() +
ggplot2::geom_sf(data = GA2010messer_county,
ggplot2::aes(fill = NDIQuartNA),
ggplot2size = 0.20,
color = "white") +
::theme_minimal() +
ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
ggplot2na.value = "grey80") +
::labs(fill = "Index (Categorical)",
ggplot2caption = "Source: U.S. Census ACS 2006-2010 estimates") +
::ggtitle("Neighborhood Deprivation Index (Messer) Quartiles",
ggplot2subtitle = "Georgia, U.S.A., counties as the referent")
Compute the NDI (Powell-Wiley) values (2016-2020 5-year ACS) for Maryland, Virginia, Washington, D.C., and West Virginia census tracts. This metric is based on Andrews et al. (2020) and Slotman et al. (2022) with socio-economic status (SES) variables chosen by Roux and Mair (2010):
Characteristic | SES dimension | ACS table source | Description |
---|---|---|---|
MedHHInc | Wealth and income | B19013 | Median household income (dollars) |
PctRecvIDR | Wealth and income | B19054 | Percent of households receiving dividends, interest, or rental income |
PctPubAsst | Wealth and income | B19058 | Percent of households receiving public assistance |
MedHomeVal | Wealth and income | B25077 | Median home value (dollars) |
PctMgmtBusSciArt | Occupation | C24060 | Percent in a management, business, science, or arts occupation |
PctFemHeadKids | Occupation | B11005 | Percent of households that are female headed with any children under 18 years |
PctOwnerOcc | Housing conditions | DP04 | Percent of housing units that are owner occupied |
PctNoPhone | Housing conditions | DP04 | Percent of households without a telephone |
PctNComPlmb | Housing conditions | DP04 | Percent of households without complete plumbing facilities |
PctEducHSPlus | Education | S1501 | Percent with a high school degree or higher (population 25 years and over) |
PctEducBchPlus | Education | S1501 | Percent with a college degree or higher (population 25 years and over) |
PctFamBelowPov | Wealth and income | S1702 | Percent of families with incomes below the poverty level |
PctUnempl | Occupation | S2301 | Percent unemployed |
More information about the codebook and computation of the NDI (Powell-Wiley) can be found on a GIS Portal for Cancer Research website.
<- ndi::powell_wiley(state = c("DC", "MD", "VA", "WV"), year = 2020) powell_wiley2020DMVW
One output from powell_wiley()
function is tibble
containing the identification, geographic name, NDI (Powell-Wiley)
values, and raw census characteristics for each tract.
$ndi powell_wiley2020DMVW
## # A tibble: 4,425 × 20
## GEOID state county tract NDI NDIQu…¹ MedHH…² PctRe…³ PctPu…⁴ MedHo…⁵
## <chr> <chr> <chr> <chr> <dbl> <fct> <dbl> <dbl> <dbl> <dbl>
## 1 11001000101 Distr… Distr… 1.01 -2.13 1-Leas… 187839 50.9 0.8 699100
## 2 11001000102 Distr… Distr… 1.02 -2.46 1-Leas… 184167 52.2 0.6 1556000
## 3 11001000201 Distr… Distr… 2.01 NA 9-NDI … NA NaN NaN NA
## 4 11001000202 Distr… Distr… 2.02 -2.30 1-Leas… 164261 49.6 0.9 1309100
## 5 11001000300 Distr… Distr… 3 -2.06 1-Leas… 156483 46 0.6 976500
## 6 11001000400 Distr… Distr… 4 -2.09 1-Leas… 153397 47.8 0 1164200
## 7 11001000501 Distr… Distr… 5.01 -2.11 1-Leas… 119911 44.5 0.8 674600
## 8 11001000502 Distr… Distr… 5.02 -2.21 1-Leas… 153264 46.8 0.5 1012500
## 9 11001000600 Distr… Distr… 6 -2.16 1-Leas… 154266 60.8 7.4 1109800
## 10 11001000702 Distr… Distr… 7.02 -1.20 1-Leas… 71747 22.9 0 289900
## # … with 4,415 more rows, 10 more variables: PctMgmtBusScArti <dbl>,
## # PctFemHeadKids <dbl>, PctOwnerOcc <dbl>, PctNoPhone <dbl>,
## # PctNComPlmb <dbl>, PctEducHSPlus <dbl>, PctEducBchPlus <dbl>,
## # PctFamBelowPov <dbl>, PctUnempl <dbl>, TotalPop <dbl>, and abbreviated
## # variable names ¹NDIQuint, ²MedHHInc, ³PctRecvIDR, ⁴PctPubAsst, ⁵MedHomeVal
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
A second output from powell_wiley()
function is the
results from the principal component analysis used to compute the NDI
(Powell-Wiley) values.
$pca powell_wiley2020DMVW
## $loadings
##
## Loadings:
## PC1 PC2 PC3
## logMedHHInc -0.638 -0.364
## PctNoIDRZ 0.612 0.319
## PctPubAsstZ 0.379 0.615
## logMedHomeVal -0.893
## PctWorkClassZ 0.974
## PctFemHeadKidsZ 0.128 0.697 -0.233
## PctNotOwnerOccZ -0.375 0.923
## PctNoPhoneZ 0.329 0.524
## PctNComPlmbZ -0.141 0.869
## PctEducLTHSZ 0.642 0.164
## PctEducLTBchZ 1.020 -0.121
## PctFamBelowPovZ 0.219 0.698
## PctUnemplZ 0.596
##
## PC1 PC2 PC3
## SS loadings 4.340 2.971 1.102
## Proportion Var 0.334 0.229 0.085
## Cumulative Var 0.334 0.562 0.647
##
## $rotmat
## [,1] [,2] [,3]
## [1,] 0.68516447 0.4403017 0.04517229
## [2,] -0.95446519 1.0806634 0.03196818
## [3,] -0.09078904 -0.2028145 1.02053725
##
## $rotation
## [1] "promax"
##
## $Phi
## [,1] [,2] [,3]
## [1,] 1.0000000 0.5250277 0.1649550
## [2,] 0.5250277 1.0000000 0.1923867
## [3,] 0.1649550 0.1923867 1.0000000
##
## $Structure
## [,1] [,2] [,3]
## [1,] -0.8432264 -0.71542812 -0.26037289
## [2,] 0.7750210 0.63477178 0.13357439
## [3,] 0.7001489 0.81237803 0.17181801
## [4,] -0.8931597 -0.46211477 -0.19777858
## [5,] 0.9253217 0.42037509 0.12424330
## [6,] 0.4559503 0.71962940 -0.07780352
## [7,] 0.1195748 0.73799969 0.17788881
## [8,] 0.2552300 0.42753150 0.58693047
## [9,] 0.1155170 0.05008712 0.84948765
## [10,] 0.7325510 0.50620197 0.16403994
## [11,] 0.9557341 0.41335682 0.13787215
## [12,] 0.5881663 0.81650181 0.18717861
## [13,] 0.3841044 0.63036827 0.09925687
##
## $communality
## [1] 0.5468870 0.4774810 0.5220533 0.8012746 0.9576793 0.5566938 0.9968490
## [8] 0.3829550 0.7774209 0.4398519 1.0560478 0.5359573 0.3616523
##
## $uniqueness
## logMedHHInc PctNoIDRZ PctPubAsstZ logMedHomeVal PctWorkClassZ
## 0.453113047 0.522518997 0.477946655 0.198725386 0.042320711
## PctFemHeadKidsZ PctNotOwnerOccZ PctNoPhoneZ PctNComPlmbZ PctEducLTHSZ
## 0.443306153 0.003150984 0.617045044 0.222579117 0.560148142
## PctEducLTBchZ PctFamBelowPovZ PctUnemplZ
## -0.056047809 0.464042693 0.638347726
##
## $Vaccounted
## [,1] [,2] [,3]
## SS loadings 4.5987837 3.2131788 1.10386926
## Proportion Var 0.3537526 0.2471676 0.08491302
## Cumulative Var 0.3537526 0.6009202 0.68583321
## Proportion Explained 0.5157997 0.3603902 0.12381002
## Cumulative Proportion 0.5157997 0.8761900 1.00000000
A third output from powell_wiley()
function is a tibble
containing a breakdown of the missingness of the census characteristics
used to compute the NDI (Powell-Wiley) values.
$missing powell_wiley2020DMVW
## # A tibble: 13 × 5
## # Groups: variable, total, missing [13]
## variable total missing n percent
## <chr> <int> <lgl> <int> <chr>
## 1 MedHHInc 4425 TRUE 73 1.65 %
## 2 MedHomeVal 4425 TRUE 148 3.34 %
## 3 PctEducBchPlus 4425 TRUE 47 1.06 %
## 4 PctEducHSPlus 4425 TRUE 47 1.06 %
## 5 PctFamBelowPov 4425 TRUE 63 1.42 %
## 6 PctFemHeadKids 4425 TRUE 60 1.36 %
## 7 PctMgmtBusScArti 4425 TRUE 57 1.29 %
## 8 PctNComPlmb 4425 TRUE 60 1.36 %
## 9 PctNoPhone 4425 TRUE 60 1.36 %
## 10 PctOwnerOcc 4425 TRUE 60 1.36 %
## 11 PctPubAsst 4425 TRUE 60 1.36 %
## 12 PctRecvIDR 4425 TRUE 60 1.36 %
## 13 PctUnempl 4425 TRUE 57 1.29 %
A fourth output from powell_wiley()
function is a
character string or numeric value of a standardized Cronbach’s alpha. A
value greater than 0.7 is desired.
$cronbach powell_wiley2020DMVW
## [1] 0.931138
We can visualize the NDI (Powell-Wiley) values geographically by
linking them to spatial information from tigris
and
plotting with the ggplot2
package suite.
# Obtain the 2020 census tracts from the "tigris" package
<- tigris::states(cb = TRUE)
state2020 <- state2020[state2020$STUSPS %in% c("DC", "MD", "VA", "WV"), ]
state2020DMVW <- tigris::tracts(state = "DC", year = 2020, cb = TRUE)
tract2020D <- tigris::tracts(state = "MD", year = 2020, cb = TRUE)
tract2020M <- tigris::tracts(state = "VA", year = 2020, cb = TRUE)
tract2020V <- tigris::tracts(state = "WV", year = 2020, cb = TRUE)
tract2020W <- rbind(tract2020D, tract2020M, tract2020V, tract2020W)
tracts2020DMVW
# Join the NDI (Powell-Wiley) values to the census tract geometry
<- merge(tracts2020DMVW, powell_wiley2020DMVW$ndi, by = "GEOID")
DMVW2020powell_wiley
# Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS)
## Maryland, Virginia, Washington, D.C., and West Virginia census tracts
## Continuous Index
::ggplot() +
ggplot2::geom_sf(data = DMVW2020powell_wiley,
ggplot2::aes(fill = NDI),
ggplot2color = NA) +
::geom_sf(data = state2020DMVW,
ggplot2fill = "transparent",
color = "white") +
::theme_minimal() +
ggplot2::scale_fill_viridis_c(na.value = "grey80") +
ggplot2::labs(fill = "Index (Continuous)",
ggplot2caption = "Source: U.S. Census ACS 2016-2020 estimates")+
::ggtitle("Neighborhood Deprivation Index (Powell-Wiley)",
ggplot2subtitle = "Maryland, Virginia, Washington, D.C., and West Virginia tracts as the referent")
## Categorical Index (Population-weighted quintiles)
### Rename "9-NDI not avail" level as NA for plotting
$NDIQuintNA <- factor(replace(as.character(DMVW2020powell_wiley$NDIQuint),
DMVW2020powell_wiley$NDIQuint == "9-NDI not avail", NA),
DMVW2020powell_wileyc(levels(DMVW2020powell_wiley$NDIQuint)[-6], NA))
::ggplot() +
ggplot2::geom_sf(data = DMVW2020powell_wiley,
ggplot2::aes(fill = NDIQuintNA),
ggplot2color = NA) +
::geom_sf(data = state2020DMVW,
ggplot2fill = "transparent",
color = "white") +
::theme_minimal() +
ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
ggplot2na.value = "grey80") +
::labs(fill = "Index (Categorical)",
ggplot2caption = "Source: U.S. Census ACS 2016-2020 estimates")+
::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Population-weighted Quintiles",
ggplot2subtitle = "Maryland, Virginia, Washington, D.C., and West Virginia tracts as the referent")
Like the NDI (Messer), we also calculate county-level NDI (Powell-Wiley).
# Obtain the 2020 counties from the "tigris" package
<- tigris::counties(state = c("DC", "MD", "VA", "WV"), year = 2020, cb = TRUE)
county2010DMVW
# NDI (Powell-Wiley) at the county level (2016-2020)
<- ndi::powell_wiley(geo = "county", state = c("DC", "MD", "VA", "WV"), year = 2020)
powell_wiley2020DMVW_county
# Join the NDI (Powell-Wiley) values to the county geometry
<- merge(county2010DMVW, powell_wiley2020DMVW_county$ndi, by = "GEOID")
DMVW2020powell_wiley_county
# Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS)
## Maryland, Virginia, Washington, D.C., and West Virginia counties
## Continuous Index
::ggplot() +
ggplot2::geom_sf(data = DMVW2020powell_wiley_county,
ggplot2::aes(fill = NDI),
ggplot2size = 0.20,
color = "white") +
::theme_minimal() +
ggplot2::scale_fill_viridis_c() +
ggplot2::labs(fill = "Index (Continuous)",
ggplot2caption = "Source: U.S. Census ACS 2016-2020 estimates") +
::ggtitle("Neighborhood Deprivation Index (Powell-Wiley)",
ggplot2subtitle = "Maryland, Virginia, Washington, D.C., and West Virginia counties as the referent")
## Categorical Index
### Rename "9-NDI not avail" level as NA for plotting
$NDIQuintNA <- factor(replace(as.character(DMVW2020powell_wiley_county$NDIQuint),
DMVW2020powell_wiley_county$NDIQuint == "9-NDI not avail", NA),
DMVW2020powell_wiley_countyc(levels(DMVW2020powell_wiley_county$NDIQuint)[-6], NA))
::ggplot() +
ggplot2::geom_sf(data = DMVW2020powell_wiley_county,
ggplot2::aes(fill = NDIQuint),
ggplot2size = 0.20,
color = "white") +
::theme_minimal() +
ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
ggplot2na.value = "grey80") +
::labs(fill = "Index (Categorical)",
ggplot2caption = "Source: U.S. Census ACS 2016-2020 estimates") +
::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Population-weighted Quintiles",
ggplot2subtitle = "Maryland, Virginia, Washington, D.C., and West Virginia counties as the referent")
In the messer()
and powell_wiley()
functions, missing census characteristics can be imputed using the
missing
and impute
arguments of the
pca()
and fa()
functions in the
psych
package called within the messer()
and
powell_wiley()
function, respectively. Impute values using
the logical imp
argument (currently only calls
impute = "median"
by default, which assigns the median
values of each missing census variable for a geography).
<- ndi::powell_wiley(state = "DC", year = 2020) # without imputation
powell_wiley2020DC <- ndi::powell_wiley(state = "DC", year = 2020, imp = TRUE) # with imputation
powell_wiley2020DCi
table(is.na(powell_wiley2020DC$ndi$NDI)) # n=2 tracts without NDI (Powell-Wiley) values
table(is.na(powell_wiley2020DCi$ndi$NDI)) # n=0 tracts without NDI (Powell-Wiley) values
# Obtain the 2020 census tracts from the "tigris" package
<- tigris::tracts(state = "DC", year = 2020, cb = TRUE)
tract2020DC
# Join the NDI (Powell-Wiley) values to the census tract geometry
<- merge(tract2020DC, powell_wiley2020DC$ndi, by = "GEOID")
DC2020powell_wiley <- merge(DC2020powell_wiley, powell_wiley2020DCi$ndi, by = "GEOID", suffixes = c("_nonimp", "_imp"))
DC2020powell_wiley
# Visualize the NDI (Powell-Wiley) values (2006-2010 5-year ACS) for Washington, D.C., census tracts
## Continuous Index
::ggplot() +
ggplot2::geom_sf(data = DC2020powell_wiley,
ggplot2::aes(fill = NDI_nonimp),
ggplot2size = 0.2,
color = "white") +
::theme_minimal() +
ggplot2::scale_fill_viridis_c() +
ggplot2::labs(fill = "Index (Continuous)",
ggplot2caption = "Source: U.S. Census ACS 2016-2020 estimates") +
::ggtitle("Neighborhood Deprivation Index (Powell-Wiley), Non-Imputed",
ggplot2subtitle = "Washington, D.C., census tracts as the referent")
::ggplot() +
ggplot2::geom_sf(data = DC2020powell_wiley,
ggplot2::aes(fill = NDI_imp),
ggplot2size = 0.2,
color = "white") +
::theme_minimal() +
ggplot2::scale_fill_viridis_c() +
ggplot2::labs(fill = "Index (Continuous)",
ggplot2caption = "Source: U.S. Census ACS 2016-2020 estimates") +
::ggtitle("Neighborhood Deprivation Index (Powell-Wiley), Imputed",
ggplot2subtitle = "Washington, D.C., census tracts as the referent")
## Categorical Index
### Rename "9-NDI not avail" level as NA for plotting
$NDIQuintNA_nonimp <- factor(replace(as.character(DC2020powell_wiley$NDIQuint_nonimp),
DC2020powell_wiley$NDIQuint_nonimp == "9-NDI not avail", NA),
DC2020powell_wileyc(levels(DC2020powell_wiley$NDIQuint_nonimp)[-6], NA))
::ggplot() +
ggplot2::geom_sf(data = DC2020powell_wiley,
ggplot2::aes(fill = NDIQuintNA_nonimp),
ggplot2size = 0.2,
color = "white") +
::theme_minimal() +
ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
ggplot2na.value = "grey80") +
::labs(fill = "Index (Categorical)",
ggplot2caption = "Source: U.S. Census ACS 2016-2020 estimates") +
::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Quintiles, Non-Imputed",
ggplot2subtitle = "Washington, D.C., census tracts as the referent")
### Rename "9-NDI not avail" level as NA for plotting
$NDIQuintNA_imp <- factor(replace(as.character(DC2020powell_wiley$NDIQuint_imp),
DC2020powell_wiley$NDIQuint_imp == "9-NDI not avail", NA),
DC2020powell_wileyc(levels(DC2020powell_wiley$NDIQuint_imp)[-6], NA))
::ggplot() +
ggplot2::geom_sf(data = DC2020powell_wiley,
ggplot2::aes(fill = NDIQuintNA_imp),
ggplot2size = 0.2,
color = "white") +
::theme_minimal() +
ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
ggplot2na.value = "grey80") +
::labs(fill = "Index (Categorical)",
ggplot2caption = "Source: U.S. Census ACS 2016-2020 estimates") +
::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Quintiles, Imputed",
ggplot2subtitle = "Washington, D.C., census tracts as the referent")
To conduct a contiguous US-standardized index, compute an NDI for all
states as in the example below that replicates the nationally
standardized NDI (Powell-Wiley) values (2013-2017 ACS-5) found in Slotman et
al. (2022) and available from a GIS Portal for
Cancer Research website. To replicate the nationally standardized
NDI (Powell-Wiley) values (2006-2010 ACS-5) found in Andrews et
al. (2020) change the year
argument to
2010
(i.e., year = 2010
).
<- tigris::states()
us <- c("Commonwealth of the Northern Mariana Islands", "Guam", "American Samoa",
n51 "Puerto Rico", "United States Virgin Islands")
<- us$STUSPS[!(us$NAME %in% n51)]
y51
<- Sys.time() # record start time
start_time <- ndi::powell_wiley(state = y51, year = 2017)
powell_wiley2017US <- Sys.time() # record end time
end_time <- end_time - start_time # Calculate run time
time_srr powell_wiley2017US
## $ndi
## # A tibble: 73,056 × 20
## GEOID state county tract NDI NDIQu…¹ MedHH…² PctRe…³ PctPu…⁴ MedHo…⁵
## <chr> <chr> <chr> <chr> <dbl> <fct> <dbl> <dbl> <dbl> <dbl>
## 1 01001020… Alab… Autau… 201 -2.51e-1 2-Belo… 67826 26.9 12.1 152500
## 2 01001020… Alab… Autau… 202 8.27e-1 4-Abov… 41287 10.9 24.1 96100
## 3 01001020… Alab… Autau… 203 6.63e-1 4-Abov… 46806 12.3 12.9 98900
## 4 01001020… Alab… Autau… 204 1.90e-1 3-Aver… 55895 17.8 5.7 140800
## 5 01001020… Alab… Autau… 205 -5.44e-1 2-Belo… 68143 20.2 8.8 187900
## 6 01001020… Alab… Autau… 206 7.26e-1 4-Abov… 44549 17.8 16 93300
## 7 01001020… Alab… Autau… 207 1.03e+0 5-Most… 41250 8.4 23.2 90800
## 8 01001020… Alab… Autau… 208.… -7.12e-1 2-Belo… 80089 25.7 13.1 299100
## 9 01001020… Alab… Autau… 208.… -3.76e-4 3-Aver… 64439 22.2 9.4 163200
## 10 01001020… Alab… Autau… 209 5.07e-1 4-Abov… 49469 11.6 19 120200
## # … with 73,046 more rows, 10 more variables: PctMgmtBusScArti <dbl>,
## # PctFemHeadKids <dbl>, PctOwnerOcc <dbl>, PctNoPhone <dbl>,
## # PctNComPlmb <dbl>, PctEducHSPlus <dbl>, PctEducBchPlus <dbl>,
## # PctFamBelowPov <dbl>, PctUnempl <dbl>, TotalPop <dbl>, and abbreviated
## # variable names ¹NDIQuint, ²MedHHInc, ³PctRecvIDR, ⁴PctPubAsst, ⁵MedHomeVal
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
##
## $pca
## $pca$loadings
##
## Loadings:
## PC1 PC2 PC3
## logMedHHInc -0.508 -0.482 -0.106
## PctNoIDRZ 0.520 0.446
## PctPubAsstZ 0.309 0.686
## logMedHomeVal -0.902 0.135
## PctWorkClassZ 0.896
## PctFemHeadKidsZ 0.194 0.701 -0.145
## PctNotOwnerOccZ -0.418 1.022
## PctNoPhoneZ 0.208 0.654
## PctNComPlmbZ -0.114 0.855
## PctEducLTHSZ 0.448 0.420
## PctEducLTBchZ 0.989
## PctFamBelowPovZ 0.183 0.772
## PctUnemplZ 0.125 0.612
##
## PC1 PC2 PC3
## SS loadings 3.680 3.667 1.203
## Proportion Var 0.283 0.282 0.093
## Cumulative Var 0.283 0.565 0.658
##
## $pca$rotmat
## [,1] [,2] [,3]
## [1,] 0.55862784 0.5566946 0.05689447
## [2,] -1.07454532 1.0353417 0.11522338
## [3,] -0.02457479 -0.3181005 1.02299745
##
## $pca$rotation
## [1] "promax"
##
## $pca$Phi
## [,1] [,2] [,3]
## [1,] 1.0000000 0.5589563 0.2019482
## [2,] 0.5589563 1.0000000 0.2257122
## [3,] 0.2019482 0.2257122 1.0000000
##
## $pca$Structure
## [,1] [,2] [,3]
## [1,] -0.7983619 -0.78924511 -0.3169437
## [2,] 0.7566491 0.72262792 0.1464839
## [3,] 0.6945047 0.86105916 0.2250651
## [4,] -0.8412442 -0.38554970 -0.2228492
## [5,] 0.9209060 0.55006528 0.1613593
## [6,] 0.5562044 0.77638871 0.0519852
## [7,] 0.1563371 0.79221347 0.1626307
## [8,] 0.2466075 0.35502655 0.7010271
## [9,] 0.1201396 0.08519067 0.8311965
## [10,] 0.6827484 0.67044690 0.1843296
## [11,] 0.9475443 0.48138341 0.1669005
## [12,] 0.6232212 0.88362870 0.2527666
## [13,] 0.4705114 0.68557375 0.1788853
##
## $pca$communality
## [1] 0.5009586 0.4719001 0.5669373 0.8372646 0.8066373 0.5499017 1.2207095
## [8] 0.4716959 0.7435617 0.3773519 0.9823819 0.6309002 0.3905570
##
## $pca$uniqueness
## logMedHHInc PctNoIDRZ PctPubAsstZ logMedHomeVal PctWorkClassZ
## 0.49904136 0.52809991 0.43306269 0.16273538 0.19336271
## PctFemHeadKidsZ PctNotOwnerOccZ PctNoPhoneZ PctNComPlmbZ PctEducLTHSZ
## 0.45009834 -0.22070951 0.52830412 0.25643832 0.62264809
## PctEducLTBchZ PctFamBelowPovZ PctUnemplZ
## 0.01761813 0.36909983 0.60944297
##
## $pca$Vaccounted
## [,1] [,2] [,3]
## SS loadings 4.0565234 4.0415662 1.21158031
## Proportion Var 0.3120403 0.3108897 0.09319849
## Cumulative Var 0.3120403 0.6229300 0.71612846
## Proportion Explained 0.4357322 0.4341256 0.13014213
## Cumulative Proportion 0.4357322 0.8698579 1.00000000
##
##
## $missing
## # A tibble: 13 × 5
## # Groups: variable, total, missing [13]
## variable total missing n percent
## <chr> <int> <lgl> <int> <chr>
## 1 MedHHInc 73056 TRUE 995 1.36 %
## 2 MedHomeVal 73056 TRUE 1948 2.67 %
## 3 PctEducBchPlus 73056 TRUE 646 0.88 %
## 4 PctEducHSPlus 73056 TRUE 646 0.88 %
## 5 PctFamBelowPov 73056 TRUE 867 1.19 %
## 6 PctFemHeadKids 73056 TRUE 816 1.12 %
## 7 PctMgmtBusScArti 73056 TRUE 752 1.03 %
## 8 PctNComPlmb 73056 TRUE 816 1.12 %
## 9 PctNoPhone 73056 TRUE 1377 1.88 %
## 10 PctOwnerOcc 73056 TRUE 816 1.12 %
## 11 PctPubAsst 73056 TRUE 816 1.12 %
## 12 PctRecvIDR 73056 TRUE 816 1.12 %
## 13 PctUnempl 73056 TRUE 751 1.03 %
##
## $cronbach
## [1] 0.9372104
::ggplot(powell_wiley2017US$ndi, ggplot2::aes(x = NDI)) +
ggplot2::geom_histogram(color = "black", fill = "white") +
ggplot2::theme_minimal() +
ggplot2::ggtitle("Histogram of US-standardized NDI (Powell-Wiley) values (2013-2017)",
ggplot2subtitle = "U.S. census tracts as the referent (including Alaska, Hawaii, and Washington, D.C.)")
The process to compute a US-standardized NDI (Powell-Wiley) took about 4.5 minutes to run on an machine with the following features:
sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] tigris_1.6.1 tidycensus_1.2.2 sf_1.0-7 ndi_0.0.1
## [5] ggplot2_3.3.6 knitr_1.39
##
## loaded via a namespace (and not attached):
## [1] httr_1.4.3 sass_0.4.2 tidyr_1.2.0 jsonlite_1.8.0
## [5] viridisLite_0.4.0 bslib_0.4.0 sp_1.5-0 highr_0.9
## [9] yaml_2.3.5 pillar_1.8.0 lattice_0.20-45 glue_1.6.2
## [13] uuid_1.1-0 digest_0.6.29 rvest_1.0.2 colorspace_2.0-3
## [17] htmltools_0.5.3 psych_2.2.5 pkgconfig_2.0.3 s2_1.1.0
## [21] purrr_0.3.4 scales_1.2.0 tzdb_0.3.0 tibble_3.1.8
## [25] proxy_0.4-27 generics_0.1.3 farver_2.1.1 ellipsis_0.3.2
## [29] cachem_1.0.6 withr_2.5.0 cli_3.3.0 mnormt_2.1.0
## [33] magrittr_2.0.3 crayon_1.5.1 maptools_1.1-4 evaluate_0.15
## [37] fansi_1.0.3 nlme_3.1-158 MASS_7.3-57 xml2_1.3.3
## [41] foreign_0.8-82 class_7.3-20 tools_4.2.1 hms_1.1.1
## [45] lifecycle_1.0.1 stringr_1.4.0 munsell_0.5.0 compiler_4.2.1
## [49] jquerylib_0.1.4 e1071_1.7-11 rlang_1.0.4 classInt_0.4-7
## [53] units_0.8-0 grid_4.2.1 rstudioapi_0.13 rappdirs_0.3.3
## [57] labeling_0.4.2 rmarkdown_2.14 wk_0.6.0 gtable_0.3.0
## [61] DBI_1.1.3 curl_4.3.2 R6_2.5.1 dplyr_1.0.9
## [65] rgdal_1.5-30 fastmap_1.1.0 utf8_1.2.2 KernSmooth_2.23-20
## [69] readr_2.1.2 stringi_1.7.8 parallel_4.2.1 Rcpp_1.0.9
## [73] vctrs_0.4.1 tidyselect_1.1.2 xfun_0.31