This is an appendix to the main vignette, “Covariate Balance Tables and Plots: A Guide to the cobalt Package”. It contains descriptions and demonstrations of several utility functions in cobalt
and the use of bal.tab()
with twang
, Matching
, optmatch
, CBPS
, ebal
, designmatch
, sbw
, MatchThem
, and cem
. Note that MatchIt
can perform most of the functions that Matching
, optmatch
, and cem
can, and WeightIt
can perform most of the functions that twang
, CBPS
, ebal
, and sbw
can. Because cobalt
has been optimized to work with MatchIt
and WeightIt
, it is recommended to use those packages to simplify preprocessing and balance assessment, but we recognize users may prefer to use the packages described in this vignette.
In addition to its main balance assessment functions, cobalt
contains several utility functions. These are meant to reduce the typing and programming burden that often accompany the use of R with a diverse set of packages.
f.build()
f.build()
is a small tool that can be helpful in quickly specifying formula inputs to functions. An example is provided below:
data("lalonde", package = "cobalt")
<- subset(lalonde, select = -c(treat, re78))
covs f.build("treat", covs)
## treat ~ age + educ + race + married + nodegree + re74 + re75
## <environment: 0x7fe89f2d3d20>
The function creates a formula
object from two inputs: the first argument is the quoted name of the variable to be the left hand side (response) variable in the formula; the second argument is a vector of right hand side (predictor) variable names or a data frame, the variable names of which are to be the predictor variables. The utility of f.build()
is that the user does not have to manually type out the name of every covariate when entering a formula into a function. It can be used simply in place of a formula, as in the following examples, which make use of the objects defined above:
# Generating propensity scores using logistic regression
<- glm(f.build("treat", covs), data = lalonde, family = "binomial")$fitted.values
p.score
# Using matchit() from the MatchIt package
library("MatchIt")
<- matchit(f.build("treat", covs), data = lalonde, method = "nearest") m.out
splitfactor()
and unsplitfactor()
Some functions (outside of cobalt
) are not friendly to factor or character variables, and require numeric variables to operate correctly. For example, some regression-style functions, such as ebalance()
in ebal
, can only take in non-singular numeric matrices. Other functions will process factor variables, but will return output in terms of dummy coded version of the factors. For example, lm()
will create dummy variables out of a factor and drop the reference category to create regression coefficients.
To prepare data sets for use in functions that do not allow factors or to mimic the output of functions that split factor variables, users can use splitfactor()
, which takes in a data set and the names of variables to split, and outputs a new data set with newly created dummy variables. Below is an example splitting the race
variable in the Lalonde data set into dummies, eliminating the reference category ("black"
):
head(lalonde)
treat | age | educ | race | married | nodegree | re74 | re75 | re78 |
---|---|---|---|---|---|---|---|---|
1 | 37 | 11 | black | 1 | 1 | 0 | 0 | 9930.0460 |
1 | 22 | 9 | hispan | 0 | 1 | 0 | 0 | 3595.8940 |
1 | 30 | 12 | black | 0 | 0 | 0 | 0 | 24909.4500 |
1 | 27 | 11 | black | 0 | 1 | 0 | 0 | 7506.1460 |
1 | 33 | 8 | black | 0 | 1 | 0 | 0 | 289.7899 |
1 | 22 | 9 | black | 0 | 1 | 0 | 0 | 4056.4940 |
<- splitfactor(lalonde, "race")
lalonde.split head(lalonde.split)
treat | age | educ | race_hispan | race_white | married | nodegree | re74 | re75 | re78 |
---|---|---|---|---|---|---|---|---|---|
1 | 37 | 11 | 0 | 0 | 1 | 1 | 0 | 0 | 9930.0460 |
1 | 22 | 9 | 1 | 0 | 0 | 1 | 0 | 0 | 3595.8940 |
1 | 30 | 12 | 0 | 0 | 0 | 0 | 0 | 0 | 24909.4500 |
1 | 27 | 11 | 0 | 0 | 0 | 1 | 0 | 0 | 7506.1460 |
1 | 33 | 8 | 0 | 0 | 0 | 1 | 0 | 0 | 289.7899 |
1 | 22 | 9 | 0 | 0 | 0 | 1 | 0 | 0 | 4056.4940 |
It is possible to undo the action of splitfactor()
with unsplitfactor()
, which takes in a data set with dummy variables formed from splitfactor()
or otherwise and recreates the original factor variable. If the reference category was dropped, its value needs to be supplied.
<- unsplitfactor(lalonde.split, "race",
lalonde.unsplit dropped.level = "black")
head(lalonde.unsplit)
treat | age | educ | race | married | nodegree | re74 | re75 | re78 |
---|---|---|---|---|---|---|---|---|
1 | 37 | 11 | black | 1 | 1 | 0 | 0 | 9930.0460 |
1 | 22 | 9 | hispan | 0 | 1 | 0 | 0 | 3595.8940 |
1 | 30 | 12 | black | 0 | 0 | 0 | 0 | 24909.4500 |
1 | 27 | 11 | black | 0 | 1 | 0 | 0 | 7506.1460 |
1 | 33 | 8 | black | 0 | 1 | 0 | 0 | 289.7899 |
1 | 22 | 9 | black | 0 | 1 | 0 | 0 | 4056.4940 |
Notice the original data set and the unsplit data set look identical. If the input to unsplitfactor()
is the output of a call to splitfactor()
(as it was here), you don’t need to tell unsplitfactor()
the name of the split variable or the value of the dropped level. It was done here for illustration purposes.
get.w()
get.w()
allows users to extract weights from the output of a call to a preprocessing function in one of the supported packages. Because each package stores weights in different ways, it can be helpful to have a single function that applies equally to all outputs. twang
has a function called get.weights()
that performs the same functions with slightly finer control for the output of a call to ps()
.
bal.tab()
The next sections describe the use of bal.tab()
with packages other than those described in the main vignette. Even if you are using bal.tab()
with one of these packages, it may be useful to read the main vignette to understand bal.tab()
’s main options, which are not detailed here.
bal.tab()
with twang
Generalized boosted modeling (GBM), as implemented in twang
, can be an effective way to generate propensity scores and weights for use in propensity score weighting. bal.tab()
functions similarly to the functions bal.table()
and summary()
when used with GBM in twang
. Below is a simple example of its use:
library("twang")
data("lalonde", package = "cobalt") ##If not yet loaded
<- subset(lalonde, select = -c(treat, re78))
covs0
<- ps(f.build("treat", covs0), data = lalonde,
ps.out stop.method = c("es.mean", "es.max"),
estimand = "ATT", n.trees = 1000, verbose = FALSE)
bal.tab(ps.out, stop.method = "es.mean")
## Call
## ps.fast(formula = formula, data = data, params = params, n.trees = nrounds,
## interaction.depth = max_depth, shrinkage = eta, bag.fraction = subsample,
## n.minobsinnode = min_child_weight, perm.test.iters = perm.test.iters,
## print.level = print.level, verbose = verbose, estimand = estimand,
## stop.method = stop.method, sampw = sampw, multinom = multinom,
## ks.exact = ks.exact, version = version, tree_method = tree_method,
## n.keep = n.keep, n.grid = n.grid, keep.data = keep.data)
##
## Balance Measures
## Type Diff.Adj
## prop.score Distance 0.5189
## age Contin. 0.0400
## educ Contin. -0.0819
## race_black Binary 0.0250
## race_hispan Binary -0.0008
## race_white Binary -0.0242
## married Binary -0.0116
## nodegree Binary 0.0864
## re74 Contin. 0.0691
## re75 Contin. 0.0953
##
## Effective sample sizes
## Control Treated
## Unadjusted 429. 185
## Adjusted 33.03 185
The output looks a bit different from twang
’s bal.table()
output. First is the original call to ps()
. Next is the balance table containing mean differences for the covariates included in the input to ps()
. Last is a table displaying sample size information, similar to what would be generated using twang
’s summary()
function. The “effective” sample size is displayed when weighting is used; it is calculated as is done in twang
. See the twang
documentation, ?bal.tab
, or “Details on Calculations” in the main vignette for details on this calculation.
When using bal.tab()
with twang
, the user must specify the ps
object, the output of a call to ps()
, as the first argument. The second argument, stop.method
, is the name of the stop method(s) for which balance is to be assessed, since a ps
object may contain more than one if so specified. bal.tab()
can display the balance for more than one stop method at a time by specifying a vector of stop method names. If this argument is left empty or if the argument to stop.method
does not correspond to any of the stop methods in the ps
object, bal.tab()
will default to displaying balance for all stop methods available. Abbreviations are allowed for the stop method, which is not case sensitive.
The other arguments to bal.tab()
when using it with twang
have the same form and function as those given when using it without a conditioning package, except for s.d.denom
. If the estimand of the stop method used is the ATT, s.d.denom
will default to "treated"
if not specified, and if the estimand is the ATE, s.d.denom
will default to "pooled"
, mimicking the behavior of twang
. The user can specify their own argument to s.d.denom
, but using the defaults is advised.
If sampling weights are used in the call to ps()
, they will be automatically incorporated into the bal.tab()
calculations for both the adjusted and unadjusted samples, just as twang
does.
mnps
objects resulting from fitting models in twang
with multi-category treatments are also compatible with cobalt
. See the section “Using cobalt
with multi-category treatments” in the main vignette. iptw
objects resulting from fitting models in twang
with longitudinal treatments are also compatible with cobalt
. See the Appendix 3 vignette. ps.cont
objects resulting from using ps.cont()
in WeightIt
, which implements GBM for continuous treatments, are also compatible. See the section “Using cobalt
with continuous treatments” in the main vignette.
bal.tab()
with Matching
The Matching
package is used for propensity score matching, and was also the first package to implement genetic matching. MatchIt
calls Matching
to use genetic matching and can accomplish many of the matching methods Matching
can, but Matching
is still a widely used package with its own strengths. bal.tab()
functions similarly to Matching
’s MatchBalance()
command, which yields a thorough presentation of balance. Below is a simple example of the use of bal.tab()
with Matching
:
library("Matching")
data("lalonde", package = "cobalt") #If not yet loaded
<- subset(lalonde, select = -c(treat, re78))
covs0
<- glm(f.build("treat", covs0), data = lalonde, family = binomial)
fit <- fit$fitted.values
p.score <- Match(Tr = lalonde$treat, X = p.score, estimand = "ATT")
match.out
bal.tab(match.out, formula = f.build("treat", covs0), data = lalonde,
distance = ~ p.score)
## Balance Measures
## Type Diff.Adj
## p.score Distance 0.0043
## age Contin. 0.2106
## educ Contin. 0.0201
## race_black Binary 0.0054
## race_hispan Binary -0.0051
## race_white Binary -0.0003
## married Binary 0.0661
## nodegree Binary -0.0079
## re74 Contin. -0.0772
## re75 Contin. -0.0127
##
## Sample sizes
## Control Treated
## All 429. 185
## Matched (ESS) 49.17 185
## Matched (Unweighted) 136. 185
## Unmatched 293. 0
The output looks quite different from Matching
’s MatchBalance()
output. Rather than being stacked vertically, balance statistics are arranged horizontally in a table format, allowing for quick balance checking. Below the balance table is a summary of the sample size before and after matching, similar to what Matching
’s summary()
command would display. The sample size can include an “ESS” and “unweighted” value; the “ESS” value is the effective sample size resulting from the matching weights, while the “unweighted” is the count of units with nonzero matching weights.
The input to bal.tab()
is similar to that given to MatchBalance()
: the Match
object resulting from the call to Match()
, a formula relating treatment to the covariates for which balance is to be assessed, and the original data set. This is not the only way to call bal.tab()
: instead of a formula and a data set, one can also input a data frame of covariates and a vector of treatment status indicators, just as when using bal.tab()
without a conditioning package. For example, the code below will yield the same results as the call to bal.tab()
above:
bal.tab(match.out, treat = lalonde$treat, covs = covs0,
distance = ~ p.score)
The other arguments to bal.tab()
when using it with Matching
have the same form and function as those given when using it without a conditioning package, except for s.d.denom
. If the estimand of the original call to Match()
is the ATT, s.d.denom
will default to "treated"
if not specified; if the estimand is the ATE, s.d.denom
will default to "pooled"
; if the estimand is the ATC, s.d.denom
will default to "control"
. The user can specify their own argument to s.d.denom
, but using the defaults is advisable. In addition, the use of the addl
argument is unnecessary because the covariates are entered manually as arguments, so all covariates for which balance is to be assessed can be entered through the formula
or covs
argument. If the covariates are stored in two separate data frames, it may be useful to include one in formula
or covs
and the other in addl
.
bal.tab()
with optmatch
The optmatch
package is useful for performing optimal pairwise or full matching. Most functions in optmatch
are subsumed in MatchIt
, but optmatch
sees use from those who want finer control of the matching process than MatchIt
allows. The output of calls to functions in optmatch
is an optmatch
object, which contains matching stratum membership for each unit in the given data set. Units that are matched with each other are assigned the same matching stratum. The user guide for optmatch
recommends using the RItools
package for balance assessment, but below is an example of how to use bal.tab()
for the same purpose. Note that some results will differ between cobalt
and RItools
because of differences in how balance is calculated in each.
#Optimal full matching on the propensity score
library("optmatch")
data("lalonde", package = "cobalt") #If not yet loaded
<- subset(lalonde, select = -c(treat, re78))
covs0
<- glm(f.build("treat", covs0), data = lalonde, family = binomial)
fit <- fit$fitted.values #get the propensity score
p.score <- fullmatch(treat ~ p.score, data = lalonde)
fm
bal.tab(fm, covs = covs0, distance = ~ p.score)
## Call
## fullmatch(x = treat ~ p.score, data = lalonde)
##
## Balance Measures
## Type Diff.Adj
## p.score Distance 0.0052
## age Contin. 0.1623
## educ Contin. -0.0185
## race_black Binary 0.0086
## race_hispan Binary -0.0064
## race_white Binary -0.0022
## married Binary 0.0573
## nodegree Binary 0.0026
## re74 Contin. -0.0643
## re75 Contin. -0.0084
##
## Sample sizes
## Control Treated
## All 429. 185
## Matched (ESS) 52.52 185
## Matched (Unweighted) 429. 185
Most details for the use of bal.tab()
with optmatch
are similar to those when using bal.tab()
with Matching
. Users can enter either a formula and a data set or a vector of treatment status and a set of covariates. Unlike with Matching
, entering the treatment variable is optional as it is already stored in the optmatch
object. bal.tab()
is compatible with both pairmatch()
and fullmatch()
output.
bal.tab()
with CBPS
The CBPS
(Covariate Balancing Propensity Score) package is a great tool for generating covariate balancing propensity scores, a class of propensity scores that are quite effective at balancing covariates among groups. CBPS
includes functions for estimating propensity scores for binary, multi-category, and continuous treatments. bal.tab()
functions similarly to CBPS
’s balance()
command. Below is a simple example of its use with a binary treatment:
library("CBPS")
data("lalonde", package = "cobalt") #If not yet loaded
<- subset(lalonde, select = -c(treat, re78))
covs0
#Generating covariate balancing propensity score weights for ATT
<- CBPS(f.build("treat", covs0), data = lalonde) cbps.out
## [1] "Finding ATT with T=1 as the treatment. Set ATT=2 to find ATT with T=0 as the treatment"
bal.tab(cbps.out)
## Call
## CBPS(formula = f.build("treat", covs0), data = lalonde)
##
## Balance Measures
## Type Diff.Adj
## prop.score Distance -0.0057
## age Contin. -0.0052
## educ Contin. -0.0017
## race_black Binary 0.0019
## race_hispan Binary -0.0002
## race_white Binary -0.0017
## married Binary -0.0029
## nodegree Binary 0.0042
## re74 Contin. -0.0078
## re75 Contin. 0.0061
##
## Effective sample sizes
## Control Treated
## Unadjusted 429. 185
## Adjusted 99.97 185
First is the original call to CBPS()
. Next is the balance table containing mean differences for the covariates included in the input to CBPS()
. Last is a table displaying sample size information. The “effective” sample size is displayed when weighting (rather than matching or subclassification) is used; it is calculated as is done in twang
. See the twang
documentation, ?bal.tab
, or “Details on Calculations” in the main vignette for details on this calculation.
The other arguments to bal.tab()
when using it with CBPS
have the same form and function as those given when using it without a conditioning package, except for s.d.denom
. If the estimand of the original call to CBPS()
is the ATT, s.d.denom
will default to "treated"
if not specified, and if the estimand is the ATE, s.d.denom
will default to "pooled"
. The user can specify their own argument to s.d.denom
, but using the defaults is advisable.
CBPSContinuous
objects resulting from fitting models in CBPS
with continuous treatments are also compatible with cobalt
. See the section “Using cobalt
with continuous treatments” in the main vignette. CBPS
objects resulting from fitting models in CBPS
with multi-category treatments are also compatible with cobalt
. See the section “Using cobalt
with multi-category treatments” in the main vignette. CBMSM
objects resulting from fitting models in CBPS
with longitudinal treatments are also compatible with cobalt
. See the Appendix 3 vignette.
bal.tab()
with ebal
The ebal
package implements entropy balancing, a method of weighting for the ATT that yields perfect balance on all desired moments of the covariate distributions between groups. Rather than estimate a propensity score, entropy balancing generates weights directly that satisfy a user-defined moment condition, specifying which moments are to be balanced. Not that all the functionality of ebal
is contained within Weightit
. ebal
does not have its own balance assessment function; thus, cobalt
is the only way to assess balance without programming, which the ebal
documentation instructs. Below is a simple example of using bal.tab()
with ebal
:
library("ebal")
data("lalonde", package = "cobalt") #If not yet loaded
<- subset(lalonde, select = -c(treat, re78, race))
covs0
#Generating entropy balancing weights
<- ebalance(lalonde$treat, covs0) e.out
## Converged within tolerance
bal.tab(e.out, treat = lalonde$treat, covs = covs0)
## Balance Measures
## Type Diff.Adj
## age Contin. -0
## educ Contin. -0
## married Binary -0
## nodegree Binary 0
## re74 Contin. -0
## re75 Contin. -0
##
## Effective sample sizes
## Control Treated
## Unadjusted 429. 185
## Adjusted 247.64 185
First is the balance table containing mean differences for covariates included in the original call to ebalance
. In general, these will all be very close to 0. Next is a table displaying effective sample size information. The “effective” sample size is calculated as is done in twang
. See the twang
documentation, ?bal.tab
, or “Details on Calculations” in the main vignette for details on this calculation. A common issue when using entropy balancing is small effective sample size, which can yield low precision in effect estimation when using weighted regression, so it is important that users pay attention to this measure.
The input is similar to that for using bal.tab()
with Matching
or optmatch
. In addition to the ebalance
object, one must specify either both a formula and a data set or both a treatment vector and a data frame of covariates.
bal.tab()
with designmatch
The designmatch
package implements various matching methods that use optimization to find matches that satisfy certain balance constraints. bal.tab()
functions similarly to designmatch
’s meantab()
command but provides additional flexibility and convenience. Below is a simple example of using bal.tab()
with designmatch
:
library("designmatch")
data("lalonde", package = "cobalt") #If not yet loaded
<- subset(lalonde, select = -c(treat, re78, race))
covs0
#Matching for balance on covariates
<- bmatch(lalonde$treat,
dmout dist_mat = NULL,
subset_weight = NULL,
mom = list(covs = covs0,
tols = absstddif(covs0, lalonde$treat, .005)),
n_controls = 1,
total_groups = 185)
## Building the matching problem...
## GLPK optimizer is open...
## Finding the optimal matches...
## Optimal matches found
bal.tab(dmout, treat = lalonde$treat, covs = covs0)
## Balance Measures
## Type Diff.Adj
## age Contin. 0.0038
## educ Contin. 0.0054
## married Binary 0.0000
## nodegree Binary 0.0054
## re74 Contin. -0.0120
## re75 Contin. -0.0076
##
## Sample sizes
## Control Treated
## All 429 185
## Matched 185 185
## Unmatched 244 0
The input is similar to that for using bal.tab()
with Matching
or optmatch
. In addition to the designmatch()
output object, one must specify either both a formula and a data set or both a treatment vector and a data frame of covariates. The output is similar to that of optmatch
.
bal.tab()
with sbw
The sbw
package implements optimization-based weighting to estimate weights that satisfy certain balance constraints and have minimal variance. bal.tab()
functions similarly to sbw
’s summarize()
function but provides additional flexibility and convenience. Below is a simple example of using bal.tab()
with sbw
:
library("sbw")
data("lalonde", package = "cobalt") #If not yet loaded
<- splitfactor(lalonde, drop.first = "if2")
lalonde_split <- c("age", "educ", "race_black", "race_hispan",
cov.names "race_white", "married", "nodegree",
"re74", "re75")
#Estimating balancing weights for the ATT
<- sbw(lalonde_split,
sbw.out ind = "treat",
bal = list(bal_cov = cov.names,
bal_alg = FALSE,
bal_tol = .001),
par = list(par_est = "att"))
bal.tab(sbw.out, un = TRUE, disp.means = TRUE)
The output is similar to the output of a call to summarize()
. Rather than stack several balance tables vertically, each with their own balance summary, here they are displayed horizontally. Note that due to differences in how sbw
and cobalt
compute the standardization factor in the standardized mean difference, values may not be identical between bal.tab()
and summarize()
. Also note that bal.tab()
’s default is to display raw rather than standardized mean differences for binary variables.
bal.tab()
with MatchThem
The MatchThem
package is essentially a wrapper for matchit()
from MatchIt
and weightit()
from WeightIt
but for use with multiply imputed data. Using bal.tab()
on mimids
or wimids
objects from MatchThem
activates the features that accompany multiply imputed data; balance is assessed within each imputed dataset and aggregated across imputations. See ?bal.tab.imp
or the accompanying Appendix 2 for more information about using cobalt
with multiply imputed data. Below is a simple example of using bal.tab()
with MatchThem
:
library("mice"); library("MatchThem")
data("lalonde_mis", package = "cobalt")
#Generate imputed data sets
<- 10 #number of imputed data sets
m <- mice(lalonde_mis, m = m, print = FALSE)
imp.out
#Matching for balance on covariates
<- weightthem(treat ~ age + educ + married +
wt.out + re74 + re75,
race datasets = imp.out,
approach = "within",
method = "ps",
estimand = "ATE")
bal.tab(wt.out)
## Balance summary across all imputations
## Type Min.Diff.Adj Mean.Diff.Adj Max.Diff.Adj
## prop.score Distance 0.1488 0.1586 0.1696
## age Contin. -0.1925 -0.1872 -0.1831
## educ Contin. 0.0776 0.0817 0.0914
## married Binary -0.1159 -0.1054 -0.0965
## race_black Binary 0.0540 0.0576 0.0620
## race_hispan Binary 0.0060 0.0093 0.0121
## race_white Binary -0.0717 -0.0668 -0.0604
## re74 Contin. -0.3152 -0.2900 -0.2683
## re75 Contin. -0.1740 -0.1602 -0.1477
##
## Average effective sample sizes across imputations
## Control Treated
## Unadjusted 429. 185.
## Adjusted 331.82 67.45
The input is similar to that for using bal.tab()
with MatchIt
or WeightIt
.
bal.tab()
with cem
The cem
package implements coarsened exact matching for binary and multi-category treatments. bal.tab()
functions similarly to cems
’s imbalance()
. Below is a simple example of using bal.tab()
with cem
:
library("cem")
data("lalonde", package = "cobalt") #If not yet loaded
#Matching for balance on covariates
<- cem("treat", data = lalonde, drop = "re78") cem.out
##
## Using 'treat'='1' as baseline group
bal.tab(cem.out, data = lalonde, stats = c("m", "ks"))
## Balance Measures
## Type Diff.Adj KS.Adj
## age Contin. 0.0512 0.1581
## educ Contin. -0.0441 0.0445
## race_black Binary 0.0000 0.0000
## race_hispan Binary 0.0000 0.0000
## race_white Binary 0.0000 0.0000
## married Binary 0.0000 0.0000
## nodegree Binary 0.0000 0.0000
## re74 Contin. -0.0341 0.2418
## re75 Contin. -0.0528 0.1162
##
## Sample sizes
## Control Treated
## All 429. 185
## Matched (ESS) 36.29 68
## Matched (Unweighted) 78. 68
## Unmatched 351. 117
The input is similar to that for using bal.tab()
with Matching
or optmatch
. In addition to the cem()
output object, one must specify either both a formula and a data set or both a treatment vector and a data frame of covariates. Unlike with Matching
, entering the treatment variable is optional as it is already stored in the output object. The output is similar to that of optmatch
.
When using cem()
with multiply imputed data (i.e., by supplying a list of data.frames to the datalist
argument in cem()
), an argument to imp
should be specified to bal.tab()
or a mids
object from the mice
package should be given as the argument to data
. See ?bal.tab.imp
or the accompanying Appendix 2 for more information about using cobalt
with multiply imputed data. Below is an example of using cem
with multiply imputed data from mice
:
library("mice"); library("cem")
data("lalonde_mis", package = "cobalt")
#Generate imputed data sets
<- 10 #number of imputed data sets
m <- mice(lalonde_mis, m = m, print = FALSE)
imp.out <- lapply(1:m, complete, data = imp.out)
imp.data.list
#Match within each imputed dataset
<- cem("treat", datalist = imp.data.list, drop = "re78") cem.out.imp
##
## Using 'treat'='1' as baseline group
##
## Using 'treat'='1' as baseline group
##
## Using 'treat'='1' as baseline group
##
## Using 'treat'='1' as baseline group
##
## Using 'treat'='1' as baseline group
##
## Using 'treat'='1' as baseline group
##
## Using 'treat'='1' as baseline group
##
## Using 'treat'='1' as baseline group
##
## Using 'treat'='1' as baseline group
##
## Using 'treat'='1' as baseline group
bal.tab(cem.out.imp, data = imp.out)
## Balance summary across all imputations
## Type Min.Diff.Adj Mean.Diff.Adj Max.Diff.Adj
## age Contin. 0.0426 0.0468 0.0494
## educ Contin. -0.0386 -0.0347 -0.0165
## race_black Binary -0.0000 -0.0000 0.0000
## race_hispan Binary -0.0000 -0.0000 0.0000
## race_white Binary -0.0000 -0.0000 0.0000
## married Binary -0.0000 -0.0000 0.0000
## nodegree Binary -0.0000 -0.0000 0.0000
## re74 Contin. -0.0416 -0.0362 -0.0325
## re75 Contin. -0.0725 -0.0546 -0.0476
##
## Average sample sizes across imputations
## Control Treated
## All 429. 185.
## Matched (ESS) 35.79 66.1
## Matched (Unweighted) 78.1 66.1
## Unmatched 350.9 118.9
bal.tab()
with other packagesIt is possible to use bal.tab
with objects that don’t come from these packages using the default
method. If an object that doesn’t correspond to the output from one of the specifically supported packages is passed as the first argument to bal.tab
, bal.tab
will do its best to process that object as if it did come from a supported package. It will search through the components of the object for items with names like "treat"
, "covs"
, "data"
, "weights"
, etc., that have the correct object types. Any additional arguments can be specified by the user.
The goal of the default
method is to allow package authors to rely on cobalt
as a substitute for any balancing function they might otherwise write. By ensuring compatibility with the default
method, package authors can have their users simply supply the output of a compatible function into cobalt
functions without having to write a specific method in cobalt
. A package author would need to make sure the output of their package contained enough information with correctly named components; if so, cobalt
functions can be used as conveniently with the output as it is with specifically supported packages.
Below, we demonstrate this capability with the output of optweight
, which performs a version of propensity score weighting using optimization, similar to sbw
. No bal.tab
method has been written with optweight
output in mind; rather, optweight
was written to have output compatible with the default
method of bal.tab
.
library("optweight")
data("lalonde", package = "cobalt")
#Estimate the weights using optimization
<- optweight(treat ~ age + educ + married + race + re74 + re75,
ow.out data = lalonde, estimand = "ATE", tols = .01)
#Note the contents of the output object:
names(ow.out)
## [1] "weights" "treat" "covs" "s.weights" "estimand" "focal"
## [7] "call" "tols" "duals" "info"
#Use bal.tab() directly on the output
bal.tab(ow.out)
## Call
## optweight(formula = treat ~ age + educ + married + race + re74 +
## re75, data = lalonde, tols = 0.01, estimand = "ATE")
##
## Balance Measures
## Type Diff.Adj
## age Contin. -0.0000
## educ Contin. 0.0100
## married Binary -0.0100
## race_black Binary 0.0100
## race_hispan Binary -0.0000
## race_white Binary -0.0100
## re74 Contin. -0.0100
## re75 Contin. 0.0085
##
## Effective sample sizes
## Control Treated
## Unadjusted 429. 185.
## Adjusted 349.42 52.04
The output is treated as output from a specifically supported package. See ?bal.tab.default
for more details and another example.