library("metagam")
library("mgcv")
library("metafor")
#> Loading required package: Matrix
#>
#> Loading the 'metafor' package (version 3.0-2). For an
#> introduction to the package please type: help(metafor)
library("ggplot2")
theme_set(theme_bw())
Previous versions of metagam had built-in aggregation of p-values, using the metap package. However, this had two drawbacks:
This vignette briefly describes the latter issue, and shows how the metap package can be used directly on output from metagam.
Consider the two datasets below, which have completely opposite effects. I have also included the code for simulating the data.
Fitting a GAM to each dataset separately, we see that the effect is significant at \(\alpha = 0.05\) level.
<- gam(y ~ s(x, bs = "cr"), data = dd$dat1)
mod1 summary(mod1)
#>
#> Family: gaussian
#> Link function: identity
#>
#> Formula:
#> y ~ s(x, bs = "cr")
#>
#> Parametric coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.50554 0.06587 7.674 1.25e-11 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Approximate significance of smooth terms:
#> edf Ref.df F p-value
#> s(x) 1 1 24.26 3.61e-06 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> R-sq.(adj) = 0.19 Deviance explained = 19.8%
#> GCV = 0.4428 Scale est. = 0.43394 n = 100
<- gam(y ~ s(x, bs = "cr"), data = dd$dat2)
mod2 summary(mod2)
#>
#> Family: gaussian
#> Link function: identity
#>
#> Formula:
#> y ~ s(x, bs = "cr")
#>
#> Parametric coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.46800 0.07367 -6.352 6.72e-09 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Approximate significance of smooth terms:
#> edf Ref.df F p-value
#> s(x) 1.237 1.436 10.65 0.00112 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> R-sq.(adj) = 0.117 Deviance explained = 12.8%
#> GCV = 0.55517 Scale est. = 0.54275 n = 100
Now let us try a metagam fit.
<- list(strip_rawdata(mod1), strip_rawdata(mod2))
models <- metagam(models) metafit
Plotting the meta-analytic fit, we see that the 95 % confidence bands completely cover zero.
ggplot(metafit$meta_estimates, aes(x = x, y = estimate, ymin = ci.lb, ymax = ci.ub)) +
geom_line() +
geom_ribbon(alpha = .3) +
geom_hline(yintercept = 0, linetype = "dashed")
The object returned by metagam
has a pvals
element which contains p-values for each term in each individual fit.
$pvals
metafit#> term edf Ref.df F p-value
#> s(x) s(x) 1.000000 1.000000 24.25930 3.612591e-06
#> s(x)1 s(x) 1.236565 1.436059 10.64571 1.121220e-03
Combining the p-values using any of the methods available in metap, we do however see that the effects are significant at \(\alpha = 0.05\).
library("metap")
allmetap(p = metafit$pvals$`p-value`, method = "all")
#> p valid eponym
#> logitp 5.618196e-07 2
#> maximump 1.257134e-06 2
#> meanp NA 2
#> meanz 6.758351e-08 2
#> minimump 7.225169e-06 2 Tippett
#> sumlog 8.232425e-08 2 Fisher
#> sump 6.32624e-07 2 Edgington
#> sumz 4.810753e-08 2 Stouffer
This happens because combination of p-values is not aware of the direction of the effects. Hence, rather than having this a part of the package, we suggest that users who know what they’re doing combine the p-values using metap package.
For comparison, fitting a linear model to each dataset and combining the slopes using metafor
gives a p-value 0.9296, which makes a lot more sense.
<- lm(y ~ x, data = dd$dat1)
mod1 <- lm(y ~ x, data = dd$dat2)
mod2
<- c(coef(mod1)[["x"]], coef(mod2)[["x"]])
estimates <- c(vcov(mod1)[["x", "x"]], vcov(mod2)[["x", "x"]])
sampling_variances rma(estimates, vi = sampling_variances)
#>
#> Random-Effects Model (k = 2; tau^2 estimator: REML)
#>
#> tau^2 (estimated amount of total heterogeneity): 2.4526 (SE = 3.5661)
#> tau (square root of estimated tau^2 value): 1.5661
#> I^2 (total heterogeneity / total variability): 97.26%
#> H^2 (total variability / sampling variability): 36.54
#>
#> Test for Heterogeneity:
#> Q(df = 1) = 36.5396, p-val < .0001
#>
#> Model Results:
#>
#> estimate se zval pval ci.lb ci.ub
#> 0.0992 1.1229 0.0884 0.9296 -2.1015 2.3000
#>
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Let’s create another GAM fit, this time using the gamSim
function. The dashed lines in the plot correspond to two standard errors above and below the estimate. However, this is a pointwise confidence interval, which for our practical purposes means that under repeated sampling from the population, it will contain the true function less than 95 % of the times. See, e.g., Chapter 6.5 of Ruppert, Wand, and Carroll (2003), Sørensen, Walhovd, and Fjell (2021) or this blog post. What we need is a simultaneous confidence interval.
set.seed(123)
<- gamSim(verbose = FALSE)
dat <- gam(y ~ s(x0, bs = "cr"), data = dat)
mod plot(mod)
We see that the critical value for a 95 % simultaneous interval is above 2.
<- with(dat, data.frame(x0 = seq(min(x0), max(x0), length = 200)))
grid <- getmasd(mod, grid, 10000, terms = "s(x0)")
masd <- quantile(masd, prob = .95, type = 8))
(crit #> 95%
#> 2.405748
We can plot the two together, and we see that the simultaneous confidence interval is bigger.
<- predict(mod, newdata = grid, se.fit = TRUE)
fit <- data.frame(
dat x0 = grid$x0,
pred = fit$fit,
ci_pt_lb = fit$fit + qnorm(.025) * fit$se.fit,
ci_pt_ub = fit$fit + qnorm(.975) * fit$se.fit,
ci_sim_lb = fit$fit - crit * fit$se.fit,
ci_sim_ub = fit$fit + crit * fit$se.fit
)
ggplot(dat, aes(x = x0, y = pred)) +
geom_line() +
geom_ribbon(aes(ymin = ci_pt_lb, ymax = ci_pt_ub), alpha = .3) +
geom_ribbon(aes(ymin = ci_sim_lb, ymax = ci_sim_ub), alpha = .3)
In the current version of metagam, it is possible to combine simultaneous confidence intervals for each separate fit. The critical value multiplied by the standard errors is then used as the weight when it is meta analytically combined. Significance testing can be done by checking if it is possible to fit a constant function within the simultaneous confidence bands, which is the same as checking whether the maximum of the lower band across the range of x-values is larger than the minimum of the upper band across the range of x-values. We illustrate it here.
set.seed(124)
<- lapply(1:5, function(x) gamSim(scale = 5, verbose = FALSE))
datasets
<- lapply(datasets, function(dat){
models <- gam(y ~ s(x2, bs = "cr"), data = dat)
model strip_rawdata(model)
})
<- metagam(models, terms = "s(x2)", grid_size = 100,
meta_analysis nsim = 10000, ci_alpha = .05)
Convenience functions will be added later, but for now it is a bit manual. Notice that the outer bands are bigger.
<- meta_analysis$sim_ci
sim_conf <- meta_analysis$meta_estimates
ptw_conf ggplot(sim_conf, aes(x = x2)) +
geom_line(aes(y = pred)) +
geom_ribbon(aes(ymin = ci_sim_lb, ymax = ci_sim_ub), alpha = .3) +
geom_ribbon(data = ptw_conf, aes(ymin = ci.lb, ymax = ci.ub), alpha = .3)
The p-value is printed with the model summary. In order to get more digits after the decimal point in the p-value, increase the nsim
argument.
summary(meta_analysis)
#> Meta-analysis of GAMs from 5 cohorts, using method FE.
#>
#> Smooth terms analyzed: s(x2). P-value for smooth term: <1e-04
We shouldn’t trust a single example, so here we will show some simulations. We do this by simulating a case where the true relationship is zero, and check if the distribution of p-values is close to uniform?
library(parallel)
<- makeCluster(10)
cl <- parLapply(
pvals cl = cl, X = 1:100, fun = function(x){
<- lapply(1:5, function(i){
models <- data.frame(x = runif(100), y = rnorm(100))
dat <- mgcv::gam(y ~ s(x, bs = "cr"), data = dat)
mod ::strip_rawdata(mod)
metagam
})::metagam(models, nsim = 1000, ci_alpha = .05)$meta_pval
metagam
}
)stopCluster(cl)
plot(qunif(seq(from = 0, to = 1, length.out = 100)),
sort(as.numeric(unlist(pvals))),
xlab = "Theoretical quantile",
ylab = "Data quantile")
The quantile plot above looks sufficiently satisfactory to include this in the package. We will conduct much more extensive simulation studies in a future paper.