The standard deviation over the bootstrap samples \(X\) is defined as \[ \mathop{\mathrm{sd}}(X) = \sqrt{\frac{1}{N - 1} \sum_{i = 1}^N (x_i - \bar x_.)^2} \,. \]
In the notation \(\bar x_.\) the bar means that averaging has been performed over the indices that are replaced by periods.
The jackknife error over the jackknife samples \(Y\) is defined as \[ \mathop{\mathrm{jse}}(Y) = \sqrt{\frac{N - 1}{N} \sum_{i = 1}^N (y_i - \bar y_.)^2} \,. \]
From the equations we expect a factor \(\sqrt{(N-1)^2 / N}\) between the two. We can therefore expect to express the jackknife error simply as \[ \mathop{\mathrm{jse}}(Y) = \sqrt{\frac{(N-1)^2}{N}} \mathop{\mathrm{sd}}(Y) \,. \]
We want to test this numerically using the implementation of the second equation.
jackknife_error <- function (samples, na.rm = FALSE) {
## Number of jackknife samples.
N <- length(samples)
if (na.rm) {
selection <- !is.na(samples)
samples <- samples[selection]
## Number of non-NA samples.
m <- sum(selection)
factor <- N / m
} else {
factor <- 1.0
}
sqrt(factor * (N - 1) / N * sum((samples - mean(samples))^2))
}
Using a little data set we conclude that we got the factor right.
N = 10
data = rnorm(N)
be = sd(data)
je = jackknife_error(data)
expected_factor = sqrt((N-1)^2 / N)
actual_factor = je / be
actual_factor / expected_factor
## [1] 1
The covariance is similarly defined, and we have the same normalization factor that we need to take care of. Since the diagonal elements of the covariance matrix are the variances, we need to apply the same normalization factor. The big complication is that the R cov
function can either be called with one matrix or two vectors. We need to support both for the jackknife such that it has the same API.
jackknife_cov <- function (x, y = NULL, na.rm = FALSE, ...) {
factor <- 1.0
if (is.null(y)) {
N <- nrow(x)
if (na.rm) {
na_values <- apply(x, 2, function (row) any(is.na(row)))
m <- sum(na_values)
x <- x[!na_values, ]
factor <- N / m
}
} else {
N <- length(x)
if (na.rm) {
na_values <- is.na(x) | is.na(y)
m <- sum(na_values)
x <- x[!na_values]
y <- y[!na_values]
factor <- N / m
}
}
(N-1)^2 / N * factor * cov(x, y, ...)
}
x <- rnorm(10)
y <- rnorm(10)
cov(x, y)
## [1] 0.1857572
jackknife_cov(x, y)
## [1] 1.504633
cov(cbind(x, y))
## x y
## x 0.6324084 0.1857572
## y 0.1857572 0.4636721
jackknife_cov(cbind(x, y))
## x y
## x 5.122508 1.504633
## y 1.504633 3.755744
x <- rnorm(1000)
jackknife_samples_1 <- sapply(1:length(x), function (i) mean(x[-i]))
jackknife_samples_2 <- sapply(2:length(x), function (i) mean(x[-c(i-1, i)]))
jse1 <- jackknife_error(jackknife_samples_1)
jse2 <- jackknife_error(jackknife_samples_2)
jse2/jse1
## [1] 1.421094