Bootstrap confidence intervals in R

statistics

bootstrap

ecology tutorial

Bootstrap confidence intervals in R when no formula applies: percentile and BCa methods on a Shannon index and a regression slope, and where it breaks.

Author

Tidy Ecology

Published

2026-06-30

The standard confidence interval needs a formula for the standard error, and that formula needs an assumption about the sampling distribution, usually normality delivered by the central limit theorem. For a mean that assumption is fine. For a Shannon diversity index, a median, a ratio of two counts, or a regression slope built on skewed residuals, there is often no clean formula and no reason to trust the normal approximation. The bootstrap sidesteps both problems. It replaces the missing theory with brute force: resample the data you have, recompute the statistic thousands of times, and read the confidence interval off the spread of those replicates.

The idea is that your sample stands in for the population. Drawing a new sample of the same size from your data with replacement mimics drawing a fresh sample from the population, so the variation across those resamples approximates the sampling variation you cannot observe directly. We use the boot package, which ships with R, and walk through one statistic that has no tidy interval, one that does, and one case where the bootstrap quietly fails.

library(boot)
library(vegan)
library(ggplot2)

paper <- "#f5f4ee"; ink <- "#16241d"; forest <- "#275139"
gold  <- "#cda23f"; miss_red <- "#b5534e"; faint <- "#5d6b61"; line <- "#dad9ca"

theme_te <- theme_minimal(base_size = 12) +
  theme(panel.grid.minor = element_blank(),
        panel.grid.major = element_line(colour = line, linewidth = 0.3),
        plot.background = element_rect(fill = paper, colour = NA),
        panel.background = element_rect(fill = paper, colour = NA),
        axis.title = element_text(colour = ink),
        axis.text = element_text(colour = faint),
        plot.title = element_text(colour = ink, face = "bold"))

A statistic with no formula: Shannon diversity

Take one sampled assemblage of 286 individuals sorted into eight species. The Shannon index summarises its diversity, but its sampling distribution has no convenient closed form, so a textbook interval is not available. Resample the individuals instead.

counts <- c(120, 64, 38, 25, 17, 11, 7, 4)     # abundance of 8 species
species_vec <- rep(seq_along(counts), counts)   # one entry per individual

shannon <- function(v) {
  p <- table(v) / length(v)
  -sum(p * log(p))
}
H_obs <- shannon(species_vec)

boot_stat <- function(d, i) shannon(d[i])       # statistic on a resample
set.seed(501)
bA <- boot(species_vec, boot_stat, R = 3000)
ciA <- boot.ci(bA, type = c("norm", "perc", "bca"))

round(c(H_obs = H_obs, boot_SE = sd(bA$t),
        perc_lo = ciA$percent[4], perc_hi = ciA$percent[5],
        bca_lo = ciA$bca[4], bca_hi = ciA$bca[5]), 4)

  H_obs boot_SE perc_lo perc_hi  bca_lo  bca_hi 
 1.6243  0.0509  1.5073  1.7053  1.5358  1.7269

The observed index is 1.624. The bootstrap standard error, simply the standard deviation of the 3000 replicate indices, is 0.051. From the replicate distribution boot.ci() returns several intervals. The percentile interval just takes the 2.5th and 97.5th percentiles of the replicates, here 1.507 to 1.705. The BCa interval (bias-corrected and accelerated) adjusts for two things the percentile method ignores: a median bias in the replicates and a skew that changes with the parameter value. It lands at 1.536 to 1.727, nudged upward here because resampling individuals tends to lose rare species and pull diversity estimates slightly down, a bias BCa partly repairs.

boot_df <- data.frame(H = bA$t)
ggplot(boot_df, aes(H)) +
  geom_histogram(bins = 40, fill = forest, colour = paper, linewidth = 0.2) +
  geom_vline(xintercept = H_obs, colour = ink, linewidth = 0.8) +
  geom_vline(xintercept = c(ciA$percent[4], ciA$percent[5]),
             colour = gold, linewidth = 0.7, linetype = "dashed") +
  geom_vline(xintercept = c(ciA$bca[4], ciA$bca[5]),
             colour = miss_red, linewidth = 0.7, linetype = "dotted") +
  labs(x = "bootstrapped Shannon index", y = "replicate count",
       title = "Bootstrap distribution of Shannon diversity") +
  theme_te

A histogram of bootstrapped Shannon diversity values, roughly bell shaped, with one solid vertical line at the observed value and two pairs of vertical lines marking the percentile and BCa interval bounds. — Figure 1: Distribution of 3000 bootstrapped Shannon indices. The solid line is the observed value; dashed lines mark the percentile interval and dotted lines the BCa interval. BCa sits a little higher, correcting the downward bias of resampling.

A statistic that has a formula, stress-tested

A regression slope does have a standard interval, but it assumes roughly normal residuals. When the noise is skewed, the bootstrap is a useful second opinion. Here species richness rises along a gradient with a true slope of 0.8 and strongly right-skewed errors. Rather than resample residuals, we resample whole rows, the pairs of x and y, which makes no assumption about the error shape.

set.seed(77)
ndat <- 40
x <- runif(ndat, 0, 10)
err <- (rexp(ndat, rate = 1) - 1) * 2.2        # skewed, centred near zero
y <- 2 + 0.8 * x + err                          # true slope = 0.8
dat <- data.frame(x, y)

ols <- lm(y ~ x, dat)

slope_stat <- function(d, i) coef(lm(y ~ x, d[i, ]))["x"]
set.seed(909)
bB <- boot(dat, slope_stat, R = 3000)
ciB <- boot.ci(bB, type = c("perc", "bca"))

round(c(ols_slope = coef(ols)["x"],
        ols_lo = confint(ols)["x", 1], ols_hi = confint(ols)["x", 2],
        boot_SE = sd(bB$t),
        bca_lo = ciB$bca[4], bca_hi = ciB$bca[5]), 4)

ols_slope.x      ols_lo      ols_hi     boot_SE      bca_lo      bca_hi 
     0.6655      0.4109      0.9201      0.1312      0.3498      0.8694

The normal-theory interval runs from 0.411 to 0.92, and the BCa bootstrap interval from 0.35 to 0.869. Both contain the true 0.8, and they are close, which is itself the lesson: with 40 points the central limit theorem rescues the classical interval even though the errors are skewed, and the bootstrap confirms it while leaning slightly wider. When the two disagree, trust the bootstrap, because it is the one not relying on an assumption the data may violate.

When the bootstrap fails

The bootstrap is not magic. It struggles for statistics that depend on the extreme order statistics, because a resample can never contain a value larger than the largest one you observed. Bootstrapping the maximum makes this vivid.

set.seed(31)
samp <- rgamma(30, 4, 0.2)
max_stat <- function(d, i) max(d[i])
set.seed(32)
bMax <- boot(samp, max_stat, R = 3000)

c(observed_max = round(max(samp), 2),
  distinct_bootstrap_values = length(unique(bMax$t)))

             observed_max distinct_bootstrap_values 
                    63.65                      8.00

maxdf <- data.frame(m = bMax$t)
ggplot(maxdf, aes(m)) +
  geom_histogram(bins = 40, fill = miss_red, colour = paper, linewidth = 0.2) +
  geom_vline(xintercept = max(samp), colour = ink, linewidth = 0.8) +
  labs(x = "bootstrapped maximum", y = "replicate count",
       title = "A statistic the bootstrap cannot handle") +
  theme_te

A spiky bar chart with a few tall bars at the right edge and nothing beyond, showing the bootstrap maximum piling onto the observed maximum. — Figure 2: Bootstrap distribution of the sample maximum. It collapses onto a handful of the largest observed values and is hard against the observed maximum on the right, so no sensible upper bound exists. The bootstrap cannot extrapolate beyond the data.

Only 8 distinct values appear across 3000 resamples, every one at or below the observed maximum of 63.7. The distribution is truncated by construction, and any interval read from it is meaningless. The same caution applies, less dramatically, to minima, ranges, and any sample size small enough that a few influential points dominate every resample.

What to carry forward

Reach for the bootstrap when the statistic has no standard error formula, or when it has one that rests on an assumption your data break. Resample the natural unit: individuals for a pooled diversity index, whole rows for a regression, plots for a community-level summary. Prefer the BCa interval over the plain percentile interval, since it corrects for bias and skew at little extra cost, and use a few thousand resamples so the percentiles are stable. Keep its limits in mind: the bootstrap cannot invent data beyond what you observed, so it fails for extreme order statistics and gets shaky at very small sample sizes. Within those bounds it turns a hard theoretical problem into a short, honest computation.

References

Efron B, Tibshirani RJ 1993 An Introduction to the Bootstrap. Chapman and Hall. ISBN 9780412042317

DiCiccio TJ, Efron B 1996 Statistical Science 11(3):189-228 (10.1214/ss/1032280214)

Davison AC, Hinkley DV 1997 Bootstrap Methods and their Application. Cambridge University Press. ISBN 9780521574716

--- title: "Bootstrap confidence intervals in R" description: "Bootstrap confidence intervals in R when no formula applies: percentile and BCa methods on a Shannon index and a regression slope, and where it breaks." date: "2026-06-30 11:00" categories: [statistics, bootstrap, ecology tutorial, R] image: thumbnail.png image-alt: "A histogram of bootstrapped Shannon diversity values with the observed value and two confidence interval bounds marked." --- The standard confidence interval needs a formula for the standard error, and that formula needs an assumption about the sampling distribution, usually normality delivered by the central limit theorem. For a mean that assumption is fine. For a Shannon diversity index, a median, a ratio of two counts, or a regression slope built on skewed residuals, there is often no clean formula and no reason to trust the normal approximation. The bootstrap sidesteps both problems. It replaces the missing theory with brute force: resample the data you have, recompute the statistic thousands of times, and read the confidence interval off the spread of those replicates. The idea is that your sample stands in for the population. Drawing a new sample of the same size from your data with replacement mimics drawing a fresh sample from the population, so the variation across those resamples approximates the sampling variation you cannot observe directly. We use the `boot` package, which ships with R, and walk through one statistic that has no tidy interval, one that does, and one case where the bootstrap quietly fails. ```{r} #| label: setup #| message: false #| warning: false library(boot) library(vegan) library(ggplot2) paper <- "#f5f4ee"; ink <- "#16241d"; forest <- "#275139" gold <- "#cda23f"; miss_red <- "#b5534e"; faint <- "#5d6b61"; line <- "#dad9ca" theme_te <- theme_minimal(base_size = 12) + theme(panel.grid.minor = element_blank(), panel.grid.major = element_line(colour = line, linewidth = 0.3), plot.background = element_rect(fill = paper, colour = NA), panel.background = element_rect(fill = paper, colour = NA), axis.title = element_text(colour = ink), axis.text = element_text(colour = faint), plot.title = element_text(colour = ink, face = "bold")) ``` ## A statistic with no formula: Shannon diversity Take one sampled assemblage of 286 individuals sorted into eight species. The Shannon index summarises its diversity, but its sampling distribution has no convenient closed form, so a textbook interval is not available. Resample the individuals instead. ```{r} #| label: shannon-boot counts <- c(120, 64, 38, 25, 17, 11, 7, 4) # abundance of 8 species species_vec <- rep(seq_along(counts), counts) # one entry per individual shannon <- function(v) { p <- table(v) / length(v) -sum(p * log(p)) } H_obs <- shannon(species_vec) boot_stat <- function(d, i) shannon(d[i]) # statistic on a resample set.seed(501) bA <- boot(species_vec, boot_stat, R = 3000) ciA <- boot.ci(bA, type = c("norm", "perc", "bca")) round(c(H_obs = H_obs, boot_SE = sd(bA$t), perc_lo = ciA$percent[4], perc_hi = ciA$percent[5], bca_lo = ciA$bca[4], bca_hi = ciA$bca[5]), 4) ``` The observed index is `r round(H_obs, 3)`. The bootstrap standard error, simply the standard deviation of the 3000 replicate indices, is `r round(sd(bA$t), 3)`. From the replicate distribution `boot.ci()` returns several intervals. The percentile interval just takes the 2.5th and 97.5th percentiles of the replicates, here `r round(ciA$percent[4], 3)` to `r round(ciA$percent[5], 3)`. The BCa interval (bias-corrected and accelerated) adjusts for two things the percentile method ignores: a median bias in the replicates and a skew that changes with the parameter value. It lands at `r round(ciA$bca[4], 3)` to `r round(ciA$bca[5], 3)`, nudged upward here because resampling individuals tends to lose rare species and pull diversity estimates slightly down, a bias BCa partly repairs. ```{r} #| label: fig-shannon-boot #| fig-cap: "Distribution of 3000 bootstrapped Shannon indices. The solid line is the observed value; dashed lines mark the percentile interval and dotted lines the BCa interval. BCa sits a little higher, correcting the downward bias of resampling." #| fig-alt: "A histogram of bootstrapped Shannon diversity values, roughly bell shaped, with one solid vertical line at the observed value and two pairs of vertical lines marking the percentile and BCa interval bounds." #| fig-width: 7 #| fig-height: 4.2 boot_df <- data.frame(H = bA$t) ggplot(boot_df, aes(H)) + geom_histogram(bins = 40, fill = forest, colour = paper, linewidth = 0.2) + geom_vline(xintercept = H_obs, colour = ink, linewidth = 0.8) + geom_vline(xintercept = c(ciA$percent[4], ciA$percent[5]), colour = gold, linewidth = 0.7, linetype = "dashed") + geom_vline(xintercept = c(ciA$bca[4], ciA$bca[5]), colour = miss_red, linewidth = 0.7, linetype = "dotted") + labs(x = "bootstrapped Shannon index", y = "replicate count", title = "Bootstrap distribution of Shannon diversity") + theme_te ``` ## A statistic that has a formula, stress-tested A regression slope does have a standard interval, but it assumes roughly normal residuals. When the noise is skewed, the bootstrap is a useful second opinion. Here species richness rises along a gradient with a true slope of 0.8 and strongly right-skewed errors. Rather than resample residuals, we resample whole rows, the pairs of x and y, which makes no assumption about the error shape. ```{r} #| label: slope-boot set.seed(77) ndat <- 40 x <- runif(ndat, 0, 10) err <- (rexp(ndat, rate = 1) - 1) * 2.2 # skewed, centred near zero y <- 2 + 0.8 * x + err # true slope = 0.8 dat <- data.frame(x, y) ols <- lm(y ~ x, dat) slope_stat <- function(d, i) coef(lm(y ~ x, d[i, ]))["x"] set.seed(909) bB <- boot(dat, slope_stat, R = 3000) ciB <- boot.ci(bB, type = c("perc", "bca")) round(c(ols_slope = coef(ols)["x"], ols_lo = confint(ols)["x", 1], ols_hi = confint(ols)["x", 2], boot_SE = sd(bB$t), bca_lo = ciB$bca[4], bca_hi = ciB$bca[5]), 4) ``` The normal-theory interval runs from `r round(confint(ols)["x", 1], 3)` to `r round(confint(ols)["x", 2], 3)`, and the BCa bootstrap interval from `r round(ciB$bca[4], 3)` to `r round(ciB$bca[5], 3)`. Both contain the true 0.8, and they are close, which is itself the lesson: with 40 points the central limit theorem rescues the classical interval even though the errors are skewed, and the bootstrap confirms it while leaning slightly wider. When the two disagree, trust the bootstrap, because it is the one not relying on an assumption the data may violate. ## When the bootstrap fails The bootstrap is not magic. It struggles for statistics that depend on the extreme order statistics, because a resample can never contain a value larger than the largest one you observed. Bootstrapping the maximum makes this vivid. ```{r} #| label: max-fail set.seed(31) samp <- rgamma(30, 4, 0.2) max_stat <- function(d, i) max(d[i]) set.seed(32) bMax <- boot(samp, max_stat, R = 3000) c(observed_max = round(max(samp), 2), distinct_bootstrap_values = length(unique(bMax$t))) ``` ```{r} #| label: fig-max-fail #| fig-cap: "Bootstrap distribution of the sample maximum. It collapses onto a handful of the largest observed values and is hard against the observed maximum on the right, so no sensible upper bound exists. The bootstrap cannot extrapolate beyond the data." #| fig-alt: "A spiky bar chart with a few tall bars at the right edge and nothing beyond, showing the bootstrap maximum piling onto the observed maximum." #| fig-width: 7 #| fig-height: 3.8 maxdf <- data.frame(m = bMax$t) ggplot(maxdf, aes(m)) + geom_histogram(bins = 40, fill = miss_red, colour = paper, linewidth = 0.2) + geom_vline(xintercept = max(samp), colour = ink, linewidth = 0.8) + labs(x = "bootstrapped maximum", y = "replicate count", title = "A statistic the bootstrap cannot handle") + theme_te ``` Only `r length(unique(bMax$t))` distinct values appear across 3000 resamples, every one at or below the observed maximum of `r round(max(samp), 1)`. The distribution is truncated by construction, and any interval read from it is meaningless. The same caution applies, less dramatically, to minima, ranges, and any sample size small enough that a few influential points dominate every resample. ## What to carry forward Reach for the bootstrap when the statistic has no standard error formula, or when it has one that rests on an assumption your data break. Resample the natural unit: individuals for a pooled diversity index, whole rows for a regression, plots for a community-level summary. Prefer the BCa interval over the plain percentile interval, since it corrects for bias and skew at little extra cost, and use a few thousand resamples so the percentiles are stable. Keep its limits in mind: the bootstrap cannot invent data beyond what you observed, so it fails for extreme order statistics and gets shaky at very small sample sizes. Within those bounds it turns a hard theoretical problem into a short, honest computation. ## References Efron B, Tibshirani RJ 1993 An Introduction to the Bootstrap. Chapman and Hall. ISBN 9780412042317 DiCiccio TJ, Efron B 1996 Statistical Science 11(3):189-228 (10.1214/ss/1032280214) Davison AC, Hinkley DV 1997 Bootstrap Methods and their Application. Cambridge University Press. ISBN 9780521574716 ## Related tutorials - [Standard errors and confidence intervals](../standard-errors-confidence-intervals/) - [A reproducible statistical workflow](../reproducible-statistical-workflow/) - [Publication-quality ggplot figures](../publication-quality-ggplot-figures/)