Fitting species abundance distributions in R

R
community ecology
biodiversity
vegan
Fit and compare rank-abundance (dominance/diversity) models in R with vegan’s radfit: niche pre-emption, lognormal, Zipf and Zipf-Mandelbrot, with AIC model selection and Preston octave plots for community ecology.
Author

Tidy Ecology

Published

2026-06-24

A diversity index turns a whole community into one number. Shannon, Simpson and evenness are useful, but they throw away the shape of the abundance distribution: how steeply the common species tower over the rare ones, and whether there is a mode in the middle. That shape is the species abundance distribution (SAD), one of the oldest descriptions in community ecology (Fisher et al. 1943; Preston 1948).

This post fits parametric SAD models to ranked counts with vegan::radfit, compares them with AIC, and shows how two communities with similar species lists end up under different models. The point is not that a fitted model reveals a mechanism. It is that the rank-abundance curve carries structure that a single index hides, and that model selection gives a compact, comparable summary of that structure.

Two synthetic communities

The data are illustrative. One community has strong dominance, like an early-successional site where one or two species pre-empt most of the available space. The other is more even, with many species of intermediate abundance and a few rare ones, closer to a mature stand.

library(vegan)
library(ggplot2)
library(dplyr)
library(tidyr)

# Quarry: early-successional, niche pre-emption (geometric dominance).
# Each species takes a fixed fraction k of the resource left by the previous one.
set.seed(19)
S1 <- 15; k <- 0.38; Ntot1 <- 2600
rem <- 1; props <- numeric(S1)
for (i in 1:S1) { props[i] <- k * rem; rem <- rem - props[i] }
props <- props / sum(props)
quarry <- rpois(S1, props * Ntot1); quarry <- quarry[quarry > 0]

# Forest: mature, lognormal SAD (more even, with an internal mode in log abundance).
set.seed(204)
raw <- rlnorm(55, meanlog = 1.7, sdlog = 1.05)
forest <- round(raw * 4200 / sum(raw)); forest <- forest[forest > 0]

c(quarry_S = length(quarry), quarry_N = sum(quarry),
  forest_S = length(forest), forest_N = sum(forest))
quarry_S quarry_N forest_S forest_N 
      15     2556       55     4199 

The quarry holds 15 species in 2556 individuals; the forest holds 55 species in 4199. Sorted from most to least abundant, the quarry falls away fast (951 down to a single individual), while the forest descends more gently.

What the indices say, and what they hide

idx <- function(x) {
  H <- diversity(x, "shannon")
  data.frame(
    S = sum(x > 0), N = sum(x),
    Shannon = round(H, 2),
    Simpson = round(diversity(x, "simpson"), 3),
    Pielou_J = round(H / log(sum(x > 0)), 2),
    Hill_q1 = round(exp(H), 1))
}
knitr::kable(rbind(Quarry = idx(quarry), Forest = idx(forest)))
S N Shannon Simpson Pielou_J Hill_q1
Quarry 15 2556 1.79 0.775 0.66 6.0
Forest 55 4199 3.51 0.953 0.88 33.5

The forest is more diverse and far more even (Pielou’s J of 0.88 against 0.66). These numbers tell you the forest is flatter, but they do not tell you the form of the flattening: whether the abundances decline geometrically, follow a lognormal hump, or fall on a power law. For that you need the curve itself.

Rank-abundance curves and the radfit models

A rank-abundance (dominance/diversity) plot puts species rank on the x-axis and abundance on a log y-axis (Whittaker 1965). radfit fits five models to these ranked counts as Poisson generalized linear models, choosing parameters by minimizing deviance (Wilson 1991):

  • Null (broken stick): no free parameters, a neutral partition of the resource axis.
  • Pre-emption (geometric series): one parameter, a constant fraction taken at each rank. A straight line on the log axis.
  • Lognormal: two parameters, a hump in log abundance.
  • Zipf: two parameters, a power law with a heavy tail of rare species.
  • Zipf-Mandelbrot: three parameters, a more flexible version of Zipf.
rq <- radfit(quarry)
rf <- radfit(forest)
rq

RAD models, family poisson 
No. of species 15, total abundance 2556

           par1     par2        par3        Deviance AIC      BIC     
Null                                        898.0054 980.0930 980.0930
Preemption  0.36481                           4.2137  88.3013  89.0093
Lognormal   4.0429   1.6287                 164.3905 250.4780 251.8941
Zipf        0.46207 -1.462                  409.4065 495.4940 496.9101
Mandelbrot    Inf   -1.2575e+06  2.7709e+06   4.2106  92.2982  94.4223
rf

RAD models, family poisson 
No. of species 55, total abundance 4199

           par1      par2     par3    Deviance AIC    BIC   
Null                                  328.26   639.40 639.40
Preemption  0.065396                  566.15   879.29 881.30
Lognormal   3.836     1.0223          160.36   475.50 479.52
Zipf        0.15122  -0.80827         348.64   663.78 667.79
Mandelbrot  0.35425  -1.0595   1.6867 300.84   617.98 624.01
ink <- "#16241d"; forest_col <- "#275139"; gold <- "#cda23f"; red <- "#b5534e"
faint <- "#5d6b61"; line_col <- "#dad9ca"; paper <- "#f5f4ee"

fitcurve <- function(r, model, site) {
  f <- r$models[[model]]$fitted.values
  data.frame(rank = seq_along(f), fit = f, model = model, site = site)
}
mods <- c("Preemption", "Lognormal", "Zipf")
fit <- bind_rows(
  lapply(mods, fitcurve, r = rq, site = "Quarry (pre-emption)"),
  lapply(mods, fitcurve, r = rf, site = "Forest (lognormal)"))
obs <- bind_rows(
  data.frame(rank = seq_along(quarry), abund = sort(quarry, decreasing = TRUE),
             site = "Quarry (pre-emption)"),
  data.frame(rank = seq_along(forest), abund = sort(forest, decreasing = TRUE),
             site = "Forest (lognormal)"))
lev <- c("Quarry (pre-emption)", "Forest (lognormal)")
fit$site <- factor(fit$site, lev); obs$site <- factor(obs$site, lev)
fit$model <- factor(fit$model, levels = mods)

te <- theme_minimal(base_size = 12) + theme(
  panel.grid.minor = element_blank(),
  panel.grid.major = element_line(color = line_col, linewidth = 0.3),
  plot.background = element_rect(fill = paper, color = NA),
  panel.background = element_rect(fill = paper, color = NA),
  strip.text = element_text(face = "bold", color = ink, hjust = 0),
  axis.title = element_text(color = "#46604a"),
  legend.position = "top", legend.title = element_blank())

ggplot() +
  geom_line(data = fit, aes(rank, fit, color = model), linewidth = 0.9) +
  geom_point(data = obs, aes(rank, abund), color = ink, size = 1.6) +
  facet_wrap(~site, scales = "free_x") +
  scale_y_log10(labels = scales::label_number()) +
  scale_color_manual(values = c(Preemption = forest_col, Lognormal = gold, Zipf = red)) +
  labs(x = "Species rank (most to least abundant)", y = "Abundance (log scale)") + te
Rank-abundance curves for two communities on a log axis with fitted SAD models: the quarry follows a straight pre-emption line while the forest bends toward a lognormal.
Figure 1: Rank-abundance curves with three fitted SAD models. The quarry follows a straight pre-emption line on the log axis; the forest bends away from it toward a lognormal.

For the quarry, pre-emption wins outright: an AIC of 88.3 against 250.5 for the lognormal and 980.1 for the null. The fitted fraction is about 0.36, so each species in rank order holds roughly a third of what the one above it held. On the log axis that is a straight line, and the points sit on it. For the forest the order reverses: the lognormal wins (AIC 475.5), the pre-emption model is the worst fit (879.3), and the points trace the gentle S-curve that a geometric series cannot follow.

The Preston octave plot

Preston (1948) binned species into octaves, doubling abundance classes (1, 2 to 3, 4 to 7, and so on). A lognormal community shows an internal mode: a peak at some intermediate octave, with fewer species both rarer and commoner. A geometric community has no such peak; species are spread roughly evenly across octaves.

octdf <- function(x, site) {
  cl <- floor(log2(sort(x, decreasing = TRUE)))
  tb <- tabulate(cl + 1, nbins = max(cl) + 1)
  data.frame(octave = seq_along(tb) - 1, nsp = tb, site = site)
}
oct <- bind_rows(octdf(quarry, "Quarry (pre-emption)"),
                 octdf(forest, "Forest (lognormal)"))
oct$site <- factor(oct$site, lev)

ggplot(oct, aes(octave, nsp)) +
  geom_col(fill = forest_col, width = 0.85) +
  facet_wrap(~site) +
  scale_x_continuous(breaks = 0:9) +
  labs(x = expression(paste("Abundance octave (", log[2], " class)")),
       y = "Number of species") + te
Preston octave plots of species counts: the forest shows an internal peak near octave six while the quarry stays flat across octaves.
Figure 2: Preston octave plots. The forest has a clear internal mode near octave 6, the signature of a lognormal; the quarry is flat across octaves.

The forest piles up near octave 6, exactly where a lognormal puts its mode. The quarry has no peak, which is why the lognormal fits it poorly and the geometric line fits it well.

Model selection across all five models

Ranking by AIC makes the contrast explicit. The best model has a delta of zero; everything else is measured against it.

getaic <- function(r) sapply(r$models, function(m) tryCatch(AIC(m), error = function(e) NA))
aicdf <- bind_rows(
  data.frame(model = names(getaic(rq)), aic = getaic(rq), site = "Quarry (pre-emption)"),
  data.frame(model = names(getaic(rf)), aic = getaic(rf), site = "Forest (lognormal)")) %>%
  group_by(site) %>% mutate(dAIC = aic - min(aic, na.rm = TRUE), best = dAIC == 0) %>% ungroup()
aicdf$site <- factor(aicdf$site, lev)
aicdf$model <- factor(aicdf$model,
  levels = rev(c("Null", "Preemption", "Lognormal", "Zipf", "Mandelbrot")))

ggplot(aicdf, aes(dAIC, model)) +
  geom_segment(aes(x = 0, xend = dAIC, yend = model), color = line_col, linewidth = 1) +
  geom_point(aes(color = best), size = 3) +
  facet_wrap(~site, scales = "free_x") +
  scale_color_manual(values = c(`FALSE` = faint, `TRUE` = forest_col), guide = "none") +
  labs(x = expression(paste(Delta, "AIC (0 = best model)")), y = NULL) + te
Bar chart of delta-AIC by model, with pre-emption lowest for the quarry and lognormal lowest for the forest, and Zipf-Mandelbrot always close behind but never best.
Figure 3: Delta-AIC by model. Pre-emption is best for the quarry, lognormal for the forest. Zipf-Mandelbrot is close behind the winner each time but never wins, because AIC charges for its extra parameter.

One detail rewards attention. For the quarry, Zipf-Mandelbrot fits the data exactly as well as pre-emption: both reach a deviance of about 4.2. But Mandelbrot carries one extra parameter, so its AIC is 92.3 against 88.3, a delta of 4. Equal fit, worse score. This is parsimony in action: when two models describe the data equally well, the simpler one is preferred, and AIC does the bookkeeping. The extra flexibility of Mandelbrot only pays off in the forest, where the lognormal still beats it.

How far to read these fits

A fitted SAD model is a description, not a diagnosis. Many processes can generate the same curve, and the SAD on its own rarely discriminates among them (McGill et al. 2007). A geometric fit is consistent with sequential niche pre-emption, but it does not prove it; a lognormal can arise from the central limit theorem acting on multiplicative growth, with no special ecology required. Treat the winning model as a compact summary of community structure that lets you compare sites, not as evidence for a mechanism.

A few practical notes. The models are fitted to ranked counts, so they describe relative abundance, not absolute density; rescale before comparing communities sampled at different efforts. With strongly geometric data the three-parameter Mandelbrot model can fail to converge and radfit returns NA for it, which is informative rather than alarming: the data do not support the extra parameters. And radfit works on a single community vector here, but it also accepts a site-by-species matrix and fits every row at once, which is the usual way to compare many plots.

References

Fisher RA, Corbet AS, Williams CB (1943). The relation between the number of species and the number of individuals in a random sample of an animal population. Journal of Animal Ecology 12(1):42-58. doi:10.2307/1411

Preston FW (1948). The commonness and rarity of species. Ecology 29(3):254-283. doi:10.2307/1930989

Whittaker RH (1965). Dominance and diversity in land plant communities. Science 147(3655):250-260. doi:10.1126/science.147.3655.250

Wilson JB (1991). Methods for fitting dominance/diversity curves. Journal of Vegetation Science 2(1):35-46. doi:10.2307/3235896

McGill BJ, Etienne RS, Gray JS, et al. (2007). Species abundance distributions: moving beyond single prediction theories to integration within an ecological framework. Ecology Letters 10(10):995-1015. doi:10.1111/j.1461-0248.2007.01094.x

Oksanen J, Simpson G, Blanchet F, et al. (2022). vegan: Community Ecology Package. R package version 2.6-4.