When not to use the Shannon diversity index

diversity

ecology tutorial

vegan

The Shannon index blends richness and evenness into one number, which can mislead. When Hill numbers, evenness, or richness tell a clearer ecological story.

Author

Tidy Ecology

Published

2026-06-30

The Shannon index is the reflex choice for summarising diversity, and that is part of the problem. It compresses two distinct properties of a community, how many species are present and how evenly the individuals are spread among them, into a single number. When those two properties move together the summary is fine. When they move in opposite directions, a single Shannon value can hide the very difference you are trying to detect, and can even rank communities backwards. This post shows that failure with a worked example, then points to the tools that keep richness and evenness visible.

library(vegan)
library(ggplot2)

paper <- "#f5f4ee"; ink <- "#16241d"; forest <- "#275139"
gold  <- "#cda23f"; abandoned <- "#b5534e"; faint <- "#5d6b61"; line <- "#dad9ca"

theme_te <- theme_minimal(base_size = 12) +
  theme(panel.grid.minor = element_blank(),
        panel.grid.major = element_line(colour = line, linewidth = 0.3),
        plot.background = element_rect(fill = paper, colour = NA),
        panel.background = element_rect(fill = paper, colour = NA),
        axis.title = element_text(colour = ink),
        axis.text = element_text(colour = faint),
        plot.title = element_text(colour = ink, face = "bold"))

Two communities, one misleading number

Consider two sampled communities. Community A is species-rich but dominated by one abundant species. Community B has fewer species but spreads its individuals almost evenly.

A <- c(50, 25, 12, 6, 3, 2, 1, 1)   # 8 species, uneven
B <- c(22, 20, 19, 18, 16)          # 5 species, even

H <- function(x) { p <- x / sum(x); -sum(p * log(p)) }

c(shannon_A = H(A), shannon_B = H(B),
  richness_A = sum(A > 0), richness_B = sum(B > 0),
  evenness_A = H(A) / log(sum(A > 0)),
  evenness_B = H(B) / log(sum(B > 0)))

 shannon_A  shannon_B richness_A richness_B evenness_A evenness_B 
 1.3919240  1.6038766  8.0000000  5.0000000  0.6693740  0.9965445

Community A holds eight species, Community B only five, so A is plainly richer. Yet its Shannon index, 1.39, is lower than B’s 1.6. By Shannon alone you would call B the more diverse community and never learn that A has more species. The reason sits in the evenness column: A’s Pielou evenness is 0.67 against B’s 1. Shannon weighs evenness heavily, so B’s near-perfect balance outweighs A’s extra species. Whether that is the right conclusion depends entirely on the ecological question, and the single number does not let you ask it.

Hill numbers keep both properties visible

The cleaner approach reports diversity as a function of one parameter, the order q, which sets how much rare species count. Hill numbers express each value as an effective number of species, the number of equally-common species that would give the same diversity. At q = 0 rare species count fully and the Hill number is simply richness. At q = 1 species count in proportion to their abundance, which is the exponential of the Shannon index. At q = 2 common species dominate, giving the inverse Simpson concentration. Plotting the whole profile shows the trade-off directly.

qfull <- seq(0, 3, by = 0.05)
prof <- rbind(
  data.frame(q = qfull, D = as.numeric(renyi(A, scales = qfull, hill = TRUE)),
             community = "A: rich, uneven"),
  data.frame(q = qfull, D = as.numeric(renyi(B, scales = qfull, hill = TRUE)),
             community = "B: poor, even")
)

ggplot(prof, aes(q, D, colour = community)) +
  geom_line(linewidth = 1) +
  geom_vline(xintercept = 0.6, colour = faint, linetype = "dashed", linewidth = 0.4) +
  scale_colour_manual(values = c("A: rich, uneven" = abandoned,
                                 "B: poor, even" = forest)) +
  labs(x = "order q (weight on common species)",
       y = "effective number of species",
       colour = NULL, title = "Which community is more diverse? It depends on q") +
  theme_te + theme(legend.position = "top")

Figure 1: Diversity profiles for the two communities as a function of the order q. The richer but uneven community A starts higher at q = 0, but the curves cross near q = 0.6, after which the even community B is more diverse. No single value of q, and so no single index, settles which community wins.

The crossing is the whole point. At q = 0 community A leads, because it has more species. By q = 2 community B leads, because its individuals are evenly distributed. A single index picks one point on the horizontal axis and pretends the rest does not exist. Reporting the profile, or at least the trio of richness, exponential Shannon, and inverse Simpson, makes the trade-off explicit and reproducible.

Shannon also shifts with sample size

A second, quieter trap: the Shannon index is biased downward in small samples, because rare species go undetected and the estimate of evenness is distorted. Subsample one community at increasing effort and the estimate drifts upward toward its true value.

set.seed(88)
pool <- rep(seq_along(A), A * 40)        # large community with A's proportions
subsizes <- c(20, 50, 100, 300, 800)
res <- data.frame(
  n = subsizes,
  H = sapply(subsizes, function(n) {
    mean(replicate(200, { s <- sample(pool, n); p <- table(s) / n; -sum(p * log(p)) }))
  })
)

ggplot(res, aes(n, H)) +
  geom_hline(yintercept = H(A), colour = faint, linetype = "dashed") +
  geom_line(colour = gold, linewidth = 1) + geom_point(colour = gold, size = 2) +
  scale_x_log10(breaks = subsizes) +
  labs(x = "subsample size (log scale)", y = "estimated Shannon index",
       title = "Shannon drifts upward with sampling effort") +
  theme_te

A rising curve of estimated Shannon index against subsample size on a log axis, climbing from about one point two and flattening near one point four, with a horizontal line at the true value. — Figure 2: Shannon index of subsamples drawn from a fixed community at increasing sample size. The estimate climbs from well below the true value and only stabilises with several hundred individuals. Comparing Shannon across samples of different sizes is comparing the sampling effort as much as the diversity.

The estimate starts near 1.22 at twenty individuals and only reaches the true 1.39 after several hundred. If two sites were sampled with different effort, a difference in Shannon could be nothing more than a difference in how hard you looked.

What to do instead

None of this means abandoning Shannon, which is a reasonable q = 1 summary when communities are sampled comparably and you want one evenness-sensitive number. It means not leaning on it alone. Report richness and an evenness measure separately, so the two properties stay legible. Better still, report Hill numbers at q = 0, 1, and 2, or the full diversity profile, which expresses everything in the same interpretable unit of effective species and reveals crossings a single index would bury. And when comparing across samples, standardise the effort first, by rarefaction or coverage, so the comparison is of diversity rather than of sampling. The extra columns cost almost nothing and stop a single number from quietly making the decision for you.

References

Hill MO 1973 Ecology 54(2):427-432 (10.2307/1934352)

Jost L 2006 Oikos 113(2):363-375 (10.1111/j.2006.0030-1299.14714.x)

--- title: "When not to use the Shannon diversity index" description: "The Shannon index blends richness and evenness into one number, which can mislead. When Hill numbers, evenness, or richness tell a clearer ecological story." date: "2026-06-30 12:00" categories: [diversity, ecology tutorial, vegan, R] image: thumbnail.png image-alt: "Two diversity profiles plotted against the order q, crossing each other, showing that which community is more diverse depends on q." --- The Shannon index is the reflex choice for summarising diversity, and that is part of the problem. It compresses two distinct properties of a community, how many species are present and how evenly the individuals are spread among them, into a single number. When those two properties move together the summary is fine. When they move in opposite directions, a single Shannon value can hide the very difference you are trying to detect, and can even rank communities backwards. This post shows that failure with a worked example, then points to the tools that keep richness and evenness visible. ```{r} #| label: setup #| message: false #| warning: false library(vegan) library(ggplot2) paper <- "#f5f4ee"; ink <- "#16241d"; forest <- "#275139" gold <- "#cda23f"; abandoned <- "#b5534e"; faint <- "#5d6b61"; line <- "#dad9ca" theme_te <- theme_minimal(base_size = 12) + theme(panel.grid.minor = element_blank(), panel.grid.major = element_line(colour = line, linewidth = 0.3), plot.background = element_rect(fill = paper, colour = NA), panel.background = element_rect(fill = paper, colour = NA), axis.title = element_text(colour = ink), axis.text = element_text(colour = faint), plot.title = element_text(colour = ink, face = "bold")) ``` ## Two communities, one misleading number Consider two sampled communities. Community A is species-rich but dominated by one abundant species. Community B has fewer species but spreads its individuals almost evenly. ```{r} #| label: two-communities A <- c(50, 25, 12, 6, 3, 2, 1, 1) # 8 species, uneven B <- c(22, 20, 19, 18, 16) # 5 species, even H <- function(x) { p <- x / sum(x); -sum(p * log(p)) } c(shannon_A = H(A), shannon_B = H(B), richness_A = sum(A > 0), richness_B = sum(B > 0), evenness_A = H(A) / log(sum(A > 0)), evenness_B = H(B) / log(sum(B > 0))) ``` Community A holds eight species, Community B only five, so A is plainly richer. Yet its Shannon index, `r round(H(A), 2)`, is lower than B's `r round(H(B), 2)`. By Shannon alone you would call B the more diverse community and never learn that A has more species. The reason sits in the evenness column: A's Pielou evenness is `r round(H(A)/log(sum(A>0)), 2)` against B's `r round(H(B)/log(sum(B>0)), 2)`. Shannon weighs evenness heavily, so B's near-perfect balance outweighs A's extra species. Whether that is the right conclusion depends entirely on the ecological question, and the single number does not let you ask it. ## Hill numbers keep both properties visible The cleaner approach reports diversity as a function of one parameter, the order q, which sets how much rare species count. Hill numbers express each value as an effective number of species, the number of equally-common species that would give the same diversity. At q = 0 rare species count fully and the Hill number is simply richness. At q = 1 species count in proportion to their abundance, which is the exponential of the Shannon index. At q = 2 common species dominate, giving the inverse Simpson concentration. Plotting the whole profile shows the trade-off directly. ```{r} #| label: fig-profile #| fig-cap: "Diversity profiles for the two communities as a function of the order q. The richer but uneven community A starts higher at q = 0, but the curves cross near q = 0.6, after which the even community B is more diverse. No single value of q, and so no single index, settles which community wins." #| fig-alt: "Two descending curves of effective number of species against q from zero to three. The community A curve starts at eight and falls steeply; the community B curve starts at five and is nearly flat; they cross close to q equals zero point six." #| fig-width: 7 #| fig-height: 4.4 qfull <- seq(0, 3, by = 0.05) prof <- rbind( data.frame(q = qfull, D = as.numeric(renyi(A, scales = qfull, hill = TRUE)), community = "A: rich, uneven"), data.frame(q = qfull, D = as.numeric(renyi(B, scales = qfull, hill = TRUE)), community = "B: poor, even") ) ggplot(prof, aes(q, D, colour = community)) + geom_line(linewidth = 1) + geom_vline(xintercept = 0.6, colour = faint, linetype = "dashed", linewidth = 0.4) + scale_colour_manual(values = c("A: rich, uneven" = abandoned, "B: poor, even" = forest)) + labs(x = "order q (weight on common species)", y = "effective number of species", colour = NULL, title = "Which community is more diverse? It depends on q") + theme_te + theme(legend.position = "top") ``` The crossing is the whole point. At q = 0 community A leads, because it has more species. By q = 2 community B leads, because its individuals are evenly distributed. A single index picks one point on the horizontal axis and pretends the rest does not exist. Reporting the profile, or at least the trio of richness, exponential Shannon, and inverse Simpson, makes the trade-off explicit and reproducible. ## Shannon also shifts with sample size A second, quieter trap: the Shannon index is biased downward in small samples, because rare species go undetected and the estimate of evenness is distorted. Subsample one community at increasing effort and the estimate drifts upward toward its true value. ```{r} #| label: fig-samplesize #| fig-cap: "Shannon index of subsamples drawn from a fixed community at increasing sample size. The estimate climbs from well below the true value and only stabilises with several hundred individuals. Comparing Shannon across samples of different sizes is comparing the sampling effort as much as the diversity." #| fig-alt: "A rising curve of estimated Shannon index against subsample size on a log axis, climbing from about one point two and flattening near one point four, with a horizontal line at the true value." #| fig-width: 7 #| fig-height: 3.8 set.seed(88) pool <- rep(seq_along(A), A * 40) # large community with A's proportions subsizes <- c(20, 50, 100, 300, 800) res <- data.frame( n = subsizes, H = sapply(subsizes, function(n) { mean(replicate(200, { s <- sample(pool, n); p <- table(s) / n; -sum(p * log(p)) })) }) ) ggplot(res, aes(n, H)) + geom_hline(yintercept = H(A), colour = faint, linetype = "dashed") + geom_line(colour = gold, linewidth = 1) + geom_point(colour = gold, size = 2) + scale_x_log10(breaks = subsizes) + labs(x = "subsample size (log scale)", y = "estimated Shannon index", title = "Shannon drifts upward with sampling effort") + theme_te ``` The estimate starts near `r round(res$H[1], 2)` at twenty individuals and only reaches the true `r round(H(A), 2)` after several hundred. If two sites were sampled with different effort, a difference in Shannon could be nothing more than a difference in how hard you looked. ## What to do instead None of this means abandoning Shannon, which is a reasonable q = 1 summary when communities are sampled comparably and you want one evenness-sensitive number. It means not leaning on it alone. Report richness and an evenness measure separately, so the two properties stay legible. Better still, report Hill numbers at q = 0, 1, and 2, or the full diversity profile, which expresses everything in the same interpretable unit of effective species and reveals crossings a single index would bury. And when comparing across samples, standardise the effort first, by rarefaction or coverage, so the comparison is of diversity rather than of sampling. The extra columns cost almost nothing and stop a single number from quietly making the decision for you. ## References Hill MO 1973 Ecology 54(2):427-432 (10.2307/1934352) Jost L 2006 Oikos 113(2):363-375 (10.1111/j.2006.0030-1299.14714.x) ## Related tutorials - [Bootstrap confidence intervals](../bootstrap-confidence-intervals/) - [Diversity indices in R](../diversity-indices-in-r/) - [Rarefaction and accumulation curves](../rarefaction-accumulation-curves/) - [Species abundance distributions](../species-abundance-distributions-radfit/)