Computing diversity indices in R with vegan

biodiversity

tutorial

Species richness, Shannon and Simpson from a community matrix, with code you can reuse.

Author

Tidy Ecology

Published

2026-06-18

Alpha diversity, meaning how diverse a single site or sample is, is usually the first number an ecologist wants out of a species-abundance table. This post covers the three you’ll reach for most often: species richness, the Shannon index, and the Simpson index, all from the vegan package.

We’ll go from a raw community matrix to a tidy results table and a finished plot.

A community matrix

vegan expects a community matrix: one row per site (or sample), one column per species, cells holding abundances (counts or cover). Here’s a small worked example, with counts of five plant species across four meadow plots:

library(vegan)
library(dplyr)
library(ggplot2)

community <- data.frame(
  row.names = c("Plot_A", "Plot_B", "Plot_C", "Plot_D"),
  Festuca   = c(10,  0, 25,  4),
  Trifolium = c( 8, 12,  2,  5),
  Plantago  = c( 5,  3,  1,  6),
  Achillea  = c( 2,  9,  0,  5),
  Lotus     = c( 0, 15,  0,  4)
)

community

       Festuca Trifolium Plantago Achillea Lotus
Plot_A      10         8        5        2     0
Plot_B       0        12        3        9    15
Plot_C      25         2        1        0     0
Plot_D       4         5        6        5     4

Each row is a plot; each column a species. Plot_C is dominated by Festuca, while Plot_D spreads its individuals evenly across all five species.

The three indices

vegan gives you each of these in a single call:

richness <- specnumber(community)                 # number of species present
shannon  <- diversity(community, index = "shannon")
simpson  <- diversity(community, index = "simpson")

richness

Plot_A Plot_B Plot_C Plot_D 
     4      4      3      5

shannon

   Plot_A    Plot_B    Plot_C    Plot_D 
1.2550811 1.2658568 0.4086977 1.5974167

simpson

   Plot_A    Plot_B    Plot_C    Plot_D 
0.6912000 0.6982249 0.1964286 0.7951389

A few things worth knowing:

specnumber() simply counts the non-zero species. It’s the most basic measure, and blind to abundance.
Shannon (H′) rises with both richness and evenness; for real communities it usually falls between about 1.5 and 3.5.
Simpson is returned here as 1 − D: the probability that two individuals drawn at random belong to different species. It runs from 0 to 1 and is less sensitive to rare species than Shannon.

A tidy results table

Let’s pull everything into one data frame, rounded for reading:

diversity_summary <- tibble(
  plot     = rownames(community),
  richness = richness,
  shannon  = round(shannon, 2),
  simpson  = round(simpson, 2)
) |>
  arrange(desc(shannon))

diversity_summary

# A tibble: 4 × 4
  plot   richness shannon simpson
  <chr>     <int>   <dbl>   <dbl>
1 Plot_D        5    1.6     0.8 
2 Plot_B        4    1.27    0.7 
3 Plot_A        4    1.26    0.69
4 Plot_C        3    0.41    0.2

Plot_D comes out most diverse. It carries all five species at similar abundances, so its evenness is high. Plot_A and Plot_B land in the middle and almost tie (1.26 vs 1.27), despite very different species makeups. Plot_C, dominated by a single grass, scores lowest even though three species are present. That gap between richness and Shannon is the evenness effect at work.

A plot

Finally, the bit you’ll actually paste into a report: Shannon diversity per plot.

ggplot(diversity_summary, aes(x = reorder(plot, shannon), y = shannon)) +
  geom_col(fill = "#2f6b3e", width = 0.7) +
  geom_text(aes(label = shannon), hjust = -0.2, size = 3.5) +
  coord_flip() +
  labs(x = NULL, y = "Shannon diversity (H′)") +
  ylim(0, max(diversity_summary$shannon) * 1.15) +
  theme_minimal(base_size = 12)

Figure 1: Shannon diversity (H′) across the four meadow plots.

Where to go next

This is the foundation. Realistic next steps:

swap the toy matrix for your own data, read in with read.csv() into the same site-by-species shape;
add Pielou’s evenness (shannon / log(richness)) to separate richness from evenness explicitly;
once you have many sites, move from alpha to beta diversity and ordination (vegdist(), metaMDS()).

Ordination is the next post. If you hit a snag adapting this to your own data, let me know.