Start here: ecological data analysis in R
This page is a reading order, not a feed. The tutorials below are grouped by the stage of a typical analysis, so you can follow a path instead of picking posts at random. Each section is roughly self-contained; jump to the stage you need, or read top to bottom.
If you are new to R for ecology, start with the foundations, then move to whichever data type you work with: community tables (diversity, ordination), counts and presence-absence (GLMs), grouped or repeated measurements (mixed models), or spatial layers (GIS).
Foundations
Get the basics of estimation and a reproducible setup in place before modelling anything.
- Standard errors and confidence intervals - what SE actually measures, t-based CIs for a mean, and why coverage matters.
- Bootstrap confidence intervals - intervals when there is no formula: percentile and BCa, with worked ecological cases.
- A reproducible statistical workflow - where to put set.seed, version pinning with sessionInfo, relative paths, and clean-session rendering.
Diversity and community description
Summarise what is in your samples: richness, evenness, and the structure of an assemblage.
- Diversity indices in R - richness, Shannon and Simpson from a site-by-species matrix.
- When not to use Shannon - why one number hides richness and evenness, and what Hill numbers give you instead.
- Rarefaction and accumulation curves - comparing richness fairly across uneven sampling effort.
- Species abundance distributions - rank-abundance and fitting SAD models.
- Functional diversity - diversity from continuous and categorical traits.
- Phylogenetic diversity with picante - Faith PD and mean pairwise distance.
- Beta diversity partitioning - splitting turnover from nestedness in the Baselga framework.
Ordination and multivariate structure
Explore gradients and group differences in multivariate community data.
- Choosing a dissimilarity index - the decision that sits underneath every ordination and PERMANOVA.
- NMDS ordination - non-metric ordination of community composition.
- PCA on environmental data - ordination for abiotic variables, with standardisation.
- Constrained ordination with dbRDA - relating composition to measured predictors.
- capscale vs dbRDA - the two distance-based constrained ordinations in vegan.
- RDA vs CCA and gradient length - choosing a linear or unimodal method from DCA gradient length.
- Variance partitioning - how much variation environment and space each explain.
- Non-linear gradients with ordisurf - a smooth GAM surface where a straight arrow would mislead.
- envfit and PERMANOVA - fitting vectors and testing group differences with adonis2 and betadisper.
- Pairwise PERMANOVA - which groups differ after a significant omnibus test.
- Common PERMANOVA mistakes - dispersion vs location, permutation structure, and unbalanced designs.
- Hierarchical clustering and dendrograms - grouping sites from a Bray-Curtis distance.
- Mantel tests - correlating two distance matrices.
- Indicator species analysis - which species characterise which groups.
- Co-occurrence null models - testing whether species associations differ from random.
Regression, GLMs and the modelling workflow
Model how a response depends on predictors, for continuous, count and presence-absence data.
- Classical tests as linear models - the t-test and ANOVA seen as one framework.
- Contrasts and post-hoc tests - which groups differ after a significant ANOVA.
- Logistic regression for presence-absence - binomial GLMs with a logit link.
- GLMs for count data - why not to log-transform counts, and what to fit instead.
- Zero-inflated count models - ZINB and hurdle models for excess zeros.
- Offsets for rates and densities - modelling per-effort rates correctly in a Poisson GLM.
- GAM species response curves - smooth response shapes with mgcv.
- GLM residual diagnostics - why Pearson and deviance residuals mislead, and what to check.
- Predicting on the response scale - building CIs on the link scale, then back-transforming.
- Interaction terms in GLMs - reading an interaction through predictions, not coefficients.
- Contrasts after a GLM - marginal means and pairwise contrasts with multiplicity correction.
- Collinearity and VIF - how correlated predictors inflate standard errors.
- Model selection and AIC - what AIC measures and how to use it.
- Power analysis by simulation - estimating power when there is no closed-form formula.
Spatial data and GIS
Work with coordinates, rasters and the link between QGIS and R.
- Richness mapping with sf - spatial vector work and mapping in R.
- Raster basics with terra - raster structure and map algebra.
- QGIS to R spatial join - a hybrid workflow between the QGIS GUI and sf.
- Spatial autocorrelation and Moran’s I - testing for spatial dependence with spdep.
- Kriging and spatial interpolation - continuous surfaces from scattered point samples.
Communicating results
- Publication-quality ggplot figures - physical size, resolution and clean export with ggsave.
Common pitfalls
A few posts above are dedicated to mistakes that are easy to make and hard to spot. If something looks wrong, these are the ones to reread: pseudoreplication, common PERMANOVA mistakes, collinearity and VIF, when not to use Shannon, GLM residual diagnostics, and offsets for rates and densities.