Estimating population trends in R

population dynamics

time series

GAMs

mgcv

ecology tutorial

Why one linear rate can hide a population crash and recovery, using a GAM in R to show trend shape, and why count data need a count model rather than log-OLS.

Author

Tidy Ecology

Published

2026-07-08

A population trend is often reported as a single number: so many percent per year. That summary answers one question, the average rate of change, but it quietly assumes the change has a constant rate. When a population declines and then recovers, or holds steady and then collapses, the single rate can be close to zero while the trajectory tells a story a manager would need to act on. This post fits a smooth trend with a generalised additive model in R to recover the shape a linear rate throws away, and then makes a second point that matters for monitoring data: counts should be modelled as counts, not as log-transformed responses in an ordinary regression.

A trend that averages to nothing

Consider thirty years of counts from a population that falls to less than half its size and then climbs back. The true mean follows a smooth dip, from about ninety down to the high thirties near the middle and back to ninety, and the counts are Poisson draws around it.

T<-30; year<-1:T
logmu <- log(90) - 0.9*sin(pi*(year-1)/(T-1))   # dip to the high 30s, then recover
set.seed(311); count <- rpois(T, exp(logmu))
dat <- data.frame(year, count)

Fit two trend models. The first is a log-linear Poisson regression, the constant-rate trend that a single percentage per year comes from. The second is a Poisson generalised additive model with a smooth term in year, whose wiggliness is chosen by restricted maximum likelihood rather than fixed in advance.

glin <- glm(count ~ year, family=poisson, data=dat)
gam1 <- gam(count ~ s(year), family=poisson, data=dat, method="REML")

sl <- summary(glin)$coefficients["year", ]
edf <- sum(gam1$edf[-1]); de <- summary(gam1)$dev.expl
c(pct_per_yr = round(100*(exp(sl["Estimate"])-1), 2),
  lin_p      = round(sl["Pr(>|z|)"], 2),
  gam_edf    = round(edf, 1),
  gam_dev_expl = round(de, 2),
  dAIC       = round(AIC(glin)-AIC(gam1), 0))

pct_per_yr.Estimate      lin_p.Pr(>|z|)             gam_edf        gam_dev_expl 
               0.02                0.95                4.30                0.86 
               dAIC 
             162.00

The linear model reports a change of $0.02\%$ per year, with a $p$ value of $0.95$: essentially zero on the log scale. Read on its own that is a flat line, no trend, a stable population. The additive model tells a different story. Its smooth term uses about $4.3$ effective degrees of freedom, explains $86\%$ of the deviance, and improves the AIC by $162$ over the straight line. The population did not sit still; it crashed and recovered, and only the model that allows a shape can see it.

gg <- data.frame(year=seq(1,T,length.out=200))
pr <- predict(gam1, gg, type="link", se.fit=TRUE)
gg$fit<-exp(pr$fit); gg$lo<-exp(pr$fit-1.96*pr$se.fit); gg$hi<-exp(pr$fit+1.96*pr$se.fit)
gl <- data.frame(year=seq(1,T,length.out=200)); gl$fit <- exp(predict(glin, gl))
ggplot()+
  geom_ribbon(data=gg, aes(year, ymin=lo, ymax=hi), fill=forest, alpha=.15)+
  geom_point(data=dat, aes(year, count), colour=ink, size=2, alpha=.85)+
  geom_line(data=gl, aes(year, fit, colour="Linear trend"), linewidth=1)+
  geom_line(data=gg, aes(year, fit, colour="GAM smooth"), linewidth=1)+
  scale_colour_manual(values=c("Linear trend"=gold,"GAM smooth"=forest))+
  labs(x="year", y="count", colour=NULL,
       title="A flat linear trend hides a crash and recovery")+
  theme_te()

Scatter of counts over thirty years, a nearly flat gold linear trend line, and a green GAM smooth that dips near year 16 and climbs back, with a shaded confidence band. — Figure 1: Monitoring counts with two fitted trends. The linear Poisson fit is nearly flat and reports no trend, while the GAM smooth (with a 95 percent band) recovers the decline and recovery.

The smooth is not a licence to chase every wiggle. The penalty on its wiggliness, tuned by restricted maximum likelihood, is what keeps it from interpolating the noise, and the same fit collapses back towards a straight line when the data really are linear, because a straight line is the smoothest option available to it. The effective degrees of freedom, here about four, report how much curvature the data actually support. This flexible-trend approach is the basis of the generalised additive models used for national bird monitoring by Fewster and colleagues.

Model counts as counts

The second point is about the response, not its shape. Monitoring counts are non-negative integers, often small, sometimes zero, and their variance grows with their mean. A common habit is to take logs and run an ordinary regression, but $\log 0$ is undefined, so people add a constant first, and that constant is an arbitrary choice that changes the answer. Take a shorter series of low counts with a genuine decline and a couple of zeros.

T2<-25; yr2<-1:T2; logmu2 <- log(9) - 0.075*(yr2-1)
set.seed(69); y2 <- rpois(T2, exp(logmu2))
dat2 <- data.frame(year=yr2, count=y2)

gp <- glm(count ~ year, family=poisson, data=dat2)         # count model
o1 <- lm(log(count+1)   ~ year, data=dat2)                 # log(y+1) OLS
o5 <- lm(log(count+0.5) ~ year, data=dat2)                 # log(y+0.5) OLS
round(c(n_zeros=sum(y2==0),
        poisson_pct=100*(exp(coef(gp)["year"])-1),
        logp1_pct=100*(exp(coef(o1)["year"])-1),
        logp0.5_pct=100*(exp(coef(o5)["year"])-1)), 2)

         n_zeros poisson_pct.year   logp1_pct.year logp0.5_pct.year 
            2.00            -9.50            -7.59            -8.97

The Poisson model estimates a decline of about $9.5\%$ per year and needs no fudge for the two zeros. The log-transformed regressions give $7.6\%$ per year when the added constant is one and $9.0\%$ per year when it is a half: a swing of roughly a fifth of the estimate, driven entirely by a number the analyst picked to avoid taking the log of zero. The count model has no such lever. It handles zeros directly, its log link keeps predictions positive, and its Poisson variance matches the way count scatter grows with the mean, none of which the log-plus-constant regression gets right.

gg2 <- data.frame(year=seq(1,T2,length.out=200))
gg2$pois <- exp(predict(gp, gg2))
gg2$logols <- exp(predict(o1, gg2)) - 1
ggplot()+
  geom_point(data=dat2, aes(year, count), colour=ink, size=2, alpha=.85)+
  geom_line(data=gg2, aes(year, pois, colour="Poisson GLM"), linewidth=1)+
  geom_line(data=gg2, aes(year, logols, colour="log(y+1) OLS"), linewidth=1, linetype="22")+
  scale_colour_manual(values=c("Poisson GLM"=forest,"log(y+1) OLS"=aban))+
  labs(x="year", y="count", colour=NULL,
       title="Count model against log-transformed regression")+
  theme_te()

Scatter of low counts over twenty-five years with two zeros, a green Poisson GLM decline and a red dashed log-transformed regression decline that sit apart from each other. — Figure 2: Low counts with two fitted declines. The Poisson GLM and the log(y+1) regression diverge, because the added constant needed to log the zeros shifts the transformed fit.

What to take from this

Two habits are worth keeping. First, before quoting a single rate of change, fit a smooth trend and look at the shape; if the smooth is close to a straight line the single rate is a fair summary, and if it is not, the rate is hiding something. Second, model counts with a count distribution, moving to a negative binomial or a mixed model when the scatter is wider than Poisson allows, rather than logging the counts and reaching for an ordinary regression. Neither step is more work than the analysis it replaces, and both keep the trend honest. The rate of change also says nothing about the mechanism behind it, which is where the earlier tools for density dependence and process versus observation error come back in.

References

Fewster RM, Buckland ST, Siriwardena GM, Baillie SR & Wilson JD 2000 Ecology 81(7):1970-1984 (10.1890/0012-9658(2000)081[1970:AOPTFF]2.0.CO;2)

Wood SN 2011 Journal of the Royal Statistical Society Series B 73(1):3-36 (10.1111/j.1467-9868.2010.00749.x)

O’Hara RB & Kotze DJ 2010 Methods in Ecology and Evolution 1(2):118-122 (10.1111/j.2041-210X.2010.00021.x)

--- title: "Estimating population trends in R" description: "Why one linear rate can hide a population crash and recovery, using a GAM in R to show trend shape, and why count data need a count model rather than log-OLS." date: "2026-07-08 11:00" categories: [R, population dynamics, time series, GAMs, mgcv, ecology tutorial] image: thumbnail.png image-alt: "Monitoring counts that dip and recover, with a flat linear trend line and a curved GAM smooth revealing the trajectory." --- A population trend is often reported as a single number: so many percent per year. That summary answers one question, the average rate of change, but it quietly assumes the change has a constant rate. When a population declines and then recovers, or holds steady and then collapses, the single rate can be close to zero while the trajectory tells a story a manager would need to act on. This post fits a smooth trend with a generalised additive model in R to recover the shape a linear rate throws away, and then makes a second point that matters for monitoring data: counts should be modelled as counts, not as log-transformed responses in an ordinary regression. ```{r} #| label: setup #| include: false suppressPackageStartupMessages({library(mgcv); library(ggplot2); library(dplyr)}) ink<-"#16241d"; body<-"#2c3a31"; forest<-"#275139"; gold<-"#c9b458" paper<-"#f5f4ee"; line<-"#dad9ca"; faint<-"#5d6b61"; aban<-"#b5534e"; mown<-"#2f8f63" theme_te <- function(){ theme_minimal(base_size=12)+ theme(plot.background=element_rect(fill=paper,colour=NA), panel.background=element_rect(fill=paper,colour=NA), panel.grid.minor=element_blank(), panel.grid.major=element_line(colour=line,linewidth=.3), axis.title=element_text(colour=ink), axis.text=element_text(colour=faint), plot.title=element_text(colour=ink,size=12,face="bold"), strip.text=element_text(colour=ink,face="bold"), legend.position="bottom", legend.title=element_text(colour=ink), legend.text=element_text(colour=body)) } ``` ## A trend that averages to nothing Consider thirty years of counts from a population that falls to less than half its size and then climbs back. The true mean follows a smooth dip, from about ninety down to the high thirties near the middle and back to ninety, and the counts are Poisson draws around it. ```{r} #| label: simulate T<-30; year<-1:T logmu <- log(90) - 0.9*sin(pi*(year-1)/(T-1)) # dip to the high 30s, then recover set.seed(311); count <- rpois(T, exp(logmu)) dat <- data.frame(year, count) ``` Fit two trend models. The first is a log-linear Poisson regression, the constant-rate trend that a single percentage per year comes from. The second is a Poisson generalised additive model with a smooth term in year, whose wiggliness is chosen by restricted maximum likelihood rather than fixed in advance. ```{r} #| label: fit glin <- glm(count ~ year, family=poisson, data=dat) gam1 <- gam(count ~ s(year), family=poisson, data=dat, method="REML") sl <- summary(glin)$coefficients["year", ] edf <- sum(gam1$edf[-1]); de <- summary(gam1)$dev.expl c(pct_per_yr = round(100*(exp(sl["Estimate"])-1), 2), lin_p = round(sl["Pr(>|z|)"], 2), gam_edf = round(edf, 1), gam_dev_expl = round(de, 2), dAIC = round(AIC(glin)-AIC(gam1), 0)) ``` The linear model reports a change of $0.02\%$ per year, with a $p$ value of $0.95$: essentially zero on the log scale. Read on its own that is a flat line, no trend, a stable population. The additive model tells a different story. Its smooth term uses about $4.3$ effective degrees of freedom, explains $86\%$ of the deviance, and improves the AIC by $162$ over the straight line. The population did not sit still; it crashed and recovered, and only the model that allows a shape can see it. ```{r} #| label: fig-trend #| fig-cap: "Monitoring counts with two fitted trends. The linear Poisson fit is nearly flat and reports no trend, while the GAM smooth (with a 95 percent band) recovers the decline and recovery." #| fig-alt: "Scatter of counts over thirty years, a nearly flat gold linear trend line, and a green GAM smooth that dips near year 16 and climbs back, with a shaded confidence band." gg <- data.frame(year=seq(1,T,length.out=200)) pr <- predict(gam1, gg, type="link", se.fit=TRUE) gg$fit<-exp(pr$fit); gg$lo<-exp(pr$fit-1.96*pr$se.fit); gg$hi<-exp(pr$fit+1.96*pr$se.fit) gl <- data.frame(year=seq(1,T,length.out=200)); gl$fit <- exp(predict(glin, gl)) ggplot()+ geom_ribbon(data=gg, aes(year, ymin=lo, ymax=hi), fill=forest, alpha=.15)+ geom_point(data=dat, aes(year, count), colour=ink, size=2, alpha=.85)+ geom_line(data=gl, aes(year, fit, colour="Linear trend"), linewidth=1)+ geom_line(data=gg, aes(year, fit, colour="GAM smooth"), linewidth=1)+ scale_colour_manual(values=c("Linear trend"=gold,"GAM smooth"=forest))+ labs(x="year", y="count", colour=NULL, title="A flat linear trend hides a crash and recovery")+ theme_te() ``` The smooth is not a licence to chase every wiggle. The penalty on its wiggliness, tuned by restricted maximum likelihood, is what keeps it from interpolating the noise, and the same fit collapses back towards a straight line when the data really are linear, because a straight line is the smoothest option available to it. The effective degrees of freedom, here about four, report how much curvature the data actually support. This flexible-trend approach is the basis of the generalised additive models used for national bird monitoring by Fewster and colleagues. ## Model counts as counts The second point is about the response, not its shape. Monitoring counts are non-negative integers, often small, sometimes zero, and their variance grows with their mean. A common habit is to take logs and run an ordinary regression, but $\log 0$ is undefined, so people add a constant first, and that constant is an arbitrary choice that changes the answer. Take a shorter series of low counts with a genuine decline and a couple of zeros. ```{r} #| label: lowcount T2<-25; yr2<-1:T2; logmu2 <- log(9) - 0.075*(yr2-1) set.seed(69); y2 <- rpois(T2, exp(logmu2)) dat2 <- data.frame(year=yr2, count=y2) gp <- glm(count ~ year, family=poisson, data=dat2) # count model o1 <- lm(log(count+1) ~ year, data=dat2) # log(y+1) OLS o5 <- lm(log(count+0.5) ~ year, data=dat2) # log(y+0.5) OLS round(c(n_zeros=sum(y2==0), poisson_pct=100*(exp(coef(gp)["year"])-1), logp1_pct=100*(exp(coef(o1)["year"])-1), logp0.5_pct=100*(exp(coef(o5)["year"])-1)), 2) ``` The Poisson model estimates a decline of about $9.5\%$ per year and needs no fudge for the two zeros. The log-transformed regressions give $7.6\%$ per year when the added constant is one and $9.0\%$ per year when it is a half: a swing of roughly a fifth of the estimate, driven entirely by a number the analyst picked to avoid taking the log of zero. The count model has no such lever. It handles zeros directly, its log link keeps predictions positive, and its Poisson variance matches the way count scatter grows with the mean, none of which the log-plus-constant regression gets right. ```{r} #| label: fig-count #| fig-cap: "Low counts with two fitted declines. The Poisson GLM and the log(y+1) regression diverge, because the added constant needed to log the zeros shifts the transformed fit." #| fig-alt: "Scatter of low counts over twenty-five years with two zeros, a green Poisson GLM decline and a red dashed log-transformed regression decline that sit apart from each other." gg2 <- data.frame(year=seq(1,T2,length.out=200)) gg2$pois <- exp(predict(gp, gg2)) gg2$logols <- exp(predict(o1, gg2)) - 1 ggplot()+ geom_point(data=dat2, aes(year, count), colour=ink, size=2, alpha=.85)+ geom_line(data=gg2, aes(year, pois, colour="Poisson GLM"), linewidth=1)+ geom_line(data=gg2, aes(year, logols, colour="log(y+1) OLS"), linewidth=1, linetype="22")+ scale_colour_manual(values=c("Poisson GLM"=forest,"log(y+1) OLS"=aban))+ labs(x="year", y="count", colour=NULL, title="Count model against log-transformed regression")+ theme_te() ``` ## What to take from this Two habits are worth keeping. First, before quoting a single rate of change, fit a smooth trend and look at the shape; if the smooth is close to a straight line the single rate is a fair summary, and if it is not, the rate is hiding something. Second, model counts with a count distribution, moving to a negative binomial or a mixed model when the scatter is wider than Poisson allows, rather than logging the counts and reaching for an ordinary regression. Neither step is more work than the analysis it replaces, and both keep the trend honest. The rate of change also says nothing about the mechanism behind it, which is where the earlier tools for [density dependence](../detecting-density-dependence/) and [process versus observation error](../gompertz-state-space-model/) come back in. ## References Fewster RM, Buckland ST, Siriwardena GM, Baillie SR & Wilson JD 2000 Ecology 81(7):1970-1984 (10.1890/0012-9658(2000)081[1970:AOPTFF]2.0.CO;2) Wood SN 2011 Journal of the Royal Statistical Society Series B 73(1):3-36 (10.1111/j.1467-9868.2010.00749.x) O'Hara RB & Kotze DJ 2010 Methods in Ecology and Evolution 1(2):118-122 (10.1111/j.2041-210X.2010.00021.x) ## Related tutorials - [Species response curves with GAMs](../gam-species-response-curves/) - [Model selection with AIC](../model-selection-aic/) - [Detecting density dependence](../detecting-density-dependence/) - [The Gompertz state-space model](../gompertz-state-space-model/)