Fitting single-season occupancy models in R

occupancy

maximum likelihood

detection probability

ecology tutorial

Build the MacKenzie occupancy likelihood from scratch and fit it with optim to separate true site occupancy from detection probability, using base R only.

Author

Tidy Ecology

Published

2026-07-02

A companion tutorial showed that counting the sites where a species was detected underestimates how widespread it is, because a quiet site looks the same as an empty one. The repeat visits you make to each site are what rescue the estimate: the sequence of detections and blanks across visits tells you how often the species is missed, and from that you can back out how many silent sites were occupied all along.

This post writes the single-season occupancy likelihood by hand and fits it with optim. No specialised package is needed. Working through the likelihood makes clear what the model assumes and why it can separate occupancy from detection at all.

The same simulated data

We reuse the landscape from the detection-bias tutorial: 200 sites, true occupancy psi = 0.55, per-visit detection p = 0.30, four visits each. Naive occupancy came out at 0.48 against a realised truth of 0.605.

set.seed(2026)
R <- 200L; K <- 4L; psi_true <- 0.55; p_true <- 0.30
z <- rbinom(R, 1, psi_true)
y <- matrix(rbinom(R * K, 1, z * p_true), nrow = R, ncol = K)
d <- rowSums(y)                # detections per site
mean(d >= 1)                   # naive occupancy

[1] 0.48

The likelihood, site by site

Each site contributes one term to the likelihood, and there are only two cases.

If a site was detected on at least one visit, it must be occupied. The probability of its exact history is the occupancy probability times the detection pattern: psi for being occupied, p for each of the d visits with a detection, and 1 - p for each of the K - d visits without one. That gives

\[ L_i = \psi \, p^{\,d_i} (1 - p)^{\,K - d_i}, \qquad d_i \ge 1. \]

If a site was never detected across all K visits, two explanations remain. Either it is occupied and every visit missed the species, with probability psi (1 - p)^K, or it is genuinely empty, with probability 1 - psi. Since both would produce an all-blank history, the term is their sum:

\[ L_i = \psi (1 - p)^K + (1 - \psi), \qquad d_i = 0. \]

That second line is where the model earns its keep. The all-blank sites are a mixture of missed occupancy and true absence, and the relative weight of the two is pinned down by how detectable the species proves to be at the sites where it was seen. A species detected on most visits leaves few plausible misses, so most blanks are real absences; a rarely detected species leaves room for many occupied blanks.

We estimate on the logit scale so the optimiser can range over the whole real line while psi and p stay inside zero and one.

nll <- function(par, y, K) {
  psi <- plogis(par[1]); p <- plogis(par[2])
  d <- rowSums(y)
  lik <- ifelse(d > 0,
                psi * p^d * (1 - p)^(K - d),
                psi * (1 - p)^K + (1 - psi))
  -sum(log(lik))
}
fit <- optim(c(0, 0), nll, y = y, K = K, method = "BFGS", hessian = TRUE)
psi_hat <- plogis(fit$par[1])
p_hat   <- plogis(fit$par[2])
c(psi_hat = round(psi_hat, 3), p_hat = round(p_hat, 3))

psi_hat   p_hat 
  0.635   0.297

The model recovers psi_hat = 0.635 and p_hat = 0.297. The occupancy estimate has moved from the naive 0.48 up towards the realised 0.605, and the detection estimate sits almost exactly on the true 0.30, rather than the inflated 0.393 the raw detected-site average gave. The figure below places the three occupancy numbers side by side.

V <- solve(fit$hessian); sl <- sqrt(diag(V))
psi_ci <- plogis(fit$par[1] + c(-1.96, 1.96) * sl[1])
est <- data.frame(
  what = factor(c("Naive\n(>=1 detection)", "Occupancy\nmodel", "True\noccupancy"),
                levels = c("Naive\n(>=1 detection)", "Occupancy\nmodel", "True\noccupancy")),
  val  = c(mean(d >= 1), psi_hat, mean(z)),
  lo   = c(NA, psi_ci[1], NA), hi = c(NA, psi_ci[2], NA),
  kind = c("naive", "model", "truth"))
ggplot(est, aes(what, val, colour = kind)) +
  geom_hline(yintercept = mean(z), colour = te_line, linewidth = 0.5, linetype = "22") +
  geom_errorbar(aes(ymin = lo, ymax = hi), width = 0.12, linewidth = 0.8, na.rm = TRUE) +
  geom_point(size = 4) +
  geom_text(aes(label = sprintf("%.2f", val)), vjust = -1.1, size = 3.6, colour = te_ink) +
  scale_colour_manual(values = c(naive = te_red, model = te_forest, truth = te_gold), guide = "none") +
  scale_y_continuous(limits = c(0, 0.8)) +
  labs(title = "Occupancy model recovers what naive counting misses",
       subtitle = "200 sites, 4 visits, true detection p = 0.30",
       x = NULL, y = "Occupancy probability") +
  theme_te(13)

Point-range chart with naive occupancy well below true occupancy and the model estimate close to truth with a confidence interval spanning it. — Figure 1: Naive occupancy, the occupancy-model estimate with its 95 per cent interval, and the realised true occupancy. The model closes most of the gap the naive count leaves open.

How precise is the estimate?

The Hessian that optim returns is the curvature of the negative log-likelihood at its minimum, so its inverse is the covariance of the parameters on the logit scale. To report a standard error on the probability scale, transform the logit-scale error with the delta method: the derivative of plogis at a point equals the fitted probability times one minus itself.

se_logit <- sqrt(diag(V))
se_psi <- se_logit[1] * psi_hat * (1 - psi_hat)
se_p   <- se_logit[2] * p_hat   * (1 - p_hat)
round(c(se_psi = se_psi, se_p = se_p), 3)

se_psi   se_p 
 0.058  0.030

plogis(fit$par[1] + c(-1.96, 1.96) * se_logit[1])   # 95% CI for psi

[1] 0.5146637 0.7399558

plogis(fit$par[2] + c(-1.96, 1.96) * se_logit[2])   # 95% CI for p

[1] 0.2419659 0.3595317

Occupancy carries a standard error of 0.058, with a 95 per cent interval of about 0.515 to 0.740, so the true 0.55 sits comfortably inside it. Detection is pinned down more tightly, at 0.297 with an interval of 0.242 to 0.360. Note that building the interval on the logit scale and then transforming keeps it inside zero and one, which a plain estimate plus or minus two standard errors would not guarantee near the boundary.

The surface being climbed

It helps to see what optim is doing. The negative log-likelihood is a surface over the two parameters, and the fit is its lowest point. Evaluating it across a grid shows a single well, with the maximum-likelihood estimate at the bottom and the true values close by.

gp <- seq(0.30, 0.90, length.out = 120); gq <- seq(0.10, 0.55, length.out = 120)
Z <- outer(gp, gq, Vectorize(function(P, Q) nll(c(qlogis(P), qlogis(Q)), y, K)))
dfz <- expand.grid(psi = gp, p = gq); dfz$nll <- as.vector(Z)
ggplot(dfz, aes(psi, p, z = nll)) +
  geom_raster(aes(fill = nll)) +
  geom_contour(colour = "#ffffff", alpha = 0.35, linewidth = 0.25, bins = 14) +
  scale_fill_gradient(low = te_deep, high = te_paper, name = "-log L") +
  annotate("point", x = psi_hat, y = p_hat, colour = te_gold, size = 3.2) +
  annotate("point", x = mean(z), y = p_true, colour = te_red, shape = 4, size = 3, stroke = 1.1) +
  annotate("text", x = psi_hat, y = p_hat, label = "  MLE", hjust = 0, colour = te_ink, size = 3.3) +
  labs(title = "Occupancy and detection are estimated jointly",
       subtitle = "Gold point: maximum-likelihood estimate; red cross: true values",
       x = "Occupancy psi", y = "Detection p") +
  theme_te(12)

Filled contour surface over occupancy and detection probability with a single minimum, marked by a point near a cross for the true values. — Figure 2: The negative log-likelihood over occupancy and detection. The gold point is the maximum-likelihood estimate; the red cross marks the true values used to simulate the data.

More visits, sharper separation

The single dataset gives one draw of the estimator. The precision improves as visits accumulate, because more visits pin down p, and a well-estimated p leaves less ambiguity about the all-blank sites. Refitting with eight visits instead of four narrows the occupancy standard error markedly.

set.seed(2026)
K8 <- 8L
z8 <- rbinom(R, 1, psi_true)
y8 <- matrix(rbinom(R * K8, 1, z8 * p_true), R, K8)
fit8 <- optim(c(0, 0), nll, y = y8, K = K8, method = "BFGS", hessian = TRUE)
se8 <- sqrt(diag(solve(fit8$hessian)))[1] * plogis(fit8$par[1]) * (1 - plogis(fit8$par[1]))
c(psi_hat = round(plogis(fit8$par[1]), 3),
  p_hat   = round(plogis(fit8$par[2]), 3),
  se_psi  = round(se8, 3))

psi_hat   p_hat  se_psi 
  0.609   0.317   0.037

With eight visits the occupancy estimate is 0.609 with a standard error of 0.037, down from 0.058 at four visits. How to trade the number of sites against the number of visits under a fixed budget is a design question in its own right, taken up in a later tutorial.

What the model assumes

The likelihood leans on a few conditions worth stating plainly. Occupancy is treated as fixed across the visit window, so the site does not gain or lose the species between visits, which is what single-season means. Detection is assumed independent across visits given occupancy, and false detections are ruled out, so a recorded detection is never a mistake. When those hold, the repeat-visit structure identifies psi and p separately. When they do not, for instance if the species arrives partway through the season, the estimates shift, and a dynamic or multi-season model becomes the honest choice.

The next step is to let occupancy and detection depend on covariates, so that psi can vary with habitat and p with survey effort, which is where the model becomes useful for real questions.

References

MacKenzie D I, Nichols J D, Lachman G B, Droege S, Royle J A, Langtimm C A 2002 Ecology 83(8):2248-2255 (10.1890/0012-9658(2002)083[2248:ESORWD]2.0.CO;2)
MacKenzie D I, Nichols J D, Royle J A, Pollock K H, Bailey L L, Hines J E 2018 Occupancy Estimation and Modeling, 2nd ed. Academic Press. ISBN 978-0-12-407197-1
Fiske I, Chandler R B 2011 Journal of Statistical Software 43(10):1-23 (10.18637/jss.v043.i10)

--- title: "Fitting single-season occupancy models in R" description: "Build the MacKenzie occupancy likelihood from scratch and fit it with optim to separate true site occupancy from detection probability, using base R only." date: "2026-07-02 20:00" categories: [R, occupancy, maximum likelihood, detection probability, ecology tutorial] image: thumbnail.png image-alt: "Point-range chart comparing naive occupancy, the occupancy-model estimate with confidence interval, and true occupancy." --- ```{r} #| label: setup #| include: false suppressMessages({library(ggplot2)}) te_ink<-"#16241d"; te_body<-"#2c3a31"; te_forest<-"#275139"; te_paper<-"#f5f4ee" te_line<-"#dad9ca"; te_faint<-"#5d6b61"; te_gold<-"#c9b458"; te_deep<-"#1d5b4e" te_red<-"#b5534e"; te_sage<-"#93a87f"; te_green<-"#2f8f63" theme_te <- function(base_size = 12) { theme_minimal(base_size = base_size) + theme(text=element_text(colour=te_body), plot.title=element_text(colour=te_ink, face="bold", size=rel(1.05)), plot.subtitle=element_text(colour=te_faint, size=rel(0.9)), axis.title=element_text(colour=te_ink), axis.text=element_text(colour=te_body), panel.grid.minor=element_blank(), panel.grid.major=element_line(colour=te_line, linewidth=0.3), plot.background=element_rect(fill=te_paper, colour=NA), panel.background=element_rect(fill=te_paper, colour=NA), legend.key=element_blank(), legend.position="bottom", plot.margin=margin(10,12,10,10)) } ``` A companion tutorial showed that counting the sites where a species was detected underestimates how widespread it is, because a quiet site looks the same as an empty one. The repeat visits you make to each site are what rescue the estimate: the sequence of detections and blanks across visits tells you how often the species is missed, and from that you can back out how many silent sites were occupied all along. This post writes the single-season occupancy likelihood by hand and fits it with `optim`. No specialised package is needed. Working through the likelihood makes clear what the model assumes and why it can separate occupancy from detection at all. ## The same simulated data We reuse the landscape from the detection-bias tutorial: 200 sites, true occupancy `psi = 0.55`, per-visit detection `p = 0.30`, four visits each. Naive occupancy came out at 0.48 against a realised truth of 0.605. ```{r} #| label: data set.seed(2026) R <- 200L; K <- 4L; psi_true <- 0.55; p_true <- 0.30 z <- rbinom(R, 1, psi_true) y <- matrix(rbinom(R * K, 1, z * p_true), nrow = R, ncol = K) d <- rowSums(y) # detections per site mean(d >= 1) # naive occupancy ``` ## The likelihood, site by site Each site contributes one term to the likelihood, and there are only two cases. If a site was detected on at least one visit, it must be occupied. The probability of its exact history is the occupancy probability times the detection pattern: `psi` for being occupied, `p` for each of the `d` visits with a detection, and `1 - p` for each of the `K - d` visits without one. That gives $$ L_i = \psi \, p^{\,d_i} (1 - p)^{\,K - d_i}, \qquad d_i \ge 1. $$ If a site was never detected across all `K` visits, two explanations remain. Either it is occupied and every visit missed the species, with probability `psi (1 - p)^K`, or it is genuinely empty, with probability `1 - psi`. Since both would produce an all-blank history, the term is their sum: $$ L_i = \psi (1 - p)^K + (1 - \psi), \qquad d_i = 0. $$ That second line is where the model earns its keep. The all-blank sites are a mixture of missed occupancy and true absence, and the relative weight of the two is pinned down by how detectable the species proves to be at the sites where it was seen. A species detected on most visits leaves few plausible misses, so most blanks are real absences; a rarely detected species leaves room for many occupied blanks. We estimate on the logit scale so the optimiser can range over the whole real line while `psi` and `p` stay inside zero and one. ```{r} #| label: nll nll <- function(par, y, K) { psi <- plogis(par[1]); p <- plogis(par[2]) d <- rowSums(y) lik <- ifelse(d > 0, psi * p^d * (1 - p)^(K - d), psi * (1 - p)^K + (1 - psi)) -sum(log(lik)) } fit <- optim(c(0, 0), nll, y = y, K = K, method = "BFGS", hessian = TRUE) psi_hat <- plogis(fit$par[1]) p_hat <- plogis(fit$par[2]) c(psi_hat = round(psi_hat, 3), p_hat = round(p_hat, 3)) ``` The model recovers `psi_hat = 0.635` and `p_hat = 0.297`. The occupancy estimate has moved from the naive 0.48 up towards the realised 0.605, and the detection estimate sits almost exactly on the true 0.30, rather than the inflated 0.393 the raw detected-site average gave. The figure below places the three occupancy numbers side by side. ```{r} #| label: fig-compare #| fig-cap: "Naive occupancy, the occupancy-model estimate with its 95 per cent interval, and the realised true occupancy. The model closes most of the gap the naive count leaves open." #| fig-alt: "Point-range chart with naive occupancy well below true occupancy and the model estimate close to truth with a confidence interval spanning it." #| warning: false V <- solve(fit$hessian); sl <- sqrt(diag(V)) psi_ci <- plogis(fit$par[1] + c(-1.96, 1.96) * sl[1]) est <- data.frame( what = factor(c("Naive\n(>=1 detection)", "Occupancy\nmodel", "True\noccupancy"), levels = c("Naive\n(>=1 detection)", "Occupancy\nmodel", "True\noccupancy")), val = c(mean(d >= 1), psi_hat, mean(z)), lo = c(NA, psi_ci[1], NA), hi = c(NA, psi_ci[2], NA), kind = c("naive", "model", "truth")) ggplot(est, aes(what, val, colour = kind)) + geom_hline(yintercept = mean(z), colour = te_line, linewidth = 0.5, linetype = "22") + geom_errorbar(aes(ymin = lo, ymax = hi), width = 0.12, linewidth = 0.8, na.rm = TRUE) + geom_point(size = 4) + geom_text(aes(label = sprintf("%.2f", val)), vjust = -1.1, size = 3.6, colour = te_ink) + scale_colour_manual(values = c(naive = te_red, model = te_forest, truth = te_gold), guide = "none") + scale_y_continuous(limits = c(0, 0.8)) + labs(title = "Occupancy model recovers what naive counting misses", subtitle = "200 sites, 4 visits, true detection p = 0.30", x = NULL, y = "Occupancy probability") + theme_te(13) ``` ## How precise is the estimate? The Hessian that `optim` returns is the curvature of the negative log-likelihood at its minimum, so its inverse is the covariance of the parameters on the logit scale. To report a standard error on the probability scale, transform the logit-scale error with the delta method: the derivative of `plogis` at a point equals the fitted probability times one minus itself. ```{r} #| label: se se_logit <- sqrt(diag(V)) se_psi <- se_logit[1] * psi_hat * (1 - psi_hat) se_p <- se_logit[2] * p_hat * (1 - p_hat) round(c(se_psi = se_psi, se_p = se_p), 3) plogis(fit$par[1] + c(-1.96, 1.96) * se_logit[1]) # 95% CI for psi plogis(fit$par[2] + c(-1.96, 1.96) * se_logit[2]) # 95% CI for p ``` Occupancy carries a standard error of 0.058, with a 95 per cent interval of about 0.515 to 0.740, so the true 0.55 sits comfortably inside it. Detection is pinned down more tightly, at 0.297 with an interval of 0.242 to 0.360. Note that building the interval on the logit scale and then transforming keeps it inside zero and one, which a plain estimate plus or minus two standard errors would not guarantee near the boundary. ## The surface being climbed It helps to see what `optim` is doing. The negative log-likelihood is a surface over the two parameters, and the fit is its lowest point. Evaluating it across a grid shows a single well, with the maximum-likelihood estimate at the bottom and the true values close by. ```{r} #| label: fig-surface #| fig-cap: "The negative log-likelihood over occupancy and detection. The gold point is the maximum-likelihood estimate; the red cross marks the true values used to simulate the data." #| fig-alt: "Filled contour surface over occupancy and detection probability with a single minimum, marked by a point near a cross for the true values." #| warning: false #| message: false gp <- seq(0.30, 0.90, length.out = 120); gq <- seq(0.10, 0.55, length.out = 120) Z <- outer(gp, gq, Vectorize(function(P, Q) nll(c(qlogis(P), qlogis(Q)), y, K))) dfz <- expand.grid(psi = gp, p = gq); dfz$nll <- as.vector(Z) ggplot(dfz, aes(psi, p, z = nll)) + geom_raster(aes(fill = nll)) + geom_contour(colour = "#ffffff", alpha = 0.35, linewidth = 0.25, bins = 14) + scale_fill_gradient(low = te_deep, high = te_paper, name = "-log L") + annotate("point", x = psi_hat, y = p_hat, colour = te_gold, size = 3.2) + annotate("point", x = mean(z), y = p_true, colour = te_red, shape = 4, size = 3, stroke = 1.1) + annotate("text", x = psi_hat, y = p_hat, label = " MLE", hjust = 0, colour = te_ink, size = 3.3) + labs(title = "Occupancy and detection are estimated jointly", subtitle = "Gold point: maximum-likelihood estimate; red cross: true values", x = "Occupancy psi", y = "Detection p") + theme_te(12) ``` ## More visits, sharper separation The single dataset gives one draw of the estimator. The precision improves as visits accumulate, because more visits pin down `p`, and a well-estimated `p` leaves less ambiguity about the all-blank sites. Refitting with eight visits instead of four narrows the occupancy standard error markedly. ```{r} #| label: k8 set.seed(2026) K8 <- 8L z8 <- rbinom(R, 1, psi_true) y8 <- matrix(rbinom(R * K8, 1, z8 * p_true), R, K8) fit8 <- optim(c(0, 0), nll, y = y8, K = K8, method = "BFGS", hessian = TRUE) se8 <- sqrt(diag(solve(fit8$hessian)))[1] * plogis(fit8$par[1]) * (1 - plogis(fit8$par[1])) c(psi_hat = round(plogis(fit8$par[1]), 3), p_hat = round(plogis(fit8$par[2]), 3), se_psi = round(se8, 3)) ``` With eight visits the occupancy estimate is 0.609 with a standard error of 0.037, down from 0.058 at four visits. How to trade the number of sites against the number of visits under a fixed budget is a design question in its own right, taken up in a later tutorial. ## What the model assumes The likelihood leans on a few conditions worth stating plainly. Occupancy is treated as fixed across the visit window, so the site does not gain or lose the species between visits, which is what single-season means. Detection is assumed independent across visits given occupancy, and false detections are ruled out, so a recorded detection is never a mistake. When those hold, the repeat-visit structure identifies `psi` and `p` separately. When they do not, for instance if the species arrives partway through the season, the estimates shift, and a dynamic or multi-season model becomes the honest choice. The next step is to let occupancy and detection depend on covariates, so that `psi` can vary with habitat and `p` with survey effort, which is where the model becomes useful for real questions. ## References - MacKenzie D I, Nichols J D, Lachman G B, Droege S, Royle J A, Langtimm C A 2002 Ecology 83(8):2248-2255 (10.1890/0012-9658(2002)083[2248:ESORWD]2.0.CO;2) - MacKenzie D I, Nichols J D, Royle J A, Pollock K H, Bailey L L, Hines J E 2018 Occupancy Estimation and Modeling, 2nd ed. Academic Press. ISBN 978-0-12-407197-1 - Fiske I, Chandler R B 2011 Journal of Statistical Software 43(10):1-23 (10.18637/jss.v043.i10) ## Related tutorials - [Imperfect detection and occupancy bias in R](../imperfect-detection-occupancy/) - [Occupancy and detection covariates in R](../occupancy-detection-covariates/) - [Logistic regression for presence/absence data](../logistic-regression-presence-absence/) - [N-mixture models for abundance in R](../n-mixture-models-abundance/)