Fitting single-season occupancy models in R

R
occupancy
maximum likelihood
detection probability
ecology tutorial
Build the MacKenzie occupancy likelihood from scratch and fit it with optim to separate true site occupancy from detection probability, using base R only.
Author

Tidy Ecology

Published

2026-07-02

A companion tutorial showed that counting the sites where a species was detected underestimates how widespread it is, because a quiet site looks the same as an empty one. The repeat visits you make to each site are what rescue the estimate: the sequence of detections and blanks across visits tells you how often the species is missed, and from that you can back out how many silent sites were occupied all along.

This post writes the single-season occupancy likelihood by hand and fits it with optim. No specialised package is needed. Working through the likelihood makes clear what the model assumes and why it can separate occupancy from detection at all.

The same simulated data

We reuse the landscape from the detection-bias tutorial: 200 sites, true occupancy psi = 0.55, per-visit detection p = 0.30, four visits each. Naive occupancy came out at 0.48 against a realised truth of 0.605.

set.seed(2026)
R <- 200L; K <- 4L; psi_true <- 0.55; p_true <- 0.30
z <- rbinom(R, 1, psi_true)
y <- matrix(rbinom(R * K, 1, z * p_true), nrow = R, ncol = K)
d <- rowSums(y)                # detections per site
mean(d >= 1)                   # naive occupancy
[1] 0.48

The likelihood, site by site

Each site contributes one term to the likelihood, and there are only two cases.

If a site was detected on at least one visit, it must be occupied. The probability of its exact history is the occupancy probability times the detection pattern: psi for being occupied, p for each of the d visits with a detection, and 1 - p for each of the K - d visits without one. That gives

\[ L_i = \psi \, p^{\,d_i} (1 - p)^{\,K - d_i}, \qquad d_i \ge 1. \]

If a site was never detected across all K visits, two explanations remain. Either it is occupied and every visit missed the species, with probability psi (1 - p)^K, or it is genuinely empty, with probability 1 - psi. Since both would produce an all-blank history, the term is their sum:

\[ L_i = \psi (1 - p)^K + (1 - \psi), \qquad d_i = 0. \]

That second line is where the model earns its keep. The all-blank sites are a mixture of missed occupancy and true absence, and the relative weight of the two is pinned down by how detectable the species proves to be at the sites where it was seen. A species detected on most visits leaves few plausible misses, so most blanks are real absences; a rarely detected species leaves room for many occupied blanks.

We estimate on the logit scale so the optimiser can range over the whole real line while psi and p stay inside zero and one.

nll <- function(par, y, K) {
  psi <- plogis(par[1]); p <- plogis(par[2])
  d <- rowSums(y)
  lik <- ifelse(d > 0,
                psi * p^d * (1 - p)^(K - d),
                psi * (1 - p)^K + (1 - psi))
  -sum(log(lik))
}
fit <- optim(c(0, 0), nll, y = y, K = K, method = "BFGS", hessian = TRUE)
psi_hat <- plogis(fit$par[1])
p_hat   <- plogis(fit$par[2])
c(psi_hat = round(psi_hat, 3), p_hat = round(p_hat, 3))
psi_hat   p_hat 
  0.635   0.297 

The model recovers psi_hat = 0.635 and p_hat = 0.297. The occupancy estimate has moved from the naive 0.48 up towards the realised 0.605, and the detection estimate sits almost exactly on the true 0.30, rather than the inflated 0.393 the raw detected-site average gave. The figure below places the three occupancy numbers side by side.

V <- solve(fit$hessian); sl <- sqrt(diag(V))
psi_ci <- plogis(fit$par[1] + c(-1.96, 1.96) * sl[1])
est <- data.frame(
  what = factor(c("Naive\n(>=1 detection)", "Occupancy\nmodel", "True\noccupancy"),
                levels = c("Naive\n(>=1 detection)", "Occupancy\nmodel", "True\noccupancy")),
  val  = c(mean(d >= 1), psi_hat, mean(z)),
  lo   = c(NA, psi_ci[1], NA), hi = c(NA, psi_ci[2], NA),
  kind = c("naive", "model", "truth"))
ggplot(est, aes(what, val, colour = kind)) +
  geom_hline(yintercept = mean(z), colour = te_line, linewidth = 0.5, linetype = "22") +
  geom_errorbar(aes(ymin = lo, ymax = hi), width = 0.12, linewidth = 0.8, na.rm = TRUE) +
  geom_point(size = 4) +
  geom_text(aes(label = sprintf("%.2f", val)), vjust = -1.1, size = 3.6, colour = te_ink) +
  scale_colour_manual(values = c(naive = te_red, model = te_forest, truth = te_gold), guide = "none") +
  scale_y_continuous(limits = c(0, 0.8)) +
  labs(title = "Occupancy model recovers what naive counting misses",
       subtitle = "200 sites, 4 visits, true detection p = 0.30",
       x = NULL, y = "Occupancy probability") +
  theme_te(13)
Point-range chart with naive occupancy well below true occupancy and the model estimate close to truth with a confidence interval spanning it.
Figure 1: Naive occupancy, the occupancy-model estimate with its 95 per cent interval, and the realised true occupancy. The model closes most of the gap the naive count leaves open.

How precise is the estimate?

The Hessian that optim returns is the curvature of the negative log-likelihood at its minimum, so its inverse is the covariance of the parameters on the logit scale. To report a standard error on the probability scale, transform the logit-scale error with the delta method: the derivative of plogis at a point equals the fitted probability times one minus itself.

se_logit <- sqrt(diag(V))
se_psi <- se_logit[1] * psi_hat * (1 - psi_hat)
se_p   <- se_logit[2] * p_hat   * (1 - p_hat)
round(c(se_psi = se_psi, se_p = se_p), 3)
se_psi   se_p 
 0.058  0.030 
plogis(fit$par[1] + c(-1.96, 1.96) * se_logit[1])   # 95% CI for psi
[1] 0.5146637 0.7399558
plogis(fit$par[2] + c(-1.96, 1.96) * se_logit[2])   # 95% CI for p
[1] 0.2419659 0.3595317

Occupancy carries a standard error of 0.058, with a 95 per cent interval of about 0.515 to 0.740, so the true 0.55 sits comfortably inside it. Detection is pinned down more tightly, at 0.297 with an interval of 0.242 to 0.360. Note that building the interval on the logit scale and then transforming keeps it inside zero and one, which a plain estimate plus or minus two standard errors would not guarantee near the boundary.

The surface being climbed

It helps to see what optim is doing. The negative log-likelihood is a surface over the two parameters, and the fit is its lowest point. Evaluating it across a grid shows a single well, with the maximum-likelihood estimate at the bottom and the true values close by.

gp <- seq(0.30, 0.90, length.out = 120); gq <- seq(0.10, 0.55, length.out = 120)
Z <- outer(gp, gq, Vectorize(function(P, Q) nll(c(qlogis(P), qlogis(Q)), y, K)))
dfz <- expand.grid(psi = gp, p = gq); dfz$nll <- as.vector(Z)
ggplot(dfz, aes(psi, p, z = nll)) +
  geom_raster(aes(fill = nll)) +
  geom_contour(colour = "#ffffff", alpha = 0.35, linewidth = 0.25, bins = 14) +
  scale_fill_gradient(low = te_deep, high = te_paper, name = "-log L") +
  annotate("point", x = psi_hat, y = p_hat, colour = te_gold, size = 3.2) +
  annotate("point", x = mean(z), y = p_true, colour = te_red, shape = 4, size = 3, stroke = 1.1) +
  annotate("text", x = psi_hat, y = p_hat, label = "  MLE", hjust = 0, colour = te_ink, size = 3.3) +
  labs(title = "Occupancy and detection are estimated jointly",
       subtitle = "Gold point: maximum-likelihood estimate; red cross: true values",
       x = "Occupancy psi", y = "Detection p") +
  theme_te(12)
Filled contour surface over occupancy and detection probability with a single minimum, marked by a point near a cross for the true values.
Figure 2: The negative log-likelihood over occupancy and detection. The gold point is the maximum-likelihood estimate; the red cross marks the true values used to simulate the data.

More visits, sharper separation

The single dataset gives one draw of the estimator. The precision improves as visits accumulate, because more visits pin down p, and a well-estimated p leaves less ambiguity about the all-blank sites. Refitting with eight visits instead of four narrows the occupancy standard error markedly.

set.seed(2026)
K8 <- 8L
z8 <- rbinom(R, 1, psi_true)
y8 <- matrix(rbinom(R * K8, 1, z8 * p_true), R, K8)
fit8 <- optim(c(0, 0), nll, y = y8, K = K8, method = "BFGS", hessian = TRUE)
se8 <- sqrt(diag(solve(fit8$hessian)))[1] * plogis(fit8$par[1]) * (1 - plogis(fit8$par[1]))
c(psi_hat = round(plogis(fit8$par[1]), 3),
  p_hat   = round(plogis(fit8$par[2]), 3),
  se_psi  = round(se8, 3))
psi_hat   p_hat  se_psi 
  0.609   0.317   0.037 

With eight visits the occupancy estimate is 0.609 with a standard error of 0.037, down from 0.058 at four visits. How to trade the number of sites against the number of visits under a fixed budget is a design question in its own right, taken up in a later tutorial.

What the model assumes

The likelihood leans on a few conditions worth stating plainly. Occupancy is treated as fixed across the visit window, so the site does not gain or lose the species between visits, which is what single-season means. Detection is assumed independent across visits given occupancy, and false detections are ruled out, so a recorded detection is never a mistake. When those hold, the repeat-visit structure identifies psi and p separately. When they do not, for instance if the species arrives partway through the season, the estimates shift, and a dynamic or multi-season model becomes the honest choice.

The next step is to let occupancy and detection depend on covariates, so that psi can vary with habitat and p with survey effort, which is where the model becomes useful for real questions.

References

  • MacKenzie D I, Nichols J D, Lachman G B, Droege S, Royle J A, Langtimm C A 2002 Ecology 83(8):2248-2255 (10.1890/0012-9658(2002)083[2248:ESORWD]2.0.CO;2)
  • MacKenzie D I, Nichols J D, Royle J A, Pollock K H, Bailey L L, Hines J E 2018 Occupancy Estimation and Modeling, 2nd ed. Academic Press. ISBN 978-0-12-407197-1
  • Fiske I, Chandler R B 2011 Journal of Statistical Software 43(10):1-23 (10.18637/jss.v043.i10)