What You Can Do Now
You can look at a dataset with coordinates and ask good questions. Are these points clustered or random? Is there spatial structure in this variable, and where does it live? If you know the value at these locations, what’s your best estimate at the ones in between? And if you want to run a regression, are your residuals clean?
Those questions fall into three different categories, and it’s helpful to be explicit about those.
The first is description. Before you model anything, you want to know what you’re dealing with. Is the variable spatially structured at all? At what scale? Is the autocorrelation isotropic or does it vary by direction? A variogram or correlogram early in an analysis is like plotting your data before fitting a model (and there is no such thing as too much ploting!), it’s how you avoid being surprised later. And sometimes the spatial pattern is the finding. A LISA map that shows a cluster of high values in one corner of the study area, or a correlogram that flattens out at 500 meters, is telling you something about the process that generated the data.
The second is prediction. IDW, TPS, and kriging are about accuracy. You have measurements somewhere and want estimates somewhere else. The goal is a surface that holds up on withheld data. Whether the variogram parameters are ecologically meaningful is secondary. Cross-validation tells you how well you’re doing.
The third is inference. You have a hypothesis about a relationship between variables and you want a coefficient and a standard error you can trust. Spatial autocorrelation threatens that goal not because it makes the predictions wrong, but because it makes the uncertainty estimates wrong. GLS and SAR are corrections to that problem, and skipping them means understating how uncertain you are.
Mixing these categories up is common and it’s easy to forget what question you are asking of a data set. Using a kriging surface to make causal claims, or worrying about prediction accuracy when all you need is a valid p-value, will lead you somewhere you don’t want to be.
But spatial analysis is not about avoiding statistical mistakes. The spatial structure in your data is information. Where lead is high tells you about flood history. Where birds are scarce tells you about habitat connectivity. Where your regression residuals are autocorrelated tells you that something is going on in that part of the landscape that your covariates don’t explain yet. Learning to read that structure is part of the fundamental job of environmental science.
The thread running through all the methods in this book is Tobler’s First Law: nearby things are more related than distant things. KDE, Moran’s I, the variogram, kriging, GLS, SAR – they’re all taking that one idea seriously in different ways. You’ve got the tools now.
There are a lot of things this book didn’t get to. Areal data – watersheds, census tracts, habitat patches – has its own analytical world built around polygon adjacency rather than continuous distance. If that’s where your data lives, look at Bivand (2008) chapters 9 and 10. Spatiotemporal methods, where you have both spatial and temporal autocorrelation, are a natural extension and an active area of development in R. Machine learning approaches to spatial prediction (random forests with spatial cross-validation, for instance) are worth knowing about, though they tend to trade interpretability for flexibility in ways that aren’t always a good deal for environmental science. And then there are networks. Stream networks are a good example of why standard spatial methods can mislead you: two sites might be 200 meters apart in Euclidean space but separated by 10 kilometers of channel, with a waterfall in between. Distances along the network – flow distance, not straight-line distance – are what govern organism dispersal, nutrient transport, and hydrologic connectivity. The SSN2 package in R is built for exactly this, fitting spatial models where autocorrelation is structured by the stream network rather than by the map. If your data lives in a drainage network, it’s worth knowing about.
For deeper reading: Bivand (2008) is the R-focused reference and you should own it. Cressie (1993) is the statistical bible for geostatistics, dense but authoritative. Fortin et al. (2016) is readable and ecologically grounded. Diggle and Ribeiro (2007) takes a model-based approach to geostatistics that connects cleanly to the likelihood ideas that show up in variogram fitting. And if you want to go deeper into landscape ecology as a discipline – the spatial structure of habitat, the movement of organisms through patchy environments, the consequences of fragmentation – Dean Urban’s books are the place to go: Urban (2023) and Urban (2024). Urban is who kicked off the preface, and his work is still the best argument I know for why space isn’t just a nuisance to correct for.