What You Can Do Now

Way back in the last century Steve Herman, my undergraduate mentor, told me that science is the search for patterns in nature. Google doesn’t really help me with the provenance of that definition but I like it and think about it all the time. Patterns are what we’ve learning about.

Now? You can look at a dataset with coordinates and ask good questions. Are these points clustered or random? Is there spatial structure in this variable, and where does it live? If you know the value at these locations, what’s your best estimate at the ones in between? And if you want to run a regression, are you making the right inference?

Those questions fall into three different categories, and it’s good to be explicit about which is which.

The first is description. Before you model anything, you want to know what you’re dealing with. Is the variable spatially structured at all? At what scale? Is the autocorrelation isotropic or does it vary by direction? A variogram or correlogram early in an analysis is like plotting your data before fitting a model (and there is no such thing as too much plotting). It’s how you avoid being surprised later. And sometimes the spatial pattern is the finding. A LISA map that shows a cluster of high values in one corner of the study area, or a correlogram that flattens out at 500 meters, is telling you something about the process that generated the pattern.

The second is prediction. IDW, TPS, and kriging are about accuracy. You have measurements somewhere and want estimates somewhere else. The goal is a surface that holds up on withheld data. Whether the variogram parameters are ecologically meaningful is secondary. Cross-validation tells you how well you’re doing.

The third is inference. You have a hypothesis about a relationship between variables and you want a coefficient and a standard error you can trust. Spatial autocorrelation threatens that goal not because it makes the predictions wrong, but because it makes the uncertainty estimates wrong. GLS and SAR are corrections to that problem, and skipping them means understating how uncertain you are.

Mixing these categories up is common and it’s easy to forget what question you are asking of a data set. Using a kriging surface to make causal claims, or worrying about prediction accuracy when all you need is a valid p-value, will lead you somewhere you don’t want to be.

But spatial analysis is not about avoiding statistical mistakes. The spatial structure in your data is information. Where lead is high tells you about flood history. Where birds are scarce tells you about habitat connectivity. Where your regression residuals are autocorrelated tells you that something is going on in that part of the landscape that your covariates don’t explain yet. Learning to read that structure is part of the fundamental job of environmental science.

Back in the autocorrelation chapter we met the older idea behind all of this. Alex Watt called it pattern and process (Watt 1947), and Dean Urban has spent a career on it. Process is what we care about: dispersal, disturbance, fate and transport. Process is also the thing we usually can’t watch. It’s too slow, or too big, or it finished before we showed up. Pattern is what we can measure and it feedbacks to processs. That is what you have been doing all along, whether you were fitting a variogram, mapping a LISA cluster, or testing whether your residuals carry structure your covariates missed. Every method in this book is a way of describing pattern precisely enough to say something disciplined about process. If we are lucky we can open up the feedbacks to processes.

The step from pattern to process is rarely clean, because the same pattern can come from more than one process. And processes make patterns. A clustered point pattern might be seeds falling near a parent, or a patch of good soil that several unrelated plants found, or both at once. A variogram with a 500-meter range gives you the scale of the structure, not the thing that built it. Pattern narrows the field and rules some explanations out, which is progress, but it seldom closes the case on its own. The spatial lag and error models are the sharpest version of this in the book: same data, same map, and you still have to decide whether the structure is nuisance in the residuals or a spillover process in the response. The statistics can point. You have to know the system well enough to choose.

The thread running through all the methods in this book is Tobler’s First Law: nearby things are more related than distant things. KDE, Moran’s I, variograms, kriging, GLS, SAR all take that one idea seriously in different ways.

There are a lot of things this book didn’t get to. Areal data (e.g., watersheds, census tracts, habitat patches) has its own analytical world built around polygon adjacency rather than continuous distance. If that’s your data, look at Bivand (2008) chapters 9 and 10. Spatiotemporal methods, where you have both spatial and temporal autocorrelation, are a natural extension and an active area of development in R. Machine learning approaches to spatial prediction (random forests with spatial cross-validation, for instance) are worth knowing about, though they tend to trade interpretability for flexibility in ways that aren’t always a good deal for science. And then there are networks. Stream networks are a good example of why standard spatial methods can mislead you: two sites might be 200 meters apart in Euclidean space but separated by 10 kilometers of channel, with a waterfall in between. Distances along the network (flow distance, not straight-line distance) are what govern organism dispersal, nutrient transport, and hydrologic connectivity. The SSN2 package in R is built for exactly this, fitting spatial models where autocorrelation is structured by the stream network rather than by the map. If your data lives in a drainage network, it’s worth knowing about.

For deeper reading: Bivand (2008) is the R-focused reference and you should own it. Cressie (1993) is the statistical bible for geostatistics, dense but authoritative. Fortin et al. (2016) is readable and ecologically grounded. Diggle and Ribeiro (2007) takes a model-based approach to geostatistics that connects cleanly to the likelihood ideas that show up in variogram fitting. And if you want to go deeper into landscape ecology as a discipline (e.g., the spatial structure of habitat, the movement of organisms through patchy environments, the consequences of fragmentation), Dean Urban’s books are the place to go: Urban (2023) and Urban (2024). Urban is who kicked off the preface, and his work is still the best argument I know for why space isn’t just a nuisance to correct for.

Go find some patterns in nature.