Getting Started

Why Spatial Analysis

Species don’t distribute themselves at random. Wildfire risk is not uniform across a landscape. Metal contamination in soils clusters near roads and demolished buildings. Precipitation gradients follow topography. In each case, where something is happening is part of the story.

The underlying idea has a name. Paraphrased, Tobler’s First Law of Geography states that everything is related to everything else, but near things are more related than distant things. It sounds obvious until you realize how many standard practices in data analysis ignore it.

If you treat spatially dependent observations as independent, your standard errors might be too small, your p-values too optimistic, and your predictions might not hold up. None of this is a pedantic quibble. The result is a systematic bias, and it runs through a lot of published environmental science.

Spatial pattern is also information in its own right. Where something is happening, and where it isn’t, tells you something about the processes driving it. Learning to read that structure is part of the fundamental job of environmental science: figuring out why things are patterned the way they are.

The tools to understand space have existed for decades. What’s changed is that they are now implemented in open, well-documented software, with enough worked examples that you don’t need a geostatistics degree to use them correctly. There’s no longer a good reason to leave spatial structure out of your analysis.

This is a hands-on book. You will write code, look at data, make maps, fit models, and interpret results. Each chapter builds on what came before. The emphasis throughout is on doing spatial analysis and understanding what you’re doing. The math shows up when it needs to but we don’t derive things for their own sake.

How This Book Is Laid Out

Most chapters follow a natural progression, each building on the last. Scattered through the book you’ll find Asides which are side quests that dig into something that doesn’t quite fit in the main flow. An Aside might work through the math behind a method, explain how R handles something under the hood, or fill in background that makes the surrounding chapters make more sense.

The Asides are referenced from the chapters that depend on them. They are separate from the main chapters not because they are less important, but because pulling them out keeps the main flow readable.

What’s in This Book

The book is organized into four parts. Each builds on the one before it.

Foundations gets you oriented on the two data types that show up everywhere in this book: point observations and raster surfaces. This isn’t a GIS-in-R book, so we won’t linger here long.

Spatial Data in R: A Quick Orientation: A fast introduction to sf for point data and terra for rasters. Enough to follow along with the rest of the book. For the full treatment, see Geocomputation with R¹ or the Data Carpentries spatial curriculum².
- Aside: Methods and Generics in R: A look under the hood at how R uses generic functions. Explains why plot() and summary() behave differently depending on what you hand them.

Point Patterns & Autocorrelation is where the analysis starts. We ask a basic spatial question: are things randomly distributed, or is there structure? We then move on to measuring spatial dependence in continuous variables.

Point Patterns: How to quantify whether points in space are randomly distributed, clustered, or repulsed, using kernel density estimation and Ripley’s K.
- Aside: Distance Matrices: A detour into how distances between points are computed and stored. This comes up enough that it’s worth understanding before you need it.
Global Spatial Autocorrelation: How to measure and describe spatial structure in a continuous variable using variograms, Moran’s I, and correlograms.
- Aside: Correlation and Regression Are Not the Same Thing: Correlation and regression get conflated constantly. This aside untangles them before we lean on regression in later chapters.
Local Spatial Autocorrelation: Global Moran’s I tells you whether spatial structure exists. LISA tells you where. We decompose the global statistic into location-specific values and map the hot spots, cold spots, and spatial outliers.

Interpolation & Geostatistics is about prediction. We have measurements at some locations and want to estimate values at others. We start with deterministic methods and work up to kriging, which is the probabilistic workhorse of geostatistics.

Inverse Distance Weighting: Predicting values at unsampled locations using a deterministic, distance-weighted approach. Simple, intuitive, and a good place to start.
- Aside: Model Skill and Cross Validation: A model that fits your data well is not necessarily a model that predicts well. This aside covers how to tell the difference before we start building interpolation models that depend on getting this right.
Thin-Plate Splines: A smooth, flexible interpolation method that minimizes curvature across the surface.
Kriging: Uses the spatial structure of the data to predict unknown values and, unlike IDW, gives you a measure of uncertainty.
- Aside: Kriging Variance By Hand: Cracks open the gstat black box. We calculate kriging weights and prediction variance using matrix algebra on a toy dataset and check our work against the package.
Regression Kriging: Kriging with external predictors folded in. Useful when covariates explain part of the spatial pattern.
- Aside: Predicting New Data with a Fitted Model: A refresher on using predict on new data.

Regression closes the loop. You already know how to fit a linear model: a response variable, some predictors, minimize the sum of squared residuals. That’s ordinary least squares (OLS), and it works great when the residuals are independent. The problem is that spatial data often violate that assumption. If you fit an OLS model and map the residuals, you’ll sometimes see patches of positive residuals in one corner and negative residuals in another. That’s spatial autocorrelation in the errors, and it means your standard errors are wrong and your inference is unreliable. The fix is to model that structure explicitly rather than pretend it isn’t there.

GLS with Autocorrelated Residuals: What to do when your regression residuals have spatial structure. Spoiler: you can use generalized least squares (GLS) to account for that structure.
- Aside: OLS via Algebra and Matrices: Derives the OLS and GLS solution algebraically and in matrix form, then implements both in R. This is background that makes the GLS chapter make more sense.
Spatial Regression: Lag and Error Models: GLS treats space as nuisance in the residuals. Spatial autoregressive model (SAR) models let you go further: test whether the structure is in the errors or in the response itself, and fit a model accordingly.

What You Can Do Now closes the book. A short synthesis of the three modes of spatial analysis we covered (description, prediction, inference) and where to go from here.

Points and Grids, Not Polygons

One thing you will not find in this book is areal data analysis. Watersheds, land cover polygons, census tracts, habitat patches: that world has its own analytical framework, its own vocabulary, and its own rabbit holes, and going there turns a spatial analysis book into a GIS book fast. The methods here are built around point observations and continuous raster surfaces, which is where most field-based environmental and ecological data lives anyway.

What You Should Know Beforehand

This isn’t a book for absolute beginners, and a few things will go more smoothly if you’ve seen them before. None of it is a hard prerequisite, but here’s a plain accounting of what I’m assuming.

R basics. You should be comfortable enough in R to read a data file, poke at a data frame, write a function, and not panic when something throws an error. You don’t need to be an expert. If you can follow along and look things up when you’re stuck, you’re ready.

Some statistics. We lean on the standard introductory toolkit throughout: correlation, standard errors, p-values, and linear models. You don’t need to have aced a theory course, but the ideas should feel familiar rather than brand new. If “fit a linear model and look at the residuals” means nothing to you, a stats class first will make this book a lot more rewarding.

Matrix algebra (gently). It comes up, mostly in the Asides, when I want to show what a method is actually doing under the hood. I keep it gentle and you do not need to bring any of it with you. If you’ve never multiplied two matrices, the main chapters will still work fine. The Asides are there if you want to look deeper, not a wall you have to climb.

GIS is helpful but not required. If you’ve made a map in QGIS or ArcGIS, some of the spatial concepts will already make sense. If you haven’t, no problem. We cover what you need as we go, and this isn’t a GIS book anyway.

One more thing about the code. We mostly use tidyverse syntax: dplyr for wrangling, ggplot2 for plots, the pipe to chain it together. If that style is new to you, don’t let it slow you down. There are loads and loads of free resources out there for learning it. The place to start is tidyverse.org³, and from there R for Data Science (Wickham et al. 2023) is the canonical free book.

Setup

The book is built around R, so you’ll need a working R installation. The code here was written and tested with R version 4.5.2 (2025-10-31). The package versions used in each chapter are recorded in the References section, so if something breaks you have a fixed point to compare against. You should be reasonably up to date on your versions of R, RStudio, and relevant packages. If you’re not sure, run:


update.packages()

Do it now, and anytime it occurs to you. It’s almost always the right thing to do!

Project Structure

To follow along, you’ll want a working RStudio project.

Create a new RStudio project
Go to File → New Project → New Directory → New Project. Give it a name (something like spatial-analysis) and choose where to save it.
Download the data/ folder
The datasets used in the examples are bundled into a single data.zip. Download it and unzip it inside your project directory.

You can download the data directly:
```
https://spatial.andybunn.org/data.zip
```
Once it’s unzipped, your folder structure should look something like this:
```
spatial-analysis/
├── data/
│   ├── birdDiv.csv
│   ├── californiaOzonePoints.csv
│   └── ...
└── spatial-analysis.Rproj
```
Use relative paths in your code
Use paths like "data/birdRichnessMexico.rds" rather than full file paths. This keeps the code portable. It will run on any machine without modification.
Code
birdRichness <- readRDS("data/birdRichnessMexico.rds")
Save your .R, .qmd or .Rmd files in the project root
Keep your working files in the project’s root directory. Quarto (.qmd) and R Markdown (.Rmd) are close cousins. The syntax is nearly identical and either will work fine here. After working through the point pattern chapter, your structure might look like this:
```
spatial-analysis/
├── data/
│   ├── birdDiv.csv
│   ├── californiaOzonePoints.csv
│   └── ...
├── pointPatternWork.qmd
└── spatial-analysis.Rproj
```

The Final Frontier

You are going to learn to see the world spatially. By the end you will be able to look at a dataset, ask where things are happening and why, and have tools to answer those questions.

The people who do this work well are not necessarily the ones who find it easiest. They’re the ones who run the code when it breaks, read the error messages, ask questions, and keep going. That’s the whole secret.

You’ve got R running and the data on your machine. Time to use it.