Getting Started

Why Spatial Analysis

Species don’t distribute themselves at random. Wildfire risk is not uniform across a landscape. Lead contamination in soils clusters near roads and demolished buildings. Precipitation gradients drive vegetation boundaries that don’t care about county lines. In each case, where something is happening is part of the story.

The underlying idea has a name. Tobler’s First Law of Geography states that everything is related to everything else, but near things are more related than distant things. It sounds obvious until you realize how much standard statistical practice ignores it.

If you treat spatially dependent observations as independent, your standard errors might be too small, your p-values too optimistic, and your predictions might not hold up. This isn’t a minor technical complaint. It’s a systematic bias, and it runs through a lot of published environmental science.

The tools to do this right have existed for decades. What’s changed is that they are now implemented in open, well-documented software, with enough worked examples that you don’t need a geostatistics PhD to use them correctly. There’s no longer a good reason to leave spatial structure out of your analysis.

Spatial pattern is also information in its own right. Where something is happening, and where it isn’t, tells you something about the processes driving it. Learning to read that structure is part of the fundamental job of environmental science: figuring out why things are patterned the way they are.

How to Use This Book

This is a hands-on book. You will write code, look at data, make maps, fit models, and interpret results. Each chapter builds on what came before. The math shows up when it needs to, but the emphasis throughout is on doing spatial analysis and understanding what you’re doing. Not deriving things for their own sake.

How This Book Is Laid Out

Most chapters follow a natural progression, each building on the last. Scattered through the book you’ll find Asides which are side quests that dig into something that doesn’t quite fit the main flow. An Aside might work through the math behind a method, explain how R handles something under the hood, or fill in background that makes the surrounding chapters make more sense.

The Asides are linked from the chapters that depend on them. Skipping one will likely leave you confused when that chapter references it. They are separate from the main chapters not because they are less important, but because pulling them out keeps the main flow readable. When you hit a link to an Aside, follow it.

What’s in This Book

The book is organized into four parts. Each builds on the one before it.

Foundations gets you oriented on the two data types that show up everywhere in this book: point observations and raster surfaces. This isn’t a GIS-in-R book, so we won’t linger here long.

  • Spatial Data: A Quick Orientation – A fast introduction to sf for point data and terra for rasters. Enough to follow along with the rest of the book. For the full treatment, see Geocomputation with R or the Data Carpentries spatial curriculum.
    • Aside: Methods and Generics in R – A look under the hood at how R uses generic functions. Explains why plot() and summary() behave differently depending on what you hand them.

Point Patterns & Autocorrelation is where the analysis starts. We ask the most basic spatial question – are things randomly distributed, or is there structure? – and then move on to measuring spatial dependence in continuous variables.

  • Point Patterns – How to quantify whether points in space are randomly distributed, clustered, or repulsed, using kernel density estimation and Ripley’s K.
    • Aside: Distance Matrices – A detour into how distances between points are computed and stored. This comes up enough that it’s worth understanding before you need it.
  • Global Spatial Autocorrelation – How to measure and describe spatial structure in a continuous variable using variograms, Moran’s I, and correlograms.
    • Aside: Correlation and Regression Are Not the Same Thing – These two ideas get conflated constantly. This aside untangles them before we lean on regression in later chapters.
  • Local Spatial Autocorrelation – Global Moran’s I tells you whether spatial structure exists. LISA tells you where. We decompose the global statistic into location-specific values and map the hot spots, cold spots, and spatial outliers.

Interpolation & Geostatistics is about prediction. We have measurements at some locations and want to estimate values at others. We start with deterministic methods and work up to kriging, which is the probabilistic workhorse of geostatistics.

  • Inverse Distance Weighting – Predicting values at unsampled locations using a deterministic, distance-weighted approach. Simple, intuitive, and a good place to start.
    • Aside: Model Skill and Cross Validation – A model that fits your data well is not necessarily a model that predicts well. This aside covers how to tell the difference before we start building interpolation models that depend on getting this right.
  • Thin-Plate Splines – A smooth, flexible interpolation method that minimizes curvature across the surface.
  • Kriging – Uses the spatial structure of the data to predict unknown values and, unlike IDW, gives you a measure of uncertainty.
    • Aside: Kriging Variance By Hand – Cracks open the gstat black box. We calculate kriging weights and prediction variance using matrix algebra on a toy dataset and check our work against the package.
  • Regression Kriging – Kriging with external predictors folded in. Useful when covariates explain part of the spatial pattern.
    • Aside: Predicting New Data with a Fitted Model – A refresher on using predict on new data.

Regression closes the loop. You already know how to fit a linear model: a response variable, some predictors, minimize the sum of squared residuals. That’s ordinary least squares (OLS), and it works great when the residuals are independent. The problem is that spatial data often violates that assumption. If you fit an OLS model and map the residuals, you’ll sometimes see patches of positive residuals in one corner and negative residuals in another. That’s spatial autocorrelation in the errors, and it means your standard errors are wrong and your inference is unreliable. The fix is to model that structure explicitly rather than pretend it isn’t there.

  • GLS with Autocorrelated Residuals – What to do when your regression residuals have spatial structure. Spoiler: you use GLS, and it works.
    • Aside: OLS via Algebra and Matrices – Derives the OLS solution algebraically and in matrix form, then implements both in R. Background that makes the GLS chapter make more sense.
  • Spatial Regression: Lag and Error Models – GLS treats space as nuisance in the residuals. SAR models let you go further: test whether the structure is in the errors or in the response itself, and fit accordingly.

What You Can Do Now closes the book. A short synthesis of the three modes of spatial analysis we covered – description, prediction, inference – and where to go from here.

NotePoints and Grids, Not Polygons

One thing you will not find in this book is areal data analysis. Watersheds, land cover polygons, census tracts, habitat patches: that world has its own analytical framework, its own vocabulary, and its own rabbit holes, and going there turns a spatial analysis book into a GIS book fast. The methods here are built around point observations and continuous raster surfaces, which is where most field-based environmental and ecological data lives anyway.

The Final Frontier

You are going to learn to see the world spatially. By the end you will be able to look at a dataset, ask where things are happening and why, and have tools to answer those questions.

The people who do this work well are not necessarily the ones who find it easiest. They’re the ones who run the code when it breaks, read the error messages, ask questions, and keep going. That’s the whole secret.

So: set up your project, download the data, and let’s get to work.

Setup

The book is built around R, so you’ll need a working R installation. The code here was written and tested with R version 4.5.2 (2025-10-31). The package versions used in each chapter are recorded in the References section, so if something breaks you have a fixed point to compare against. You should be reasonably up to date on your versions of R, RStudio, and relevant packages. If you’re not sure, run:

update.packages()

Do it now, and anytime it occurs to you. It’s almost always the right thing to do!

Project Structure

To follow along, you’ll want a working RStudio project.

  1. Create a new RStudio project
    Go to File → New Project → New Directory → New Project. Give it a name (something like spatial-analysis) and choose where to save it.

  2. Download the data/ folder
    The datasets used in the examples are in the data/ folder of the book’s GitHub repository. Download that folder and place it inside your project directory.

    You can download the data directly from the GitHub repo:

    https://github.com/AndyBunn/spatialAnalysisBook/releases/download/data-latest/data.zip

    Once it’s unzipped, your folder structure should look something like this:

    spatial-analysis/
    ├── data/
    │   ├── birdRichnessMexico.rds
    │   ├── prcpCA.rds
    │   └── ...
    └── spatial-analysis.Rproj
  3. Use relative paths in your code
    Use paths like "data/birdRichnessMexico.rds" rather than full file paths. This keeps the code portable – it will run on any machine without modification.

    Code
    birdRichness <- readRDS("data/birdRichnessMexico.rds")
  4. Save your .qmd or .Rmd files in the project root
    Keep your working files in the project’s root directory. Quarto (.qmd) and R Markdown (.Rmd) are close cousins – the syntax is nearly identical and either will work fine here. After working through the point pattern chapter, your structure might look like this:

    spatial-analysis/
    ├── data/
    │   ├── birdRichnessMexico.rds
    │   ├── prcpCA.rds
    │   └── ...
    ├── pointPatternWork.qmd
    └── spatial-analysis.Rproj