Spatial Data in R: A Quick Orientation

Big Idea

There is a lot to know about spatial data in R. A lot. Object classes, coordinate reference systems, projections, spatial joins, raster math, vector operations. It can get deep pretty fast but we aren’t going to go there. This is not a GIS-in-R book, and it is not a substitute for one.

If you want the full treatment, two resources will serve you well. Geocomputation with R¹ by Lovelace, Nowosad, and Muenchow is the best book I know on the subject and it’s free online. The Data Carpentries spatial curriculum² covers the fundamentals in a hands-on workshop format. Work through either of those and you’ll come out knowing a lot more than we need here.

What we do need is a working vocabulary for two things: point data and raster data. Those are the main characters in this book. Polygons and lines show up occasionally but they aren’t the focus.

So that’s the plan for this chapter. Quick introduction to sf for points. Quick introduction to terra for rasters. Then we move on.

Packages

We’ll use four packages in this chapter. tidyverse (Wickham 2023) for general data wrangling, sf (Pebesma 2026) for vector spatial data, terra (Hijmans 2026), and tidyterra (Hernangómez 2023) for rasters. You’ll see these same packages throughout the book. Full citations are in the References chapter at the back.

When you encounter a package that you haven’t used you can install it. E.g., install.packages("sf"). You just have to do that once. Then you can load your packages with library.

Code

library(tidyverse)
library(sf)
library(terra)
library(tidyterra)

Points with `sf`

The sf package handles vector data in R: points, lines, and polygons. We care about points mostly. An sf object looks like a data frame. In fact, it is a data frame, with one extra column that holds the geometry.

Let’s look at a simple csv file.

Code

californiaOzonePoints <- read_csv("data/californiaOzonePoints.csv")
head(californiaOzonePoints)

# A tibble: 6 × 3
  ozone longitude latitude
  <dbl>     <dbl>    <dbl>
1  24.9     -121.     38.7
2  36.6     -120.     36.8
3  28.1     -120.     36.7
4  26.9     -122.     36.5
5  20.6     -122.     37.1
6  39.3     -118.     34.7

That’s just a plain data frame right now. It has coordinates in it, but R doesn’t know they mean anything spatial yet. We will make this into a sf object by telling it:

Code

californiaOzonePointsSF <- st_as_sf(californiaOzonePoints, coords = c("longitude", "latitude"), crs = 4326)
head(californiaOzonePointsSF)

Simple feature collection with 6 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -121.9694 ymin: 34.7122 xmax: -118.1391 ymax: 38.6988
Geodetic CRS:  WGS 84
# A tibble: 6 × 2
  ozone            geometry
  <dbl>         <POINT [°]>
1  24.9 (-121.2711 38.6988)
2  36.6 (-119.7402 36.8136)
3  28.1 (-119.7413 36.7055)
4  26.9 (-121.7333 36.4819)
5  20.6 (-121.9694 37.0508)
6  39.3 (-118.1391 34.7122)

The coords argument says which columns are the coordinates. The crs = 4326 sets the coordinate reference system, in this case WGS84 geographic coordinates (longitude/latitude), which is what you get from a GPS or most downloaded occurrence data.

Now R knows these are spatial points. You can plot them:

Code

plot(californiaOzonePointsSF)

A few things worth knowing about sf objects before we move on:

Generic functions work on them. E.g., plot(), summary(), and print() all know what to do with an sf object. We’ll talk more about how R handles this in the Methods and Generics aside, but the short version is: the same function call does sensible things regardless of what you hand it.

The geometry column sticks around. When you use dplyr verbs like filter() or select() on an sf object, the geometry column comes along for the ride. You don’t have to do anything special.

CRS matters. If you try to do spatial operations on two sf objects with different coordinate systems, you’ll get an error. Good! R is trying to help you by making sure your data are in the right place on earth. Transform with st_transform() when you need to match them up.

Rasters with `terra`

Raster data is a grid, a matrix of cells each holding a value, covering some extent of the Earth’s surface. Elevation, temperature, precipitation, land cover: these are all commonly stored as rasters.

The .tif format (GeoTIFF) is the most common way you’ll encounter rasters in the wild. Think of it as an image file with spatial metadata baked in. If you’ve opened a satellite image or a scanned map, you’ve seen a TIFF. The “Geo” part means it knows where on earth it lives: the extent, the resolution, the CRS. terra reads it the same way you’d read any raster, and the spatial info comes along automatically.

The terra package handles rasters in R. Its main object is a SpatRaster.

Code

californiaElevation <- rast("data/californiaElevation.tif")
californiaElevation

class       : SpatRaster
size        : 233, 279, 1  (nrow, ncol, nlyr)
resolution  : 0.04174499, 0.04174499  (x, y)
extent      : -124.9883, -113.3414, 32.42926, 42.15584  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326)
source      : californiaElevation.tif
name        :        elev
min value   :  -73.910347
max value   : 3773.893066

That output tells you the essentials: dimensions (rows and columns), resolution (how big each cell is), extent (the geographic bounding box), and CRS. Check those before you do anything else with a new raster. Surprises in any of those fields will cause problems later.

Code

plot(californiaElevation)

A few things worth knowing about terra objects:

Layers. A SpatRaster can have multiple layers, think of it like a stack of grids, one per variable or time step. You index into them with [[]]. One layer is still a SpatRaster, just a simple one.

Raster math is cell-wise. Operations apply to every cell at once, like working with a vector. No loops needed.

Code

californiaElevationFt <- californiaElevation * 3.281
plot(californiaElevationFt)

californiaElevation > 500 gives you a logical raster of cells above 500 meters. It’s fast because terra handles the iteration internally.

We can plot the ozone stations on top of the elevation surface in one shot. Let’s do that with ggplot.

Code

ggplot() +
  geom_spatraster(data = californiaElevation) +
  scale_fill_terrain_c(name = "Elevation (m)") +
  geom_spatraster_contour(
    data = californiaElevation,
    color = "black", alpha = 0.4, bins = 10
  ) +
  geom_sf(
    data = californiaOzonePointsSF, pch = 21, color = "grey",
    fill = "white",
    size = 2, alpha = 0.5
  ) +
  theme_minimal()

The monitoring stations cluster in the valleys and along the coast, which makes sense. Nobody builds an ozone monitor on top of a mountain.

Extracting values at points. You’ll do this quite a bit: you have point observations and a raster surface, and you want to know the raster value at each point location. terra::extract() does it. The terra:: prefix is intentional. extract() exists in other packages too, and if you’ve loaded tidyverse, there’s a good chance the wrong one is sitting on top. Being explicit saves you a confusing error.

Code

elevationAtOzonePoints <- terra::extract(californiaElevation, californiaOzonePointsSF)
head(elevationAtOzonePoints)

  ID      elev
1  1  67.73187
2  2 109.31101
3  3  90.42530
4  4 299.21814
5  5 249.27463
6  6 732.43298

A field ecologist records species observations at GPS points. You want to know the elevation, the climate, the land cover at each location. terra::extract() is how you get there.

We could attach those to the ozone points

Code

californiaOzonePointsSF <- californiaOzonePointsSF %>%
  add_column(elevation = elevationAtOzonePoints$elev)
head(californiaOzonePointsSF)

Simple feature collection with 6 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -121.9694 ymin: 34.7122 xmax: -118.1391 ymax: 38.6988
Geodetic CRS:  WGS 84
# A tibble: 6 × 3
  ozone            geometry elevation
  <dbl>         <POINT [°]>     <dbl>
1  24.9 (-121.2711 38.6988)      67.7
2  36.6 (-119.7402 36.8136)     109. 
3  28.1 (-119.7413 36.7055)      90.4
4  26.9 (-121.7333 36.4819)     299. 
5  20.6 (-121.9694 37.0508)     249. 
6  39.3 (-118.1391 34.7122)     732.

And now we have a nice new data column that we can use any way we like. E.g.,

Code

californiaOzonePointsSF %>%
  ggplot(mapping = aes(x = elevation, y = ozone)) +
  geom_point() +
  geom_smooth() +
  labs(x = "Elevation (m)", y = "Ozone (ppb)") +
  theme_minimal()

Projections: Why You Want to Be on a Plane

Longitude and latitude are angles, not distances. A degree of longitude at the equator is about 111 km wide. At 60 degrees north, it’s roughly half that. This matters because almost everything we do in spatial analysis, computing distances between points, fitting variograms, running kernel density estimation, assumes we’re working in flat Euclidean space where a unit in the x-direction is the same size as a unit in the y-direction, everywhere on the map.

In geographic coordinates, that assumption is wrong. And the further you get from the equator, the more wrong it gets.

The fix is to project your data onto a plane. For California, a sensible choice is CA Albers (EPSG:3310), an equal-area projection designed for the state. Coordinates come out in meters, distances are meaningful, and the analysis methods in the rest of this book will behave the way they’re supposed to.

For an sf object, use st_transform():

Code

californiaOzonePointsAlbers <- st_transform(californiaOzonePointsSF, crs = 3310)
californiaOzonePointsAlbers

Simple feature collection with 345 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -331109.4 ymin: -593561.6 xmax: 449656.9 ymax: 432274.9
Projected CRS: NAD83 / California Albers
# A tibble: 345 × 3
   ozone              geometry elevation
 * <dbl>           <POINT [m]>     <dbl>
 1  24.9  (-110428.2 76601.22)      67.7
 2  36.6   (23145.3 -133669.2)     109. 
 3  28.1  (23080.09 -145684.3)      90.4
 4  26.9 (-155083.4 -169148.9)     299. 
 5  20.6 (-174891.7 -105519.2)     249. 
 6  39.3  (170356.7 -365466.7)     732. 
 7  30.0  (63581.31 -186931.7)     101. 
 8  28.8  (269110.7 -571070.3)     105. 
 9  45.1 (-42655.93 -79315.44)      47.7
10  22.8  (188196.9 -466797.9)      31.3
# ℹ 335 more rows

For a SpatRaster, use project():

Code

californiaElevationAlbers <- project(californiaElevation, "EPSG:3310")
californiaElevationAlbers

class       : SpatRaster
size        : 268, 267, 1  (nrow, ncol, nlyr)
resolution  : 4102.271, 4102.271  (x, y)
extent      : -469783.1, 625523.3, -620345.7, 479063  (xmin, xmax, ymin, ymax)
coord. ref. : NAD83 / California Albers (EPSG:3310)
source(s)   : memory
name        :        elev
min value   :         -68
max value   : 3645.142334

You’ll notice the resolution changes when you reproject a raster. terra has to resample the grid to fit the new coordinate system. That’s normal. Check the extent and resolution in the output to make sure things look right.

From here on, we’ll work in projected coordinates. If you load data in geographic coordinates, reproject it before doing any analysis. Make it a habit.

Wrapping Up

You now know enough to follow along. We have sf for our point observations and terra for our gridded surfaces. The chapters ahead use both, and you’ll get more comfortable with them as we go.

Starting with Point Patterns, we’re going to ask the most basic spatial question: are these points randomly distributed, or is there structure? You’d be surprised how much that question opens up.

Big Idea

Packages

Points with sf

Rasters with terra

Projections: Why You Want to Be on a Plane

Wrapping Up

Points with `sf`

Rasters with `terra`