Spatial Data in R: A Quick Orientation

There is a lot to know about spatial data in R. A lot. Object classes, coordinate reference systems, projections, spatial joins, raster math, vector operations. It goes deep fast. We aren’t going to go there. This is not a GIS-in-R book, and it is not a substitute for one.

If you want the full treatment, two resources will serve you well. Geocomputation with R by Lovelace, Nowosad, and Muenchow is the best book on the subject and it’s free online. The Data Carpentries spatial curriculum covers the fundamentals in a hands-on workshop format. Work through either of those and you’ll come out knowing a lot more than we need here.

What we do need is a working vocabulary for two things: point data and raster data. Those are the main characters in this book. Polygons and lines show up occasionally but they aren’t the focus.

So that’s the plan for this chapter. Quick introduction to sf for points. Quick introduction to terra for rasters. Then we move on.

We’ll use three packages in this chapter. tidyverse (Wickham 2023) for general data wrangling, sf (Pebesma 2026) for vector spatial data, and terra (Hijmans 2026) for rasters. You’ll see these same packages throughout the book. Full citations are in the References chapter at the back.

Points with sf

The sf package handles vector data in R: points, lines, and polygons. We care about points. An sf object looks like a data frame, which is intentional. It is a data frame, with one extra column that holds the geometry.

Code
californiaOzonePoints <- read_csv("data/californiaOzonePoints.csv")
head(californiaOzonePoints)
# A tibble: 6 × 3
  ozone longitude latitude
  <dbl>     <dbl>    <dbl>
1  24.9     -121.     38.7
2  36.6     -120.     36.8
3  28.1     -120.     36.7
4  26.9     -122.     36.5
5  20.6     -122.     37.1
6  39.3     -118.     34.7

That’s just a plain data frame right now. It has coordinates in it, but R doesn’t know they mean anything spatial yet. We tell it:

Code
californiaOzonePointsSF <- st_as_sf(californiaOzonePoints, coords = c("longitude", "latitude"), crs = 4326)
californiaOzonePointsSF
Simple feature collection with 345 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -123.9812 ymin: 32.6314 xmax: -115.1844 ymax: 41.8457
Geodetic CRS:  WGS 84
# A tibble: 345 × 2
   ozone            geometry
 * <dbl>         <POINT [°]>
 1  24.9 (-121.2711 38.6988)
 2  36.6 (-119.7402 36.8136)
 3  28.1 (-119.7413 36.7055)
 4  26.9 (-121.7333 36.4819)
 5  20.6 (-121.9694 37.0508)
 6  39.3 (-118.1391 34.7122)
 7  30.0 (-119.2908 36.3325)
 8  28.8 (-117.1289 32.8364)
 9  45.1 (-120.4819 37.3019)
10  22.8  (-117.968 33.7958)
# ℹ 335 more rows

The coords argument says which columns are the coordinates. The crs = 4326 sets the coordinate reference system, in this case WGS84 geographic coordinates (longitude/latitude), which is what you get from a GPS or most downloaded occurrence data.

Now R knows these are spatial points. You can plot them:

Code
plot(californiaOzonePointsSF)

A few things worth knowing about sf objects before we move on:

Generic functions work on them. plot(), summary(), and print() all know what to do with an sf object. We’ll talk more about how R handles this in the Methods and Generics aside, but the short version is: the same function call does sensible things regardless of what you hand it.

The geometry column sticks around. When you use dplyr verbs like filter() or select() on an sf object, the geometry column comes along for the ride. You don’t have to do anything special.

CRS matters. If you try to do spatial operations on two sf objects with different coordinate systems, you’ll get an error. Good. R is catching a problem. Transform with st_transform() when you need to match them up.

Rasters with terra

Raster data is a grid, a matrix of cells each holding a value, covering some extent of the Earth’s surface. Elevation, temperature, precipitation, land cover: these are all commonly stored as rasters.

The .tif format (GeoTIFF) is the most common way you’ll encounter rasters in the wild. Think of it as an image file with spatial metadata baked in. If you’ve opened a satellite image or a scanned map, you’ve seen a TIFF. The “Geo” part means it knows where on Earth it lives: the extent, the resolution, the CRS. terra reads it the same way you’d read any raster, and the spatial info comes along automatically.

The terra package handles rasters in R. Its main object is a SpatRaster.

Code
californiaElevation <- rast("data/californiaElevation.tif")
californiaElevation
class       : SpatRaster 
size        : 233, 279, 1  (nrow, ncol, nlyr)
resolution  : 0.04174499, 0.04174499  (x, y)
extent      : -124.9883, -113.3414, 32.42926, 42.15584  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source      : californiaElevation.tif 
name        :       elev 
min value   :  -73.91035 
max value   : 3773.89307 

That output tells you the essentials: dimensions (rows and columns), resolution (how big each cell is), extent (the geographic bounding box), and CRS. Check those before you do anything else with a new raster. Surprises in any of those fields will cause problems later.

Code
plot(californiaElevation)

A few things worth knowing about terra objects:

Layers. A SpatRaster can have multiple layers, think of it like a stack of grids, one per variable or time step. You index into them with [[]]. One layer is still a SpatRaster, just a simple one.

Raster math is cell-wise. Operations apply to every cell at once, like working with a vector. No loops needed.

Code
californiaElevationFt <- californiaElevation * 3.281
plot(californiaElevationFt)

californiaElevation > 500 gives you a logical raster of cells above 500 meters. It’s fast because terra handles the iteration internally.

Before we extract, here’s the payoff for having both objects in the same space. We can plot the ozone stations on top of the elevation surface in one shot. To do that with ggplot2, we first convert the raster to a plain data frame, then layer in the points with geom_sf().

Code
elevDF <- as.data.frame(californiaElevation, xy = TRUE)
names(elevDF)[3] <- "elevation"

ggplot() +
  geom_raster(data = elevDF, aes(x = x, y = y, fill = elevation)) +
  geom_contour(data = elevDF, aes(x = x, y = y, z = elevation),
               color = "white", alpha = 0.4, bins = 15) +
  geom_sf(data = californiaOzonePointsSF, color = "red", size = 1.5) +
  scale_fill_viridis_c(name = "Elevation (m)") +
  coord_sf() +
  theme_minimal() +
  labs(x = NULL, y = NULL)

The monitoring stations cluster in the valleys and along the coast, which makes sense. Nobody builds an ozone monitor on top of a mountain.

Extracting values at points. You’ll do this constantly: you have point observations and a raster surface, and you want to know the raster value at each point location. terra::extract() does it. The terra:: prefix is intentional. extract() exists in other packages too, and if you’ve loaded tidyverse, there’s a good chance the wrong one is sitting on top. Being explicit saves you a confusing error.

Code
elevation_at_ozone_points <- terra::extract(californiaElevation, californiaOzonePointsSF)
head(elevation_at_ozone_points)
  ID      elev
1  1  67.73187
2  2 109.31101
3  3  90.42530
4  4 299.21814
5  5 249.27463
6  6 732.43298

A field ecologist records species observations at GPS points. You want to know the elevation, the climate, the land cover at each location. terra::extract() is how you get there.

Projections: Why You Want to Be on a Plane

Longitude and latitude are angles, not distances. A degree of longitude at the equator is about 111 km wide. At 60 degrees north, it’s roughly half that. This matters because almost everything we do in spatial analysis, computing distances between points, fitting variograms, running kernel density estimation, assumes we’re working in flat Euclidean space where a unit in the x-direction is the same size as a unit in the y-direction, everywhere on the map.

In geographic coordinates, that assumption is wrong. And the further you get from the equator, the more wrong it gets.

The fix is to project your data onto a plane. For California, a sensible choice is CA Albers (EPSG:3310), an equal-area projection designed for the state. Coordinates come out in meters, distances are meaningful, and the analysis methods in the rest of this book will behave the way they’re supposed to.

For an sf object, use st_transform():

Code
californiaOzonePointsAlbers <- st_transform(californiaOzonePointsSF, crs = 3310)
californiaOzonePointsAlbers
Simple feature collection with 345 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -331109.4 ymin: -593561.6 xmax: 449656.9 ymax: 432274.9
Projected CRS: NAD83 / California Albers
# A tibble: 345 × 2
   ozone              geometry
 * <dbl>           <POINT [m]>
 1  24.9  (-110428.2 76601.22)
 2  36.6   (23145.3 -133669.2)
 3  28.1  (23080.09 -145684.3)
 4  26.9 (-155083.4 -169148.9)
 5  20.6 (-174891.7 -105519.2)
 6  39.3  (170356.7 -365466.7)
 7  30.0  (63581.31 -186931.7)
 8  28.8  (269110.7 -571070.3)
 9  45.1 (-42655.93 -79315.44)
10  22.8  (188196.9 -466797.9)
# ℹ 335 more rows

For a SpatRaster, use project():

Code
californiaElevationAlbers <- project(californiaElevation, "EPSG:3310")
californiaElevationAlbers
class       : SpatRaster 
size        : 268, 267, 1  (nrow, ncol, nlyr)
resolution  : 4102.271, 4102.271  (x, y)
extent      : -469783.1, 625523.3, -620345.7, 479063  (xmin, xmax, ymin, ymax)
coord. ref. : NAD83 / California Albers (EPSG:3310) 
source(s)   : memory
name        :     elev 
min value   :  -68.000 
max value   : 3645.142 

You’ll notice the resolution changes when you reproject a raster. terra has to resample the grid to fit the new coordinate system. That’s normal. Check the extent and resolution in the output to make sure things look right.

From here on, we’ll work in projected coordinates. If you load data in geographic coordinates, reproject it before doing any analysis. Make it a habit.

Wrapping Up

You now know enough to follow along. We have sf for our point observations and terra for our gridded surfaces. The chapters ahead use both, and you’ll get more comfortable with them as we go.

Starting with Point Patterns, we’re going to ask the most basic spatial question: are these points randomly distributed, or is there structure? You’d be surprised how much that question opens up.