P3D3: Programming with spatial data
These slides map to the R example slides on Day 23 or P3D9
GESTALT: AN ORGANIZED WHOLE THAT IS PERCEIVED AS MORE THAN THE SUM OF ITS PARTS
Andrew Gelman on spatial data
Mapping raw data can lead to spurious spatial features. For example, regions can appear highly variable because of small sample sizes in spatial sub-units (as in the radon example) or small populations (as in the cancer example), and these apparently variable regions contain a disproportionate number of very high (or low) observed parameter values
Furthermore, maps really do make convenient look-up tables (what is the cancer rate, or mean radon level, in my county?). Unfortunately, even maps that are intended to be used only as look-up tables are almost sure to be used for identifying spatial features – we find it very hard to suppress this instinct ourselves ref
Understanding spatial references
WHICH AREA IS LARGER - THE CONTINENTAL 48 STATES OR THE COUNTRY OF BRAZIL?
- https://spatialreference.org/ref/
- Boston has decided to change their projections used in school
- Let’s watch a video referenced the Guardian article
Leveraging the SF package
All functions in the sf
and stars
packages that operate on simple feature objects start with st_
which stands for spatial type.
Getting lat and long columns to spatial objects
You can pipe the data into this function to convert from ordinary text columns to an SF
geometry. Note the crs
and coords
arguments.
st_as_sf(coords = c("longitude", "latitude"), crs = 4326)
Practicing with polygons and points
The USAboundaries
package makes it very easy for us to leverage polygons for US political boundaries.
install.packages("USAboundaries")
install.packages("USAboundariesData", repos = "http://packages.ropensci.org", type = "source")
cal <- us_counties(states = "California") %>%
select(countyfp, countyns, name, aland, awater, state_abbr, geometry)
Let’s plot our polygons
ggplot(data = cal) +
geom_sf(aes(fill = awater)) +
geom_sf_text(aes(label = name), color = "lightgrey") +
theme_bw() +
labs(fill = "Area of county\ncovered by water", title = "What are the units?")
Calculating areas
- What units are the
awater
andaland
variables in? -
Does
aland
includeawater
? - st_area
Counting Chipotle stores by county
- st_join()
- st_within()
- Great reference
Plotting with ggplot2
Let’s create this chart in ggplot2.
Plotting with leaflet
# need to define calw
bins <- c(0, 10, 20, 30, 50, 70, 90, 110)
pal <- colorBin("YlOrRd", domain = calw$n)
m <- leaflet(calw) %>%
addPolygons(
data = calw,
fillColor = ~pal(n),
fillOpacity = .5,
color = "darkgrey",
weight = 2) %>%
addCircleMarkers(
data = filter(dat, region == "CA"),
radius = 3,
color = "grey") %>%
addProviderTiles(providers$CartoDB.Positron)