P3D10: Spatial Calculations and Manipulations with GeoPandas
These slides map to the R example slides on Day 18 or P3D4
- Make sure you have GeoPandas installed based on the guide from day P3D9
import pandas as pd
import numpy as np
import geopandas as gpd
import folium
import rtree
from plotnine import *
First lets get our Safegraph data usable
You can use your local file.
url_loc = "https://github.com/KSUDS/p3_spatial/raw/main/SafeGraph%20-%20Patterns%20and%20Core%20Data%20-%20Chipotle%20-%20July%202021/Core%20Places%20and%20Patterns%20Data/chipotle_core_poi_and_patterns.csv"
dat = pd.read_csv(url_loc)
Now we can subset the Chipotles to California and build a goemetry column.
dat_cal = dat.query("region=='CA'")
dat_cal = gpd.GeoDataFrame(
dat_cal.filter(["placekey", "latitude", "longitude", "median_dwell", "region"]),
geometry=gpd.points_from_xy(dat_cal.longitude, dat_cal.latitude),
crs='EPSG:4326')
Lets parse our county polygons
county = gpd.read_parquet("personal/usa_counties.parquet")
Now we can build out some spatial calculations on our California counties. In the example below we want to calculate the distance of each county center to KSU.
The code to get our KSU point.
from shapely.geometry import Point
ksu_df = pd.DataFrame({"lat":[34.037876],
"long":[-84.58102]})
ksu = gpd.GeoDataFrame(ksu_df,
geometry=gpd.points_from_xy(ksu_df.long, ksu_df.lat),
crs='EPSG:4326')
point = Point(
ksu.geometry.to_crs(epsg = 3310).x,
ksu.geometry.to_crs(epsg = 3310).y)
Our new wrangled California, calw
.
calw = (cal
.assign(
gp_area = lambda x: x.geometry.to_crs(epsg = 3310).area,
gp_acres = lambda x: x.gp_area * 0.000247105,
aland_acres = lambda x: x.aland * 0.000247105,
percent_water = lambda x: x.awater / x.aland,
gp_center = lambda x: x.geometry.to_crs(epsg = 3310).centroid,
gp_length = lambda x: x.geometry.to_crs(epsg = 3310).length,
gp_distance = lambda x: x.gp_center.distance(point),
gp_buffer = lambda x: x.geometry.to_crs(epsg = 3310).buffer(24140.2)
))
Plotting Chipotle stores in California
We can plot our spatial series that we created in calw
.
calw.gp_buffer.plot()
calw.gp_center.plot(color= "black")
With dat_cal
from above we can plot the locations on the county map.
base = calw.plot(color="white", edgecolor="darkgrey")
dat_cal.plot(ax=base, color="red", markersize=5)
Counting Chipotle stores by county
To leverage the spatial join functions of GeoPandas we need to make sure that we have rtree. We can leverage gpd.sjoin()
for our task.
# Now count stores by county
dat_join_s1 = gpd.sjoin(dat_cal, calw)
dat_join_merge = (dat_join_count
.groupby("name")
.agg(counts = ('percent_water', 'size'))
.reset_index())
calw_join = (calw
.merge(dat_join_merge, on="name", how="left")
.fillna(value={"counts":0}))
Now we can plot the counties colored by their respective number of stores with the store locations.
base = calw_join.plot(
edgecolor="darkgrey",
column = "counts")
dat_cal.plot(ax=base, color="red", markersize=4)