Link Search Menu Expand Document

P3D11: Managing spatial and nested data in Python

These slides map to the R example slides on Day 16 or P3D2

Github gists

Gists in Github - They aren’t just for Geeks!

The above quote is from a short post about Gists by Amit Agarwal.

Sharing gists

When we have short code snippets we want to share we can use Github’s gists.

  1. Click on your user icon once you are logged into Github.
  2. Then select Your gists
  3. Now you are at gist.github.com/<YOURUSERNAME>
  4. From here you can click on the + in the top right corner to create your own gist.
  5. You can also fork a gist just like a repo. Let’s fork mine - hathaway/safegraph_functions.py

Downloading public gists (without using Github)

In the example below we are ‘web scraping’ to get the python script from our gist. Notice the .get() that ‘scrapes’ that url. Finally we can use open().write() to save the script.

import requests 

your_location = "safegraph_functions.py"

url = "https://github.com/KSUDS/p3_spatial/raw/main/SafeGraph%20-%20Patterns%20and%20Core%20Data%20-%20Chipotle%20-%20July%202021/Core%20Places%20and%20Patterns%20Data/chipotle_core_poi_and_patterns.csv"

response = requests.get(url)

print(response.headers.get('content-type'))

open(your_location, "wb").write(response.content)

Let’s explore the safegraph_functions.py script.

Starting our exploratory data analysis (EDA)

With our file in the same folder as our exploration python script we can use the following code.

import pandas as pd
import numpy as np
import geopandas as gpd
import folium
from plotnine import *
import safegraph_functions as sgf

Now let’s load the SafeGraph Chipotle data. You can load your local file if you would like.

url_loc = "https://github.com/KSUDS/p3_spatial/raw/main/SafeGraph%20-%20Patterns%20and%20Core%20Data%20-%20Chipotle%20-%20July%202021/Core%20Places%20and%20Patterns%20Data/chipotle_core_poi_and_patterns.csv"
dat = pd.read_csv(url_loc)

datl = dat.iloc[:10,:]

Now we can use sgf.expand_json() and sgf.expand.list() to get the embedded data out of the dataframe.

list_cols = ['visits_by_day', 'popularity_by_hour']
json_cols = ['open_hours','visitor_home_cbgs', 'visitor_country_of_orgin', 'bucketed_dwell_times', 'related_same_day_brand', 'related_same_month_brand', 'popularity_by_day', 'device_type', 'visitor_home_aggregation', 'visitor_daytime_cbgs']

dat_pbd = sgf.expand_json('popularity_by_day', datl)
dat_rsdb = sgf.expand_json('related_same_day_brand', datl)
dat_vbd = sgf.expand_list("visits_by_day", datl)
dat_pbh = sgf.expand_list("popularity_by_hour", datl)

What are the top three brands that Chipotle customers visit on the same day?

Create a bar chart of the top 10 to show us.

Over all the days of the week, which day of the week has the most variability across the Chipotle brand?

Over the hours in a day which has to most variability across the Chipotle brand? Which has the highest median visit rate?

Create a boxplot by hour of the day to help answer this question