P2D5: Writing Python code
Using pd.read_fwf()
pd.read_fwf()
doesn’t provide much detail on how to use the function.
- Let’s start with one of our files
- What arguments do we have available to us in
pd.read_fwf()
.path_or_buf
colspecs
widths
**kwds
What are these? Kew word arguments frompd.read_table()
- Ok, now we have one dataset locally and we have it parsed into Python. Let’s write psuedo code for our Python script.
range objects and list comprehension
In R we have access to vectors of numbers using :
. It is intuitive to me and easy to read in code.
range_variable <- 60:84
R also converts numbers to text automatically with paste.
paste0("X", range_variable)
Then we can add more items with c()
.
c(paste0("X", range_variable), "U03", "Y870")
Python is not as friendly to the R user mind. We have to understand a few programming constructs to create a similar variable.
range()
objects.- Convert numbers to strings with
str()
- list comprehension
- joining lists
# My python code to make the magic happen??
range(60, 85) # to start, but why 85?
Creating the suicide table in Pandas
Here is the example R code.
suicide <- raw_file %>%
dplyr::filter(underlying_cause %in% suicide_code) %>%
dplyr::mutate(
gun = ifelse(underlying_cause %in% c("X72", "X73", "X74"), 1, 0),
year = year
)
.query()
.assign()
.isin()
np.where()
Creating the guns table in Pandas
Pandas does not have a clean case_when()
like dplyr. We can use np.select()
to get close.
intent_cond = [
(raw_file.underlying_cause.isin(["W32", "W33", "W34"])),
(raw_file.underlying_cause.isin(["X72", "X73", "X74"])),
(raw_file.underlying_cause.isin(["*U01.4", "X93", "X94", "X95",
"Y350"])),
(raw_file.underlying_cause.isin(["Y22", "Y23", "Y24"]))
]
intent_val = ["Accidental", "Suicide", "Homicide", "Undetermined"]
guns = raw_file.assign(
intent = np.select(intent_cond, intent_val, default = np.nan)
)