Link Search Menu Expand Document

P2D1: Getting data and gun deaths

Partner presentations (10 minutes).

Find a partner in the class and share your project 1 code and charts.

  • Compare your work to identify different ways that each of you completed the job.
  • Discuss with each other how long it took for you to complete each chart.
  • Share any cool google resources that you have found while completing the project.

Exploring gun deaths

What is the problem with raw counts and minority groups?

#' Used this information to build the values.
# https://www.census.gov/quickfacts/fact/table/US/POP010220
dat_pop <- tibble(
    table_var = c("Asian/Pacific Islander",  
        "Black",  "Hispanic",  
        "Native American/Native Alaskan",  "White"), 
    N =  331449281 *c(.061, .134, .185, .013, .763))

What are some solutions to displaying this small population impact? Issues with those solutions?

Trying the waffle chart

waffle

The plague of bar charts

It is difficult to understand why statisticians commonly limit their inquiries to Averages, and do not revel in more comprehensive views. Their souls seem as dull to the charm of variety as that of the native of one of our flat English counties, whose retrospect of Switzerland was that, if its mountains could be thrown into its lakes, two nuisances would be got rid of at once.

Francis Galton

The problem with bar charts:

  • They display summaries of data, not the data.
  • They make it hard to visualize uncertainty.
  • They don’t facilitate easy comparisons beyond one category.
  • They should be avoided with time-series data

Two excellent references:

Munging data with dplyr

Understanding case_when()

case_when() is particularly useful inside mutate when you want to create a new variable that relies on a complex combination of existing variables. Write a short sentence that says what this code is doing?

starwars %>%
  select(name:mass, gender, species) %>%
  mutate(
    type = case_when(
      height > 200 | mass > 200 ~ "large",
      species == "Droid"        ~ "robot",
      TRUE                      ~  "other"
    )
  )

#> # A tibble: 87 x 6
#>                  name height  mass gender species  type
#>                 <chr>  <int> <dbl>  <chr>   <chr> <chr>
#>  1     Luke Skywalker    172    77   male   Human other
#>  2              C-3PO    167    75   <NA>   Droid robot
#>  3              R2-D2     96    32   <NA>   Droid robot