Link Search Menu Expand Document

P2D4: Being and data scientist

Reading discussion

Big Data and the Underground Railroad

Big Data and the Underground Railroad from Slate.com.

There is a moral lag in the way we treat data. Far too often, today’s discrimination was yesterday’s national security or public health necessity. An approach that advocates ubiquitous data collection and protects privacy solely through post-collection use restrictions doesn’t account for that.

These examples may seem extreme. But they highlight an important and uncomfortable fact: Throughout our history, the survival of our most vulnerable communities has often turned on their ability to avoid detection.

The problem is not that the proponents of big data are nefarious—they’re not. The problem is that all of us, as a society, are startlingly bad at protecting the data of vulnerable communities, and recognizing when use of that data crosses a line.

One Month, 500,000 Face Scans: How China Is Using A.I. to Profile a Minority

BIG DATA IS OUR GENERATION’S CIVIL RIGHTS ISSUE, AND WE DON’T KNOW IT

BIG DATA IS OUR GENERATION’S CIVIL RIGHTS ISSUE, AND WE DON’T KNOW IT

when things become so cheap that they’re practically free, big changes happen—just look at the advent of steam power, or the copying of digital music, or the rise of home printing. Abundance replaces scarcity, and we invent new business models.

Python and Pandas

object-oriented: describes a programming language. Specifically, one whose constructs make object-based solutions convenient to implement. functionally-oriented: describes a programming language. Specifically, one whose constructs make function-based solutions convenient to implement.

Characteristic Object-Based Approach Function-Based Approach
Programmer Paradigm Imperative: How to perform the task in gory detail Declerative: What result you want. What information is desired.
State changes Important Non-existent (avoids mutable state objects)
Order of Execution Important Only in the context of affecting the output
Primary flow Loops and methods Functions
manipulation paradigm functionality as methods within objects Functions first, data is an input to the function
Readability concise: easier to test and comprehend as it is explicit obfuscation: complex hierarchies and abstractions make testing and comprehensability more difficult.
Fans Jesse Dickey here Ilya Suzdalnitski discussion here

Imperative and Declarative

With an imperative approach, a developer writes code that specifies the steps that the computer must take to accomplish the goal. This is sometimes referred to as algorithmic programming. In contrast, a functional approach involves composing the problem as a set of functions to be executed. -ref-

This Stack Overflow thread provides some great examples.

Imperative examples

  1. Imperative goes to a restaurant and orders a 6oz. steak (cooked rare), fries (with ketchup), a side-salad (with ranch), and a Coke (with no ice). The waiter delivers exactly what he asked for, and he’s charged $14.50.
  2. Imperative: Librarian please 1) Go into the Library, 2) Find Book Organization System, 3) Research how to use Card Catalogs, 5) Figure out how shelves are labeled and organized, 6) Figure out how books are organized on a shelf, 7) Cross-reference book location from card catalog with organization system to find said book. 8) Take book to check-out system, 9) Check out book for me.
  3. Imperative directions say, “Go to 1st Street, turn left onto Main, drive two blocks, turn right onto Maple, and stop at the third house on the left.”
  4. Imperative language is when you say how to get what you want.
  5. In English Imperative sentences give a command or make some sort of request.

Declarative examples

  1. Declarative goes to a restaurant and tells the waiter that he only wants to pay around 12 dollars for dinner, and he’s in the mood for steak. The waiter returns with a 6oz. steak (cooked medium), a side of mashed potatoes, steamed broccoli, a dinner roll, and a glass of water. He’s charged $11.99.
  2. Declarative: Librarian, please check me out a copy of Moby Dick. (Librarian, at their discretion chooses the best method for performing the request)
  3. Declarative directions are something like this: “Drive to Sue’s house.”
  4. Declarative language is when you say what you want.
  5. In English Declarative sentences state something.

Don’t I live in a state?

Understanding the phrase ‘state change’ in computer science can be a bit complicated (and I know I don’t have it completely right). I think we could use Pandas to help us understand the concept when we make a variable assignment in Python we create a state for the object df.

import pandas as pd
import numpy as np
df = pd.DataFrame({'col1': [1, 2], 'col2': [3, np.nan]})

When we assign dfto a new object we have created a new variable df2 that is mapped to the df state. We may think that df2 is another object to manipulate without worry about changing df. However, Pandas keeps these two variables entangled.

df2 = df

We can break this entanglement by adding and empty method.

df3 = df.assign()

Generally, we don’t have to think about these issues with Pandas because we shouldn’t use inplace=True. However, the argument is available within a few methods which makes it worth our time to discuss. You can read why You Should Probably Never Use pandas inplace=True on Medium to get more details.

Pandas does have some methods that perform state changes (no explicit assignment using =) which are accessed with the inplace=True. In this first case, the change to df3 does not affect df. However, the state of df3 was changed without an explicit =.

df3.fillna(0, inplace=True)
print(df3)
print(df)

What do you think will happen with the following code chunk?

df2.fillna(0, inplace=True)
print(df2)
print(df)

Notice how df and df2 were both changed. The state of df was changed without an explicit = or the use of .fillna().

Lists, Tuples, and Dictionaries

A Byte of Python is a great free resource to get the basics of Python. I am going to quote this book heavily below.

Lists

The list of items should be enclosed in square brackets ([]) so that Python understands that you are specifying a list. Once you have created a list, you can add, remove or search for items in the list. Since we can add and remove items, we say that a list is a mutable data type i.e. this type can be altered.

mylist = ["toast", "butter", "cheese", "dig a whole"]

Tuples

Think of them as similar to lists, but without the extensive functionality that the list class gives you. One major feature of tuples is that they are immutable like strings i.e. you cannot modify tuples. Tuples are defined by specifying items separated by commas within an optional pair of parentheses (()).

mytupple = ("thing1", "thing2")

Dictionary

A dictionary is like an address-book where you can find the address (value) or contact details of a person by knowing only his/her name (key). The key must be unique just like the limitations of getting correct response if you have two persons with the exact same name when calling on them.

Pairs of keys and values are specified in a dictionary by using the colons (:) to match each pair using curly brackets ({}). The pairs are separated by commas.

my_dictionary = {'col1': 1, 'col2': 3}

Pandas examples

Notice how we create our DataFrames using these tools.

df = pd.DataFrame({'col1': [1, 2], 'col2': [3, np.nan]})
data = [('Peter', 18, 7),
        ('Riff', 15, 6),
        ('John', 17, 8),
        ('Michel', 18, 7),
        ('Sheli', 17, 5) ]
df2 = pd.DataFrame(data, columns =['Name', 'Age', 'Score'])

List comprehension (Pythonista)

Real Python provides a clear guide on list comprehension. You can follow this guide as well and then use their cheat sheet.

  1. List comprehensions are often described as being more Pythonic than loops.
  2. One main benefit of using a list comprehension in Python is that it’s a single tool that you can use in many different situations.
  3. List comprehensions are also more declarative than loops, which means they’re easier to read and understand.

An example

What will the list prices be?

original_prices = [1.25, -9.45, 10.22, 3.78, -5.92, 1.16]
prices = [i if i > 0 else 0 for i in original_prices]

Same process using functions

def get_price(price):
    return price if price > 0 else 0
prices = [get_price(i) for i in original_prices]
prices

Functions in Python

Functions are reusable pieces of programs. Functions are defined using the def keyword. After this keyword comes an identifier name for the function, followed by a pair of parentheses (()) which may enclose some names of variables, and by the final colon (:) that ends the line. Next follows the block of statements that are part of this function which are tab spaced one indent.

def magic_calcs(myDF):
    a = myDF.mean()
    return(a)