D1: Introduction
My history
My family
I know. I have a lot of kids. My oldest is 21, and my youngest is 4.
My school/professional journey
- 1998: Start my undergraduate at BYU
- 2000: Transfer to the University of Utah
- 2003: Undergraduate in Economics (er Socialist History) from the U.
- 2003-2005: Master’s degree in Statistics from BYU.
- 2005-2012: Statistician: Pacific Northwest National Laboratory (PNNL)
- 2012-2015: Reformed statistician: PNNL
- 2015-Current: Data Science Professor: BYU-I
- 2015-Current: Owner and Data Scientist of Data-Driven Consulting (Medical records and Child Health Analytics, Environmental Sampling, Business Consulting)
My data DNA
- Building energy data (real and simulated)
- Climate model data using Hadoop
- Rat trial data (smoking and cancer)
- Bomb fragmentation
- Lidar from retired World War II ranges
- Lidar from warehouse scans
- Spatial data with the power grid and climate models
- Power Grid data
- HR/Employment data
- Banking data
- University records data
- Newborn and Child Health
- Library records data
- Movement data from SafeGraph
- Electronic Health Records
Questions
- What things do you (not) like in the data science space?
- Why did you stop calling yourself a statistician?
- Why did you leave PNNL?
- What do(n’t) you like about academia?
- What is your favorite music group?
- Why did you choose Kennesaw State for your sabbatical?
- Which data science programming language is your favorite?
- What are some jobs you had before you finished graduate school?
- How do you define ‘big data?’
- What do you think about mathematics education?
- How does Kennesaw compare to Rexburg?
- How is BYU-I related to BYU and The Church of Jesus Christ of Latter-day Saints?
Class introductions
- Your major and plans after you graduate.
- How long you have lived in Georgia.
- Why you picked Kennesaw State for your education.
Anonymous note
- Why are you taking this class?
- What would you like to remember from this class five years from now?
Data science programming
I see data engineering as ‘big client’ (building pipelines and tools that touch 1000’s) with small daily change (refine systems and deliver quicker results) and data science as a small client (addressing the needs of 10’s in management) with ‘big change’ and new modeling (propose the latest methods and demo the data munging and value).
A data engineer would spend more of their time talking with IT and CS folk. They would interact heavily with the data scientists as well. The data engineer would translate for the data scientist into the IT and CS space, and the data scientist would translate for the data engineer into the business and business need space. A data scientist would spend less of their time talking with IT and CS than a data engineer. Many people will wear the data science and data engineering hat at the same time.
Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician. Josh Wills
The Darkside of that quote is real! Data scientists don’t program as well as software engineers. Data scientists are also reasonably soft when it comes to understanding the larger field of statistical analysis. We can improve over time. However, our domain structure often demands that we don’t specialize in the technical areas as we are often scaling up in other domains. If we did specialize, then we would be called statisticians or software engineers.
Where are we going?
Everything in this class starts with data. We need programming to handle data. After data and programming, all other skills are a distant third. I want to help you have a solid foundation and introduce you to the different abilities of data science.
Class structure
Let’s look at the syllabus and make sure we understand the plan.
- Objectives
- Teaching DS principles
- Course communication
- Projects
- Tri-weekly routine
- Readings
- Course grading
Why are we doing this?
Data Science job growth and employment are strong. Even for skilled undergraduates. We have had 0ver 50 graduates from BYU-I over the last three years, and almost all have found and maintained employment in analytics. The students that can’t program well and do poorly in school are still demanding ~$50k a year. The better programmers and performers in schools are starting in the high $70k range.
On teaching and learning
- Can you read the few quotes under each heading of my learning manifesto and pick your favorite?
What is next?
On Wednesday, we will help everyone finalize their installation and computer setup. Please review the guides on slack, VScode, and Git before class. I would even try to get everything installed and working.
Connecting to Slack
Join ksuds.slack.com with your kennesaw.edu
email to connect with our class workspace. Slack has a set of videos that can help you get comfortable with the tool. You can also use the quick start guide to familiarize yourself.
I highly recommend the apps over the web interface. Please download them for your phone and computer.