Data Science Programming in R and Python Syllabus
Materials for STAT 4490 Data Science Programming (Special Topics). The syllabus is below. Our class material can be found in the slides folder.
Table of contents
Course objectives
- Use functions, data structures, and other programming constructs efficiently to process and find meaning in data.
- Programmatically load data from various data sources, including files, databases, and remote services.
- Use data manipulation libraries to perform straightforward analysis, produce charts, and prepare data for machine learning algorithms.
- Use machine learning libraries to discover insights, make predictions, and interpret the success of these algorithms.
- Use industry-leading tools to collaborate and share your work.
Principles of data science teaching
You may want to read my teaching philosophy to see how I will manage the class. You may also find some joy in reading student reviews of my courses at BYU-I.
I follow the listed principles of data science teaching as found in A Guide to Teaching Data Science.
- Organize the course around a set of diverse projects
- Integrate computing into every aspect of the course
- Teach abstraction, but minimize reliance on mathematical notation
- Structure course activities to realistically mimic a data scientist’s experience
- Demonstrate the importance of critical thinking/skepticism through examples
Course communication
- Slack is used heavily in the data science space. We will use it in our course for most non-graded communication. Please use your
kennesaw.edu
email to join our space - Github is the space where data science is shared. Data scientists use Github for project collaboration and methodology dissemination. We will use Github for our work collaborations and some course communication.
- Kennesaw LMS for graded assignments.
Course outline
- Class Data projects: We will complete each project in Python and R over three weeks.
- Personal projects: You will also be responsible for picking a new data set to demonstrate your creative analysis using our course skills.
- Data science readings: We will have data science ‘being’ readings which we will use in class discussions once each project.
- Coding challenges: A few times during the semester, you will have a coding challenge that lets you prove your developed programming skills.
Use the course repo template for your personal data projects.
3-week routine
After the first week of class, we will follow this class time routine for each project throughout the semester.
- Day 1: Review your completed work on the previous project with your team. Learn about the upcoming project.
- Day 2: Discuss the programming principles for R or Python based on student questions from completed readings.
- Day 3: Open programming time to complete the project in R/Python. Submit your assignment in Github the following day.
- Day 4: Class discussion on being readings and introduction to second language principles.
- Day 5: Discuss the programming principles for the second language based on student questions from completed readings.
- Day 6: Open programming time to complete the project in the second language. Finalized project submission in both languages due before the next class period.
Class Data Project topics
- Project 1: Principles of visualization and Github
- Project 2: Importing and tidying data
- Project 3: Spatial data visualization and analysis
- Project 4: Machine Learning Introduction
- Project 5: Proving your point (SQL)
Personal projects
You don’t need to make these projects complicated. These projects are built to show your work using the skills you have developed during the course. I would make sure that these are presentable in your Github space. You want to demonstrate your creativity. You could use the following links to find a new data set. Please use the template provided in our Github Org
Course readings
We will leverage the R for Data Science book written by the RStudio employees. I have tweaked their book to demonstrate Python for Data Science as well.
Supplemental readings
We will leverage materials developed by other data scientists available on the internet. We will have two types of supplemental readings.
Course grading
Grading is a nasty side effect of mass learning and academia. We are in a class at a university and will have to manage this side effect. However, we don’t have to let it control our learning and thinking in this class. Learning and thinking should motivate each activity.
As we team, teacher and student, we have the challenge to become more! I have worked hard to identify the specifications needed for a data science programmer. Our goal is to align your grade with the skill specification you have mastered. In other words, the grade you want will determine how much work you will do. We will not score individual tasks in the class on a percentage scale. If your work meets the specified criteria, you will get full credit.
In a specifications-grading system, all tasks are evaluated on a high-standards pass/fail basis using detailed checklists of task requirements and expectations. You earn your letter grade by earning passing marks on a set of tasks. This system provides various choices and is closer to how learning and work occur in the real world. It will be easy for us to tell if work is complete, done in good faith, and consistent with the requirements.
Semester deliverables
- Completed LinkedIn and GitHub profiles.
- A grade request letter stating the key concepts and techniques you learned during our projects and your goals to continue learning in this area, including a grade request representing your knowledge and task completion.
- A resume that includes the skills you have learned during our projects.
- A semester task form that records your completed tasks during the semester.
- Personal project submissions on GitHub.
Competency scale
The following scale highlights the work that must be completed to warrant the listed grade. Half-step in your respective grade can be negotiated in your grade request letter if you don’t quite meet the specifications. We will have two coding challenges during the semester. The coding challenge will occur on 12/8.
A: I am a data scientist
- All four projects fully completed in both languages.
- Team lead presentation at least twice.
- Three independent data projects completed.
- Active participation in being readings.
- Semester deliverables completed.
- Score at least a 3 out of 4 on the final coding challenge.
B: Becoming a data scientist
- All four projects completed in at least one language and two completed in both.
- Team lead presentation at least once.
- Two independent data projects.
- Active participation in being readings.
- Semester deliverables completed.
- Score at least a 3 out of 4 on the final coding challenge.
C: I understand data scientists
- Three projects completed in at least one language and one completed in both.
- Team lead presentation at least once.
- One independent data project.
- Attendance at being readings.
- Semester deliverables completed.
- Score at least a 2 out of 4 on the final coding challenge.
D: I think I am awake
- One project completed in both languages.
Policies
Kennesaw University
- Accommodations: Any student with a documented disability or medical condition needing academic accommodations of class-related activities or schedules must contact the instructor immediately. Written verification from the KSU Student Disability Services ( https://sds.kennesaw.edu/index.phpl) is required. No requirements exist that accommodations be made before completion of this approved University documentation. All discussions will remain confidential.
- Academic Integrity Statement: http://scai.kennesaw.edu/codes.php
- Federal, BOR, & KSU Course Syllabus Policies: https://cia.kennesaw.edu/instructional-resources/syllabus-policy.php
- Resources: https://cia.kennesaw.edu/instructional-resources/syllabus-resources.php
COVID-19
All faculty, staff and students are strongly encouraged to receive a COVID-19 vaccine.
Based on guidance from the University System of Georgia (USG), all vaccinated and unvaccinated individuals are encouraged to wear a face covering while inside campus facilities. Unvaccinated individuals are also strongly encouraged to continue to socially distance while inside campus facilities, when possible.