P1D1: Workflow and Github
Finishing up technology installation
Does everyone have R, Python, Git, and VScode installed?
Git(hub) workflow
Git
Download and install
- Download git.
- Make sure git is working on your computer (https://git-scm.com/) A. Mac fix with paths
B. Download Xcode and update 10 gig download.
Configure Git
We have to set some configurations.
- Open the Terminal in VS Code.
- Set your username:
git config --global user.name 'FIRST_NAME LAST_NAME'
- Set your email address:
git config --global user.email 'MY_NAME@example.com'
The Git workflow
Folder management suggestions on your computer
I don’t think you should organize your computer files by course. Especially with course numbers. If you need to have your information in class folders, think about shortcuts to your git folders.
- Git repos are usually connected to Github so don’t have them in your OneDrive, Google Drive, or iCloud.
- I store all my Github repos within a
git
folder with each organization having its own folder. - We are going to have
>9
repositories cloned to our computers for this class.
Github
Make sure you realize that GitHub is key to your employment as a Data Scientist.
This is GitHub, the world’s largest code repository platform online. A platform used by some 50 million software developers to host their coding projects, most of them open-source — meaning others can access their codes and modify them to create better versions if they feel like.
Most of the internet is produced or hosted on GitHub in the form of code. “What Gmail is to email, GitHub is to writing software,” says Kiran Jonnalagadda, cofounder of HasGeek, a platform to build and discover peer groups.
Read more here.
A key differentiator between an analyst and a data scientist
It signals that you are a programmer as well as an analyst.
Github is our version control and we have everything on Github. Definitely having strong git experience is very helpful. The way my team is using it is through forking. We fork the main file and then pull from and to it to update the code.
Keaton Sant, Data Scientist at John Deere
Is it going to hurt?
Yes.
It feels weird at first but quickly becomes second nature. More bad news. Our pain will be short-lived because students primarily work in their own repositories. Do you use GitHub to work with other people or to coordinate your own work from multiple computers? If so, after you recover from the initial setup, Git will crush you again with merge conflicts. And this is not one-time pain, this could be a dull ache for a long time. The best remedy is prevention, but also understanding how to back out of tricky situations and tackle them on your own terms.
Managing a project via Git/GitHub is much more like the Google Doc scenario and enjoys many of the same advantages. It is definitely more complicated than collaborating on a Google Doc, but this puts you in the right mindset. ref
Github and education guidelines
- Don’t post assignments
- Do post unique code and projects using skills from your classes
- Use private repos with student education account to manage your course work
- Use it to communicate
Managing your Github space
If you are trying to get a job, then your Github space should be organized. Take the time to make this space your coding ‘social media’ where people see the best side of your work.
- Make your landing page stand out by Managing your profile README. Use this guide for additional inspiration.
- Track your work and share it with the world.
- Organize and document your repositories. Here are some great examples
- Find a project you could support (long term goal).
Github’s other tools
Github desires to be the social communication tool for coders reference. Versioning and sharing code is the core. However, ignoring the other available tools is not wise.
- Github pages
- Project and Organization Wikis (D3 Example)
- Issues
- Discussions
- Projects
- Github Actions (I use the peaceiris action for hugo for our data science programming course at BYUI The R for Data Science book does as well)
Your peronsal data projects work flow
You don’t need to make these projects complicated. These projects are built to show your work using the skills you have developed during the course. I would make sure that these are presentable in your Github space. You want to demonstrate your creativity. You could use the following links to find a new data set. __Please use the template provided in our Github Org
Let’s start our space for our first personal data project
- Go to our org’s personal project template
- Click on the green button that says
Use this template
. - Pick a location. Use your space as the owner.
- Name your project
[title]-[lastname]-[languagesused]
. You can change this later. - Decide on public or private. I recommend public. You can also change this later.
- Clone this project to your local computer using Github.
Our class projects workflow
- Go to our first class project
- Click on the green button that says
Use this template
. - Pick a location. Use our org as the owner.
- Name your project
p1-[lastname]
. You can change this later. - Make this repo private.
- Add the class as viewers to your repo.
- Clone this project to your local computer using Github.