VScode for Data Science
Visual Studio Code (VScode) is taking over the open source code editing space. Even Rstudio is providing VScode in their production environments. This 2019 Stack Overflow survey pegs its usage at a 50% market share. It is fantastic for data science programming in Python and competes with Rstudio for programming in R.
Installation
The layout
VS Code comes with a simple and intuitive layout that maximizes the space provided for the editor while leaving ample room to browse and access the full context of your folder or project. The UI is divided into five areas:
- Editor - The main area to edit your files. You can open as many editors as you like side by side vertically and horizontally.
- Side Bar - Contains different views like the Explorer to assist you while working on your project.
- Status Bar - Information about the opened project and the files you edit.
- Activity Bar - Located on the far left-hand side, this lets you switch between views and gives you additional context-specific indicators, like the number of outgoing changes when Git is enabled.
- Panels - You can display different panels below the editor region for output or debug information, errors and warnings, or an integrated terminal. Panel can also be moved to the right for more vertical space. Each time you start VS Code, it opens up in the same state it was in when you last closed it. The folder, layout, and opened files are preserved.
Interactive Python
An open-source project called Jupyter is the standard method for interactive Python use for data science or scientific computing. However, there are some issues with its use in a development environment. VS Code has provided a way for us to have the best of Python and Jupyter Notebooks with their Python Interactive Window.
You will need to install the jupyter python package using pip
for the interactive Python window to work. The code chunk below will install
import sys
!{sys.executable} -m pip install jupyter pandas altair altair_saver numpy plotnine scikit-learn
Using the VS Code functionality, you will work with a standard .py
file instead of the .ipynb
extension typically used with jupyter notebooks. The Python extension in VS Code will recognize # %%
as a cell or chunk of python code and add notebook options to ‘Run Cell’ as well as other actions. You can see the code example bellow with the image of the view in VS Code as an example. Microsoft’s documentation goes into more detail (https://code.visualstudio.com/docs/python/jupyter-support-py).
# %%
msg = "Hello World"
print(msg)
# %%
msg = "Hello again"
print(msg)
Settings adjustments
To make the interactive window use more functional you can ctrl + ,
(cmd + ,
on a mac) to open the settings. From there you can alter a few defaults.
- Search ‘Send Selection to Interactive Window’ and make sure the box is checked. Now you will be able to use
shift + return
to send a selected chunk of code or an entire cell. - Search ‘Collapse Cell Input Code By Default’ and uncheck the box. Now, your code will show expanded in the interactive Python console history by default.
- Search ‘Always Scroll on New Cell’ and make sure the box is checked. Now, each time you run a chunk or command from your
.py
script the interactive window will scroll to the output.
R Extension for Visual Studio Code
Yuki Ueda’s R extension is the leading extension for using R within VScode. The extension’s wiki is a good guide. I recommend the following elements from their guide.
- R Extension for Visual Studio Code
- Radian for the R terminal
- httpgd for chart management
- Editing your VScode configuration
"Edit in settings.json"
{
"r.bracketedPaste": true,
"r.rterm.windows": "C:\\Users\\user\\AppData\\Local\\Programs\\Python\\Python37\\Scripts\\radian.exe"
"r.rterm.mac": "/usr/local/bin/radian"
}
Git and Github
Microsoft Github and is the lead developer of VScode. As such, they are actively working on integrating Github and VScode functionality (see Codespaces). We will use the Git source control manager (SCM) that comes with VScode. The Git support guide from VScode will suffice for our needs. We will only leverage a small portion of Git within our class.