Three Ways To Run Jupyter In Windows

· 27 Feb 2016

Let’s look at a few different ways to run jupyter notebooks in Windows.

The “Pure Python” Way

Make your way over to python.org, download and install the latest version (3.5.1 as of this writing) and make sure that wherever you install it, the directory containing python.exe is in your system PATH environment variable. I like to install it in the root of my C: drive, e.g. C:\Python35, so my PATH contains that directory.

Once that’s installed, you’ll want to create a virtual environment, a lightweight, disposable, isolated python installation where you can experiment and install 3rd party libraries without affecting your “main” installation. To do this, open up a Powershell window, and enter the following commands (where “myenv” is the name of the virtualenv we’re going to create, you can use any name you like for this):

PS C:\> python -m venv myenv
PS C:\> myenv\Scripts\activate

Then, let’s install jupyter and start up a notebook:

PS C:\> pip install jupyter
PS C:\> jupyter notebook

Incidentally, if you get a warning about upgrading pip, make sure to use the following incantation to upgrade (to prevent an issue on windows where pip is unable to upgrade its own executable in-place):

PS C:\> python -m pip install --upgrade pip

Advantages: Uses “pure” python, official tools, and no external dependencies. Well supported, with plenty of online documentation and support communities.

Disadvantages: While many popular data analysis or scientific python libraries can be installed by pip on windows (including Pandas and Matplotlib), some (for example SciPy) require a C compiler and the presence of 3rd party C libraries on the system which are difficult to install on Windows.

Who is it for? Python users comfortable with the command line and the tools that ship with Python itself.

The Python Distributions

Because of the difficulty mentioned above in getting packages like SciPy installed on Windows, a few commercial entities have put together pre-packaged Python “distributions” that contain most, if not all, of the commonly used libraries for data analysis and/or scientific computing.

Anaconda is an excellent option for this. Download their Python 3.5 installer for Windows, run it, and in your Start menu you’ll have a bunch of neat new tools, including an entry for Jupyter Notebook. Click to start it up and it’ll launch in the background and open up your browser to the notebook console. It doesn’t get any easier than that.

Advantages: Simplest, fastest way to get started and it comes with probably everything you need for your scientific computing projects. And anything it doesn’t ship with you can still instalAl via its built in conda package manager.

Disadvantages: No virtualenv support, although the conda package manager provides very similar functionality with the conda create command. Relies on a commercial 3rd party for support.

Who is it for? People who want the quickest, easiest way to get Jupyter notebook up and running (IE, most people).

Docker

Docker is a platform for running software in “containers”, or self-contained, isolated processes. While it may sound similar in concept to python virtual environments, Docker containers are an entirely different kind of technology offering vast flexibility and power. Don’t let the flexibility and power and confusing terminology put you off though – Docker can be easy to get up and running on your PC and has some advantages of its own with respect to Python and Jupyter.

To get started on Windows, download the Docker Toolbox, which contains the tools you need to get up and running. Run the installer and make sure the checkbox to install Virtualbox is checked if you don’t already have Virtualbox or another virtualization platform (like VMWare Workstation) installed.

Once installed, you’ll have a “Docker Quickstart Terminal” shortcut in your Start Menu. Double click that shortcut and it will create your first Docker engine for you and set up everything you need automatically. Once you see a prompt in the terminal, you can use the docker run command to run Docker “images”, which you can think of as pre-packaged bundles of software that will be automatically downloaded from the Docker Hub when you run them. There are many images on Docker Hub that offer Jupyter, including the official Jupyter Notebook image, and Anaconda itself if you want the full SciPy stack.

To run just the official Jupyter Notebook image in your Docker engine, type the following into the Docker Quickstart Terminal:

$ docker run --rm -it -p 8888:8888 -v "$(pwd):/notebooks" jupyter/notebook

After all the image’s “layers” are downloaded, it will start up. Make a note of the IP address listed in the terminal (mine is usually 192.168.99.100), and point your browser at that IP address, port 8888 (e.g. http://192.168.99.100:8888) and you’ll see the familiar Jupyter console, with both Python 2 and Python 3 kernels available.

Advantages: Use the flexibility and power of Docker! Honestly one of my favorite things about Docker is thinking of it as an open software distribution platform for things like the SciPy stack that are hard to install.

Disadvantages: Grapple with the flexibility and power of Docker! There are quite a few “gotchas” to be aware of when dealing with Docker, such as immutable containers, data volumes, arcane commands, and rapidly developing, occasionally buggy tooling.

Who is it for? Users who either already are comfortable with Docker or are willing to dive into bleeding edge technology :)

Conclusion

For the work I do, where Jupyter running Python 3 notebooks with Pandas and SqlAlchemy is enough, I prefer to use the “pure Python” method, because the tools are well understood and well supported, and a tremendous amount of work has been done by the Python community to make the tools work well on Windows. And if I ever am working on a large enough data set that my laptop alone can’t handle it, using Docker to run my notebooks on cloud providers’ platforms is wonderfully easy.

That said, if you are coming into this just looking to use Python to tackle a data analysis or scientific computing problem and want to get started with minimal fuss, a distribution like Anaconda is without a doubt the fastest way to get started.