Multi-user notebooks on the Cloud

with JupyterHub & Kubernetes

Yuvi Panda

Develper @ Project Jupyter, Devops @ Data Science Education Program, UC Berkeley

Jupyter Notebooks

The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text.

LIGO example

These are great, but installing them locally, getting everything set up with all libraries is a pain point.

Also often you wanna use more / different resources than is available in your laptop - maybe a cluster you have access to.

JupyterHub Logo

A multi-user deployment of the notebook designed for companies, classrooms and research labs

  • Plug into whatever clustering system you are using with Spawners
    • Docker, Torque, GridEngine, SLURM, Kubernetes, Systemd, just processes, etc
  • Works across various authentication providers too
    • Kerberos, LDAP, Google, GitHub, Generic OAuth, MediaWiki, PAM, keep-passwords-in-google-sheets, etc
  • Gives users a web URL they can go to, log in and get a compute environment!

PAWS Demo</small>

The Zero to JupyterHub guide

https://z2jh.jupyter.org

An opinionated way to set up JupyterHub on any Kubernetes cluster that is easy to scale, maintain & upgrade.

Currently used by many workshops, ~1500 students at UC Berkeley's data science program, Wikimedia, etc.

  • Fully reproducible infrastructure - a YAML file fully captures the state of the cluster
  • Built to be continuously deployable, can do several deploys a day without interrupting users
  • Cloud / provider agnostic, easy to port between various cloud providers / on-prem
  • Properly layered, so when it does break it breaks in debuggable ways

Why Kubernetes?

  1. Provides very high level abstractions that seem non-leaky
  2. Abstracts away most of the underlying hardware while still allowing it to be taken advantage of
  3. Fairly self-healing once set up, so most faults are automatically fixed
  4. Has an amazingly fast growing community that is really friendly, diverse & quite innovative

Learn more at https://kubernetes.io

Demo

  1. Set up JupyterHub with a default environment
  2. Build a different environment with repo2docker, and use that for everyone
  3. Give each user guaranteed memory
  4. Resize the cluster up and down, as your needs change
  5. Convert a GitHub repository into a running Jupyter Notebook automatically
  6. Convert a GitHub repository into a running Jupyter Notebook on beta.mybinder.org