Recently I had the opportunity to implement the jupyterhub solution in the Google cloud environment in a practical way. I had to go through all the stages of this small project adding more and more advanced issues.
You can find most of the information on Zero to jupyterhub web — the website I base myself on and I understand that this web provides a lot of the information.
We’ll use for this project Google Cloud, so let’s start from something very basic. Create a GoogleCloud account if you don’t have it yet.
Now is usefull to create new project for our jupyterhub.
Create a new project in Google Cloud
We do that selecting “New Project” button
and adding our Project Name
Once the new project is created we need to enable Kubernetes. The easiest way to do this is to go to the Kubernetes Engine:
then we goes to Clusters and here if the service is not enabled — we will have the opportunity to enable it.
OK — we are ready to create our cluster kubernetes solution. We can do this manually, via the Create Cluster button.
But to ensure repeatability, I recommend doing it through the command line.
For this we will use gcloud commands. Instructions to install the Google cloud SDK in your environment you can find here in official web.
Once installed, run the command:
and authenticate to enable the management of the Google cloud account from our computer.
In the next step we choose our newly created project from the list and if you want you can choose your zone — I ignore it for the moment and I will define it later.
Now we are ready to start.
As the basis of our project I chose two e2-standard-2 machines in the central part of the US and defined their name as jhubmedium. It costs about 50USD/month each, so this project costs 100USD/moth. If yopu need calculate your prices go to Google cloud pricing calculator. Let’s execute this command.
gcloud container clusters create \
--machine-type e2-standard-2 \
--num-nodes 2 \
--zone us-central1-a \
--cluster-version latest \
After about 3 minutes, we can see that our cluster is up and running.
Creating cluster jhubmedium in us-central1-a... Cluster is being health-checked (master is healthy)...done.
To inspect the contents of your cluster, go to: https://console.cloud.google.com/kubernetes/workload_/gcloud/us-central1-a/jhubmedium?project=jupyterhubmedium
kubeconfig entry generated for jhubmedium.
NAME LOCATION MASTER_VERSION MASTER_IP MACHINE_TYPE NODE_VERSION NUM_NODES STATUS
jhubmedium us-central1-a 1.18.14-gke.1200 34.70.XX.XX e2-standard-2 1.18.14-gke.1200 2 RUNNING
And we can see it also in our panel.
At this point, since we have the kubernetes cluster, we will need one more tool — it is kubectl. How to install it depends on your environment and everything is very well explained on their official website.
Once we have this tool operational we can check our nodes.
$ kubectl get node
NAME STATUS ROLES AGE VERSION
gke-jhubmedium-default-pool-9967fb45-7lh2 Ready <none> 8m v1.18.14-gke.1200
gke-jhubmedium-default-pool-9967fb45-hwjh Ready <none> 8m v1.18.14-gke.1200
Give your account permissions to perform all administrative actions needed.
kubectl create clusterrolebinding cluster-admin-binding \
At the moment we have a Google account set up, we have a Kubernetes cluster and we are ready to deploy the basic version of jupytehub.
You could install it step by step but there is a more convenient, tested and secure way to deploy the jupyterhub solution. It is a set of kubernetes configuration needed for jupyterhub to work. All these configurations are collected thanks to a tool called helm. You will need this tool and depending on your environment you can install it according to the instructions on the official website.
Once Helm is installed, we can download the project from jupyterhub. We do it like this
helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/
helm repo update
This should show output like:
Hang tight while we grab the latest from your chart repositories...
...Skip local chart repository
...Successfully got an update from the "stable" chart repository
...Successfully got an update from the "jupyterhub" chart repository
Update Complete. ⎈ Happy Helming!⎈
For the Helm project to work we need to set up a file with minimal data. Let’s call this file config.yaml. What we need at this point is simply a token.
Let’s generate this token
openssl rand -hex 32
and insert that token into the config.yaml file, so the file config.yaml will be like that:
At the moment I still have to define variables that I will use in the future. Let’s say that in linux we do:
These variables are important for future project management. And in this moment we can create our project with jupyterhub:
helm upgrade --cleanup-on-fail \
--install $RELEASE jupyterhub/jupyterhub \
--namespace $NAMESPACE \
After few minutes we can see our project ready
Release "jhub" does not exist. Installing it now.
LAST DEPLOYED: Mon Jan 25 15:22:38 2021
TEST SUITE: None
Thank you for installing JupyterHub!Your release is named jhub and installed into the namespace jhub.You can find if the hub and proxy is ready by doing:kubectl --namespace=jhub get podand watching for both those pods to be in status 'Running'.You can find the public IP of the JupyterHub by doing:kubectl --namespace=jhub get svc proxy-publicIt might take a few minutes for it to appear!Note that this is still an alpha release! If you have questions, feel free to
1. Read the guide at https://z2jh.jupyter.org
2. Chat with us at https://gitter.im/jupyterhub/jupyterhub
3. File issues at https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues
We can see all created pods
$ kubectl --namespace=jhub get pod
NAME READY STATUS RESTARTS AGE
continuous-image-puller-4jpft 1/1 Running 0 52s
continuous-image-puller-z5zbw 1/1 Running 0 52s
hub-8dfb7797f-pt5ft 1/1 Running 0 52s
proxy-79b56996cf-zk9dn 1/1 Running 0 52s
user-scheduler-599dd58d74-4c6vb 1/1 Running 0 51s
user-scheduler-599dd58d74-7b7cq 1/1 Running 0 52s
and we can see which IP has our proxy:
$ kubectl --namespace=jhub get svc proxy-public
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
proxy-public LoadBalancer 10.3.242.214 34.121.XX.XX 80:30738/TCP 43s
Now our basic jupyterhub is ready. If you go to the External-IP in your browser, you will see:
User and pass are whatever you want in this moment. So let’s go inside with user user and password pass, and our system will start creating resources
Let’s create notebook:
If we want to see what is inside our cluster we can do that also accessing through console panel. Workloads show us all podst created before and one new. Jupyter-<userName> is created for any user thas login ito our system.
In storage are two disks. One for our system and in this moment “claim-<userName>” for every user that login into our system
In compute Engine we can see “phisical” machines used for our cluster
And in Disk menu we see disks used for cluster, for hub and for persistent Disk for user that login into our system.
That’s all if you want to try creating a basic jupyterhub system. That’s what it’s for — to see if it works. Now comes the last part — how to delete it to free up the resources and then create a new — improved — version a bit more practical.
We can delete all cluster using gcloud command
gcloud container clusters delete jhubmedium--zone us-central1-a
or using Delete option in console
In next part we will see how to add some usefull components like shared drives or basic authentication. Also we’ll see howto add some interesting options in configuration of our jupyterhub project.