This repository contains Terraform and Ansible code to set up AWS resources and deploy a JupyterHub instance using The Littlest JupyterHub (TLJH).
The infrastructure code here offers an alternative to the manual steps in TLJH's Installing on Amazon Web Services guide. While you can set up TLJH using the manual steps in the guide, you may enjoy the benefits of having your infrastructure setup in code if you are running a production instance of JupyterHub. It will be simple to create, replace, and destroy the infrastructure in a reproducible manner.
The downside of this approach is that you will need to install additional dependencies, whereas the manual approach from the guide is done entirely from the AWS console.
The code in this repository also sets up HTTPS, which is covered in a different TLJH guide: Enable HTTPS.
The process is broken down into steps:
- Set up infrastructure with Terraform
- Configure the domain DNS record
- Set up JupyterHub on the EC2 instance using Ansible
- Deny all access to SSH
When the process is complete, you will have a web service through which users can work on Jupyter notebooks without needing to set up Python and Jupyter on their own machines.
I set this up because I was curious what operationalizing JupyterHub could look like, but I also put in effort to make it general enough for others to customize it easily to their situation. If you find this useful, please consider letting me know (star or email); that would mean a lot to me. Feel free to send feedback or other thoughts.
If I were using this in a shared environment with real users, there's more that I would do. If something can be added to help you, let me know. I might find time to add it. Some potential features follow.
There needs to be backup and restoration in the event that something goes wrong. The most obvious option is snapshotting the EBS volume, but I would also investigate how JupyterHub can be backed up without a complete snapshot.
Users on JupyterHub can install packages into their own environments, but some packages are likely to be used by many users. In that situation, it makes sense to just have it installed once for everyone. TLJH has the guide Install conda / pip packages for all users, but I would investigate doing this in an Ansible playbook so that the environment is more reproducible.
For a fully-coded setup, I would manage DNS with Route53. To avoid extra complication for potential users, I didn't do that here. However, it might be nice to have optional configuration for Route53 users.
An option that would simplify DNS configuration (at least if you ever need to replace the EC2 instance) would be to attach an Elastic IP. This would allow the user to have a fixed IP address across EC2 instance redeployment.
In the past, Elastic IP addresses were free while in use. AWS, however, recently added a fee to in-use addresses. Given that, I did not think that there would be demand for using an Elastic IP and did not implement it.