-
Notifications
You must be signed in to change notification settings - Fork 801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add information about using NFS with z2jh #421
Comments
@yuvipanda I'd like to be a guinea pig on this. I am trying to setup a persistent EFS volume and use that as the user storage. So far I've created and applied:
After that I added the following to my config.yaml:
I am pretty sure I am missing a few key ideas here Edit: |
w00t, thanks for volunteering :) The other two things to keep in mind are:
|
I'll take a look through the nfs-flex-volume repo |
@cam72cam another way to check that everything works right now except for permissions is to set: singleuser:
uid: 0
fsGid: 0
cmd:
- jupyterhub-singleuser
- --allow-root If you can launch servers with that, then we can confirm that the uid situation is the only problem. |
I am currently getting the following response:
I suspect it has to do with:
Actually:
it appears that subpath is not being set correctly |
So it turns out specifying subPath manually of |
The PVC needs to be in the same namespace as JupyterHub, so the pods can find it. |
The PVC needs to be told how to find the PV to match, and this is done by using:
So
Things to note:
|
Ok, I've got the mount working. I did not do the label stuff yet, simply set 'storageClassName: ""' in the claim. That seemed to work just fine. I ran into a speed bump where I had to change the security groups to allow access from EC2 to EFS. As a temporary measure I added both the EFS volume and the EC2 instances to the "default" security group. Eventually part of the initial kops config should add the correct security groups. I am now getting a permission error: I am going to try to change the permissions on the EFS drive first, and if that does not work try the root hack that @yuvipanda mentioned EDIT: A manual chown on the EFS volume to 1000:1000 seems to have worked! |
EFS Success! Process: Setup an EFS volume. It must be in the same VPC as your cluster. This can be changed in the AWS settings after it has been created. Created test_efs.yaml
Created test_efs_claim.yaml
kubectl --namespace=cmesh-test apply -f test_efs.yaml The sizes in these files don't mean what you think. There is no quota enforced with EFS **. In the future we want to set the efs PersistentVolume size to something ridiculously large like 8Ei and the PersistentVolumeClaim to 10GB (neither matters AFAICT). This is my rough understanding and could be incorrect. A PersistentVolume defines a service which can perform a mount inside of a container. The PersistentVolumeClaim is a way of reserving a portion of the PersistentVolume and potentially locking access to it. The storageClassName setting looks innocuous, but it is incredibly critical. The only non storage class PV in the cluster is the one we defined above. In the future we should tag different PV's and use tag filters in the PVC instead of relying on a default of "". We are going to configure jupyterhub to use the same "static" claim among all of the containers*** . This means that all of our users will be using the same EFS share which should be able to scale as high as we need. We now add the following to config.yaml
type static tells jh not to use a storage class and instead use a PVC defined below. It turns out there is a bug in jupyterhub where the default subPath does not work, and setting the subPath to "{username}" breaks in the same way. At this point if we tried to start our cluster, it would fail. The directory created on the mount at subPath will be created with uid:0 and gid:0. This means that when jupyter hub is launched it won't be able to create any files, will complain, then self destruct. What we need to do is tell the container to run our jupyterhub setup as root, then switch to the Jovian user before starting the jupyterhub process. When we are running as root we can do our own chown to adjust the created directory permissions. First we merge the following to our config.yaml
This tells jupyterhub to enter the container as root and run the start-singleuser script. Start-singleuser calls a helper start.sh script which we will use later on. This will get jupyter hub to provision the container and attempt to start, but the process will still fail as the chmod has not taken place. In order for us to have a properly chowned directory in /home/Jovian mounted from $EFS_ROOT/home/{username}, we need to create our own docker container***. Here are some terse steps:
The base dockerfile came from 967b2d2#diff-aed8b29ee8beb1247469956c481040c2 Merge the following into config.yaml
You may be able to do a helm upgrade, but I ended up purging and reinstalling via helm just to be safe. At this point you should be all set with a semi-fragile (but functional) EFS backed jupyterhub setup Debugging tools: (all with --namespace=)
|
Thank you for getting this working, @cam72cam! To summarize, this mostly works, except for the issue of permissions:
It'll be great if we can fix EFS or Kubernetes to have options around 'what user / group / mode should this directory be created as?' |
Could we add the chown hack I put in my image to the start.sh script in the stacks repo? Would that break any existing users setups? |
I ran into an issue when trying to get this to work with a NFS server which I did not have direct control over (EFS). As part of the PersistentVolumeClaim, there is no easy way to set the UID and GID of the created directory.on the networked FS. My only concern with this chown is that some user out there might be running jupyterhub in an odd configuration where $NB_USER is not supposed to have these exact permissions on the storage. I think this is quite unlikely, but it is worth mentioning. I chronicled my experiences with working around this issue and setting up z2jh on EFS in jupyterhub/zero-to-jupyterhub-k8s#421 with @yuvipanda.
I ran into an issue when trying to get this to work with a NFS server which I did not have direct control over (EFS). As part of the PersistentVolumeClaim, there is no easy way to set the UID and GID of the created directory.on the networked FS. My only concern with this chown is that some user out there might be running jupyterhub in an odd configuration where $NB_USER is not supposed to have these exact permissions on the storage. I think this is quite unlikely, but it is worth mentioning. I chronicled my experiences with working around this issue and setting up z2jh on EFS in jupyterhub/zero-to-jupyterhub-k8s#421 with @yuvipanda.
There are a few issues in Kubernetes which seem to disagree on whether Kubernetes should set the permissions on subpaths or not: |
I figure doing the chown ourselves resolves it for now (behind a config setting) and can be removed once K8S finalizes if/how/when the subpath permissions should be set |
Just confirmed the fix in my test env |
I was able to use my NFS-backed persistent claim on Google Cloud as the user storage by following the steps @cam72cam outlined, so I can attest his solution. Thanks for paving the way guys! |
Just to clarify, do we do a helm installation first using a config file with the start-singleuser.sh command inserted; and then do a helm upgrade using an updated config file with singleuser image? |
Either should work, though I'd recommend a clean install just to be safe. |
Hey all - it sounds like there's some useful information in this thread that hasn't made its way into Z2JH yet. Might I suggest that either:
What do folks think? |
We are currently doing an internal alpha with a setup similar to the one mentioned above and working out any minor issues which come up. @mfox22 I'd be up for either, what do you think? My biggest concern with how it works at the moment is that a clever user could look at the system mounts and figure out how to do a userspace nfs mount with someone else's directory. I think we could get around that with configuring the PV differently, but I still have a lot to learn in regards to that. |
Well if this is "useful functionality that may introduce some bugs because it's in an 'alpha' state" kinda functionality, maybe a blog post kinda thing is better? One reason we added https://zero-to-jupyterhub.readthedocs.io/en/latest/#resources-from-the-community was to make it easier for people to add more knowledge to the internet without needing to be "official" z2jh instructions. :-) If you like, I'm happy to give a writeup a look-through, and if it ever gets to a point that your team is happy with, we can revisit bringing it into Z2JH? |
@albertmichaelj So, I don't think you've tried my solution. I have the same requirement as yours, mounting one PVC multiple times on the same pod. How I got the name home is when you execute |
@fifar I thought I had tried your solution, but I didn't quite understand what you were suggesting. Just for sake of posterity, I'll lay out the problem I was having. My initial config was (basically) this: storage:
type: "static"
static:
pvcName: "nfs-home"
subPath: 'home/{username}'
extraVolumes:
- name: home
persistentVolumeClaim:
claimName: "nfs-home"
extraVolumeMounts:
- name: home
mountPath: "/mnt/data"
subPath: "shared_data" When I had this, I got the following error message when I try to spawn the pod:
basically saying that I had a duplicate resource (which I did, two things named storage:
type: "static"
static:
pvcName: "nfs-home"
subPath: 'home/{username}'
extraVolumeMounts:
- name: home
mountPath: "/mnt/data"
subPath: "shared_data" it works! The problem is that I had the This now works great, and I can use a single PV and PVC for as many mounts as I'd like! I had been creating dozens of PVs and PVCs (which is not that hard to do with a helm template, but it is annoying) in order to mount multiple shared volumes (groups, data, whole class, etc...). This is much more elegant. Thanks @fifar. |
@albertmichaelj Good to know it works for you. The config without |
Hi, I was also struggling to get z2jh up and running on prem, and inspired by: My solution was to helm install the and make it the default storage class on my kubernetes cluster. Specific steps:
I hope that helps others as well. Best regards, PS1: This could help here: PS2: Similar issue here: PS3: This solution works when you have your own NFS server deployed. If not, you should try this other chart instead: |
Thanks for everyone who commented on this. I was having issues long login times when using Jupyterhub on Azure Kubernetes Service (AKS), and was able to take the login times from two minutes to 20 seconds by using NFS. For any who are interested, here's how I did it:
|
@akaszynski was the delay related to attaching the default PVCs that were dynamically created by JupyterHub, or perhaps on its creation? Did you notice this big time difference between first login ever and the second login, or was there a big time delay even on second login and jupyterhub user pod startups? |
@consideRatio: The delay was for both. Maybe 1-2 minutes to create, 30-60 seconds to mount. When the end user expects things to load near instantly, having the progress bar hang there will drive them crazy! I've even moved the hub-db-dir over to nfs as it was taking forever create the hub pod everytime I upgraded the cluster:
|
@akaszynski thanks for sharing this insight! ❤️ |
This issue has been mentioned on Jupyter Community Forum. There might be relevant details there: https://discourse.jupyter.org/t/efs-multi-user-home-directories/3032/3 |
Just finished setting up user NFS storage and a shared NFS folder based on this excellent issue and the CHOWN stuff in base image (thanks @cam72cam !). Just checking in to see if this is still being worked on from either the K8s perspective or making it simpler in Z2JH setup? I plan on writing a blog on my experience just because there were small things that took me forever to figure out (like running |
As far as I know no-ones actively working on it. The problem with NFS is it's flexibility- there are lots of ways to set it up, ranging from managed public cloud fileservers to NFS provisioners you manage yourself, so it'd be difficult to add something in the Z2JH chart. It'd be nice to make the documentation clearer though. Perhaps a good start for someone would be to collate all the information in this issue as it's grown quite a lot and there are several configurations mentioned. One option might be a discourse post? I think it's possible to create a editable wiki post (@consideRatio ?). |
Yepp! The first "post" in a "topic" can be made into a "wiki post" by users of a certain trust level. This trust is gained after being registered for 24 hours on the forum. |
great! I am doing a write up internally, I'll take a crack at condensing that into a topic on discourse this weekend. Thanks! |
@josibake ❤️ thanks for taking the time to work on this, I think it would be very valuable summarizing this! |
This issue has been mentioned on Jupyter Community Forum. There might be relevant details there: https://discourse.jupyter.org/t/multi-nfs-mount-points/4610/2 |
This issue has been mentioned on Jupyter Community Forum. There might be relevant details there: https://discourse.jupyter.org/t/additional-storage-volumes/5012/2 |
This issue has been mentioned on Jupyter Community Forum. There might be relevant details there: https://discourse.jupyter.org/t/custom-dockerimage-for-jupyterhub-on-kubernetes/6118/6 |
This issue has been mentioned on Jupyter Community Forum. There might be relevant details there: https://discourse.jupyter.org/t/warning-unable-to-mount-volumes-for-pod/6404/2 |
i am still running into this issue on AWS EFS with the https://github.com/kubernetes-sigs/aws-efs-csi-driver . when the
and mounted in the container as i was expecting i'm using the config discussed in this issue and in the z2jh documentation.
|
following up on a tip from @consideRatio on the gitter channel i have finally been able to get this operational using initContainers. relevant storage and initiContainers section here: the |
This issue has been mentioned on Jupyter Community Forum. There might be relevant details there: https://discourse.jupyter.org/t/sql-operationalerror-with-jh-default-config/12207/5 |
I love this. Because it let us to launch jupyterhub with normal user, not root. I think this should be the recommend way. And I think it is better to use
|
This issue has been mentioned on Jupyter Community Forum. There might be relevant details there: |
NFS is still a very popular storage setup, and is a good fit for use with z2jh in several cases:
While we don't want to be on the hook for teaching users to setup and maintain NFS servers, we should document how to use an NFS serer that already exists.
The text was updated successfully, but these errors were encountered: