Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 [BUG]: Cowbird is not backward compatible with existing Jupyter users #425

Open
tlvu opened this issue Feb 17, 2024 · 14 comments · May be fixed by #480
Open

🐛 [BUG]: Cowbird is not backward compatible with existing Jupyter users #425

tlvu opened this issue Feb 17, 2024 · 14 comments · May be fixed by #480
Assignees
Labels
bug Something isn't working

Comments

@tlvu
Copy link
Collaborator

tlvu commented Feb 17, 2024

Summary

Activating Cowbird with existing Jupyter users have many road blocks. This is in contrast with the usual "just enable the new component in env.local and it should play nice with all existing components" message we are trying to convey in the stack.

A migration guide for system with existing Jupyter users would have been helpful.

Below are the various problems I faced so far and any work-around I was able to find. Will add more to this list as I try out Cowbird.

Details

  • For each existing Jupyter users, /data/user_workspaces/$USER have to be manually created

    Otherwise this error in docker logs jupyterhub: [E 2024-01-16 15:30:36.478 JupyterHub user:884] Unhandled error starting lvu's server: The user lvu's workspace doesn't exist in the workspace directory, but should have been created by Cowbird already.

  • Conflict with the existing poor man's public share

    If the poor man's public share in

    #export JUPYTERHUB_CONFIG_OVERRIDE="
    #
    # Sample below will allow for sharing notebooks between Jupyter users.
    # Note all shares are public.
    #
    ### public-read paths
    #
    ## /data/jupyterhub_user_data/public-share/
    #public_read_on_disk = join(jupyterhub_data_dir, 'public-share')
    #
    ## /notebook_dir/public/
    #public_read_in_container = join(notebook_dir, 'public')
    #
    #c.DockerSpawner.volumes[public_read_on_disk] = {
    # 'bind': public_read_in_container,
    # 'mode': 'ro',
    #}
    #
    ### public-share paths
    #
    ## /data/jupyterhub_user_data/public-share/{username}-public
    #public_share_on_disk = join(public_read_on_disk, '{username}-public')
    #
    ## /notebook_dir/mypublic
    #public_share_in_container = join(notebook_dir, 'mypublic')
    #
    #c.DockerSpawner.volumes[public_share_on_disk] = {
    # 'bind': public_share_in_container,
    # 'mode': 'rw',
    #}
    #
    ### create dir with proper permissions
    #
    #def custom_create_dir_hook(spawner):
    # username = spawner.user.name
    #
    # perso_public_share_dir = public_share_on_disk.format(username=username)
    #
    # for dir_to_create in [public_read_on_disk, perso_public_share_dir]:
    # if not os.path.exists(dir_to_create):
    # os.mkdir(dir_to_create, 0o755)
    #
    # subprocess.call(['chown', '-R', '1000:1000', public_read_on_disk])
    #
    # # call original create_dir_hook() function
    # create_dir_hook(spawner)
    #
    #c.Spawner.pre_spawn_hook = custom_create_dir_hook
    #"
    is enabled, then we have to set PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR in env.local to a different value than public.

    Otherwise this error when spawning a new Jupyterlab server: Spawn failed: 500 Server Error for http+docker://localhost/v1.43/containers/2239816099ea7b8bf440b76fc0a1d4a43248bb1e5073fc043ef1c1062cdd3cff/start: Internal Server Error ("failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/data/user_workspaces/public/wps_outputs" to rootfs at "/notebook_dir/public/wps_outputs": mkdir /pvcs1/var-lib/docker/overlay2/ec7672b5d034e55d21465dd1e41c0333e0c5db2adb2dcec9f0f2a37bb968fe10/merged/notebook_dir/public/wps_outputs: read-only file system: unknown").

    See 🐛 [BUG]: jupyterlab server fails to spawn due to read-only volume mount #392 (comment)

  • Content of /notebook_dir/writable-workspace for all existing Jupyter users seem to have disappeared

    This is because without Cowbird enabled, /notebook_dir/writable-workspace is binded to /data/jupyterhub_user_data/$USER. But with Cowbird enabled, /notebook_dir/writable-workspace is binded to /data/user_workspaces/$USER, which is a new dir that is empty.

    No work-around found so far.

To Reproduce

Steps to reproduce the behavior:

  1. Use birdhouse-deploy at any versions before 2.0.0
  2. Enable the poor man's public share in env.local by uncommenting this section
    #export JUPYTERHUB_CONFIG_OVERRIDE="
    #
    # Sample below will allow for sharing notebooks between Jupyter users.
    # Note all shares are public.
    #
    ### public-read paths
    #
    ## /data/jupyterhub_user_data/public-share/
    #public_read_on_disk = join(jupyterhub_data_dir, 'public-share')
    #
    ## /notebook_dir/public/
    #public_read_in_container = join(notebook_dir, 'public')
    #
    #c.DockerSpawner.volumes[public_read_on_disk] = {
    # 'bind': public_read_in_container,
    # 'mode': 'ro',
    #}
    #
    ### public-share paths
    #
    ## /data/jupyterhub_user_data/public-share/{username}-public
    #public_share_on_disk = join(public_read_on_disk, '{username}-public')
    #
    ## /notebook_dir/mypublic
    #public_share_in_container = join(notebook_dir, 'mypublic')
    #
    #c.DockerSpawner.volumes[public_share_on_disk] = {
    # 'bind': public_share_in_container,
    # 'mode': 'rw',
    #}
    #
    ### create dir with proper permissions
    #
    #def custom_create_dir_hook(spawner):
    # username = spawner.user.name
    #
    # perso_public_share_dir = public_share_on_disk.format(username=username)
    #
    # for dir_to_create in [public_read_on_disk, perso_public_share_dir]:
    # if not os.path.exists(dir_to_create):
    # os.mkdir(dir_to_create, 0o755)
    #
    # subprocess.call(['chown', '-R', '1000:1000', public_read_on_disk])
    #
    # # call original create_dir_hook() function
    # create_dir_hook(spawner)
    #
    #c.Spawner.pre_spawn_hook = custom_create_dir_hook
    #"
  3. Create a Jupyter user via Magpie
  4. Login to JupyterHub and create some data under writable-workspace
  5. Update birdhouse-deploy to any version after 2.0.0 where Cowbird is enabled by default
  6. Re-enable any components that is not enabled by default anymore in env.local, ex: ./components/jupyterhub

Environment

Information Value
Server/Platform URL My dev PAVICS stack
Version Tag/Commit 2.0.5
Related issues/PR #392
Related components Jupyter, Cowbird, possibly Weaver, Magpie
Custom configuration

Concerned Organizations

@fmigneault @ChaamC @Nazim-crim @mishaschwartz @eyvorchuk

@fmigneault
Copy link
Collaborator

For each existing Jupyter users, /data/user_workspaces/$USER have to be manually created

Otherwise this error in docker logs jupyterhub: [E 2024-01-16 15:30:36.478 JupyterHub user:884] Unhandled error starting lvu's > server: The user lvu's workspace doesn't exist in the workspace directory, but should have been created by Cowbird already.

This looks like the volume mounted as /data/user_workspaces could be owned by root or some other user that the internal jupyter spawner user cannot get sufficient permissions to create the user-specific workspace, or that /data/user_workspaces/$USER already exists, but has higher/root owner, such that jupyter cannot do the chown command, and therefore Cowbird will fail any following step since it uses the same UID:GID. Same applies for /data/jupyterhub_user_data/ and /data/jupyterhub_user_data/$USER.

Just a wild guess. The order by which the volumes are created could be the source of the root owner. Since there is a step for jupyter persistence volume creation, it might not play nice with docker-compose configuration that would auto-create volume mount locations (as root) if they do not exist.

The creation is performed by this hook:

def create_dir_hook(spawner):
username = spawner.user.name
jupyterhub_user_dir = join(jupyterhub_data_dir, username)
if not os.path.exists(jupyterhub_user_dir):
os.mkdir(jupyterhub_user_dir, 0o755)

https://github.com/bird-house/birdhouse-deploy/blob/master/birdhouse/components/jupyterhub/jupyterhub_config.py.template#L173

Note that care should be taken with overrides if they play with similar properties:

@mishaschwartz
Copy link
Collaborator

This is the same issue as #392

  • first {notebook_dir}/public gets created in read-only mode

#public_read_in_container = join(notebook_dir, 'public')
#
#c.DockerSpawner.volumes[public_read_on_disk] = {
# 'bind': public_read_in_container,
# 'mode': 'ro',
#}

  • then {notebook_dir}/public/wps_outputs can't be created later because the containing folder is read-only:

c.DockerSpawner.volumes[join(os.environ['WORKSPACE_DIR'], os.environ['PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR'])] = {
"bind": join(notebook_dir, os.environ['PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR']),
"mode": "ro"
}

I don't know why the Dockerspawner decides to create them in that order but that's how it's done consistently.

@tlvu
Copy link
Collaborator Author

tlvu commented Feb 21, 2024

I don't know why the Dockerspawner decides to create them in that order but that's how it's done consistently.

I am happy it is consistent. The worst kind of problems are intermittent ones.

But I think the sequence is appropriate. {notebook_dir}/public is the parent dir so it is volume-mounted first. Then {notebook_dir}/public/wps_outputs volume-mount follows because it is the child dir. But since the parent dir is read-only, volume-mount of the child dir errors out because it can not create the mount point. This makes sense.

@tlvu
Copy link
Collaborator Author

tlvu commented Feb 21, 2024

For each existing Jupyter users, /data/user_workspaces/$USER have to be manually created
Otherwise this error in docker logs jupyterhub: [E 2024-01-16 15:30:36.478 JupyterHub user:884] Unhandled error starting lvu's > server: The user lvu's workspace doesn't exist in the workspace directory, but should have been created by Cowbird already.

This looks like the volume mounted as /data/user_workspaces could be owned by root or some other user that the internal jupyter spawner user cannot get sufficient permissions to create the user-specific workspace,

This is a reasonable hint but should not happen since the jupyterhub container runs as root so it can mkdir and chown all the paths it needs before spawning the Jupyterlab server container.

or that /data/user_workspaces/$USER already exists

No, the error happens only when that dir do not exist yet. If I manually create it before spawning the Jupyter server (which is my documented work-around), the error is gone and we can spawn the Jupyter server successfully.

The order by which the volumes are created could be the source of the root owner. Since there is a step for jupyter persistence volume creation.

No, Jupyterhub persistance data-volume is for the sessions tokens only. User data are not data-volume but direct volume-mount from disk.

@mishaschwartz
Copy link
Collaborator

For each existing Jupyter users, /data/user_workspaces/$USER have to be manually created

Isn't this just because the webhook action that creates the directory is only triggered when the user is created:

- name: cowbird_create_user
action: create_user

And the user is already created so the webhook isn't triggered (see: https://pavics-magpie.readthedocs.io/en/latest/configuration.html#webhook-user-create)

@fmigneault
Copy link
Collaborator

fmigneault commented Feb 21, 2024

This code was added to consider the situation where the user already exists, and no webhook would be triggered.

if not os.path.exists(jupyterhub_user_dir):
os.mkdir(jupyterhub_user_dir, 0o755)
subprocess.call(["chown", "-R", f"{os.environ['USER_WORKSPACE_UID']}:{os.environ['USER_WORKSPACE_GID']}",
jupyterhub_user_dir])

I'm not sure why it doesn't resolve the same way as when the directory is manually created.

Could it be that jupyterhub tries to mount the volumes before c.Spawner.pre_spawn_hook gets called? Somewhat counter-intuitive name if so.

c.Spawner.pre_spawn_hook = create_dir_hook

@fmigneault
Copy link
Collaborator

Does adding a mkdir here fix it instead of raising?

if not os.path.exists(workspace_user_dir):
raise FileNotFoundError(f"The user {username}'s workspace doesn't exist in the workspace directory, "
"but should have been created by Cowbird already.")

@tlvu
Copy link
Collaborator Author

tlvu commented Feb 22, 2024

This code was added to consider the situation where the user already exists, and no webhook would be triggered.

if not os.path.exists(jupyterhub_user_dir):
os.mkdir(jupyterhub_user_dir, 0o755)
subprocess.call(["chown", "-R", f"{os.environ['USER_WORKSPACE_UID']}:{os.environ['USER_WORKSPACE_GID']}",
jupyterhub_user_dir])

This code (mkdir + chown) was there already before Cowbird was added to the stack and I can confirm it works fine on /data/jupyterhub_user_data/. It is really odd that switching to /data/user_workspaces/ it does not work anymore.

Below old code with existing mkdir + chown:

def create_dir_hook(spawner):
username = spawner.user.name
user_dir = join(jupyterhub_data_dir, username)
if not os.path.exists(user_dir):
os.mkdir(user_dir, 0o755)
subprocess.call(["chown", "-R", "1000:1000", user_dir])

Is it possible Cowbird volume-mount /data/user_workspaces/ read-only which makes Jupyterhub unable to write to it? This is still weird since Jupyterhub has root access, it should be able to write to any paths it sees.

Does adding a mkdir here fix it instead of raising?

if not os.path.exists(workspace_user_dir):
raise FileNotFoundError(f"The user {username}'s workspace doesn't exist in the workspace directory, "
"but should have been created by Cowbird already.")

Or maybe adding a symlink instead, see this comment?

# Case for the cowbird setup, where the workspace_dir contains a symlink to the jupyterhub dir.
# The jupyterhub dir must also be mounted in this case.

@tlvu
Copy link
Collaborator Author

tlvu commented Feb 22, 2024

For each existing Jupyter users, /data/user_workspaces/$USER have to be manually created

Isn't this just because the webhook action that creates the directory is only triggered when the user is created:

- name: cowbird_create_user
action: create_user

And the user is already created so the webhook isn't triggered (see: https://pavics-magpie.readthedocs.io/en/latest/configuration.html#webhook-user-create)

Oh interesting. How does this hook knows to create a new dir or symlink to an existing /data/jupyterhub_user_data/$USER dir?

@fmigneault
Copy link
Collaborator

fmigneault commented Feb 22, 2024

The Magpie Webhook registered to occur on create_user is sent to Cowbird's /webhooks/users endpoint with event created when the action happens (see https://pavics-magpie.readthedocs.io/en/latest/configuration.html#config-webhook-actions for all available Magpie Webhooks and when they trigger). Each active Cowbird handler in https://github.com/bird-house/birdhouse-deploy/blob/13645f324c1bcef3decd91ba8a5462862b1e8d5a/birdhouse/components/cowbird/config/cowbird/config.yml.template that implements user_created is then called. For the user-workspace, that happens here: https://github.com/Ouranosinc/cowbird/blob/e2aa5337e32cd87efb5600f3fe62882d8d4d8b1f/cowbird/handlers/impl/filesystem.py#L118

@mishaschwartz
Copy link
Collaborator

Does adding a mkdir here fix it instead of raising?

Yes that should solve the problem (when old users were created before cowbird was enabled)

@mishaschwartz
Copy link
Collaborator

We can solve the issue of having read-only volumes mounted on top of each other by changing the location of one or the other.
I would recommend changing this line:

#public_read_in_container = join(notebook_dir, 'public')

to:

 #public_read_in_container = join(notebook_dir, 'public-shared') 

Or similar.

I also think it would be a good idea to move this code out of env.local.example and into an optional component.

@tlvu
Copy link
Collaborator Author

tlvu commented Feb 27, 2024

We can solve the issue of having read-only volumes mounted on top of each other by changing the location of one or the other. I would recommend changing this line:

#public_read_in_container = join(notebook_dir, 'public')

to:

 #public_read_in_container = join(notebook_dir, 'public-shared') 

Or similar.

Yes, or export PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR=somethingelse works and it can default to something else than public. Note PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR might already works properly. I just did not have time to confirm.

Same idea, both sharing solutions have their own public folder so they do not step on each other foot.

I also think it would be a good idea to move this code out of env.local.example and into an optional component.

Yes ! At the beginning, I thought about using this as a live example of how env.local can be used to extend JupyterHub config. Retrospectively, it should have been an optional-component because it has been very useful for us, could benefits other.

@tlvu
Copy link
Collaborator Author

tlvu commented Feb 27, 2024

Does adding a mkdir here fix it instead of raising?

Yes that should solve the problem (when old users were created before cowbird was enabled)

Should it be creating the dir or the symlink? See comment in code

# Case for the cowbird setup, where the workspace_dir contains a symlink to the jupyterhub dir.
# The jupyterhub dir must also be mounted in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants