Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jupyterhub: allow users created before Cowbird was enabled to spawn jupyterlab #480

Merged
merged 5 commits into from
Jan 16, 2025

Conversation

mishaschwartz
Copy link
Collaborator

Overview

Users created before Cowbird was enabled will not have a "workspace directory" created. A workspace directory is a symlink to the directory that contains their Jupyterhub data.

When Cowbird is enabled, Jupyterhub checks if the workspace directory exists and raises an error if it doesn't.

This change allows Jupyterhub to create the symlink if it doesn't exist instead of raising an error.
This means that users without a "workspace directory" will be able to continue using Jupyterhub as they did before without the need for manual intervention by a system administrator who would otherwise need to manually create the symlink for them.

Changes

Non-breaking changes

  • changes jupyterhub configuration

Breaking changes
None

Related Issue / Discussion

Additional Information

CI Operations

birdhouse_daccs_configs_branch: master
birdhouse_skip_ci: false

@github-actions github-actions bot added component/jupyterhub Related to JupyterHub as development frontend with notebooks documentation Improvements or additions to documentation labels Nov 22, 2024
@mishaschwartz
Copy link
Collaborator Author

@tlvu please test this in your staging environment to ensure that this works for all of your users who were created before cowbird was enabled by default (version 2.0)

@tlvu
Copy link
Collaborator

tlvu commented Nov 22, 2024

@mishaschwartz great ! Will test this next week.

By the way, do you plan to roll all changes to make Cowbird compatible with existing Magpie users and the poor-man sharing into one PR? Basically all the work-around found in #425, whenever they make sense and is possible of course.

@mishaschwartz
Copy link
Collaborator Author

By the way, do you plan to roll all changes to make Cowbird compatible with existing Magpie users and the poor-man sharing into one PR? Basically all the work-around found in #425, whenever they make sense and is possible of course.

The workaround described here #425 (comment) can be enabled just by updating your env.local file. I don't think there are any additional code changes that are needed.

Copy link
Collaborator

@fmigneault fmigneault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for the fix!
I'll let @tlvu do a more in-depth test to validate.

@mishaschwartz
Copy link
Collaborator Author

By the way, do you plan to roll all changes to make Cowbird compatible with existing Magpie users and the poor-man sharing into one PR? Basically all the work-around found in #425, whenever they make sense and is possible of course.

The workaround described here #425 (comment) can be enabled just by updating your env.local file. I don't think there are any additional code changes that are needed.

Check out #481 which adds better documentation to avoid this for other users in the future.

@tlvu
Copy link
Collaborator

tlvu commented Nov 26, 2024

The workaround described here #425 (comment) can be enabled just by updating your env.local file. I don't think there are any additional code changes that are needed.

Check out #481 which adds better documentation to avoid this for other users in the future.

Right sorry, I forgot all the work-around are just configs in env.local and no code change required.

@tlvu
Copy link
Collaborator

tlvu commented Dec 10, 2024

@mishaschwartz
I backported your change to the state that Ouranos is, at this commit efb8485.

Then from env.local.example I enable the poor-man sharing plus

export EXTRA_CONF_DIRS="
    ./components/cowbird
    ./optional-components/canarie-api-full-monitoring
    ./optional-components/all-public-access
    ./optional-components/secure-thredds
    ./optional-components/wps-healthchecks
"
PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR="share"

So with this very minimal env.local, everything works fine, I can login to Jupyter and see previous contents.

Then I enabled a config that is more similar to our production

export EXTRA_CONF_DIRS="
  ./components/canarie-api
  ./components/geoserver
  ./components/finch
  ./components/raven
  ./components/hummingbird
  ./components/thredds
  ./components/portainer
  ./components/jupyterhub
  ./components/cowbird
  ./optional-components/canarie-api-full-monitoring
  ./optional-components/wps-healthchecks
  ./optional-components/testthredds
  ./optional-components/generic_bird
  ./optional-components/x-robots-tag-header
  ./optional-components/proxy-json-logging
  ./components/weaver
  ./components/monitoring
  /path/no/exist
"

With this I am unable to login to Jupyter anymore. Here are the logs from docker logs jupyterhub:

[I 2024-12-10 06:18:04.527 JupyterHub log:191] 200 GET /jupyter/hub/login (@172.21.0.1) 2.13ms
[D 2024-12-10 06:18:09.731 JupyterHub log:191] 200 GET /jupyter/hub/static/components/font-awesome/fonts/fontawesome-webfont.woff2?v=4.7.0 (@10.10.10.6) 1.05ms
[W 2024-12-10 06:18:09.920 JupyterHub base:843] Failed login for lvu
[I 2024-12-10 06:18:09.922 JupyterHub log:191] 200 POST /jupyter/hub/login?next=%2Fjupyter%2Fhub%2F (@10.10.10.6) 178.20ms

Direct login to https://HOST/magpie with my user and passwd works so no problem on Magpie.

I removed ./components/cowbird from EXTRA_CONF_DIRS and I am able to login to Jupyter again !

This is a bit weird, I do not see how the other components can affect cowbird. I'll continue investigating another time.

@mishaschwartz
Copy link
Collaborator Author

mishaschwartz commented Dec 10, 2024

@tlvu

If you are unable to login, that is a different issue than the one addressed here. This fixes the issue that users were not able to spawn a jupyterlab container after they had logged in.

I'm also confused by your examples: in the first EXTRA_CONF_DIRS you're not enabling the jupyterhub service but you say that you can log in to jupyterhub?? Is there some other setting in your env.local that's overriding this?

In your second example can you please try with ./components/cowbird before ./components/jupyterhub in EXTRA_CONF_DIRS? Since cowbird is a required component in version 2, it will always be loaded before jupyterhub.

@tlvu
Copy link
Collaborator

tlvu commented Dec 10, 2024

in the first EXTRA_CONF_DIRS you're not enabling the jupyterhub service but you say that you can log in to jupyterhub??

This is Ouranos stack pre 2.0.0, jupyterhub, magpie and other are enabled by default 😄

In your second example can you please try with ./components/cowbird before ./components/jupyterhub in EXTRA_CONF_DIRS? Since cowbird is a required component in version 2, it will always be loaded before jupyterhub.

Oh right ! Never thought about this one but very true that the ordering could matter.

I have something to do today, will retry this investigation probably Thursday.

@mishaschwartz
Copy link
Collaborator Author

This is Ouranos stack pre 2.0.0, jupyterhub, magpie and other are enabled by default 😄

right, right.. forgot about that sorry

I have something to do today, will retry this investigation probably Thursday.

sounds good

@tlvu
Copy link
Collaborator

tlvu commented Dec 19, 2024

I re-added ./components/cowbird to EXTRA_CONF_DIRS, to the same place, not before ./components/jupyterhub and suddenly I can still login to Jupyterhub ! Basically I cannot reproduce the problem earlier in comment #480 (comment).

So I continue my testing and I created a new user in Magpie to see how Cowbird trigger works and I notice it create the following:

$ ls -l /data/user_workspaces/testcowbird01
total 4
lrwxrwxrwx. 1 root  root   40 Dec 19 20:58 notebooks -> /data/jupyterhub_user_data/testcowbird01
drwxrwxrwx+ 2 root  root    6 Dec 19 20:58 shapefile_datastore

So maybe in this PR, we might want to replicate the same behavior as the real Cowbird instead?

Then I was curious about this new shapefile_datastore dir so I look up Cowbird code and found this

$ ack shapefile_datastore                     
docs/components.rst                                 
38:    /user_workspaces/<user_name>/shapefile_datastore  # Managed by the `GeoServer` handler                                                                                                                      

cowbird/handlers/impl/geoserver.py                  
72:DEFAULT_DATASTORE_DIR_NAME = "shapefile_datastore"                                                    
686:        return f"shapefile_datastore_{workspace_name}"   

Notice there is a {workspace_name} after shapefile_datastore. I have an older Cowbird installed. The code I did the search is from the tip of master of Cowbird. I hope the folder name did not change?

This leads me to think maybe we should not try to replicate Cowbird behavior here manually but try to trigger Cowbird new user creation trigger again so any naming change is transparent for us? That assume we can call the same Magpie new user trigger from Jupyterhub.

@tlvu
Copy link
Collaborator

tlvu commented Dec 20, 2024

Trying to trigger the hook manually:

$ curl -X POST "http://lvu8.ouranos.ca:7000/cowbird/webhooks/users?format=application%2Fjson" -H  "accept: application/json" -H  "Accept: application/json" -H  "Content-Type: application/json" -d "{  \"event\": \"created\",  \"user_name\": \"testcowbird02\"}"

Got back this error and I am unable to understand why it fails:

{"param": {"conditions": {"not_none": false, "is_type": false}, "value": null, "name": "callback_url", "compare": "Type[str]"}, "code": 422, "detail": "Invalid value specified.", "type": "application/json", "path": "/webhooks/users", "url": "http://lvu8.ouranos.ca:7000/cowbird/webhooks/users?format=application%2Fjson", "method": "POST"}

docker logs cowbird gives

[2024-12-20 02:48:22,409] INFO       [ThreadPoolExecutor-0_0][cowbird.utils] Request: [POST lvu8.ouranos.ca:7000 /cowbird/webhooks/users]
[2024-12-20 02:48:22,411] DEBUG      [ThreadPoolExecutor-0_0][cowbird.utils] Request details:
URL: http://lvu8.ouranos.ca:7000/cowbird/webhooks/users?format=application%2Fjson
Path: /cowbird/webhooks/users
Method: POST
Headers:
  Host: lvu8.ouranos.ca:7000
  User-Agent: curl/7.76.1
  Accept: application/json
  Content-Type: application/json
  Content-Length: 53
Parameters:
  format: application/json
Body:
  b'{  "event": "created",  "user_name": "testcowbird02"}'
[2024-12-20 02:48:22,413] DEBUG      [ThreadPoolExecutor-0_0][cowbird.api.webhooks.views] Received user webhook event [created] for user [testcowbird02].

docker logs cowbird-worker has nothing useful.

I did turn on DEBUG logging and expose the port this way:

$ git diff
diff --git a/birdhouse/components/cowbird/config/cowbird/cowbird.ini.template b/birdhouse/components/cowbird/config/cowbird/cowbird.ini.template
index 3aa33da2..b4355c15 100644
--- a/birdhouse/components/cowbird/config/cowbird/cowbird.ini.template
+++ b/birdhouse/components/cowbird/config/cowbird/cowbird.ini.template
@@ -75,7 +75,7 @@ keys = console
 keys = generic
 
 [logger_root]
-level = INFO
+level = DEBUG
 handlers = console
 formatter = generic
 
diff --git a/birdhouse/components/cowbird/default.env b/birdhouse/components/cowbird/default.env
index 0d160735..54a17a4d 100644
--- a/birdhouse/components/cowbird/default.env
+++ b/birdhouse/components/cowbird/default.env
@@ -45,7 +45,7 @@ export COWBIRD_MONGODB_PORT=27017
 #   DEBUG:  logs detailed information about operations/settings (not for production, could leak sensitive data)
 #   INFO:   reports useful information, not leaking details about settings
 #   WARN:   only potential problems/unexpected results reported
-export COWBIRD_LOG_LEVEL=INFO
+export COWBIRD_LOG_LEVEL=DEBUG
 
 # Subdirectory of DATA_PERSIST_SHARED_ROOT containing the user workspaces used by Cowbird
 export USER_WORKSPACES="user_workspaces"
diff --git a/birdhouse/components/cowbird/docker-compose-extra.yml b/birdhouse/components/cowbird/docker-compose-extra.yml
index 5ad76749..d7a59413 100644
--- a/birdhouse/components/cowbird/docker-compose-extra.yml
+++ b/birdhouse/components/cowbird/docker-compose-extra.yml
@@ -11,6 +11,8 @@ services:
   cowbird:
     image: pavics/cowbird:${COWBIRD_VERSION}-webservice
     container_name: cowbird
+    ports:
+      - 7000:7000
     environment:
       HOSTNAME: $HOSTNAME
       FORWARDED_ALLOW_IPS: "*"

I followed these documentation to craft my curl:

Screenshot from 2024-12-19 22-02-19
Screenshot from 2024-12-19 22-05-40

How do we debug this kind of error? Is there a way to also turn on DEBUG logging for the cowbird-worker container?

@mishaschwartz
Copy link
Collaborator Author

@tlvu

So maybe in this PR, we might want to replicate the same behavior as the real Cowbird instead?

It's a good idea but I think that this needs to be tackled in a different PR. The issue you're describing is much bigger and needs careful consideration to figure out how to implement properly. Consider all of these scenarios that have to be handled:

  • a user is created after cowbird is enabled
  • a new cowbird user_created action is implemented after a user is created
  • a new cowbird handler is implemented
  • a user_created action in cowbird is modified to create/delete/modify a different resource than before
  • etc.

The PR here is really just supposed to fix the immediate issue that some users can't spawn jupyterlab containers

Got back this error and I am unable to understand why it fails:

It's telling you that you need to specify a callback URL

@tlvu
Copy link
Collaborator

tlvu commented Dec 20, 2024

It's telling you that you need to specify a callback URL

OMG ! I can not not read Javascript response. Now that you tell me it looks clear. But I didn't "catch" it yesterday.

So I did this

$ curl -X POST "http://lvu8.ouranos.ca:7000/cowbird/webhooks/users?format=application%2Fjson" -H  "accept: application/json" -H  "Accept: application/json" -H  "Content-Type: application/json" -d "{  \"event\": \"created\",  \"user_name\": \"lvu2\", \"callback_url\": \"\"}"

And it actually works, the folder structures are created on disk.

But the returned message is so misleading, with a whole bunch of NotImplementedError in docker logs cowbird as well.

{"webhook": {"event": "created", "user_name": "lvu2", "callback_url": ""}, "exception": "WebhookDispatchException([NotImplementedError(), NotImplementedError(), NotImplementedError()])", "code": 500, "detail": "Failed to handle user webhook event.", "type": "application/json", "path": "/webhooks/users", "url": "http://lvu8.ouranos.ca:7000/cowbird/webhooks/users?format=application%2Fjson", "method": "POST"}

I am guessing this is the reason for your fix in #488

So given that this works, how about we call this hook from JupyterHub if the folder structure is missing on disk?

@tlvu
Copy link
Collaborator

tlvu commented Dec 20, 2024

So if cowbird container is responding to hooks, what is the role of cowbird-worker container then? Sorry I did not have time to fully RTFM.

@fmigneault
Copy link
Collaborator

@tlvu
The worker is to run the actual operations without blocking the API. Basically, any function decorated with @shared_task that is called somewhere in the code is handled by the worker "at some point".

For example, When a user_created event is received by the API, it iterates over all "handlers" that are enabled, and call the corresponding method. Each of those methods can then sprawl out to many operations that could take more or less time, or could even depend on one another, depending on how the configuration is defined. Trying to do the operations directly with the API could lead to "soft lock" combinations.

In the case of Geoserver handler for example (https://github.com/Ouranosinc/cowbird/blob/4990fd505d5bc76cc73019daa27979adcfa2e35f/cowbird/handlers/impl/geoserver.py#L183-L188),
the chain(create_workspace.si(user_name), create_datastore.si(user_name)) only puts the workspace creation and datastore creation "in queue". Since they themselves need to request GeoServer's API, parse results, create directories, etc., it could cause Cowbird API to become unresponsive if there's too many or slow operations (eg: batch create many users), or it could cause some operations to fail if they have "race condition"-like dependencies (the directory creation for example).

class Geoserver(Handler, FSMonitor):
    #[...]
    def user_created(self, user_name: str) -> None:
        self._create_datastore_dir(user_name)
        res = chain(create_workspace.si(user_name), create_datastore.si(user_name))
        res.delay()
        LOGGER.info("Start monitoring datastore of created user [%s]", user_name)
        Monitoring().register(self._shapefile_folder_dir(user_name), True, Geoserver)

@mishaschwartz
Copy link
Collaborator Author

@tlvu

So given that this works, how about we call this hook from JupyterHub if the folder structure is missing on disk?

I still think that this should be done in a different PR (#480 (comment))

But to add another reason...

If we call the hook, it will create the folder structure asynchronously using the cowbird-worker (see explanation: #480 (comment)). This means that there is a very real possibility that the folder will not have been created when jupyterhub tries to mount it as a volume to the jupyterlab container.

Creating a symlink directly is synchronous and solves the problem in the short term so that PAVICS can upgrade. The general problem you've identified should be handled in a more complete update to cowbird itself in a later PR.

@tlvu
Copy link
Collaborator

tlvu commented Jan 10, 2025

If we call the hook, it will create the folder structure asynchronously using the cowbird-worker (see explanation: #480 (comment)). This means that there is a very real possibility that the folder will not have been created when jupyterhub tries to mount it as a volume to the jupyterlab container.

Creating a symlink directly is synchronous and solves the problem in the short term so that PAVICS can upgrade. The general problem you've identified should be handled in a more complete update to cowbird itself in a later PR.

Right, this is a very good point about the async problem so we should not call the hook from JupyterHub.

Fresh after the holidays, I took some time to ponder about this.

1 - I am not sure to understand the reasoning for the new structure

$ ls -l /data/user_workspaces/testcowbird01
total 4
lrwxrwxrwx. 1 root  root   40 Dec 19 20:58 notebooks -> /data/jupyterhub_user_data/testcowbird01
drwxrwxrwx+ 2 root  root    6 Dec 19 20:58 shapefile_datastore

From the user's point of view, that's the new content he sees in writable-workspace of his Jupyter session.

That "new" writable-workspace is still writable all the way, so user can still add new notebooks at the root, outside of notebooks symlink. Which means it's pretty confusing because some notebooks will be outside of the notebooks symlink and some inside that symlink.

If the point of this new structure is to force organize all user created content under notebooks symlink and cowbird created content under shapefile_datastore why not just create a new folder in Jupyter, at the same level as writable-workspace that is read-only so only cowbird can write and keep the writable-workspace mapped to the old /data/jupyterhub_user_data/USERNAME?

2 - With this symlink work-around, some users will have the new structure, and some will have the old structure (meaning writable-workspace is pointing directly to /data/jupyterhub_user_data/USERNAME, missing the new shapefile_datastore folder and the notebooks symlink). Will cowbird fonction properly when the structure it expects is not there? Users will be confused when comparing their workspace structure.

So at the minimum, I think we should manually re-create the actual structure that cowbird expects (shapefile_datastore folder and the notebooks symlink).

But personally, I think cowbird taking over the existing writable-workspace is just weird. I think it should just leave it untouched and create a new folder in the user's Jupyter session for its new files that it will have total control over, leaving writable-workspace for user's personal files. Taking over writable-workspace means user can change/damage the structure expected by cowbird, not sure that is in cowbird's best interest.

When I implemented the poor-man sharing, I created new structure outside of writable-workspace exactly for the reasons above.

I also found out why when I override DEFAULT_CONF_DIRS to disable cowbird when updating to v2.0.0+ I was not able to go back to pre v2.0.0.

Now that I have an exit strategy because I can go back to pre v2.0.0 if things go really south, in order to catch up faster with the tip of birdhouse, I'll move forward with cowbird disabled. We can take our time to debate the new cowbird file structure and when ready, we can implement the proper manual work-around that is consistent for both existing and new users and is compatible with existing Magpie users.

Below is what our users see when they login to Ouranos' Jupyter instance:
Screenshot from 2025-01-09 23-23-03

public (read-only) and mypublic (writable) are the structure for the poor-man sharing.

tutorial-notebooks and pavics-homepage are our read-only folders for our various tutorial notebooks.

Cowbird can create another folder for its own files and leave writable-workspace for the user.

Users are not allowed to write directly at the root / so they can not add anything next to these folders and they can not damage these folders either.

@mishaschwartz
Copy link
Collaborator Author

I'll let @fmigneault comment on #480 (comment) since he was involved in developing cowbird at the time.

My only suggestion is that we move the discussion about how cowbird can be re-configured to a different place (probably a new issue on the cowbird repo) and make a decision here about what we're doing with this PR.

@tlvu can I merge this PR as a halfway measure?

@tlvu
Copy link
Collaborator

tlvu commented Jan 10, 2025

@tlvu can I merge this PR as a halfway measure?

@fmigneault if cowbird do not see the shapefile_datastore folder and the notebooks symlink, it's fine, no error or crash? If yes then we can merge as-is.

@fmigneault
Copy link
Collaborator

The idea of the notebooks directory/link is 2-fold.

  1. It gives a common root location to Cowbird about what to potentially sync between services and between Magpie user/groups permissions when involving the Jupyter service. Without such reference, anything (including notebooks) can be dumped at the top level. Therefore, a handler that needs to interact only with Jupyter Notebooks would have to deal with many random files, potentially including credential files, nested symlinks, and other configs.

  2. It has become very common for most data science repositories to include a top-level notebooks directory. This gives an intuitive reference for such developers.

Definitely, they could be combined or merged. So far, there has not been massive integrations and sharing involving notebooks, so we don't have strong limitations to apply changes.

Regarding the shapefile_datastore. This is a directory intended to allow quick addition of shapefiles to GeoServer, which would be served under the corresponding user workspace (https://.../geoserver/USERS:<user>/ows?service=...&LAYERS=file). However, with increasing use of STAC definitions in projects on our end, the GeoServer and older WxS APIs have not been used massively, so this feature has not been further developed.

I'm all for updating and modifying Cowbird to adapt to more concrete needs by platform users.

@github-actions github-actions bot added the ci/operations Continuous Integration components label Jan 16, 2025
@mishaschwartz mishaschwartz merged commit 7f5756d into master Jan 16, 2025
4 of 5 checks passed
@mishaschwartz mishaschwartz deleted the migrate-pre-cowbird-users branch January 16, 2025 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/operations Continuous Integration components component/jupyterhub Related to JupyterHub as development frontend with notebooks documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

🐛 [BUG]: Cowbird is not backward compatible with existing Jupyter users
3 participants