Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] - Add Argo Workflows integration to Qhub #1230

Closed
Adam-D-Lewis opened this issue Apr 6, 2022 · 9 comments
Closed

[ENH] - Add Argo Workflows integration to Qhub #1230

Adam-D-Lewis opened this issue Apr 6, 2022 · 9 comments
Assignees
Labels
type: enhancement 💅🏼 New feature or request

Comments

@Adam-D-Lewis
Copy link
Member

Adam-D-Lewis commented Apr 6, 2022

This issue will document the progress/choices made in integrating Argo Workflows with QHub.

@Adam-D-Lewis
Copy link
Member Author

Adam-D-Lewis commented Apr 6, 2022

I've read these resources:

Some of my takeaways are:

  • I think we should do a namespace or managed namespace install (Argo can only execute workflows in 1 namespace rather than on all namespaces) of Argo Workflows since I've seen deployments of Qhub which occur alongside unrelated k8s client resources. So doing a namespace install would help maintain some isolation there. If we do so, we'd need to not allow the users to create workflows in the same namespace as the Argo Deployment since this introduces the ability for the user to create workflows which modify the argo workflow controller itself (according to Argo Workflows docs). So maybe we deploy Qhub in e.g. dev and all the workflows run in dev-workflows namespace.
  • The docs also say that using emissary executor + containerset is more secure than using a Docker executor since it can't run as root, and faster since all the containers in a workflow run on the same pod, and don't need to communicate through the k8s api. This could be good to try out, but not as important. The default executor and pod security context can be set with default workflow spec (https://argoproj.github.io/argo-workflows/default-workflow-specs/).
  • TLS is disabled by default on Argo Workflows Helm chart. I imagine we should enable this at some point, but I think it'd be better to get it running without it first.
  • Argo Server supports SSO. Can and should we use Keycloak to enable users to log into the web UI? The helm options for configuration are below:
  # -- SSO configuration when SSO is specified as a server auth mode.    
  sso: {}    
    ## All the values are required. SSO is activated by adding --auth-mode=sso    
    ## to the server command line.    
    #    
    ## The root URL of the OIDC identity provider.    
    # issuer: https://accounts.google.com    
    ## Name of a secret and a key in it to retrieve the app OIDC client ID from.    
    # clientId:    
    #   name: argo-server-sso    
    #   key: client-id    
    ## Name of a secret and a key in it to retrieve the app OIDC client secret from.    
    # clientSecret:    
    #   name: argo-server-sso    
    #   key: client-secret    
    ## The OIDC redirect URL. Should be in the form <argo-root-url>/oauth2/callback.    
    # redirectUrl: https://argo/oauth2/callback    
    # rbac:    
    #   enabled: true    
    ## When present, restricts secrets the server can read to a given list.    
    ## You can use it to restrict the server to only be able to access the    
    ## service account token secrets that are associated with service accounts    
    ## used for authorization.    
    #   secretWhitelist: []    
    ## Scopes requested from the SSO ID provider.  The 'groups' scope requests    
    ## group membership information, which is usually used for authorization    
    ## decisions.    
    # scopes:    
    # - groups    
  • Eventually, we'll want to enable Prometheus to collect data from Argo Server. There are some helm chart options around this as well.

There are further additions that we may want to add eventually like artifact repository, adding a database to offload large workflows (https://argoproj.github.io/argo-workflows/offloading-large-workflows/), but we can get to it later.

@iameskild iameskild self-assigned this Apr 6, 2022
@costrouc
Copy link
Member

costrouc commented Apr 7, 2022

@Adam-D-Lewis great points.

I think we should do a namespace or managed namespace install

Totally agree. In fact I would have liked to do this with other qhub resources as well. For example the jupyterlab and dask workers being launched for users. There isn't much reason they need to be in the same namespace as other pods. I think argo would be a good test of this.

The docs also say that using emissary executor + containerset is more secure than using a Doc...

I don't know enough about this. But I'd guess that after we have a stable argo deployment we could investigate this. We should wait to implement this if it adds any complexity for our deployment.

TLS is disabled by default on Argo Workflows Helm chart. I imagine we should enable this at some point, but I think it'd be better to get it running without it first.

I'd like all our components to communicate over tls.. but I don't know exactly how to do this. I agree this would be nice to have but not something we can implement in the short term.

Argo Server supports SSO. Can and should we use Keycloak to enable users to log into the web UI? The helm options for configuration are below:

Yes we should absolutely do this in our deployment and configure the roles/groups appropriately. I think this is essential to the complete deployment.

Eventually, we'll want to enable Prometheus to collect data from Argo Server. There are some helm chart options around this as well.

Yeup we should enable this and I think this is part of the complete deployment.

@Adam-D-Lewis
Copy link
Member Author

Adam-D-Lewis commented Apr 12, 2022

I think there are many different points at which we could call this "done". I think the minimum done does the following:

Minimum addition of Argo Workflows

I think we should add Argo in the following order. Minimum done is Step 1

Step 1

  • Add default helm chart deployment to Nebari
    • expose Argo UI at /argo endpoint
  • use Keycloak for Auth on Argo Server (under review)
  • deploy argo in same namespace as rest of Qhub
  • give UI permission to start workflows also (don't address risk of user creating a workflow that modifies argo at this time (cluster install has the same issue))
  • Add docs on how to enable/use Argo Workflows in Nebari
  • Add prometheus integration + simple dashboard

Step 2

  • add default security context
    • use emissary executor by default
  • set up argo roles
  • fine tune permissions
  • deploy argo in it's own namespace
    • set up cross namespace communication so Prometheus can scrape Argo metrics still

Step 3

  • turn on TLS for Argo
  • Add Artifact repo
  • Add database

@iameskild
Copy link
Member

Thanks a lot for putting this together @Adam-D-Lewis! This gives us a clear target to aim for.

I do have a question around how we would go about "Can deploy Rich's sample workflows in the UI", does this not depend on having some user-facing solution in place? Jupyterflow, Yason etc?

@Adam-D-Lewis
Copy link
Member Author

@iameskild

I do have a question around how we would go about "Can deploy Rich's sample workflows in the UI", does this not depend on having some user-facing solution in place? Jupyterflow, Yason etc?

Good point, maybe we need to just use dummy workflows (long running and scheduled) for now. We'd run his workflows later when we get to adding a front end.

@Adam-D-Lewis
Copy link
Member Author

Adam-D-Lewis commented Apr 14, 2022

@iameskild and I are getting reprioritized on another issue, but we hope to get this to a point where it's added into Qhub. However, because there is a blocker on the Prometheus integration above, I suggest we finish integrating the argo server deployment with keycloak's permissions, add some rudimentary docs and try to merge leaving Prometheus and step 2 and 3 above for future work.

@trallard
Copy link
Member

AFAIK this is now enable (at least the "backend" or the orchestrator) does this issue need closing?

@iameskild
Copy link
Member

I'm fine with closing this. @viniciusdc and I did some preliminary research for the "front-end" interface. We can open another issue to track that discussion 👍

@iameskild
Copy link
Member

Closing as completed. Opening #1495 to track front-end proposals.

Repository owner moved this from Needs Triage 🔍 to Done 💪🏾 in QHub Project Mangement 🚀 Oct 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement 💅🏼 New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants