Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Execute code independently of the IDE #216

Open
RHRolun opened this issue Oct 19, 2023 · 6 comments
Open

[Feature Request]: Execute code independently of the IDE #216

RHRolun opened this issue Oct 19, 2023 · 6 comments
Labels
kind/enhancement New feature or request priority/normal An issue with the product; fix when possible

Comments

@RHRolun
Copy link

RHRolun commented Oct 19, 2023

Feature description

The current workbenches execute user code inside the same environment that the IDE is running.
This can in some cases be undesirable as the dependencies needed to run the environment may collide with the desired dependencies for the code execution, or provide results that differ from running on a slimmer environment in production.
The goal of this feature would be to separate the IDE environment from the code execution environment so that the dependencies do not get mixed and so that the code execution environment easily can be replaced by another with other dependencies.

Describe alternatives you've considered

Separate the IDE environment from the code execution environment, either through virtual environments (set default kernel in the notebooks) or through remote execution on a different pod.

Anything else?

No response

@RHRolun RHRolun added kind/enhancement New feature or request priority/normal An issue with the product; fix when possible labels Oct 19, 2023
@shalberd
Copy link

shalberd commented Oct 19, 2023

mmh, isn't that the idea behind pipelines and dedicated / separate runtime containers for every step of a pipeline?

@guimou I think you have experience in this, too, I noticed once how you talked about Jupyter env dependencies.

@RHRolun
Copy link
Author

RHRolun commented Oct 19, 2023

@shalberd - yes, pipelines let you do this in a nice way, but having to go through a pipeline all the time while prototyping is quite a hassle.
This brings up another good point, if you wanted to develop a script for a specific step of the pipeline with specific dependencies, it would be great to quickly swap out your kernel/execution env and run it in the IDE without having to execute the pipeline.

@andrewballantyne
Copy link
Member

cc @harshad16

@lucferbux
Copy link

/transfer kubeflow

@openshift-ci openshift-ci bot transferred this issue from opendatahub-io/odh-dashboard Oct 23, 2023
@guimou
Copy link
Member

guimou commented Oct 23, 2023

This has always been the issue with the way our workbench images are built. Several aspects to that:

  • UBI images are built with an already existing Python venv (/opt/app-root). Everything Python that happens will be in this venv. The rational is that it won't prevent app/user packages to collide with the ones built inside the OS (there are some, for DNF and stuff. While the intent is good, it prevents from creating and using other venv.
  • Jupyter is a Python app. So it needs to run from somewhere... In local development mode, you will have as many Jupyter deployment as you have venvs. Meaning you switch to a specific venv (manually, with Anaconda, whatever...), then only launch Jupyter. There you can manage some consistency and compatibility. This is not doable in our containerized environment as Jupyter IS the UI.
  • If you modify currently loaded packages, then what happens of Jupyter? It's an egg-and-chicken problem that I have never fully investigated.
  • Working in a single fixed venv has advantages though, the first one being immutability/consistency. If you let people create multiple ones, you're back at square one in terms of being able to share notebooks and data as different people will have different venvs, more or less properly maintained or in sync.
  • Now, most if not all of our compatibility issues don't come from Jupyter itself, but from our extensions (Elyra, KFP,...), that are either awfully lagging in terms of dependencies, or have very strict fixed dependencies that prevent from installing something else alongside. I'm really close to simply ditch Elyra (or even Codeflare which had the same kinds of issues until recently) out of my custom images as it's a nightmare to have it work with some recent Python libraries...

Some possible paths from there:

  • Switch to VSCode for specific jobs... As it's not a Python-based UI you have less constraints in terms of compatibility, while still being able to work with notebooks. However you loose Elyra...
  • Investigate how to manually create and persist other kernels in a persistent volume. People would be able to create and populate those kernels with what they want, and Jupyter would execute them as "external" stuff, meaning not using the same Python installation as the one it's currently running on.
  • Have some kind of custom selector at the beginning of a sessions, so something before/in front of Jupyter that would allow to select a specific venv to run on. Somewhat similar to the Anaconda approach. However it's not that different from having different workbenches.
  • Update Elyra/KFP/Codeflare (and surely other extensions) to make sure they keep up with the rest of the world and don't yield to incompatibilities.
  • Don't include Elyra/KFP/Codeflare in all images, and have specific workbenches when you want to use those features. Definitely not ideal...

@shalberd
Copy link

shalberd commented Oct 23, 2023

Working in a single fixed venv has advantages though, the first one being immutability/consistency.

That is THE reason we in our corporation would always aim to have one container image mean one specific env, with clear dependencies and when developers want flexibility, we'd just build them another image, for which by the way @guimou had made a great modular folder structure and toolset (interactive-image-builder.sh) that makes the whole thing a breeze. With IDE, without IDE just for runtimes i.e. in Airflow or Kubeflow pipelines, and so on.

I have worked with Anaconda and other toolchains as well, so I know both perspectives, plus our data scientists used to working on their laptops locally gave us exactly that point of view initially mentioned here, but, there are clear advantages of doing it the immutable / always consistent per-container way.¨

I'm really close to simply ditch Elyra

Elyra is having issues with its community, I believe, plus trying to be too many things all at once, for all kinds of deployments, container PaaSs and so on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement New feature or request priority/normal An issue with the product; fix when possible
Projects
Status: No status
Status: No status
Status: 📋 Backlog
Status: Done
Development

No branches or pull requests

5 participants