Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Headaches caused by multiple environments #27

Closed
koaning opened this issue Jul 6, 2024 · 6 comments
Closed

Headaches caused by multiple environments #27

koaning opened this issue Jul 6, 2024 · 6 comments
Labels
question Further information is requested

Comments

@koaning
Copy link
Contributor

koaning commented Jul 6, 2024

I have thusfar assumed that mandr will only need to worry about a single Python environment. But maybe that is not realistic in the long run. Maybe the single user will keep their Python version up to date which means that we are dealing with updated packages/Python versions. But especially when multiple teams are involved then we are for sure going to have to deal with this.

Since we are dealing with Pickles/multiple environments a bunch ... I am wondering if there is something we can/should do to minimise headackes.

@adrinjalali
Copy link
Contributor

In skops I ended up having to store the versions of dependencies used to produce the model objects, here it translates to having that stored alongside each mandr I think.

And I guess this is only an issue on the "visualizing things from mandr objects" part, rather than "storing stuff" part, and only if we're dealing with python artifacts rather than stored information / logs. We might need to have the webapp / backend create / launch a new environment for each mandr then.

I wouldn't mind letting this be more of an advanced feature for now and assuming same / compatible versions to start with.

@koaning
Copy link
Contributor Author

koaning commented Jul 8, 2024

Right, so kind of like assuming that all views are static at some point and won't change anymore?

@tuscland
Copy link
Member

tuscland commented Jul 8, 2024

Am I right to understand this has to do with environment reproducibility?
In the future, I see a nice role played by mandr to capture the environment, ensuring it stays consistent over time.

@adrinjalali
Copy link
Contributor

Right, so kind of like assuming that all views are static at some point and won't change anymore?

As long as we don't do realtime interactive dashboard-y stuff, this assumption can be true.

In the future, I see a nice role played by mandr to capture the environment, ensuring it stays consistent over time.

To some extent, but pixi does a very good job for that, so we shouldn't be recreating any tools for that.

@tuscland
Copy link
Member

Reading again from the start, to me there are two interesting issues:

How to deal with environment reproducibility? Adrin suggests it is out of our scope.

How to introduce the notion of environment? This has, to me, extents in MLOps. For the moment being, on the storage side, maybe we can just leave users deal with environments at the path level? Something like probabl-ai/dev/my-experiment, or probabl-ai/prod/my-experiment?

Right, so kind of like assuming that all views are static at some point and won't change anymore?

Views can be dynamic as long as the underlying data does not require executing a user-program (as in loading a pickled model). It would be sad if the user could not change the layout of their dashboard, just because we decide the views are computed once for all ;)

This is why we should be careful to separate things that are data —and can be visualized (any serializable data structure containing scalar values), from things that are code (incl. pickles) — and need to be computed in a reproduced environment.

@tuscland tuscland added the question Further information is requested label Jul 15, 2024
@augustebaum augustebaum changed the title Headackes caused by multiple environments Headaches caused by multiple environments Aug 23, 2024
@tuscland
Copy link
Member

Revisiting this issue under the light of recent developments.

Since #303, we no longer persist pickles, or at least, it has become an implementation detail. Items stored in a project can only be of types that are independent of external libraries, with the exception of scikit-learn models that are serialized using skops.io.

For environment reproducibility, as Adrin said, an integration with pixi or an environment management tool would be interesting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants