Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for another public cloud - Microsoft Azure #40

Closed
leifericf opened this issue Dec 10, 2019 · 21 comments · Fixed by #1091
Closed

Support for another public cloud - Microsoft Azure #40

leifericf opened this issue Dec 10, 2019 · 21 comments · Fixed by #1091
Assignees
Labels
enhancement New feature or request

Comments

@leifericf
Copy link

Currently, Metaflow is set up to work with AWS as the default public cloud. The architecture of Metaflow allows for additional public clouds to be supported.

Adding support for Microsoft Azure might broaden the potential user base, which could increase the adaption rate. This, in turn, could lead to increased community attention.

@leifericf leifericf changed the title Add support for another public cloud - Microsoft Azure Support for another public cloud - Microsoft Azure Dec 10, 2019
@gplusplus314
Copy link

I can dedicate a few hours here and there for Azure support, but I don't have time to take the reins on this one. If someone goes through the trouble of designing and proposing a solution and could use some extra hands for the implementation, loop me in.

@savingoyal savingoyal added the enhancement New feature or request label Dec 10, 2019
@leifericf
Copy link
Author

leifericf commented Dec 12, 2019

@gerryhernandez: I have asked my contacts at Microsoft (Norwegian HQ) whether they would be willing to pitch in with funding and/or time from their engineers.

@jwang01
Copy link

jwang01 commented Feb 24, 2020

Yes, any update from Microsoft? If they don't have plan to do so. Can we fork a branch and add Azure enhancements on our own?

@webmaxru
Copy link

Hello! I'm in discussion with my colleagues from Microsoft Norway about this project. @jwang01 do you want to help with implementing?

@ylulloa
Copy link

ylulloa commented Feb 27, 2020

What's the main challenge you can see now? Converting the AWS Cloudformation templates to ARM templates?

@webmaxru
Copy link

That might be a good start :)

@nabsul
Copy link

nabsul commented Feb 28, 2020

I wonder if Kubernetes/Helm would be a better option than ARM? The result would then potentially be cloud-agnostic.

@vermaakarsh
Copy link

Any chances of this getting traction

@onacrame
Copy link

Any chances of this getting traction

Ditto. Any luck?

@pikulmar
Copy link

pikulmar commented Oct 4, 2021

Also curious about this feature. Any updates, @webmaxru or @gerryhernandez?

@savingoyal
Copy link
Collaborator

@pikulmar With the new datastore implementation (#580), it should be now rather straightforward to integrate with Azure Blob Store. With the kubernetes support for compute and orchestration (#644), one can reliably run workloads on AKS. Let us know if you would like to help test out #644!

@pikulmar
Copy link

@pikulmar With the new datastore implementation (#580), it should be now rather straightforward to integrate with Azure Blob Store. With the kubernetes support for compute and orchestration (#644), one can reliably run workloads on AKS. Let us know if you would like to help test out #644!

@savingoyal Yes, definitely! I will give it a try and let you know how things go.

@pikulmar
Copy link

pikulmar commented Oct 12, 2021

@pikulmar With the new datastore implementation (#580), it should be now rather straightforward to integrate with Azure Blob Store. With the kubernetes support for compute and orchestration (#644), one can reliably run workloads on AKS. Let us know if you would like to help test out #644!

@savingoyal Yes, definitely! I will give it a try and let you know how things go.

@savingoyal Ok, I had a first go at this:

  1. Submitting Kubernetes jobs to AKS using the branch from Kubernetes support for Metaflow #644 works. As expected, jobs fail to complete because they cannot access the code package in S3. So, the name of the game is to use New Datastore abstraction for Metaflow #580 in order to add Azure Blob Storage support.

  2. In order to bring New Datastore abstraction for Metaflow #580 in, I merged current master into plugin-linter (target branch of Kubernetes support for Metaflow #644), yielding https://github.com/fortum-tech/metaflow/tree/plugin-linter-update. I subsequently merged Kubernetes support for Metaflow #644 into the former branch, yielding https://github.com/fortum-tech/metaflow/tree/plugin-linter-update-k8s. Finally, I implemented DataStoreStorage using cloudpathlib. The result can be found in https://github.com/fortum-tech/metaflow/tree/plugin-linter-update-k8s-cloudpathlib. This yielded a first successful step execution on AKS. However, there is still work left to do. I tried to summarize the open issues in https://github.com/fortum-tech/metaflow/blob/plugin-linter-update-k8s-cloudpathlib/metaflow/datastore/cloudpathlib_storage.py#L13.

I could use some input as to how to proceed. I see two paths:

In either case, if you think the approach is promising, we could consider opening a (separate) PR for the cloudpathlib data store feature.

Let me know what you think.

UPDATE: Using the latest https://github.com/fortum-tech/metaflow/tree/plugin-linter-update-k8s-cloudpathlib, the @conda decorator also works as expected. As already noted, there is some redundancy between metaflow.datatools.S3 and DataStoreStorage. Getting @conda to work on AKS only required using DataStoreStorage. Perhaps metaflow.datatools.S3 could be removed completely at some point? Alternatively, one could replace metaflow.datatools.S3 by metaflow.datatools.DataStoreTools which implements (some of) the existing metaflow.datatools.S3 API in a cloud-agnostic manner, based on DataStoreStorage?

@savingoyal
Copy link
Collaborator

@pikulmar

  • Yes, we can get rid of metaflow.datatools.S3 from the @conda implementation entirely - if you would like to submit a PR please let me know!
  • PR New Datastore abstraction for Metaflow #580 and Kubernetes support for Metaflow #644 have been merged into the Metaflow codebase - I would be happy to take a look at your Azure datastore PR - I am not very familiar with cloudpathlib - is there a specific reason to opt for that in lieu of Azure's Python SDK?

@renato-umeton
Copy link

Any update on supporting MSFT Azure ?

@webmaxru
Copy link

I posted an invitation on my social media. Please, amplify for reaching out to potential contributors:

https://www.linkedin.com/posts/webmax_support-for-another-public-cloud-microsoft-activity-6869342352914309120-V4U8

https://twitter.com/webmaxru/status/1463576009781956616?s=20

@pikulmar
Copy link

@savingoyal It appears that we might not require Azure Blob Storage support in Metaflow after all (we might decide to share details on this later), which is why I am not sure how much time we would be able to dedicate to a corresponding PR at this time.

Regarding your question, there is a trade off between versatility and performance:

  • Using cloudpathlib makes implementation of the Datastore API particularly straightfoward and has the advantage of supporting all cloud storage services supported by cloudpathlib, including those added in the future.

  • Using cloud-specific libraries like boto3, on the other hand, can provide improved performance by using parallel data transfers etc. (I am not aware of cloudpathlib supporting that yet). But, of course, it requires work for each additional cloud storage service to be supported.

Generally, development could be split into multiple PRs:

  1. Refactor metaflow.datatools.S3 (and perhaps other parts of the code) to exclusively rely on the new Datastore abstraction.

  2. a. Implement a cloudpathlib-based Datastore similar to what was linked/discussed above.

    b. Add an Azure-specific, performance optimized Datastore implementation if this is of interest.

Question: Can Datastore implementations also be managed as plug-ins (via the metaflow_extensions mechanism) and, if yes, would such an approach be preferred for contributions 2a and/or 2b?

@romain-intel
Copy link
Contributor

A few notes:

  • currently datastore cannot be added via the metaflow_extensions mechanism but that should be possible very shortly (it's very trivial to do and just requires a tiny code reorg -- it's planned but I didn't do it yet given all the other changes that were in flight).
  • I would rather keep the metaflow.datatools.S3 as is; currently the S3 storage implementation for the datastore relies on it so it would be a bit circular if we made it depend on the datastore. Everything else should be going through the datastore. It's a good question whether or not we keep it as part of core metaflow though or move it to a more S3 specific portion.
  • Adding an implementation with cloudpathlib should hopefully be very easy; you just have to implement the storage part which has just a few methods. Similar for the azure-specific one as well. Everything in the datastore boils down to those functions. There is a requirement to store "metadata" about the file but that can be stored as a separate blob as well as in the local implementation.

@savingoyal
Copy link
Collaborator

This work is currently in-flight. We expect a feature-complete PR to be available over the next couple of weeks. It will cover AzureBlobStorage as our Azure Datastore and run on top of Kubernetes (AKS or BYOC).

@renato-umeton
Copy link

renato-umeton commented Jul 13, 2022 via email

@savingoyal savingoyal linked a pull request Jul 20, 2022 that will close this issue
@savingoyal
Copy link
Collaborator

savingoyal commented Jul 20, 2022

@Dana-Farber The PR is now available for testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.