Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 support for auto-sklearn to store and load models and configurations for each run #986

Open
pkvprakash opened this issue Oct 27, 2020 · 2 comments
Labels

Comments

@pkvprakash
Copy link

Currently I see no support for auto-sklearn to read and write from s3. Providing support for s3 opens up a door in running auto-sklearn pipelines in distributed mode in cloud or in any other on-prem cluster

As of now, after a quick code walkthrough, I can see there are many places auto-sklearn interact with filesystem directly using shutil, os, and lockfile modules.

This means we need to tackle this issue in two steps.

  1. Create an abstraction layer for filesystem access and refactor the code to use this layer for all filesystem related activities.
  2. Add support for s3 by providing concrete implementation of the abstractions for s3

What all are your thoughts?

@mfeurer
Copy link
Contributor

mfeurer commented Nov 10, 2020

Currently I see no support for auto-sklearn to read and write from s3

That's correct. Auto-sklearn is 100% filesystem based.

Providing support for s3 opens up a door in running auto-sklearn pipelines in distributed mode in cloud or in any other on-prem cluster

Auto-sklearn can run fully distributed in an on-prem setting if all nodes have a shared file system (which is the case in most academic settings). I assume this is different for cloud services? If yes, is this different for all cloud services?

Create an abstraction layer for filesystem access and refactor the code to use this layer for all filesystem related activities.

Yes and no. There is an abstraction layer but it's not complete yet.

Assuming that this is only to allow Auto-sklearn to be used if there is no shared file system, would there be any other advantages?

@github-actions
Copy link
Contributor

github-actions bot commented May 5, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs for the next 7 days. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants