Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change fullpath to a URL #16

Open
aidanheerdegen opened this issue Jan 27, 2023 · 2 comments
Open

Change fullpath to a URL #16

aidanheerdegen opened this issue Jan 27, 2023 · 2 comments

Comments

@aidanheerdegen
Copy link
Member

Could change fullpath to accept a URL argument, so prepending with file:// would be equivalent to current behaviour. Assumption would be a file:// URL to be backwards compatible.

This has the advantage of supporting any storage location/protocol that can be expressed as a URL, within the ability of python to support it.

There was interest in having some system for being able to locate important input files from experiment git repositories. Suggestions included git-lfs and git-annex. This is potentially another alternative to those.

Some examples of possible URL storage endpoints are S3 and Interplanetary File System (IPFS).

Possible modification: add a URL field that is used if fullpath is absent or the file is missing. In the first case it would be downloaded and stored in a default "scratch" space, second case it would be downloaded and save to the fullpath location.

@dougiesquire
Copy link
Contributor

fsspec immediately comes to mind. Is that the sort of thing you were thinking of? Also smart_open

@aidanheerdegen
Copy link
Member Author

Maybe. fsspec has more functionality than is strictly speaking required, all that is needed is to be able to access the URL.

For example ipfs is accessible through an http gateway, so any library that can deal with that, e.g. urllib could use that. But there are also dedicated python clients, but the status seems uncertain.

I'm just suggesting that the path could be a URL, which covers the current usage, but allows for more flexibility about where the data resides, so a config could be published that could be used anywhere, if the data were publicly accessible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants