-
Notifications
You must be signed in to change notification settings - Fork 405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate from Radiant MLHub to Source Cooperative #1830
Comments
Hi @adamjstewart I am interested to contribute to this, but I am fairly new and will need more guidance. What is a good place for me to start? |
Hi @Haimantika, thanks for volunteering! Let's pick a single dataset, maybe This is the new dataset website: https://beta.source.coop/technoserve/cashews-benin/ If you create an account, log in, and click generate credentials, you'll see that the Azure URI is https://radiantearth.blob.core.windows.net/mlhub/technoserve-cashew-benin We'll add a new dependency on azure-storage-blob in We'll probably add something similar to Let me know if anything is unclear. The first dataset is going to be a bit of work, but once we have one working, the rest should be easy. |
This is very helpful. Thanks a lot. I will start working on it and get back with doubts, if any. |
Hi @adamjstewart I finally got some time to work on it. I see a PR has been raised, is the issue solved already? |
I have not seen any PRs that implement download support for Source Cooperative. Which PR are you referring to? |
My bad. This one just mentioned the issue. |
Yes, this is a 9th dataset that will benefit from your contribution. P.S. I reached out to the folks at Source Cooperative. One thing to note is that azure-storage-blob will copy raw files/directories, not zip/tar files. So there won't be an easy way to checksum these. For now, let's just focus on downloading and ignore checksumming. |
Hi, I was doing a bit of research and the latest version of source cooperative that I could find was - beta.source.coop Is that it? Or am I missing something? I have made the changes, can make a PR for you to take a look. |
Yes, that's the new website. |
@adamjstewart I have raised a PR. There are chances that this is not the solution you are looking for. However I would like to give it one more try after your review and then unassign myself if it does not work to respect your time. :) |
review of MSFT azure-sdk-for-python that includes examples like this. Second view of the |
We definitely don't need all of azure, azure-storage-blob would suffice. |
this file appears to implement basic functionality https://github.com/kartAI/kartAI/blob/master/azure/blobstorage.py |
@Haimantika @darkblue-b all preliminary work is now complete. If you want to claim 1 or more datasets from the above list and start working on them, #2068 will show you what is required to convert them. Note that most of the file changes in that PR are auto-generated by |
Thanks Adam. I will take a look at the code and the dataset tonight and update you on which one I take up. |
Pinging the original dataset contributors:
If any of you have time, would you be interested in revamping these datasets to download from Source Cooperative? |
Hey @adamjstewart I will be taking up the NASA Marine Debris dataset. Will start working from this weekend. |
@ashnair1 it looks like SpaceNet is no longer hosted by Source Cooperative and is only on AWS, is that correct? We can use the same CLI function I wrote to update that dataset to its new download location. There's also a new SpaceNet8 released if you want to add it. |
Summary
We currently have several datasets from the recently defunct Radiant MLHub that we need to switch to Source Cooperative if we want to be able to automatically download them.
Rationale
Downloads are currently broken, and many of these datasets have completely changed their file structure.
Implementation
See #2068 for an example that converts Tropical Cyclone. The resulting implementation is actually significantly simpler than the original code.
If you would like to volunteer to convert a particular dataset, please comment on this issue to say that you're working on this.
Alternatives
We can also rehost most datasets (depending on license) on Hugging Face.
The text was updated successfully, but these errors were encountered: