-
Notifications
You must be signed in to change notification settings - Fork 383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow RasterDataset to accept list of files #1442
Allow RasterDataset to accept list of files #1442
Conversation
Here is the inital proposal. If Are there any collection of types that are appropriate for checking either "path-like" or "not list-like"? Note that we cannot check Also, mypy does not like that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For directory vs. file, what if we use os.path.exists
to determine whether or not the path is on the local filesystem. If it's local, use os.path.isdir
or os.path.isfile
. If it isn't, assume it's a path.
Alternative is to only support str
directories and list
files. I would also be satisfied with that solution if you want to keep it simple.
I think this is sufficient for a proof of concept. Wondering what @calebrob6 thinks as he's the one who's been asking for this feature for the longest. |
@adamjstewart I am moving this discussion over here. If we remove We could overcome this by checking both |
Hey all -- first, sorry it has taken me so long to get to this! I just tested this and it behaves exactly as I would expect, so that's awesome. My only concern is the situation in which someone overrides
This might produce unexpected behavior (e.g. if I make a RemoteRasterDataset that takes a list of URLs to Azure blob containers that have SAS tokens appended to the end of them, then a string passed to This seems minor and something that we could handle if it comes up. |
1b7ed03
to
93738fb
Compare
I'll try to take a closer look at this later this week. I definitely want to get this into the 0.5.0 release in August. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for taking so long to review this, I graduated and moved to Germany so I've been very preoccupied.
I still really want to sneak this into 0.5.0, which I still want to finish by tomorrow. I know this is very last minute, so if you need me to I'm happy to take over at any point to finish up this PR. Depending on which time zone you're in this may work very well.
Last request: can we do the same thing for VectorDataset? The code will be basically identical, and we only have 2 of them, so shouldn't take long.
93738fb
to
40c6d18
Compare
What is not yet fixed is:
@adamjstewart if you want to you can take over now. I hope I did not leave too big of a mess 😅 |
Same way we used to, with a call to glob.glob |
Too tired to finish this tonight, will work on this tomorrow. |
7141d0d
to
628801a
Compare
@@ -80,14 +80,17 @@ def __init__( | |||
cache: if True, cache file handle to speed up repeated sampling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamjstewart can you take a look at the docstring for `download? just above this line?
download: if True, download dataset and store it in the root directory
It may be hard to know what that root is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the majority of users will still use a single string in paths
, and the docs do say that paths
is one or more root directories, so hopefully they can figure it out. If users report confusion we can make this more clear or add a better error message in a future patch release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ready to merge now!
At the moment, we don't support passing a list if you also want to download/extract a dataset. I think this limitation makes sense, as people won't have an easy way to get a list of files if they don't already have them extracted anyway. But if someone asks for this feature, we can think about how to do it in a future patch release.
Thanks for a great first contribution, I think this is worth highlighting in the release notes as it unlocks a lot of potential for remote files or file filtering.
Looks good to me! Good job! Hopefully not my last contrib! :) I'll let you update the branch and push the button 🚀 |
Fix #1427