-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use gdal driver detection based on output path for writing files #268
Conversation
@@ -1,6 +1,8 @@ | |||
import warnings | |||
import os | |||
|
|||
from osgeo_utils.auxiliary.util import GetOutputDriversFor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather avoid this dependency and only rely on the parts of GDAL that we are binding against in Cython.
Looking at the implementation in GDAL, it looks like it is mostly a matter of iterating over drivers that support write and checking the driver's DMD_EXTENSIONS
metadata item.
From there, it seems like we could handle this in a couple different ways:
- build a map of extensions to drivers when pyogrio is first initialized (roughly similar to how we use
DRIVERS
now) - do the driver lookup at runtime based on the extension of a specific path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, it might be possible to make some minor modifications to _ogr.pyx::ogr_list_drivers()
to return both the read / write capability of a driver as well as associated extensions, which we could then use to create an inverse map of extension to driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that we don't want to depend on the GDAL python bindings (that would be an additional dependency, which also has no wheels, and thus would defeat the purpose of our wheels as a way to get GDAL).
As Brendan mentioned, I think we can indeed use the metadata from GDAL, now that we have bindings to getting that information (_get_driver_metadata_item
). We have an open issue for this feature that mentions this as well (#220), and that issue also links to the equivalent PR in fiona to add this feature (Toblerity/Fiona#948), that might be closer to our code to look at compared to the GDAL implementation (EDIT: on a closer look, this specific fiona PR doesn't actually include their cython changes to support it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I also already had a look at the implementation in gdal and it is indeed mainly based on the drivers DMD_EXTENSIONS
metadata, with some extra logic.
I'm wondering, I also encountered a similar C++ function in gdal with even some more logic in it, would it be an option that is gets exposed in the C API so we could use it like that? This could deduplicate the logic in gdal itself as wel as make it reusable in fiona, rasterio, pyogrio,... communutils.cpp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When reviewing #270 I also noticed there is a bug in the python version of GetOutputDriversFor
so I opened a ticket in GDAL to report this: OSGeo/gdal#8277
Superseded by #270 |
Because I wanted to test .xlsx for #267, I was reminded that the driver detection for writing is really minimal at the moment.
For reading the driver detection is implicitly done by gdal resulting in automatic support of +- all gdal drivers. This PR is a proposal to follow the same principle: rely as much as possible on existing gdal logic to infer the driver from the output file.
Because it introduces a dependency on the gdal python bindings, I started with this first version as a POC. If the idea is supported I'll finish the implementation.
PS: I started this PR based on #267 to have the xlsx test case handy... but only the last commit is relevant for this PR.