-
-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating a lightweight version of gdal dependencies #722
Comments
I considered this is the past. I'm not sure what are the implications of mixing the variants and/or making them incompatible with one another. For example, if we build fiona with the lite version but install the full, will it work as expected? Maybe if all the symbols are there but someone looking for a codec in the full version may be frustrated? Or will it work? I don't know the answer to all these questions. One could create lite versions of all the packages downstream but that would be quite confusing to the users. Maybe we can reduce the size of the end package with some binary stripping and removing/moving some files that are not usually used, like docs, etc. |
Normally, yes, as the API of a lite or a full build is the same. Well, to be more exact: it is almost the same. The only difference is in the GDALRegister_XXXX() or RegisterOGRXXXX() methods of each driver XXXX. So that could be an issue if some GDAL user would explicitly call a GDALRegister_XXXX()/RegisterOGRXXXX() that would be in the full package but not the lite one, but I bet > 99% of GDAL users just call GDALAllRegister(). Individual registration of drivers has never been promoted as a good pratice in GDAL documentation, and the doc points at GDALAllRegister() instead. Typically, searching in Debian sources, the only code that explicitly registers the GTiff driver (https://codesearch.debian.net/search?q=GDALRegister_GTiff&perpkg=1) or the Shapefile driver (https://codesearch.debian.net/search?q=RegisterOGRShape&perpkg=1) is GDAL itself That said the definition of a minimum/lite version of GDAL is going to be difficult to agree. Someone with vector-only workflows will have a very different idea from someone with raster-only workflows. An alternative would be to have a smaller libgdal and additional libgdal-XXXXX as we have done for Arrow/Parquet. But I should point that doing a plugin approach for too many plugins has consequences: in my development build, I build with all drivers as plugins (for drivers that support being built as plugin) to detect issues specific to plugin building, and this significantly increases the GDALRegisterAll() time to ~ 200 ms (instead of ~ 10 ms for a all-drivers-in-libgdal approach). Of course that's a bit extreme. For < 10 plugins, the perf should still be reasonable. Another downside of the plugin approach is that users must remember to install them... |
I like your idea of having all non-core drivers in separate packages. Then we could have various metapackages that include different subsets (ie. all raster drivers, all vector drivers, common drivers, all drivers, minimal drivers etc). Then we wouldn't need to install all drivers each time we want to build a package that uses GDAL.... But yes, it won't be obvious to a user who is trying to read their file that they have the wrong driver metapackage installed. |
How about a version that has the drivers to read vectors and rasters, but without the graphing/printing libraries? Also maybe without postgres, not sure how GDAL uses this for.
My idea was more to keep the existing libgdal so people would have them by default but if you need the lightweight version there is a way to not install stuff like poppler and jpeg libraries (poppler+poppler-data alone is 53MB) |
GDAL interaction with PostgreSQL/PostGIS is paramount. I couldn't imagine a build without it. No PG, no ETL. Any lite build should support it. |
Just seeing that Alpine Linux has packaged GDAL 3.6.2 with a number of drivers as plugins in extra packages. |
Hi all, I understand that everyone will have a different idea of what a minimal gdal is depending on their workflow. However, maybe we could take a non-controversial first step to make gdal more modular,
I think poppler may be a good candidate for that:
What do you guys think? |
So this would mean that gdal wouldn't support pdf's 'out of the box'? Would this be confusing for users? I don't use pdf's with gdal personally, but would be interested to hear from anyone who does... |
QGIS GeoPDF functionality relies on the GDAL PDF driver |
@gillins yes, it would mean that users who need the GDAL PDF driver would have to install QGIS could choose to depend on And yes, users / downstream packages may be impacted by any split of GDAL functionality into separate packages. I still think it is worth it in the long term. And that for poppler, that number may be rather limited. |
If we want, we can also avoid a change for existing users by making the core libgdal a package named Of course that also limits the usefulness of the change, as initially everyone will still use the meta package with everything, but it allows for packages depending on gdal to gradually move to depending on libgdal-core, leaving it to the end user to add specific gdal drivers to their requirements. (in any case, I am a big +1 on the idea of having a smaller core package!) |
Just to reinforce what @jorisvandenbossche said above. If we do this, in conda-forge, it has to be that way to avoid breakages. While that limits the impact, users who want a smaller gdal will know what to look for, while current uses who are OK with the current package won't feel the change. |
+1 from me about not breaking existing users, but being able to select a smaller gdal. How do we progress this @conda-forge/gdal ? Do we have a vote? |
Sounds good @jorisvandenbossche. This also means that |
I would prefer to make sure we have consensus than to have a vote. I suggest we put forward @jorisvandenbossche's suggestion in #722 (comment) as a way to proceed. It feels like we have a pretty good consensus on that option. If you like that idea, give it a thumbs up (if you haven't already). If you have concerns, give it a thumbs-down for now and comment about what your objection would be. Hopefully, we can address it and you'll become a thumbs up. There are remaining questions about which packages go in How does that sound? |
FWIW, I am |
Likewise. What's the benefit or is this just an exercise? I'm of the mind of adding more packages and making the default install even more robust, ie, hdf4, hdf5, geoparquet, latest CGAL, SFCGAL, GEOS, etc. |
@akrherz @PostholerCom , not sure what pain you are refering to? downstream could choose to continue to depend on gdal / libgal, which would not change anything as per @jorisvandenbossche proposal. Also, unless I am assessing this incorrectly (I may well be, plus maybe some deps of poppler would be required anyway), I am afraid we are not talking about a few kb.
|
Thanks @olivier-lacroix for quantifying the impact of poppler. The pain is that the original message denoted |
I do believe that Fiona's author would bless linking a lightweight GDAL (as Fiona binary wheels use a quite minimum GDAL), and especially since the PDF driver is of little practical use for Fiona / vector (well there are some GeoPDFs with features in them, but that's quite of an edge use case). |
An other advantage of a libgdal-core version would be that creating conda environments should speed up significantly. Especially on windows this would likely be quite a significant difference. E.g. for my CI tests for windows, creating and cleaning up de conda environment using mamba takes 12 minutes for an environment with gdal. For an environment without gdal it takes 5 minutes: |
This is going to be addressed in GDAL 3.9 per OSGeo/gdal#8648 + OSGeo/gdal#8695 |
Just a heads-up that GDAL 3.9 has been released now. Or is GDAL 3.9.1 recommended for this due to OSGeo/gdal#10096? |
OSGeo/gdal#10096 should have moderate consequence on pure-Conda-forge builds. The effect of this PR is more to allow (again) someone to use libgdal from conda-forge, and build a driver as a plugin (typically a proprietary one) against that libgdal that wasn't aware of that driver as build time. There are ongoing discussions about potential more modular builds of GDAL in conda-forge, but that might require evolutions in conda-build or using features we don't use yet (like the ability of dispatching build artifacts into multiple output packages). CC @hobu |
speaking about Poppler, another motivation for a libgdal-pdf package (with the Poppler backend) is that Poppler is GPL licenced. |
I confirm the above patch works for me in that situation in relation to our efforts with MrSID and Oracle plugins as described in #936. I also would very much like to see a Not only would this speed up solve time, it would help reduce the amount of rerendering churn that GDAL and its downstream users have to endure. |
are we sure that if we have multiple output packages for the same feedstock (let's say A=libgdal and B=gdal-poppler packages), and that only a dependency of B is updated (poppler), only B gets rebuilt? |
Nope. Both will get rebuilt. |
😦 Why? What is the point of multiple outputs then? |
It helps downstream packages to get something specific, or split different licenses, and other benefits that is mostly for downstream use. However, for gdal itself, if any of the gdal dependencies get updated, the whole feedstock will get rebuilt. |
We can have
The first one is not backwards compatible, while the second one is. |
We usually do the second option to avoid breakages. |
There's an interesting question regarding the "gdal" package (the Python bindings): should they depend on "libgdal-core" (most logical choice), or still "libgdal" (but that means that users couldn't use the GDAL Python bindings without installing all drivers, which could be undesirable). It seems difficult here to be fully backwards compatible |
We control what gdal depends on. We can change it to depend on libgdal-core without any concerns about backwards compatibility that I can see. |
my point was more for external users that install the "gdal" package and expect all GDAL drivers (but "libgdal-arrow-parquet") to be installed too. |
I see. I think that's hopefully a relatively minor inconvenience but I get your point. |
I guess we need to weight the pros and cons here: backward compatibility by making I don't have a horse in this race b/c I'm "in" the know and I can easily fix my workflows. Breaking changes are annoying but sometimes it is the opportunity we have to fix long standing issue/annoyances. Maybe the compromise would be to patch |
That's exactly a mechanism now available in core GDAL since https://gdal.org/development/rfc/rfc96_deferred_plugin_loading.html and currently used by libgdal-arrow-parquet: gdal-feedstock/recipe/build.sh Line 28 in 5edf3ab
|
Wow, that is awesome. I'm inclined for a breaking change then b/c it is super easy for the user to fix with that error message. |
I've split off the following packages in #948.
Any other packages that we should split? |
Number of deps went down from 113 to 57. Here's the dep list on macOS
We can remove a bit more. |
This is awesome! |
@olivier-lacroix, that's a dependency only on macOS |
Ah great @isuruf ! I am looking forward to this! Thanks a lot for your work on it :-) |
#948 is now merged to main, I think we can close this as completed. |
Comment:
Currently the
libgdal
feedstock includes a large number of dependencies that are not strictly required to use GDAL (e.g. poppler, postgres, ...). This bloats image sizes everytime we want to install a GDAL-related package likegeopandas
.Would it be possible to create a lightweight version of the feedstock?
Maybe something like
The text was updated successfully, but these errors were encountered: