From 860772c553e9c54421f7759cb9441fff87eb0d27 Mon Sep 17 00:00:00 2001 From: Even Rouault Date: Thu, 2 Nov 2023 12:31:57 +0100 Subject: [PATCH 1/8] Add RFC96 text: Deferred in-tree C++ plugin loading --- doc/source/development/rfc/index.rst | 1 + .../rfc/rfc96_deferred_plugin_loading.rst | 448 ++++++++++++++++++ 2 files changed, 449 insertions(+) create mode 100644 doc/source/development/rfc/rfc96_deferred_plugin_loading.rst diff --git a/doc/source/development/rfc/index.rst b/doc/source/development/rfc/index.rst index fc4d7d783344..af3c8827d9c3 100644 --- a/doc/source/development/rfc/index.rst +++ b/doc/source/development/rfc/index.rst @@ -102,3 +102,4 @@ RFC list rfc93_update_feature rfc94_field_precision_width_metadata rfc95_standard_int_types + rfc96_deferred_plugin_loading diff --git a/doc/source/development/rfc/rfc96_deferred_plugin_loading.rst b/doc/source/development/rfc/rfc96_deferred_plugin_loading.rst new file mode 100644 index 000000000000..f3bad9c479f3 --- /dev/null +++ b/doc/source/development/rfc/rfc96_deferred_plugin_loading.rst @@ -0,0 +1,448 @@ +.. _rfc-96: + +================================================================== +RFC 96: Deferred in-tree C++ plugin loading +================================================================== + +============== ============================================= +Author: Even Rouault +Contact: even.rouault @ spatialys.com +Started: 2023-Nov-01 +Status: In development +Target: GDAL 3.9 +============== ============================================= + +Summary +------- + +This RFC adds a mechanism to defer the loading of in-tree C++ plugin drivers to +the point where their executable code is actually needed, and converts a number +of relevant drivers to use that mechanism. The aim is to allow for more modular +GDAL builds, while improving the performance of plugin loading. + +Context and motivation +---------------------- + +There are currently two ways of loading a GDAL C++ driver: + +- embedded in the core libgdal library. This is the default behavior + for drivers in the official GDAL source repository. + +- available in a shared library (.so, .dll, .dylib) in a directory where it + is dynamically loaded when GDALAllRegister() is called. This is what + out-of-tree drivers use, or in-tree drivers if enabling the + [GDAL|OGR]_ENABLE_DRIVER_FOO_PLUGIN=ON or GDAL_ENABLE_PLUGINS=ON + CMake options (cf https://gdal.org/development/building_from_source.html#build-drivers-as-plugins) + to build them as plugins. + +For packagers/distributors, the second option is convenient for in-tree drivers +that depend on external libraries that are big and/or have a big number of +dependencies (libparquet, etc) and that would substantially increase the size of +the core libgdal library, or which have licenses more restrictive than the MIT +license used by libgdal. + +However, there is a penalty at GDALAllRegister() time. For example, on Linux, +it takes ~ 300 ms to complete for a build with 126 plugins, which is a substantial +time for short lived GDAL-based processes (for example a script which would run +gdalinfo or ogrinfo on many files). This time is entirely spent in the dlopen() +method of the operating system and there is, to the best of our knowledge, +nothing we can do reduce it... besides limiting the amount of dynamic loading +(attempts have been made to load plugins in parallel in multiple threads, but +this does not improve total loading time) +For developers, that plugin loading phase is actually considerably slower, of +the order of ten seconds or more, when debugging GDAL under GDB with many plugin +drivers. + +Furthermore, loading drivers that are not needed also involves some +startup/teardown code of external libraries to be run, as well as more virtual +memory to be consumed. Hence this proposal of deferring the actual loading of +the shared library of the plugins until it is really needed. + +Design constraints +------------------ + +We want the new mechanism to be opt-in and fully backwards compatible: + +- to still allow out-of-tree drivers. + +- to still allow in-tree drivers, that are compatible of being built as plugins, + to be built in libgdal core, or as plugins depending on the CMake variables. + +- to progressively convert existing in-tree drivers to use it. + +Details +------- + +The main idea if that drivers using the new capability will register a proxy +driver with a new GDALDriverManager::DeclareDeferredPluginDriver() method. + +The proxy driver uses the hints provided in the GDALPluginDriverFeatures argument +to declare a minimum set of capabilities (GDAL_DCAP_RASTER, GDAL_DCAP_MULTIDIM_RASTER, +GDAL_DCAP_VECTOR, GDAL_DCAP_OPEN, etc.) to which it can answer directly, and +which are the ones used by GDALOpen() to open a dataset. For other metadata items, +it will fallback to loading the actual driver and forward the requests to it. + + +.. code-block:: cpp + + /** Declare a driver that will be loaded as a plugin, when actually needed. + * + * @param pszDriverName Driver name, such as returned by GetDescription() + * @param pszPluginFileName Plugin filename. e.g "ogr_Parquet.so" + * @param oFeatures Driver features + * + * @since 3.9 + */ + void GDALDriverManager::DeclareDeferredPluginDriver( + const char *pszDriverName, + const char *pszPluginFileName, + const GDALPluginDriverFeatures &oFeatures); + + +That method will also keep track of pszPluginFileName to avoid automatically +loading it in the GDALDriverManager::AutoLoadDrivers() method (that method +will only load out-of-tree drivers or in-tree drivers that have not been +converted to use DeclareDeferredPluginDriver()). + + +The GDALPluginDriverFeatures structure is defined below: + +.. code-block:: cpp + + /** Data structure used for plugin declaration. */ + struct CPL_DLL GDALPluginDriverFeatures + { + /** Identify method. Required for driver with open capabilities. + * If the method returns -1 (unknown), the underlying driver will be + * loaded by GDALDataset::Open(). + */ + int (*pfnIdentify)(GDALOpenInfo *) = nullptr; + + /** + * Returns a (possibly null) pointer to the Subdataset informational function + * from the subdataset file name. + */ + GDALSubdatasetInfo *(*pfnGetSubdatasetInfoFunc)(const char *pszFileName) = + nullptr; + + /** Long name. Must be equal to the value of + * GDAL_DMD_LONG_NAME on the underlying driver. */ + const char *pszLongName = nullptr; + + /** Extensions. Must be equal to the value of + * GDAL_DMD_EXTENSIONS on the underlying driver. */ + const char *pszExtensions = nullptr; + + /** Open option list. Must be equal to the value of + * GDAL_DMD_OPENOPTIONLIST on the underlying driver.*/ + const char *pszOpenOptionList = nullptr; + + /** Whether the driver exposes GDAL_DCAP_RASTER */ + bool bHasRasterCapabilities = false; + + /** Whether the driver exposes GDAL_DCAP_MULTIDIM_RASTER */ + bool bHasMultiDimRasterCapabilities = false; + + /** Whether the driver exposes GDAL_DCAP_VECTOR */ + bool bHasVectorCapabilities = false; + + /** Whether the driver exposes GDAL_DCAP_OPEN */ + bool bHasOpen = true; + + /** Whether the driver exposes GDAL_DCAP_CREATE */ + bool bHasCreate = false; + + /** Whether the driver exposes GDAL_DCAP_CREATE_MULTIDIMENSIONAL */ + bool bHasCreateMultiDimensional = false; + + /** Whether the driver exposes GDAL_DCAP_CREATE_COPY */ + bool bHasCreateCopy = false; + + /** Whether the driver exposes GDAL_DMD_SUBDATASETS */ + bool bHasSubdatasets = false; + + /** Whether the driver exposes GDAL_DCAP_MULTIPLE_VECTOR_LAYERS */ + bool bHasMultipleVectorLayers = false; + + /** Whether the driver exposes GDAL_DCAP_NONSPATIAL */ + bool bIsNonspatial = false; + }; + + +The main point is that they provide the Identify() method to the proxy driver. +That Identify() method must be compiled in libgdal itself, and thus be +defined in a C++ file that does not depend on any external library. +Similarly for the GetSubdatasetInfoFunc() optional method. + +When loading the actual driver, the GDALPluginDriverProxy::GetRealDriver() +method will check that all information set in GDALPluginDriverFeatures is +consistent with the actual metadata of the underlying driver, and will warn +when there are differences. + +GDALDataset::Open(), Create(), CreateCopy() methods are modified to not use +directly the pfnOpen, pfnCreate, pfnCreateCopy callbacks (that would be the ones +of the proxy driver, and thus nullptr), but to call new GetOpenCallback()/ +GetCreateCallback()/GetCreateCopyCallback() methods that the GDALProxyDriver +class overloads to return the function pointers of the real driver, once it +has loaded it. + +The DeclareDeferredPluginDriver() method checks if the file of the plugin +exists before registering it. If it is not available, a CPLDebug() message is +emitted. This allows to build a "universal" core libgdal, with plugins that can +be optionally available at runtime. + +Cherry-on-the-cake: GDALOpen() will given an explicit error message if it +identifies a dataset to a plugin that is not available at runtime. Example:: + + $ gdalinfo test.h5 + ERROR 4: `test.h5' not recognized as a supported file format. It could have + been recognized by driver HDF5, but plugin gdal_HDF5.so is not available + in your installation. + + +For each driver supporting deferred plugin loading, GDALAllRegister() must be +modified to call a driver-specific function that calls +GDALDriverManager::DeclareDeferredPluginDriver() (see example in below +paragraph). This code path is enabled only when the driver is built as plugin. + + +Example of changes to do on a simplified driver +----------------------------------------------- + +In the :file:`CMakeLists.txt` file of a driver, the new option CORE_SOURCES can be +passed to ``add_gdal_driver()`` to define source file(s) that must be built in +libgdal, even when the driver is built as a plugin. + +:: + + add_gdal_driver(TARGET gdal_FOO + SOURCES foo.cpp + CORE_SOURCES foo_core.cpp + PLUGIN_CAPABLE + STRONG_CXX_WFLAGS) + if (NOT TARGET gdal_FOO) + return() + endif() + gdal_standard_includes(gdal_FOO) + +A typical :file:`mydrivercore.h`` header will declare the identify method: + +.. code-block:: cpp + + #include "gdal_priv.h" + + // Used by both DeclareDeferredFOOPlugin() and GDALRegisterFoo() + constexpr const char* FOO_DRIVER_NAME = "FOO"; + constexpr const char* FOO_LONG_NAME = "The FOO format"; + constexpr const char* FOO_EXTENSIONS = "foo"; + + int CPL_DLL FOODatasetIdentify(GDALOpenInfo* poOpenInfo); + + +And :file:`mydrivercore.cpp` will contain the implementation of the identify method, +as well as a DeclareDeferredXXXPlugin() method that will be called by +GDALAllRegister() when the driver is built as a plugin (the PLUGIN_FILENAME +macro is automatically set by the CMake scripts with the filename of the +plugin, e.g. "gdal_FOO.so"): + +.. code-block:: cpp + + int FOODatasetIdentify(GDALOpenInfo* poOpenInfo) + { + return poOpenInfo->nHeaderBytes >= 3 && + memcmp(poOpenInfo->pabyHeader, "FOO", 3) == 0; + } + + #ifdef PLUGIN_FILENAME + void DeclareDeferredFOOPlugin() + { + if (GDALGetDriverByName(FOO_DRIVER_NAME) != nullptr) + { + return; + } + GDALPluginDriverFeatures oFeatures; + oFeatures.pfnIdentify = FOODatasetIdentify; + oFeatures.pszLongName = FOO_LONG_NAME; + oFeatures.pszExtensions = FOO_EXTENSIONS; + oFeatures.bHasRasterCapabilities = true; + GetGDALDriverManager()->DeclareDeferredPluginDriver( + FOO_DRIVER_NAME, PLUGIN_FILENAME, oFeatures); + } + #endif + + +The GDALRegisterFoo() method itself is just slightly modified to change +its pfnIdentify (and pfnGetSubdatasetInfoFunc) function pointers to the +function that has been moved to :file:`mydrivercore.cpp`. + +The modified :file:`gdalallregister.cpp` file will look like: + +.. code-block:: cpp + + void GDALAllRegister() + { + auto poDriverManager = GetGDALDriverManager(); + + // Deferred driver declarations must be done *BEFORE* AutoLoadDrivers() + #if defined(DEFERRED_FOO_DRIVER) + DeclareDeferredFOOPlugin(); + #endif + + // This will not load gdal_FOO if above DeclareDeferredFOOPlugin() + // has been called + poDriverManager->AutoLoadDrivers(); + + // Standard driver declarations below for drivers built inside libgdal + // ... + #if FRMT_foo + GDALRegisterFoo(); + #endif + } + + +Limitations +----------- + +That mechanism only applies to in-tree plugins, since it requires a fraction +of the driver code to be embedded in libgdal. Out-of-tree plugins will +still be fully loaded at :cpp:func:`GDALAllRegister` time (or at +:cpp:func:`GDALDriverManager::LoadPlugin` time) + +One could imagine a further enhancement for out-of-tree plugins where they +would be accompanied by a sidecar text file that would for example declare the +information of GDALPluginDriverFeatures, as well as a limited implementation +of the identify method as a regular expression. But that is out-of-scope of +this RFC. + +Changes in the loading of OGR Python drivers (see :ref:`rfc-76`) are also +out-of-scope of this RFC (they will continue to be loaded at +:cpp:func:`GDALAllRegister` time). + +Candidate implementation +------------------------ + +A candidate implementation has been started to implement all the core mechanism, +and convert the Parquet, netCDF and HDF5 drivers. The HDF5 plugin is actually +a good stress test for the deferred loading mechanism, since it incorporates 4 +drivers (HDF5, HDF5Image, BAG and S102) in the same shared object. The plan +is to update progressively all in-tree drivers that depend on third-party +libraries (that is the one that are built as plugins when setting the +GDAL_ENABLE_PLUGINS=YES CMake options). + +Tests have also been done with QGIS (with the changes at +https://github.com/qgis/QGIS/pull/55115) to check that the declared set of +metadata items in GDALPluginDriverFeatures is sufficient to avoid loading of the +actual drivers at QGIS startup (they are only loaded when a dataset of the format +handled by the driver is identified) + +Backward compatibility +---------------------- + +Expected to be backward compatible for most practical purposes. + +Pedantically, if external code would directly use the pfnOpen, pfnCreate, +pfnCreateCopy function pointers of a GDALDriver instance, it would see them +null before the actual driver is loading, but direct access to +those function pointers has never been documented (instead users should use +GDALOpen(), GDALCreate(), GDALCreateCopy() etc), and is not expected to be +done by code external to libgdal core. + +However, the candidate implementation hits an issue with the way the GDAL +CondaForge builds work currently. At time of writing, the GDAL CondaForge +build recipee does: + +- a regular GDAL build without Arrow/Parquet dependency (and thus without the + driver), whose libgdal.so goes in to the libgdal package. +- installs libarrow and libparquet +- does an incremental GDAL build with -DOGR_ENABLE_DRIVER_FOO_PLUGIN=ON to + generate ogr_Arrow.so and ogr_Parquet.so. However with the above new mechanism, + this will result in libgdal to be modified to have a DeclareDeferredOGRParquetPlugin + function, as well as including the identification method of the Parquet plugin. + But that modified libgdal.so is discarded currently, and the ogr_Parquet.so + plugin then depends on a identify method that is not implemented. + +The initial idea was that the build recipee would have to be modified to produce +all artifacts (libgdal.so and libparquet.so) at a single time, and dispatch +them appropriately in libgdal and libgdal-arrow-parquet packages, rather than +doing two builds. However, CondaForge builds support several libarrow versions, +and produce thus different Arrow/Parquet plugins, so this approach would not be +practial. + +To solve this, the following idea was implemented. Extract from the updated +:ref:`building_from_source` document:: + + Starting with GDAL 3.9, a number of in-tree drivers, that can be built as + plugins, are loaded in a deferred way. This involves that some part of their + code, which does not depend on external libraries, is included in core libgdal, + whereas most of the driver code is in a separated dynamically loaded library. + For builds where libgdal and its plugins are built in a single operation, this + is fully transparent to the user. + + For more specific builds where libgdal would be first built, and then plugin + drivers built in later incremental builds, this approach would not work, given + that the core libgdal built initially would lack code needed to declare the + plugin(s). + + In that situation, the user building GDAL will need to explicitly declare at + initial libgdal build time that one or several plugin(s) will be later built. + Note that it is safe to distribute such a libgdal library, even if the plugins + are not always available at runtime. + + This can be done with the following option: + + .. option:: GDAL_REGISTER_DRIVER__FOR_LATER_PLUGIN:BOOL=ON + + .. option:: OGR_REGISTER_DRIVER__FOR_LATER_PLUGIN:BOOL=ON + + Declares that a driver will be later built as a plugin. + + Setting this option to drivers not ready for it will lead to an explicit + CMake error. + + + For some drivers, like netCDF (only case at time of writing), the dataset + identification code embedded in libgdal, will depend on optional capabilities + of the dependent library (libnetcdf) + In that situation, it is desirable that the dependent library is available at + CMake configuration time for the core libgdal built, but disabled with + GDAL_USE_NETCDF=OFF. It must of course be re-enabled later when the plugin is + built. + + For example:: + + cmake .. -DGDAL_REGISTER_DRIVER_NETCDF_FOR_LATER_PLUGIN=ON -DGDAL_USE_NETCDF=OFF + cmake --build . + + cmake .. -DGDAL_USE_NETCDF=ON -DGDAL_ENABLE_DRIVER_NETCDF=ON -DGDAL_ENABLE_DRIVER_NETCDF_PLUGIN=ON + cmake --build . --target gdal_netCDF + + + For other drivers, GDAL_REGISTER_DRIVER__FOR_LATER_PLUGIN / + OGR_REGISTER_DRIVER__FOR_LATER_PLUGIN can be declared at + libgdal build time without requiring the dependent libraries needed to build + the pluging later to be available. + + +Documentation +------------- + +:ref:`raster_driver_tut` and :ref:`vector_driver_tut` will be updated to point +to this RFC. +:ref:`building_from_source` will receive the new paragraph mentionned above. + +Testing +------- + +A C++ test will be added testing that for one of the updated drivers, the +plugin is loaded in a deferred way in situations where this is expected, and +is not loaded in other situations. + +Related issues and PRs +---------------------- + +- https://github.com/OSGeo/gdal/compare/master...rouault:gdal:deferred_plugin?expand=1: candidate implementation + +Voting history +-------------- + +TBD From b7053db2f433275edf16286596e71f3ceb9738a5 Mon Sep 17 00:00:00 2001 From: Even Rouault Date: Mon, 6 Nov 2023 14:47:41 +0100 Subject: [PATCH 2/8] RFC 96: revise to avoid GDALPluginDriverFeatures --- .../rfc/rfc96_deferred_plugin_loading.rst | 187 +++++++++--------- 1 file changed, 94 insertions(+), 93 deletions(-) diff --git a/doc/source/development/rfc/rfc96_deferred_plugin_loading.rst b/doc/source/development/rfc/rfc96_deferred_plugin_loading.rst index f3bad9c479f3..a09964f82aea 100644 --- a/doc/source/development/rfc/rfc96_deferred_plugin_loading.rst +++ b/doc/source/development/rfc/rfc96_deferred_plugin_loading.rst @@ -74,9 +74,54 @@ Details ------- The main idea if that drivers using the new capability will register a proxy -driver with a new GDALDriverManager::DeclareDeferredPluginDriver() method. +driver (of type GDALPluginDriverProxy, or extending it) with a new +GDALDriverManager::DeclareDeferredPluginDriver() method. -The proxy driver uses the hints provided in the GDALPluginDriverFeatures argument +.. code-block:: cpp + + /** Proxy for a plugin driver. + * + * Such proxy must be registered with + * GDALDriverManager::DeclareDeferredPluginDriver(). + * + * If the real driver defines any of the following metadata items, the + * proxy driver should also define them with the same value: + *
    + *
  • GDAL_DMD_LONGNAME
  • + *
  • GDAL_DMD_EXTENSIONS
  • + *
  • GDAL_DMD_EXTENSION
  • + *
  • GDAL_DCAP_RASTER
  • + *
  • GDAL_DCAP_MULTIDIM_RASTER
  • + *
  • GDAL_DCAP_VECTOR
  • + *
  • GDAL_DCAP_GNM
  • + *
  • GDAL_DMD_OPENOPTIONLIST
  • + *
  • GDAL_DMD_SUBDATASETS
  • + *
  • GDAL_DCAP_MULTIPLE_VECTOR_LAYERS
  • + *
  • GDAL_DCAP_NONSPATIAL
  • + *
+ * + * The pfnIdentify and pfnGetSubdatasetInfoFunc callbacks, if they are + * defined in the real driver, should also be set on the proxy driver. + * + * Furthermore, the following metadata items must be defined if the real + * driver sets the corresponding callback: + *
    + *
  • GDAL_DCAP_OPEN: must be set to YES if the real driver defines pfnOpen
  • + *
  • GDAL_DCAP_CREATE: must be set to YES if the real driver defines pfnCreate
  • + *
  • GDAL_DCAP_CREATE_MULTIDIMENSIONAL: must be set to YES if the real driver defines pfnCreateMultiDimensional
  • + *
  • GDAL_DCAP_CREATECOPY: must be set to YES if the real driver defines pfnCreateCopy
  • + *
+ * + * @since 3.9 + */ + class GDALPluginDriverProxy : public GDALDriver + { + public: + GDALPluginDriverProxy(const std::string &osPluginFileName); + } + + +The proxy driver uses the metadata items that have been set on it to declare a minimum set of capabilities (GDAL_DCAP_RASTER, GDAL_DCAP_MULTIDIM_RASTER, GDAL_DCAP_VECTOR, GDAL_DCAP_OPEN, etc.) to which it can answer directly, and which are the ones used by GDALOpen() to open a dataset. For other metadata items, @@ -87,95 +132,25 @@ it will fallback to loading the actual driver and forward the requests to it. /** Declare a driver that will be loaded as a plugin, when actually needed. * - * @param pszDriverName Driver name, such as returned by GetDescription() - * @param pszPluginFileName Plugin filename. e.g "ogr_Parquet.so" - * @param oFeatures Driver features + * @param poProxyDriver Plugin driver proxy * * @since 3.9 */ - void GDALDriverManager::DeclareDeferredPluginDriver( - const char *pszDriverName, - const char *pszPluginFileName, - const GDALPluginDriverFeatures &oFeatures); + void GDALDriverManager::DeclareDeferredPluginDriver(GDALPluginDriverProxy *poProxyDriver); -That method will also keep track of pszPluginFileName to avoid automatically +DeclareDeferredPluginDriver() method will also keep track of the plugin filename to avoid automatically loading it in the GDALDriverManager::AutoLoadDrivers() method (that method will only load out-of-tree drivers or in-tree drivers that have not been converted to use DeclareDeferredPluginDriver()). - -The GDALPluginDriverFeatures structure is defined below: - -.. code-block:: cpp - - /** Data structure used for plugin declaration. */ - struct CPL_DLL GDALPluginDriverFeatures - { - /** Identify method. Required for driver with open capabilities. - * If the method returns -1 (unknown), the underlying driver will be - * loaded by GDALDataset::Open(). - */ - int (*pfnIdentify)(GDALOpenInfo *) = nullptr; - - /** - * Returns a (possibly null) pointer to the Subdataset informational function - * from the subdataset file name. - */ - GDALSubdatasetInfo *(*pfnGetSubdatasetInfoFunc)(const char *pszFileName) = - nullptr; - - /** Long name. Must be equal to the value of - * GDAL_DMD_LONG_NAME on the underlying driver. */ - const char *pszLongName = nullptr; - - /** Extensions. Must be equal to the value of - * GDAL_DMD_EXTENSIONS on the underlying driver. */ - const char *pszExtensions = nullptr; - - /** Open option list. Must be equal to the value of - * GDAL_DMD_OPENOPTIONLIST on the underlying driver.*/ - const char *pszOpenOptionList = nullptr; - - /** Whether the driver exposes GDAL_DCAP_RASTER */ - bool bHasRasterCapabilities = false; - - /** Whether the driver exposes GDAL_DCAP_MULTIDIM_RASTER */ - bool bHasMultiDimRasterCapabilities = false; - - /** Whether the driver exposes GDAL_DCAP_VECTOR */ - bool bHasVectorCapabilities = false; - - /** Whether the driver exposes GDAL_DCAP_OPEN */ - bool bHasOpen = true; - - /** Whether the driver exposes GDAL_DCAP_CREATE */ - bool bHasCreate = false; - - /** Whether the driver exposes GDAL_DCAP_CREATE_MULTIDIMENSIONAL */ - bool bHasCreateMultiDimensional = false; - - /** Whether the driver exposes GDAL_DCAP_CREATE_COPY */ - bool bHasCreateCopy = false; - - /** Whether the driver exposes GDAL_DMD_SUBDATASETS */ - bool bHasSubdatasets = false; - - /** Whether the driver exposes GDAL_DCAP_MULTIPLE_VECTOR_LAYERS */ - bool bHasMultipleVectorLayers = false; - - /** Whether the driver exposes GDAL_DCAP_NONSPATIAL */ - bool bIsNonspatial = false; - }; - - -The main point is that they provide the Identify() method to the proxy driver. +The main point is that drivers set the Identify() method on the proxy driver. That Identify() method must be compiled in libgdal itself, and thus be defined in a C++ file that does not depend on any external library. Similarly for the GetSubdatasetInfoFunc() optional method. When loading the actual driver, the GDALPluginDriverProxy::GetRealDriver() -method will check that all information set in GDALPluginDriverFeatures is +method will check that all information set in its metadata is consistent with the actual metadata of the underlying driver, and will warn when there are differences. @@ -233,16 +208,17 @@ A typical :file:`mydrivercore.h`` header will declare the identify method: // Used by both DeclareDeferredFOOPlugin() and GDALRegisterFoo() constexpr const char* FOO_DRIVER_NAME = "FOO"; - constexpr const char* FOO_LONG_NAME = "The FOO format"; - constexpr const char* FOO_EXTENSIONS = "foo"; int CPL_DLL FOODatasetIdentify(GDALOpenInfo* poOpenInfo); + void CPL_DLL FOODriverSetCommonMetadata(GDALDriver *poDriver); And :file:`mydrivercore.cpp` will contain the implementation of the identify method, -as well as a DeclareDeferredXXXPlugin() method that will be called by -GDALAllRegister() when the driver is built as a plugin (the PLUGIN_FILENAME -macro is automatically set by the CMake scripts with the filename of the +a ``FOODriverSetCommonMetadata()`` method (with most of the content of the normal +driver registration method, except for function pointers such as pfnOpen, pfnCreate, +pfnCreateCopy or pfnCreateMultiDimensional), as well as a ``DeclareDeferredXXXPlugin()`` +method that will be called by GDALAllRegister() when the driver is built as a plugin +(the PLUGIN_FILENAME macro is automatically set by the CMake scripts with the filename of the plugin, e.g. "gdal_FOO.so"): .. code-block:: cpp @@ -253,6 +229,17 @@ plugin, e.g. "gdal_FOO.so"): memcmp(poOpenInfo->pabyHeader, "FOO", 3) == 0; } + // Called both by DeclareDeferredFOOPlugin() and GDALRegisterFoo() + void FOODriverSetCommonMetadata(GDALDriver* poDriver) + { + poDriver->SetDescription(FOO_DRIVER_NAME); + poDriver->SetMetadataItem(GDAL_DMD_LONGNAME, "The FOO format"); + poDriver->SetMetadataItem(GDAL_DCAP_RASTER, "YES"); + poDriver->SetMetadataItem(GDAL_DMD_EXTENSION, "foo"); + poDriver->pfnIdentify = FOODatasetIdentify; + poDriver->SetMetadataItem(GDAL_DCAP_OPEN, "YES"); // since the actual driver defines pfnOpen + } + #ifdef PLUGIN_FILENAME void DeclareDeferredFOOPlugin() { @@ -260,20 +247,34 @@ plugin, e.g. "gdal_FOO.so"): { return; } - GDALPluginDriverFeatures oFeatures; - oFeatures.pfnIdentify = FOODatasetIdentify; - oFeatures.pszLongName = FOO_LONG_NAME; - oFeatures.pszExtensions = FOO_EXTENSIONS; - oFeatures.bHasRasterCapabilities = true; - GetGDALDriverManager()->DeclareDeferredPluginDriver( - FOO_DRIVER_NAME, PLUGIN_FILENAME, oFeatures); + auto poDriver = new GDALPluginDriverProxy(PLUGIN_FILENAME); + FOODriverSetCommonMetadata(poDriver); + GetGDALDriverManager()->DeclareDeferredPluginDriver(poDriver); } #endif -The GDALRegisterFoo() method itself is just slightly modified to change -its pfnIdentify (and pfnGetSubdatasetInfoFunc) function pointers to the -function that has been moved to :file:`mydrivercore.cpp`. +The GDALRegisterFoo() method itself, which is defined in the plugin code, +calls ``FOODriverSetCommonMetadata``, +and defines the pfnOpen, pfnCreate, pfnCreateCopy, pfnCreateMultiDimensional +callbacks when they exist: + +.. code-block:: cpp + + void GDALRegisterFoo() + { + if (!GDAL_CHECK_VERSION(DRIVER_NAME)) + return; + + if (GDALGetDriverByName(DRIVER_NAME) != nullptr) + return; + + GDALDriver *poDriver = new GDALDriver(); + FOODriverSetCommonMetadata(poDriver); + poDriver->pfnOpen = FOODataset::Open; + GetGDALDriverManager()->RegisterDriver(poDriver); + } + The modified :file:`gdalallregister.cpp` file will look like: @@ -310,7 +311,7 @@ still be fully loaded at :cpp:func:`GDALAllRegister` time (or at One could imagine a further enhancement for out-of-tree plugins where they would be accompanied by a sidecar text file that would for example declare the -information of GDALPluginDriverFeatures, as well as a limited implementation +driver capabilities, as well as a limited implementation of the identify method as a regular expression. But that is out-of-scope of this RFC. From 492dd735bc2ee420b72c1b53cf45ec6e4448d94b Mon Sep 17 00:00:00 2001 From: Even Rouault Date: Tue, 7 Nov 2023 15:34:38 +0100 Subject: [PATCH 3/8] RFC96: add out-of-tree deferred loaded plugin capability --- .../rfc/rfc96_deferred_plugin_loading.rst | 31 ++++++++++++++----- 1 file changed, 24 insertions(+), 7 deletions(-) diff --git a/doc/source/development/rfc/rfc96_deferred_plugin_loading.rst b/doc/source/development/rfc/rfc96_deferred_plugin_loading.rst index a09964f82aea..c50eab764ab0 100644 --- a/doc/source/development/rfc/rfc96_deferred_plugin_loading.rst +++ b/doc/source/development/rfc/rfc96_deferred_plugin_loading.rst @@ -1,7 +1,7 @@ .. _rfc-96: ================================================================== -RFC 96: Deferred in-tree C++ plugin loading +RFC 96: Deferred C++ plugin loading ================================================================== ============== ============================================= @@ -15,11 +15,15 @@ Target: GDAL 3.9 Summary ------- -This RFC adds a mechanism to defer the loading of in-tree C++ plugin drivers to +This RFC adds a mechanism to defer the loading of C++ plugin drivers to the point where their executable code is actually needed, and converts a number of relevant drivers to use that mechanism. The aim is to allow for more modular GDAL builds, while improving the performance of plugin loading. +It mostly targets in-tree plugin drivers, but it also provides a way for +out-of-tree plugin drivers to benefit from deferred loading capabilities, +provided libgdal is built in a specific way + Context and motivation ---------------------- @@ -70,6 +74,10 @@ We want the new mechanism to be opt-in and fully backwards compatible: - to progressively convert existing in-tree drivers to use it. +- to provide the capability to out-of-tree drivers to optionally benefit from + the new capability, provided they build GDAL with the code needed to declared + a plugin proxy driver. + Details ------- @@ -300,15 +308,24 @@ The modified :file:`gdalallregister.cpp` file will look like: #endif } +Out-of-tree deferred loaded plugins ++++++++++++++++++++++++++++++++++++ + +Out-of-tree drivers can also benefit from the deferred loading capability, provided +libgdal is built with CMake variable(s) pointing to external code containing the +code for registering a proxy driver. + +This can be done with the following CMake option: + +.. option:: ADD_EXTERNAL_DEFERRED_PLUGIN_:FILEPATH=/path/to/some/file.cpp + +The pointed file must declare a ``void DeclareDeferred(void)`` +method with C linkage that takes care of creating a GDALPluginDriverProxy +instance and calling GDALDriverManager::DeclareDeferredPluginDriver() on it. Limitations ----------- -That mechanism only applies to in-tree plugins, since it requires a fraction -of the driver code to be embedded in libgdal. Out-of-tree plugins will -still be fully loaded at :cpp:func:`GDALAllRegister` time (or at -:cpp:func:`GDALDriverManager::LoadPlugin` time) - One could imagine a further enhancement for out-of-tree plugins where they would be accompanied by a sidecar text file that would for example declare the driver capabilities, as well as a limited implementation From e6fddcda6b4e55f0dab12e2b8fd8517a2b9c4230 Mon Sep 17 00:00:00 2001 From: Even Rouault Date: Wed, 8 Nov 2023 15:01:01 +0100 Subject: [PATCH 4/8] RFC96: mention another backward compatibility issue --- .../development/rfc/rfc96_deferred_plugin_loading.rst | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/doc/source/development/rfc/rfc96_deferred_plugin_loading.rst b/doc/source/development/rfc/rfc96_deferred_plugin_loading.rst index c50eab764ab0..4301047bf7c9 100644 --- a/doc/source/development/rfc/rfc96_deferred_plugin_loading.rst +++ b/doc/source/development/rfc/rfc96_deferred_plugin_loading.rst @@ -358,8 +358,15 @@ Backward compatibility Expected to be backward compatible for most practical purposes. -Pedantically, if external code would directly use the pfnOpen, pfnCreate, -pfnCreateCopy function pointers of a GDALDriver instance, it would see them +Drivers that would request a driver instance with GDALGetDriverByName() may +now get a GDALPluginDriverProxy instance instead of the "real" driver instance. +This is usually not an issue as few drivers subclass GDALDriver, but that issue +was hit on the PostGISRasterDriver that did subclass it. The solution was to +store the real PostGISRasterDriver instance when it is built in a global variable, +and use that global variable instead of the one returned by GDALGetDriverByName(). + +Another potential issue is that if external code would directly use the pfnOpen, pfnCreate, +pfnCreateCopy, etc. function pointers of a GDALDriver instance, it would see them null before the actual driver is loading, but direct access to those function pointers has never been documented (instead users should use GDALOpen(), GDALCreate(), GDALCreateCopy() etc), and is not expected to be From 0aeac9a92856841da0b79c108caf53cfbcec0788 Mon Sep 17 00:00:00 2001 From: Even Rouault Date: Sun, 12 Nov 2023 19:42:45 +0100 Subject: [PATCH 5/8] RFC96; update with custom installation message --- .../rfc/rfc96_deferred_plugin_loading.rst | 40 +++++++++++++++---- 1 file changed, 32 insertions(+), 8 deletions(-) diff --git a/doc/source/development/rfc/rfc96_deferred_plugin_loading.rst b/doc/source/development/rfc/rfc96_deferred_plugin_loading.rst index 4301047bf7c9..b7f9cabec821 100644 --- a/doc/source/development/rfc/rfc96_deferred_plugin_loading.rst +++ b/doc/source/development/rfc/rfc96_deferred_plugin_loading.rst @@ -391,9 +391,9 @@ all artifacts (libgdal.so and libparquet.so) at a single time, and dispatch them appropriately in libgdal and libgdal-arrow-parquet packages, rather than doing two builds. However, CondaForge builds support several libarrow versions, and produce thus different Arrow/Parquet plugins, so this approach would not be -practial. +practical. -To solve this, the following idea was implemented. Extract from the updated +To solve this, the following idea has been implemented. Extract from the updated :ref:`building_from_source` document:: Starting with GDAL 3.9, a number of in-tree drivers, that can be built as @@ -403,6 +403,29 @@ To solve this, the following idea was implemented. Extract from the updated For builds where libgdal and its plugins are built in a single operation, this is fully transparent to the user. + When a plugin driver is known of core libgdal, but not available as a plugin at + runtime, GDAL will inform the user that the plugin is not available, but could + be installed. It is possible to give more hints on how to install a plugin + by setting the following option: + + .. option:: GDAL_DRIVER__PLUGIN_INSTALLATION_MESSAGE:STRING + + .. option:: OGR_DRIVER__PLUGIN_INSTALLATION_MESSAGE:STRING + + Custom message to give a hint to the user how to install a missing plugin + + + For example, if doing a build with:: + + cmake .. -DOGR_DRIVER_PARQUET_PLUGIN_INSTALLATION_MESSAGE="You may install it with with 'conda install -c conda-forge libgdal-arrow-parquet'" + + and opening a Parquet file while the plugin is not installed will display the + follwing error:: + + $ ogrinfo poly.parquet + ERROR 4: `poly.parquet' not recognized as a supported file format. It could have been recognized by driver Parquet, but plugin ogr_Parquet.so is not available in your installation. You may install it with with 'conda install -c conda-forge libgdal-arrow-parquet' + + For more specific builds where libgdal would be first built, and then plugin drivers built in later incremental builds, this approach would not work, given that the core libgdal built initially would lack code needed to declare the @@ -425,15 +448,16 @@ To solve this, the following idea was implemented. Extract from the updated CMake error. - For some drivers, like netCDF (only case at time of writing), the dataset - identification code embedded in libgdal, will depend on optional capabilities - of the dependent library (libnetcdf) + For some drivers (ECW, HEIF, JP2KAK, JPEG, JPEGXL, KEA, LERC, MrSID, + MSSQLSpatial, netCDF, OpenJPEG, PDF, TileDB, WEBP), the metadata and/or dataset + identification code embedded on libgdal, will depend on optional capabilities + of the dependent library (e.g. libnetcdf for netCDF) In that situation, it is desirable that the dependent library is available at CMake configuration time for the core libgdal built, but disabled with - GDAL_USE_NETCDF=OFF. It must of course be re-enabled later when the plugin is + GDAL_USE_=OFF. It must of course be re-enabled later when the plugin is built. - For example:: + For example for netCDF:: cmake .. -DGDAL_REGISTER_DRIVER_NETCDF_FOR_LATER_PLUGIN=ON -DGDAL_USE_NETCDF=OFF cmake --build . @@ -465,7 +489,7 @@ is not loaded in other situations. Related issues and PRs ---------------------- -- https://github.com/OSGeo/gdal/compare/master...rouault:gdal:deferred_plugin?expand=1: candidate implementation +- https://github.com/OSGeo/gdal/pull/8695: candidate implementation Voting history -------------- From 123d30f57db53c2bc1cfd03b9a6ca3f234abaefb Mon Sep 17 00:00:00 2001 From: Even Rouault Date: Tue, 14 Nov 2023 20:16:01 +0100 Subject: [PATCH 6/8] RFC96 text: add GDAL_DMD_CONNECTION_PREFIX and GDAL_DCAP_VECTOR_TRANSLATE_FROM to the list of items that must be set on the proxy --- .../development/rfc/rfc96_deferred_plugin_loading.rst | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/doc/source/development/rfc/rfc96_deferred_plugin_loading.rst b/doc/source/development/rfc/rfc96_deferred_plugin_loading.rst index b7f9cabec821..f8b237bdffae 100644 --- a/doc/source/development/rfc/rfc96_deferred_plugin_loading.rst +++ b/doc/source/development/rfc/rfc96_deferred_plugin_loading.rst @@ -98,14 +98,16 @@ GDALDriverManager::DeclareDeferredPluginDriver() method. *
  • GDAL_DMD_LONGNAME
  • *
  • GDAL_DMD_EXTENSIONS
  • *
  • GDAL_DMD_EXTENSION
  • + *
  • GDAL_DMD_OPENOPTIONLIST
  • + *
  • GDAL_DMD_SUBDATASETS
  • + *
  • GDAL_DMD_CONNECTION_PREFIX
  • *
  • GDAL_DCAP_RASTER
  • *
  • GDAL_DCAP_MULTIDIM_RASTER
  • *
  • GDAL_DCAP_VECTOR
  • *
  • GDAL_DCAP_GNM
  • - *
  • GDAL_DMD_OPENOPTIONLIST
  • - *
  • GDAL_DMD_SUBDATASETS
  • *
  • GDAL_DCAP_MULTIPLE_VECTOR_LAYERS
  • *
  • GDAL_DCAP_NONSPATIAL
  • + *
  • GDAL_DCAP_VECTOR_TRANSLATE_FROM
  • * * * The pfnIdentify and pfnGetSubdatasetInfoFunc callbacks, if they are From 50e2f7de03c464549108886b924a97bec4784ab2 Mon Sep 17 00:00:00 2001 From: Even Rouault Date: Wed, 15 Nov 2023 18:01:25 +0100 Subject: [PATCH 7/8] RFC96 text: add missing word --- doc/source/development/rfc/rfc96_deferred_plugin_loading.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/development/rfc/rfc96_deferred_plugin_loading.rst b/doc/source/development/rfc/rfc96_deferred_plugin_loading.rst index f8b237bdffae..f161cf8a89e1 100644 --- a/doc/source/development/rfc/rfc96_deferred_plugin_loading.rst +++ b/doc/source/development/rfc/rfc96_deferred_plugin_loading.rst @@ -50,7 +50,7 @@ it takes ~ 300 ms to complete for a build with 126 plugins, which is a substanti time for short lived GDAL-based processes (for example a script which would run gdalinfo or ogrinfo on many files). This time is entirely spent in the dlopen() method of the operating system and there is, to the best of our knowledge, -nothing we can do reduce it... besides limiting the amount of dynamic loading +nothing we can do to reduce it... besides limiting the amount of dynamic loading (attempts have been made to load plugins in parallel in multiple threads, but this does not improve total loading time) For developers, that plugin loading phase is actually considerably slower, of From ce1f2038d41bd808022fc6bae9551c6bb2157443 Mon Sep 17 00:00:00 2001 From: Even Rouault Date: Fri, 17 Nov 2023 11:17:46 +0100 Subject: [PATCH 8/8] RFC96: update voting history and status --- doc/source/development/rfc/rfc96_deferred_plugin_loading.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/source/development/rfc/rfc96_deferred_plugin_loading.rst b/doc/source/development/rfc/rfc96_deferred_plugin_loading.rst index f161cf8a89e1..244f86b09b06 100644 --- a/doc/source/development/rfc/rfc96_deferred_plugin_loading.rst +++ b/doc/source/development/rfc/rfc96_deferred_plugin_loading.rst @@ -8,7 +8,7 @@ RFC 96: Deferred C++ plugin loading Author: Even Rouault Contact: even.rouault @ spatialys.com Started: 2023-Nov-01 -Status: In development +Status: Adopted Target: GDAL 3.9 ============== ============================================= @@ -496,4 +496,4 @@ Related issues and PRs Voting history -------------- -TBD ++1 from PSC members KurtS, HowardB, JukkaR, JavierJS and EvenR