Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VSI plugins #1289

Merged
merged 8 commits into from
Feb 15, 2019
Merged

VSI plugins #1289

merged 8 commits into from
Feb 15, 2019

Conversation

tbonfort
Copy link
Member

This PR allows to dynamically register a VSI handler at runtime by passing a select number of callback functions. Our principle use-case is to be able to bypass the vsicurl handler when accessing remote datasets, to provide observability, custom error handling, retrying, logging, fine-grained concurrency control, fine-grained caching, and better stability under heavily multi-threaded loads.

gdal/port/cpl_vsi.h Outdated Show resolved Hide resolved
gdal/port/GNUmakefile Show resolved Hide resolved
gdal/port/cpl_vsil_plugin.cpp Outdated Show resolved Hide resolved
gdal/port/cpl_vsil_plugin.cpp Outdated Show resolved Hide resolved
@rcoup
Copy link
Member

rcoup commented Feb 12, 2019

@tbonfort possible to share some of that work so we can improve what is in GDAL core?

@tbonfort
Copy link
Member Author

tbonfort commented Feb 13, 2019

@rcoup I'm not able to share more code, but there really is nothing special to it. I should have explained a bit more that the use case is to call back into scripted language (python, go in our case), so you can easily add your own logging semantics, error handling, authentication, etc... i.e. nothing really applicable to the core gdal implementation.

Two take-backs that could be useful for vsicurl et. al. when dealing with COGs:

  • use a two-tiered block cache: one for the first block (supposing the blocksize is big enough to contain the full COG headers), and one for the remaining blocks. This can help lru evictions to not affect the header blocks, and the two caches can be sized based on expected usage.
  • implement a global lock on a block when downloading it, i.e. if a block is not present in the block-cache, ensure that there's a single GET to fetch it instead of having each concurrent thread do the same GET multiple times

@tbonfort
Copy link
Member Author

cc @sgillies , could this spark some interest for rasterio ?

@sgillies
Copy link
Contributor

@tbonfort I'm interested!

@rouault
Copy link
Member

rouault commented Feb 14, 2019

If anyone has concerns on the interface speak up. I'll merge this soon otherwise

@tbonfort
Copy link
Member Author

I'm wondering if it would make sense to add a way for the callbacks to explicitely return an error condition, without relying on it to correctly set errno and/or call CPLError.

Something in the lines of:

typedef struct {
  int code;
  char *message
} CallbackError;

typedef size_t (*VSIFilesystemPluginReadCallback) ( void *pFile,
               void *pBuffer, size_t nSize, size_t nCount , error *CallbackError);

@rouault
Copy link
Member

rouault commented Feb 14, 2019

What is the advantage of this compared to the callback implementation calling VSIError / CPLError ?

@tbonfort
Copy link
Member Author

I was thinking it could be easier for the e.g. python code to set an error message, but I'm fine if that is handled by the C callback function itself.

@rouault
Copy link
Member

rouault commented Feb 14, 2019

I see that the behavior of joining consecutive ranges can be disabled with a config option in the vsicurl implementation, but fail to see when that could be desirable.

Yes that's what I mentionned. In some rare circumstances it might help not to merge consecutive ranges, but this is not a good default

Maybe this part could be moved up a bit higher in the vsi call tree?

There's not so many places where that could be done, except in the VSIFReadMultiRangeL() function itself, that will be the one generally called by drivers. The VSICurlHandle::ReadMultiRange() implementation probably saves a buffer allocation/copy compared to your generic implementation though.

@tbonfort
Copy link
Member Author

Lets move the discussion about multi range requests to #1295

@rouault
Copy link
Member

rouault commented Feb 15, 2019

@tbonfort Are you happy with the PR as it ?

@tbonfort
Copy link
Member Author

@rouault ready for merge, can I let you squash this through the github interface to avoid re-running the CI?

@rouault rouault merged commit dd6412b into OSGeo:master Feb 15, 2019
@rouault
Copy link
Member

rouault commented Feb 15, 2019

Done

craigds pushed a commit to koordinates/gdal that referenced this pull request Dec 16, 2019
craigds pushed a commit to koordinates/gdal that referenced this pull request Jan 27, 2021
@mmomtchev
Copy link
Contributor

@rouault, @tbonfort, is there any documentation for this or an example plugin available somewhere?

@tbonfort
Copy link
Member Author

@mmomtchev there are no public implementations that I know of, and the only documentation available is the doxygen one. There is however a similar implementation that does not use this plugin but replicates its intent at https://github.com/airbusgeo/godal/blob/ee62c71eebf81d25ef07983ada2e4ba6ea7baaf7/godal.cpp#L1102 and is used to hand off vsi read-only access to a go function.

@sgillies
Copy link
Contributor

@mmomtchev @tbonfort there's a recent example in rasterio/rasterio#2141.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants