-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Retry opening datasets #144
Comments
Thanks for raising this @charalamm, better error handling and tracking is certainly needed, see #101. It can be a little bit tricky to support consistently across Dask and direct loads though. Right now, a major refactor of the loading code is taking place to support hyperspectral data sources. As part of that work we are adding an IO driver abstraction that allows user to bring their own loader, mostly to enable efficient access to data sources that rasterio/gdal struggle with. Once completed, we should be in a much better position to experiment with various error handling approaches and to give library users more control over that aspect of things when they need it. Initially that would be implemented with various forms of callbacks into user code to make a decision or to keep track of failures, as we develop better understanding we will provide non-code mechanisms, like your suggested regex-based matching. My concern is with rasterio/GDAL boundary, at least in the past it was not always possible to bubble up GDAL errors in to Python code without losing some fidelity in error reporting (just because you see an error printed to stderr, doesn't mean Python has access to that same information in the exception object). In the meantime have you experimented with settings available within GDAL, things like |
Hello @Kirill888 thanks for your immediate response. Yes unfortunately my network is not great. I have experimented with the gdal environment variables but I did not notice any difference. I think that is because the reading status codes are 500 or GDAL can not even connect so |
Hello,
We are planning to use odc stac for some analysis. We have the data on azure and we accessing them with the
az://
prefix. In every analysis, when trying to read the files there are always some errors with the internet which result on the data missing from the final data structure.So far I have catched the following errors:
Do you think it is useful to add a mechanism to retry reading on some errors? I think I can work on a PR if you are interested in this feature. Feel free to close it if you are not interested
A possible approach?
Since some of these errors can be valid ones it should be on the user to decide I they want to retry or not and on what errors to retry. One option would be to allow the user define a list of regexes or strings and odc-stac can check if it should retry based on that. One problem is that GDAL is caching these errors so it might be needed to use
CPL_VSIL_CURL_NON_CACHED
The text was updated successfully, but these errors were encountered: