-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GDAL does not refresh IAMRole creds on EC2 or ECS after 6 hours #1593
Comments
the process which uncovered this behaviour is using gdal’s vsis3 driver to open warped virtual mosaics held on s3, which in turn reference imagery held on s3, and do the delayed-compute warping and clipping specified by the VRTs. The process can take a while - and as Chris mentioned, credentials time out. |
@chris-bateman @adamsteer Can you test rouault@68ef68a ? (this is against master, but applies on top of 2.4 as well). I believe this should fix your issue, but haven't tested |
@chris-bateman @adamsteer ping ? |
Thanks for the patch. The system went in another direction so it wasn't easy to test. I will give it a try in the next few weeks on a dev system and let you know how I go. |
/vsis3/: for a long living file handle, refresh credentials coming from EC2/AIM (fixes #1593)
OK, I've merged this and backported to 3.0 and 2.4 branch as I think it should be safe, so that this can be included in the coming bugfix releases. Confirmation that it does fix the issue would still be great. |
Update: I now have reason to beileve this fix is sound and the problem I am experiencing lies elsewhere. See following update for more info and also https://lists.osgeo.org/pipermail/gdal-dev/2020-February/051719.html My experience with this fix is as follows:
Now, I can't completely rule out that I have made a dumb error somewhere along the line, but I am reasonably sure I am running code derived from the gdal-2.4.2 source. If I can produce a standalone test-case for the problem I will raise a separate issue documenting that. I am noting these issues here, for consideration of others who have this fix and are still experiencing similar issues. |
A further update to above. I have now experienced the same symptoms "ERROR 4:" even when using the explicit AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY variables in my container (e.g. not using IAM roles). The error did seem to happen after a long pause in system usage, so still seems to be related to some kind of timeout, but doesn't seem to be explained by the timeout of IAM role credentials time out since in theory I am not using them currently. I am able to exercise the gdal.VSICurlClearCache() call inside my container and when I exercised that call, the symptom disappeared and I was able to access the previously failing file successfully. So, the summary is, it could well be that the fix is sound, but that there is a second issue which causes similar symptoms even if IAM role credentials are not in use. See also: https://lists.osgeo.org/pipermail/gdal-dev/2020-February/051719.html |
The symptoms that this problem produced are somewhat similar to #1244 although the cause is quite different. |
GDAL version 2.4.0
Using Python
Running on Amazon Linux 2 with Docker running Ubuntu 19.04
Reading and writing to S3.
After 6 hours GDAL will fail to talk to S3 and end due to continuous failure.
The 6 hour limit appears to be built into ECS and EC2 despite the IAM role having its own session time. Confirmed by checking the token expiration on the instance.
Haven't been able to generate useful logs at this stage but running GDAL in debug mode now.
Also pulling ECS logs.
Confirmed issue is not present when using environmental variables with AWS keys and no IAMrole attached.
Expected behavior - refresh AWS temp credentials token correctly when required.
The text was updated successfully, but these errors were encountered: