GDAL does not refresh IAMRole creds on EC2 or ECS after 6 hours #1593

chris-bateman · 2019-05-28T09:57:12Z

GDAL version 2.4.0
Using Python
Running on Amazon Linux 2 with Docker running Ubuntu 19.04
Reading and writing to S3.

After 6 hours GDAL will fail to talk to S3 and end due to continuous failure.
The 6 hour limit appears to be built into ECS and EC2 despite the IAM role having its own session time. Confirmed by checking the token expiration on the instance.

Haven't been able to generate useful logs at this stage but running GDAL in debug mode now.
Also pulling ECS logs.

Confirmed issue is not present when using environmental variables with AWS keys and no IAMrole attached.

Expected behavior - refresh AWS temp credentials token correctly when required.

adamsteer · 2019-05-28T10:29:32Z

the process which uncovered this behaviour is using gdal’s vsis3 driver to open warped virtual mosaics held on s3, which in turn reference imagery held on s3, and do the delayed-compute warping and clipping specified by the VRTs. The process can take a while - and as Chris mentioned, credentials time out.

rouault · 2019-06-07T18:52:12Z

@chris-bateman @adamsteer Can you test rouault@68ef68a ? (this is against master, but applies on top of 2.4 as well). I believe this should fix your issue, but haven't tested

rouault · 2019-06-12T18:44:36Z

@chris-bateman @adamsteer ping ?

ghost · 2019-06-14T03:46:56Z

Thanks for the patch. The system went in another direction so it wasn't easy to test.

I will give it a try in the next few weeks on a dev system and let you know how I go.

/vsis3/: for a long living file handle, refresh credentials coming from EC2/AIM (fixes #1593)

…om EC2/AIM (fixes #1593)

rouault · 2019-06-19T09:35:47Z

OK, I've merged this and backported to 3.0 and 2.4 branch as I think it should be safe, so that this can be included in the coming bugfix releases. Confirmation that it does fix the issue would still be great.

jonseymour · 2020-02-12T17:26:06Z

Update: I now have reason to beileve this fix is sound and the problem I am experiencing lies elsewhere. See following update for more info and also https://lists.osgeo.org/pipermail/gdal-dev/2020-February/051719.html

My experience with this fix is as follows:

I installed my own build of gdal 2.4.2 from the source tar
I still experienced issues with gdal over vrt files hosted on AWS s3 failing after the container in which GDAL had been running and had been up longer than the AWS token expiry period (~ 6 hours).
I didn't experience the issue with physical tiff files hosted on s3 - only vrt files across physical tiff files hosted on s3
I also experienced similar issues for some files even without the 6 hour delay, but (not for all files of the failing type).
I could not reproduce the issues with the same code and same files running in an environment that does not use IAM role-based authentication (e.g. one that uses a credentials profile)
The issues with these files disappeared when I injected AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY variables into the AWS containers.

Now, I can't completely rule out that I have made a dumb error somewhere along the line, but I am reasonably sure I am running code derived from the gdal-2.4.2 source. If I can produce a standalone test-case for the problem I will raise a separate issue documenting that.

I am noting these issues here, for consideration of others who have this fix and are still experiencing similar issues.

jonseymour · 2020-02-24T00:27:03Z

A further update to above. I have now experienced the same symptoms "ERROR 4:" even when using the explicit AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY variables in my container (e.g. not using IAM roles). The error did seem to happen after a long pause in system usage, so still seems to be related to some kind of timeout, but doesn't seem to be explained by the timeout of IAM role credentials time out since in theory I am not using them currently.

I am able to exercise the gdal.VSICurlClearCache() call inside my container and when I exercised that call, the symptom disappeared and I was able to access the previously failing file successfully.

So, the summary is, it could well be that the fix is sound, but that there is a second issue which causes similar symptoms even if IAM role credentials are not in use.

See also: https://lists.osgeo.org/pipermail/gdal-dev/2020-February/051719.html

jonseymour · 2020-02-25T23:32:45Z

The symptoms that this problem produced are somewhat similar to #1244 although the cause is quite different.

rouault closed this as completed in 68ef68a Jun 19, 2019

rouault added a commit that referenced this issue Jun 19, 2019

Merge pull request #1627 from rouault/fix_1593

dabadf7

/vsis3/: for a long living file handle, refresh credentials coming from EC2/AIM (fixes #1593)

rouault added a commit that referenced this issue Jun 19, 2019

/vsis3/: for a long living file handle, refresh credentials coming fr…

6df253b

…om EC2/AIM (fixes #1593)

rouault added a commit that referenced this issue Jun 19, 2019

/vsis3/: for a long living file handle, refresh credentials coming fr…

10f825e

…om EC2/AIM (fixes #1593)

rouault added this to the 2.4.2 milestone Jun 20, 2019

rouault mentioned this issue Feb 13, 2020

/vsiswift/: expiration of SWIFT_AUTH_TOKEN doesn't make RasterIO() to fail #2239

Closed

omad mentioned this issue Nov 28, 2023

Unable to Work with Temporary AWS Security Credentials opendatacube/datacube-core#1514

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GDAL does not refresh IAMRole creds on EC2 or ECS after 6 hours #1593

GDAL does not refresh IAMRole creds on EC2 or ECS after 6 hours #1593

chris-bateman commented May 28, 2019

adamsteer commented May 28, 2019

rouault commented Jun 7, 2019

rouault commented Jun 12, 2019

ghost commented Jun 14, 2019

rouault commented Jun 19, 2019

jonseymour commented Feb 12, 2020 •

edited

Loading

jonseymour commented Feb 24, 2020 •

edited

Loading

jonseymour commented Feb 25, 2020

GDAL does not refresh IAMRole creds on EC2 or ECS after 6 hours #1593

GDAL does not refresh IAMRole creds on EC2 or ECS after 6 hours #1593

Comments

chris-bateman commented May 28, 2019

adamsteer commented May 28, 2019

rouault commented Jun 7, 2019

rouault commented Jun 12, 2019

ghost commented Jun 14, 2019

rouault commented Jun 19, 2019

jonseymour commented Feb 12, 2020 • edited Loading

jonseymour commented Feb 24, 2020 • edited Loading

jonseymour commented Feb 25, 2020

jonseymour commented Feb 12, 2020 •

edited

Loading

jonseymour commented Feb 24, 2020 •

edited

Loading