-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v6.8.0 and upwards lags when multiple AWS profiles exist in the environment #3044
Comments
Thanks for reporting and for sharing the logs @megbetts. We’ve also been investigating an issue that sounds similar in #2987, but have so far been unable to reproduce and root cause. The details you provided here already may help narrow this down. See in particular #2987 (comment), which matches what I see in the logs you provided here. If by any chance you could also capture a trace that includes upstream provider logs - that may be very helpful in pinning this down. (Do note that you may need to redact some account details before sharing this):
|
Thanks for reporting this @megbetts indeed. I'm going to go ahead with editing the suspect code to switch to a lock-free implementation, and add ability to turn it off, on the hypothesis that sync.Once locks things up further. In the meanwhile any more bits of information you can provide are super helpful, in particular I am wondering if:
Thank you for your patience. @lukehoban we have a few traces and log files in the context of #2987 from a different customer and I'm working with @jaxxstorm to make a better instrumented build of AWS provider. I'll post it here once available. In particular I'd love to land improved TF_LOG=trace capture and additional spans around Configure before trying to repro again. |
We (w/ @megbetts) tested 6.7.0 last week and found no performance regressions. The issue didn't start till 6.8.0. |
Hi @weirdion we're releasing v6.11.0-alpha.1 https://github.com/pulumi/pulumi-aws/actions/runs/7010825633 that should have a mitigation for the sync.Once change (allow an hour or so for automation to finish up propagating the artifacts). If you could try that version out that'd be a great help. While the above change is likely to mitigate the issue, we're still looking for a repro to understand and fix why Configure became slow. A minimal Pulumi program reproducing the issue would be surer helpful, absent that:
|
@t0yv0 I can test this first thing tomorrow morning, I see the release on pypi. Regarding the minimal Pulumi program, I believe @megbetts actually tested with a simple s3 bucket creation python script to isolate the issue - so nothing special about the program.
|
Thanks so much. If you can share that simple program that could help. I've got 4 profiles in ~/.aws/credentials but it does not seem to be affecting the speed of Configure/CheckConfig. I'm starting to think perhaps how you configure the AWS Provider is relevant for the repro here - are you using an explicit provider or configure it with |
OK, tried several times - the issue still persists in 6.11.0-alpha1.
"""A Python Pulumi program"""
import pulumi
import pulumi_aws
a = pulumi_aws.s3.Bucket("pewlewmibucket")
pulumi.export("a", a) with config config:
aws:region: us-west-2
aws:skipCredentialsValidation: true
aws:skipMetadataApiCheck: true
aws:skipRegionValidation: true
aws:skipRequestingAccountId: true
We are not configuring the provider in any specific way here, simply using it out of the box as you can see from the code above. |
Very interesting, thank you for cleaning up the logs! This program takes 14s for me but your logs indicate 8min slowdown 2x around provider configuration under SSO-enabled AWS_PROFILE. We'll need to try to repo with an assumed role under STS. |
Noticing that https://github.com/hashicorp/terraform-provider-aws/blob/816aa99a12ca13bf225d34a226e797f1494ce8ca/internal/conns/config.go#L132 call is suspect for slowing things down. Unfortunately it seems to remain a black box in the logs.
And skimming through upstream issues found hashicorp/aws-sdk-go-base#453 |
The logs helped to track this down. I feel confident it is due to this set of issues:
Notably, the rollback fix from upstream (hashicorp/terraform-provider-aws#34300) didn't take affect for us because something in our full set of dependencies was forcing a higher version than what the upstream was using (which had the perf regression). However, the fix in aws/aws-sdk-go-v2#2355 was just picked up in our provider in #3056 - which is about to be released. So most likely this will be addressed via that. But we may want to do a deeper review of all of the relevant dependencies. |
We released https://github.com/pulumi/pulumi-aws/releases/tag/v6.12.0 this morning, which includes the fix from aws/aws-sdk-go-v2#2355. We've also verified that all versions of So we'll closed this out as fixed with the latest release. If you continue to see any performance issue, please do reopen! |
Was able to test 6.12, and can confirm that the same deploy from my last comment (was ~16 minutes with 6.8.0+) is now back to ~30 seconds. Thank you everyone! |
What happened?
With v6.8.0 and upwards, out deployment pipeline has started experiencing massive delays with Pulumi.
We saw this change v6.7.0...v6.8.0#diff-34c57e622183cb0d8dd0d3f9eaa0861b3340120e9b2ad811bac7ac7be4cea4b1R619 which could be why.
Our pipeline has multiple AWS profiles loaded in the environment - could the new version of pulumi-aws be trying to validate all the credentials it finds before the run?
We tried setting the env var to skip credentials validation but we still ran into the delay.
Attached is a trace from a test deploy where we tried to deploy a single s3 bucket. We saw it stuck on
Marshaling property for RPC[config]: version={6.9.0}
and then onTerraform input __defaults = []interface {}{}
Example
console (4).txt
Output of
pulumi about
pulumi_aws: v6.9.0
Additional context
No response
Contributing
Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).
The text was updated successfully, but these errors were encountered: