-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DynamoDB rate limit error without throttling on the table #1665
Comments
Setting KeepAlive to -1 didn't make any difference. Adding some pprof information: mem.prof -raw
cpu.prof
|
I believe you may be confusing the TCP socket keepalive with the HTTP keepalive. Your code snippet that sets See similar thread here #1434 about this topic around using |
For your first issue can you enable logging of service responses and retries? See the logging developer guide entry, and |
I've done a few additional tests with some interesting results:
Now, running against DynamoDB in eu-west-1
About DisableKeepAlives, you were right, this seems to do the trick. I played before with DisableKeepAlives and didn't see much difference, but trying again, this setting seems to solve the problem at hand. Now, this creates the question about why the client isn't better at managing the pool of existing connections. Disabling keep-alives help but it makes the performance much worse (30% less TPS even on local DynamoDB) as there is a need to re connect (including TLS handshake) for each new item. In other AWS SDKs, the recommendation is to enable keep-alives |
Ok, additional tests show that forcing the SDK client to use the HTTP endpoint avoids the crashes I kept seeing. This allows me to enable KeepAlives again and regain performance, but with the obvious risk of not having encrypted traffic between my client and the server. |
I'm seeing this same issue. This is fully reproducible in a unit test I have against a local DynamoDB.
|
Is there a workaround for this problem? With v1 go sdk package, can this same problem be reproduced? |
I was experiencing a similar issue with Kinesis client. Based on my investigation, this error is triggered by throttling implemented in In my case, I was using the same client instance across a very large number of Go routines. As a result when retries are attempted, all Go routines use the same token bucket and run out of tokens.
|
I have teh same problem, I'm going to test @buddhike solution + maybe after that disable ( or make it less time that the current ) |
Hey guys! I had somewhat similar problem with AWS SDK v1 and running PutItem in multiple goroutines. And the solution was using custom HTTPClient with a timeout. Something like (SDK v1)
You can read some details here https://medium.com/@nate510/don-t-use-go-s-default-http-client-4804cb19f779 |
@buddhike I tried it, it resolved the problem of throttling but I think this number is too much, and every time a retry is required it creates a new limiter, do you think it cause the performance issue? is there any way to say if the error type is Token Throtteling increase token limiter? not for all the retries ? |
I am experiencing the same with Athena Workgroups using the aws terraform provider.
But also
|
|
I've done some more investigations but haven't been able to publish them as they are hard to repro, and I haven't had enough time on my pockte so far. I'll see if I can publish something clearer during the holidays, but as far as I've been able to determine, the issue is with underlying Go and the crypto/tls package. I was able to repro the same issue even without using the AWS SDK. So far all my analysis point to that. I'll try to write a bug report on the Go SDK, once I get some time |
This behavior is expected, but I'd say we've clearly done a poor job of communicating how this works in our public documentation. This error surfaces from a client-side rate-limiting mechanism introduced in SDK v2 (it's part of our newer cross-SDK specification for retry behavior that was meant to standardize how this works across SDKs). @buddhike's comment above is accurate at a high level. Fortunately, this is configurable and @buddhike's solution is going to be functionally adequate for most in the absence of an explicit "off" switch. @yuseferi mentioned performance concerns, but there shouldn't be any - each operation still has the configured maximum number of attempts that it will obey. I'm going to address two things here as followup:
|
This issue is now closed. Comments on closed issues are hard for our team to see. |
Describe the bug
I'm trying to debug a problem where my application is crashing on leaked GoRoutines. While trying to root cause the problem I've found some issues that seem to be related with the SDK:
I'm launching 500 GoRoutines to do a PutItem (different PK on each). Only between 200-350 succeed (depending on the run), with the rest returning:
Interestingly, the DDB table is configured as On Demand, and CloudWatch doesn't show any Throttling errors
Also, when running
goleak.VerifyNone(t)
, the library detects lots of leaked GoRoutine in different states in some part of http libraries. I've looked into #1434 but the PutItem doesn't seem to have any response withio.ReadCloser
so I'm assuming these aren't related.Expected Behavior
Current Behavior
operation error DynamoDB: PutItem, failed to get rate limit token, retry quota exceeded, 0 available, 5 requested
without any throttling evidenced in DynamoDB CloudWatch metricsReproduction Steps
Sample below includes commented code to customise the http setting which I used to see how that could change behaviour but couldn't find any settings that would solve the perceived problem.
Possible Solution
No response
Additional Information/Context
Error is reproducible in both Linux and MacOS
AWS Go SDK version used
Go 1.18 (and Go 1.16)
Compiler and Version used
go version go1.18 darwin/amd64
Operating System and version
MacOs 12.2.1 and Amazon Linux 2
The text was updated successfully, but these errors were encountered: