Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise lambda flush timeout to 10 seconds #2855

Merged
merged 1 commit into from
Apr 23, 2021
Merged

Raise lambda flush timeout to 10 seconds #2855

merged 1 commit into from
Apr 23, 2021

Conversation

anuraaga
Copy link
Contributor

@anuraaga anuraaga commented Apr 23, 2021

@kubawach Please check this out :)

I've been playing the the wrapper recently (much more than I want...) and it generally works quite well. However, I notice that with the current 1s timeout, it takes many requests before the JVM has warmed up enough for any traces to be sent, maybe around 5+. Raising it allows the first request to be traced. I don't know a great value for this timeout but wondering if you have any ideas on raising it so first request can also be traced? In particular, I'm worried about very low QPS functions - a customer may not care about cold start time and have a function that is always cold started. Such a function effectively can't be traced with the default timeout. Though it becomes a tradeoff between "defaults allow as many use cases as possible" to "majority use case may prefer the safety of a lower timeout on flush since they'll be warm anyways".

Edit: I'm also worried about people thinking tracing isn't working since requiring 5+ requests to get them is a lot.

@anuraaga
Copy link
Contributor Author

We could also consider pausing on this sort of change and implementing dynamic timeouts in the SDK

@ghost
Copy link

ghost commented Apr 23, 2021

@kubawach Please check this out :)

I've been playing the the wrapper recently (much more than I want...) and it generally works quite well. However, I notice that with the current 1s timeout, it takes many requests before the JVM has warmed up enough for any traces to be sent, maybe around 5+. Raising it allows the first request to be traced. I don't know a great value for this timeout but wondering if you have any ideas on raising it so first request can also be traced? In particular, I'm worried about very low QPS functions - a customer may not care about cold start time and have a function that is always cold started. Such a function effectively can't be traced with the default timeout. Though it becomes a tradeoff between "defaults allow as many use cases as possible" to "majority use case may prefer the safety of a lower timeout on flush since they'll be warm anyways".

Edit: I'm also worried about people thinking tracing isn't working since requiring 5+ requests to get them is a lot.

I think currently that's the only way. I've seen it too and always recommend setting this to a higher value. Weak safeguard here is that it's "up to X seconds", meaning it may happen but won't too often (hopefully). I believe our problems will be gone once we have an ability to delivery span async (run a collector as an extension without getting frozen). I remember we discussed this, so just saying aloud for the sake of the rest of the team :)

@trask trask merged commit 7410744 into main Apr 23, 2021
@trask trask deleted the anuraaga-patch-1 branch April 23, 2021 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants