-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lambda support appears to be broken in v2.x #827
Comments
@Qard let's timebox some investigation on this and see if we get to an easy fix. |
Related to #798 |
Has there been any progress on this issue? I can submit our workaround as a PR for this but I'm not sure thats its the best long term solution |
I was unable to reproduce it in my initial testing. I did do some restructuring recently to support async/await style though, which has not yet been released. That might impact it. |
Unfortunately it is still an issue, from my poking around the cause is:
Our workaround is to track this error state and ensure that the transport is open before sending any transactions: status200@629cd9f Though it seems like a larger restructure to track the lifecycle and state of the transport would be more appropriate as this could also be an issue for clients running long enough to encounter network hiccups. |
I'm able to reproduce the ECONNRESET, but the client just reconnects and seems to send the new transaction just fine. I'm not getting the write after end error at all. Are you able to provide a code example that can reproduce the write after end? |
For what it is worth, I got it working by editing the code from the docs. I have node8.10 on Lambda. What the docs show and caused "APM Server transport error: Error: write after end":
What worked for me:
|
Sorry for the delay, @HuddleHouse is correct, the example from the docs still produces the issue. There are two methods I can see which stop this: // Pass promise to lambda
exports.handler = apm.lambda(async (payload, context, callback) => {
callback(null, `Hello, ${payload.name}!`)
})
// callbackWaitsForEmptyEventLoop
exports.handler = apm.lambda(function handler (payload, context, callback) {
context.callbackWaitsForEmptyEventLoop = false;
callback(null, `Hello, ${payload.name}!`)
}) Both of these depend on Lambda's handling of the event loop it seems and I wouldn't like to count on this behaviour staying consistent over time. |
I still get this error when using the latest apm in a lambda. I start apm like this: exports.handler = apm.lambda(async (event, context, callback) => {
...
callback(null, `Successfully processed ${items} events`);
} 2019-04-15T10:48:16.268Z 4f47d833-61f4-4af4-b824-cf46e5d0880d APM Server transport error: Error: write after end |
Yes, it's still a known issue. It's proving to be complicated to fix. Basically, when Lambda freezes the VM it disconnects any sockets still in-use and, when the process wakes up again, it's unaware the sockets have been closed until after it tries to write to them. This is resulting in the |
I've been thinking that we need a way to either force close the underlying TCP socket after each lambda function execution (though that would increase the function running-time overhead), or we should somehow ignore those errors when running in a lambda context (though that might hide some real errors). We should probably also update the docs do suggest that you disable metrics when running in lambda as that will keep trying to reopen the TCP socket - and metrics inside a lambda doesn't make much sense anyway. |
Hi all, Do NOT upgrade/use elastic stack of version 7.x, because elastic-apm-nodejs of version 1.x is not supported with APM of version 7.x If using elastic-apm-nodejs in AWS lambda, use elastic stack of version < 6.5.x and elastic-apm-node-js of version < 2.x Martin |
Hi @watson,
We've tried disabling metrics by passing the We're running server Thank you so much for all the work here! |
@willbrazil Setting If your lambda function is only called once every 20 minutes, this might mean nothing or only very little data get through. But if your lambda function is called several times a minute or more, 99+% of the collected data should get through as expected. |
Thank you for such a quick response @watson! Some of our lambdas are handling ~60tps. We're reproduced in a more controlled environment where we sent a request every second and still experience the issue unfortunately. We'll keep digging here. I know this is a tricky one specially because it's hard to repro locally. Thank you again for all the info! |
Any thoughts on the wisdom of the approach New Relic took (CloudWatch log ingestion basically) versus writing a transport class that write the serialized version of the message to EventBridge or SQS before the lambda returns and a worker or a lambda picks up the serialized form and then batch sends? I suspect performance penalty in this versus CW which is highly optimized is non-trivial (awaiting even a publish to EB or SQS is probably much longer than a console.log which is likely highly optimized to exit quickly and then get handled later), and probably cheaper than paying the EB event and/or SQS message tax. Anyone with more knowledge here care to confirm or disagree with my thoughts on the approach? We are considering writing a solution and would be happy to PR it/work on a collective effort if it makes sense. |
Sorry for my absence here :) @Qard There's not much I could share, the only thing that isn't specific to our usage is calling @24601 You're right, using anything other than the stdio stream to cloudwatch incurs the network tax, adding either extra latency to each lambda invoke or the possibility of dropped transactions. New Relic's solution was a pain to implement in an enterprise environment, they provide a python script which sets everything up, which sounds like an alright idea until:
I think publishing an app to SAR would be the cleanest approach:
[0] https://aws.amazon.com/serverless/serverlessrepo/ |
@Qard @watson in case it helps with the prioritization, we are new users thus we have elastic v7 and thus it seems we need to use the latest APM. Most of our infrastructure is on lambda, so we it seems we reached a dead end here until APM supports lambda again. Do you have any date on when this will happen? |
@chemalopezp No timeline yet, but we currently have an active working group designing our future serverless support. Stay tuned! |
Thank you @basepi, those are great news. I'll keep an eye for updates here ;) |
Similar to the |
@spectorar that would be great. Currently we are able to stream logs from |
@basepi do you have by chance any news regarding the AWS Lambda support for the APM agent? I can imagine the actual implementation might take much longer but it would be great to know that will eventually be supported in the near term. Thanks! |
No ETA, but we have an active working group finalizing the design for AWS. |
Hi @basepi ! Any updates regarding the APM support for AWS Lambda? Maybe some scope of work and estimated date, or at least if there was a decision to support it to some extent? Thank you very much! |
I think I can safely say we are planning to support it, but we don't have anything to share yet on roadmap timing. |
Hi @basepi I hope you don't mind bugging you again! Is there by chance any update about APM support for AWS Lambda? Thanks! |
No update at the moment. Sorry for the slow progress here, between summer vacations and some additional testing we still haven't settled on the design. |
Quick update here -- while we have a pretty good idea of the design, I don't know when the actual development work will reach the top of the roadmap. As a result, I still can't give any clear idea of when we'll support AWS lambda. @chemalopezp if you're still interested in this, would you mind opening an issue in https://github.com/elastic/apm/issues ? That repo is a good place for cross-agent feature requests and is used heavily for roadmapping and gauging community interest. Thanks! |
I'm closing this in favour of elastic/apm#352. |
See https://discuss.elastic.co/t/apm-server-transport-error-error-write-after-end/165484 for details
The text was updated successfully, but these errors were encountered: