Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to properly use Opentelemetry in Google Cloud Functions #1739

Open
1 of 2 tasks
sk- opened this issue Dec 10, 2020 · 2 comments
Open
1 of 2 tasks

How to properly use Opentelemetry in Google Cloud Functions #1739

sk- opened this issue Dec 10, 2020 · 2 comments

Comments

@sk-
Copy link

sk- commented Dec 10, 2020

  • This only affects the JavaScript OpenTelemetry library
  • This may affect other libraries, but I would like to get opinions here first

We have been using Opentelemetry in Firebase Functions for a while now. However there's still an issue that is quite problematic.

On Google cloud functions (Firebase functions), once the request ends the instance is not killed (globals are still available), however, if one tries to use any resources like the network an error will be raised and the instance will get terminated. Hence incurring with loss of cache and cold starts in the next request.

I saw #1296 added forceFlush and shutdown, but I'm under the impression that neither are really a great fit for Google Cloud Functions.

shutdown is ruled out as then subsequent request won't generate any traces.

Force flush may work at the expense of adding an extra latency to the endpoint. However, I think it's still not guaranteed that nothing will trigger a trace.

One alternative would be to have a pause/unpause mechanism, so that at the end of the request we can pause openetelemetry, meaning that we avoid any network calls and then resume the processing upon request start, although that could miss some spans when beginning a new request. This could alternatively be done by registering an isPaused or isActive function.

Another alternative would be to have a forceDiscard method that would be similar to shutdown, but leaving the state as _isShutdown = false instead, so that further requests can still be traced.

What do guys think? Is there something I'm missing? Or do you have any recommendations on how to tackle this.

@dyladan
Copy link
Member

dyladan commented Dec 10, 2020

That's a tricky one. I've not worked with firebase but in lambda there is a similar problem. In lambda, the runtime may (or may not) be suspended and there's no real way to know. Also, a suspended runtime may never wake again. Any async work prevents the function from returning, but also prevents the function from suspending. We decided to force flush on every call even though it was a slight performance penalty in order to guarantee every span was exported.

I suppose one of the things you're worried about is that spans may be created after the function completes? If that's the case you may want to create a custom pausable tracer. I would be hesitant to introduce a pausable tracer to this repo without spec approval, but I don't see any reason an alternative tracer couldn't be added to the contrib repo. This sounds like the type of thing the spec would be very interested in though.

@Flarna
Copy link
Member

Flarna commented Feb 10, 2023

I think this is more a call towards the cloud providers. If they provide life cycle hooks tools like OTel can be tuned for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants