Skip to content

TraceId, SpanId 0 in logs from exception handler for Exception class - FastAPI app #3477

@amit12297

Description

@amit12297

Describe your environment
python = 3.10.12
fastapi = 0.92.0
uvicorn = 0.20.0
opentelemetry-distro = 0.41b0

Steps to reproduce
When we log from exception_handler in FastAPI, traceId and spanId is coming as 0. Here is an example:

import logging

import uvicorn
from fastapi import FastAPI
from starlette.requests import Request
from starlette.responses import JSONResponse

logger = logging.getLogger("activity")


async def unhandled_exception_handler(request: Request, exc: Exception):
    logger.error("Log with 0 trace-id and span-id")
    return JSONResponse(content={"message": "Something went wrong"}, status_code=500)


app = FastAPI(exception_handlers={Exception: unhandled_exception_handler})


@app.get("/")
def read_root():
    logger.info("Log with valid trace-id and span-id")
    return {"Hello": "World"}


@app.get("/break")
def break_path():
    logger.info("Log with valid trace-id and span-id")
    1 / 0
    return {"Unreachable": "Code"}


if __name__ == "__main__":
    uvicorn.run(app=app, port=8080, host="127.0.0.1", reload=False)

I have created a github repo to demonstrate the issue. Please checkout: https://github.com/amit12297/python-otel-log-issue and follow the readme

What is the expected behavior?
Actual(Non-zero) traceId and spanId in logs from exception handler

What is the actual behavior?
traceId=0 and spanId=0 in logs from exception handler

2023-10-17 00:45:22,334 INFO [activity] [main.py:29] [trace_id=8799f34465cd9905efcee0dce40633b4 span_id=32efef8c434b3a07 resource.service.name=log-issue-repro trace_sampled=True] - Log with valid trace-id and span-id
2023-10-17 00:45:22,339 ERROR [activity] [main.py:14] [trace_id=0 span_id=0 resource.service.name=log-issue-repro trace_sampled=False] - Log with 0 trace-id and span-id

Additional context
As you can see in the code above I have registered an exception handler for the base exception class

exception_handlers={Exception: unhandled_exception_handler}

FastAPI checks if any handler is registered for 500/Exception, if yes then it sets that as handler for ServerErrorMiddleware.
Here is the FastAPI code:

        for key, value in self.exception_handlers.items():
            if key in (500, Exception):
                error_handler = value
            else:
                exception_handlers[key] = value

        middleware = (
            [Middleware(ServerErrorMiddleware, handler=error_handler, debug=debug)]
            + self.user_middleware
            + [
                Middleware(
                    ExceptionMiddleware, handlers=exception_handlers, debug=debug
                ),
                # Add FastAPI-specific AsyncExitStackMiddleware for dependencies with
                # contextvars.
                # This needs to happen after user middlewares because those create a
                # new contextvars context copy by using a new AnyIO task group.
                # The initial part of dependencies with yield is executed in the
                # FastAPI code, inside all the middlewares, but the teardown part
                # (after yield) is executed in the AsyncExitStack in this middleware,
                # if the AsyncExitStack lived outside of the custom middlewares and
                # contextvars were set in a dependency with yield in that internal
                # contextvars context, the values would not be available in the
                # outside context of the AsyncExitStack.
                # By putting the middleware and the AsyncExitStack here, inside all
                # user middlewares, the code before and after yield in dependencies
                # with yield is executed in the same contextvars context, so all values
                # set in contextvars before yield is still available after yield as
                # would be expected.
                # Additionally, by having this AsyncExitStack here, after the
                # ExceptionMiddleware, now dependencies can catch handled exceptions,
                # e.g. HTTPException, to customize the teardown code (e.g. DB session
                # rollback).
                Middleware(AsyncExitStackMiddleware),
            ]
        )

As far as I know, ServerErrorMiddleware is the outermost middleware.

When there is an error, context gets detached by OpentelemetryMiddleware before control reaches ServerErrorMiddleware's exception handler that I registered. Hence when the log is written it does not get traceId and spanId.

I tested this out, and made changes in

opentelemetry/trace/init.py -> use_span function

@contextmanager
def use_span(
    span: Span,
    end_on_exit: bool = False,
    record_exception: bool = True,
    set_status_on_exception: bool = True,
) -> Iterator[Span]:
    """Takes a non-active span and activates it in the current context.

    Args:
        span: The span that should be activated in the current context.
        end_on_exit: Whether to end the span automatically when leaving the
            context manager scope.
        record_exception: Whether to record any exceptions raised within the
            context as error event on the span.
        set_status_on_exception: Only relevant if the returned span is used
            in a with/context manager. Defines whether the span status will
            be automatically set to ERROR when an uncaught exception is
            raised in the span with block. The span status won't be set by
            this mechanism if it was previously set manually.
    """
    try:
        excep = None
        token = context_api.attach(context_api.set_value(_SPAN_KEY, span))
        try:
            yield span
        except Exception as exc:
            excep = exc
            raise exc
        finally:
            if excep is None:
                context_api.detach(token)

    except Exception as exc:  # pylint: disable=broad-except
        if isinstance(span, Span) and span.is_recording():
            # Record the exception as an event
            if record_exception:
                span.record_exception(exc)

            # Set status in case exception was raised
            if set_status_on_exception:
                span.set_status(
                    Status(
                        status_code=StatusCode.ERROR,
                        description=f"{type(exc).__name__}: {exc}",
                    )
                )
        raise

    finally:
        if end_on_exit:
            span.end()

and
opentelemetry/instrumentation/asgi/init.py -> call function

    async def __call__(self, scope, receive, send):
        """The ASGI application

        Args:
            scope: An ASGI environment.
            receive: An awaitable callable yielding dictionaries
            send: An awaitable callable taking a single dictionary as argument.
        """
        if scope["type"] not in ("http", "websocket"):
            return await self.app(scope, receive, send)

        _, _, url = get_host_port_url_tuple(scope)
        if self.excluded_urls and self.excluded_urls.url_disabled(url):
            return await self.app(scope, receive, send)

        span_name, additional_attributes = self.default_span_details(scope)

        span, token = _start_internal_or_server_span(
            tracer=self.tracer,
            span_name=span_name,
            start_time=None,
            context_carrier=scope,
            context_getter=asgi_getter,
        )
        attributes = collect_request_attributes(scope)
        attributes.update(additional_attributes)
        active_requests_count_attrs = _parse_active_request_count_attrs(
            attributes
        )
        duration_attrs = _parse_duration_attrs(attributes)

        if scope["type"] == "http":
            self.active_requests_counter.add(1, active_requests_count_attrs)
        try:
            excep = None
            with trace.use_span(span, end_on_exit=True) as current_span:
                if current_span.is_recording():
                    for key, value in attributes.items():
                        current_span.set_attribute(key, value)

                    if current_span.kind == trace.SpanKind.SERVER:
                        custom_attributes = (
                            collect_custom_request_headers_attributes(scope)
                        )
                        if len(custom_attributes) > 0:
                            current_span.set_attributes(custom_attributes)

                if callable(self.server_request_hook):
                    self.server_request_hook(current_span, scope)

                otel_receive = self._get_otel_receive(
                    span_name, scope, receive
                )

                otel_send = self._get_otel_send(
                    current_span,
                    span_name,
                    scope,
                    send,
                    duration_attrs,
                )
                start = default_timer()

                await self.app(scope, otel_receive, otel_send)
        except Exception as exc:
            excep = exc
            raise exc
        finally:
            if scope["type"] == "http":
                target = _collect_target_attribute(scope)
                if target:
                    duration_attrs[SpanAttributes.HTTP_TARGET] = target
                duration = max(round((default_timer() - start) * 1000), 0)
                self.duration_histogram.record(duration, duration_attrs)
                self.active_requests_counter.add(
                    -1, active_requests_count_attrs
                )
            if token:
                if excep is None:
                    context.detach(token)

in my local machine. Basically I am not detaching the context if there is an exception.
After these changes I am getting actual traceId and spanId.

But I am not sure, if the solution I have used is correct or not. Request inputs from python and OTEL experts out here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions