Skip to content

RFC: Support thread-local keys for logger #991

Closed
@Dunedan

Description

@Dunedan

Key information

  • RFC PR:
  • Related issue(s), if known:
  • Area: Logger
  • Meet tenets: Yes
  • Approved by:
  • Reviewed by:

Summary

Provide the ability to append thread-local keys to a logger which only get included in log records logged in the thread they got added to.

Motivation

As AWS Lambda provides multiple CPU cores it makes sense for certain workloads to distribute work across multiple threads. Currently it's not possible to append keys to a logger on a per-thread basis. That however would be very helpful for logging thread-specific information.

structlog, another popular Python logger offering structured logging, supports such a feature under the term "thread-local contexts".

Proposal

In addition to manage globally scoped keys, Logger needs an interface to manage thread-local keys. This interface must consist of the ability to append keys, remove keys and clear all thread-local keys of the current thread. The ability to clear all keys of the current thread is helpful when using thread pools where threads are being reused, but shouldn't retain log keys of previous usages.

I can imagine two ways how this could work (but I'd be happy with another way as well): Additional methods for Logger for managing thread-local keys or the ability to get a "thread-aware" child logger which can then be used to configure thread-local keys.

Proposal 1: Additional methods

Using additional methods for the existing Logger class would require to somehow indicate that they relate to the current thread. So names like append_thread_local_keys/remove_thread_local_keys/clear_thread_local_keys could be used. Here is an example how that could look like:

import concurrent.futures

from aws_lambda_powertools import Logger

logger = Logger(service="payment")


def process_payments(payment_id):
    logger.clear_thread_local_keys()
    logger.append_thread_local_keys(payment_id=payment_id)
    logger.info("Start processing payment")
    ...
    logger.info("End processing payment")


with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    futures = [executor.submit(process_payments, payment_id) for payment_id in range(10)]
    for future in concurrent.futures.as_completed(futures):
        future.result()

Proposal 2: Child logger

Offering the ability to get a thread-local child logger from an existing logger would offer usage like that:

import concurrent.futures

from aws_lambda_powertools import Logger

logger = Logger(service="payment")


def process_payments(payment_id):
    child_logger = logger.get_thread_local_logger()
    child_logger.append_keys(payment_id=payment_id)
    child_logger.info("Start processing payment")
    ...
    child_logger.info("End processing payment")


with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    futures = [executor.submit(process_payments, payment_id) for payment_id in range(10)]
    for future in concurrent.futures.as_completed(futures):
        future.result()

For performance reasons when re-using threads it might be worth to consider to not create a new child logger for every call to Logger.get_thread_local_logger(), but instead to just to the clear keys of a possibly already existing child logger for the same thread.

One drawback of this proposal might be that it'd be cumbersome to use when the logic running in a thread is more complex and spans multiple functions, as the logger instance would need to be passed between these functions.

If this feature should be available in other runtimes (e.g. Java, Typescript), how would this look like to ensure consistency?

IMO it shouldn't be a problem to port that to the other runtimes in the same way as well.

User Experience

How would customers use it?

Without the need for thread-local keys, customers would us AWS Lambda Powertools' Logger the same way as they do right now.
With the need for thread-local keys, customers would call the respective methods to enhance records logged in a thread with additional useful information. A code example for that is shown above.

Any configuration or corner cases you'd expect?

There might be different expectations regarding the behavior when globally scoped keys get appended while a thread is running and its logger configuration has been already done. Will such global keys be included in records logged by the thread immediately or only after the logger configuration for the thread got re-initialized? I personally would prefer global keys to get included immediately into records logged by threads as well.

Demonstration of before and after on how the experience will be better

Before:

import concurrent.futures

from aws_lambda_powertools import Logger

logger = Logger(service="payment")


def process_payments(payment_id):
    logger.info("Start processing payment", extra={"payment_id": payment_id})
    ...
    logger.info("End processing payment", extra={"payment_id": payment_id})


with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    futures = [executor.submit(process_payments, payment_id) for payment_id in range(10)]
    for future in concurrent.futures.as_completed(futures):
        future.result()

After:

import concurrent.futures

from aws_lambda_powertools import Logger

logger = Logger(service="payment")


def process_payments(payment_id):
    logger.append_thread_local_keys(payment_id=payment_id)
    logger.info("Start processing payment")
    ...
    logger.info("End processing payment")


with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    futures = [executor.submit(process_payments, payment_id) for payment_id in range(10)]
    for future in concurrent.futures.as_completed(futures):
        future.result()

Drawbacks

The only drawbacks I can imagine right now are a more complex Logger interface and a possible performance impact when logging records, depending on the resulting implementation.

Rationale and alternatives

  • What other designs have been considered? Why not them?

The only other option I did consider was manually creating one logger per thread, but that feels cumbersome, because aside from the thread-local keys, it would need to be configured identical to the logger for the main thread.

  • What is the impact of not doing this?

Customers who wish to log thread-local keys have no straight-forward way available to do so and either have to employ workarounds like using the extra keyword to add the desired keys for each record or to use another logger.

Unresolved questions

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Shipped

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions