Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SIP-109] Global logs context #26019

Closed
eschutho opened this issue Nov 18, 2023 · 3 comments
Closed

[SIP-109] Global logs context #26019

eschutho opened this issue Nov 18, 2023 · 3 comments
Assignees
Labels
sip Superset Improvement Proposal

Comments

@eschutho
Copy link
Member

eschutho commented Nov 18, 2023

[SIP-109] Proposal for global logs context

Motivation

We have a few examples in the codebase where for logging purposes we need to pass data down to parts of the codebase that have no need to know about this data. It creates very coupled and bloated modules that often are just passing data down without needing or acting on it. This change proposes the use of a global context dictionary that is used for read-only purposes for logging.

There are three current examples where this solution would help to simplify the codebase. These examples show the existing method of passing data around through multiple modules and levels of the code. There is this PR that logs a report execution id to help tie together multiple logs to the same report execution.
Another example is this PR that sends data to an smtp event for logging in a mail server.
And the third use case is this PR that passes data into the sql mutator for logging. In all three cases, we have needed to pass extraneous information across many levels of the codebase to get it to where we need to do the logging.

Proposed Change

Using Flask's built in global class, create a context dictionary that holds references to items in that request.

To limit the scope of this dictionary, only resource ids should be stored in this context. When a module needs to log some data, it can check the context for any relevant resources and then look up the resource for more context if needed.

New or Changed Public Interfaces

A new decorator @context that when applied to a method, will store information that is passed in either from the function arguments or directly into the decorator. Some examples:

def logs_context(**ctx_kwargs: int | str | UUID | None) -> Callable[..., Any]:
    """
    Takes arguments and adds them to the global logs_context.
    This is for logging purposes only and values should not be relied on or mutated
    """

    def decorate(f: Callable[..., Any]) -> Callable[..., Any]:
        def wrapped(*args: Any, **kwargs: Any) -> Any:
            if not hasattr(g, "logs_context"):
                g.logs_context = {}
            available_logs_context_values = [
                "slice_id",
                "dashboard_id",
                "execution_id",
                "report_schedule_id",
            ]
            logs_context_data = {
                key: val
                for key, val in kwargs.items()
                if key in available_logs_context_values
            }

            # if values are passed in to decorator directly, add them to logs_context
            # by overriding values from kwargs
            for val in available_logs_context_values:
                if ctx_kwargs.get(val) is not None:
                    logs_context_data[val] = locals()[val]

            g.logs_context.update(logs_context_data)
            return f(*args, **kwargs)

        return wrapped

    return decorate

Usage:
Per the above example, only the values in the available_logs_context_values will be added to the global logs_context. For example:

@logs_context()
 def __init__(
        self,
        session: Session,
        report_schedule: ReportSchedule,
        scheduled_dttm: datetime,
        execution_id: UUID,
 ) 

will only add execution_id to the global logs_context.

A few rules:

  1. Only resource ids should be added to the global logs_context.
  2. None of these values should be relied upon and there is no guarantee that they exist. For example, there should always be a fallback when accessing a value such as g.logs_context["dashboard_id"] (don't do this).
  3. The values in the logs_context should not be augmented during the session. After they are added, they should be considered read-only attributes.
  4. These values should only be used for logging. This can be used by the logger, in the db comments, or sent to other systems to log, for example, but it should not be a way to send data that is required by another module for any other use. This logs_context data should also not be returned in any api response.

This first iteration of this change only takes arguments that are ids. If we wanted to expand this decorator to take entire objects like report_schedule in this example, and then extract and store the id, we could do that if needed, but for now we're able to get by with just the arguments that are already ids.

To add data that is in scope, but not an argument of the function, we can pass in items directly, such as:

@logs_context(slice_id=1, dashboard_id=2, execution_id=3)

If needed, the data could be added to the global logs_context without using the decorator, but I would advise against it. When extracting data from the global logs_context, one should be able to refer to the decorator's list of acceptable values available_logs_context_values as the de facto set of values that are available in the logs_context.

Here's an example PR that is a POC for this new logs_context.

New dependencies

None

Migration Plan and Compatibility

None

Rejected Alternatives

  1. The existing pattern which is to pass information down to where we need it for logging.
  2. The flask request object. This idea was rejected because there are cases where we need to log data from an async task such as Celery.
@eschutho eschutho added the sip Superset Improvement Proposal label Nov 18, 2023
@eschutho eschutho changed the title [SIP] Global logging context [SIP-102] Global logging context Nov 18, 2023
@eschutho eschutho changed the title [SIP-102] Global logging context [SIP-109] Global logging context Nov 18, 2023
@michael-s-molina
Copy link
Member

Thanks for the SIP @eschutho. I have some questions that may help improve its definition:

  1. Only resource ids should be added to the global context.

Can you detail the motivation behind this restriction? What problems do you perceive if this expands to full objects?

  1. The values in the context should not be augmented during the session. After they are added, they should be considered read-only attributes.

Can we programmatically enforce this restriction?

  1. These values should only be used for logging. This can be used by the logger, in the db comments, or sent to other systems to log, for example, but it should not be a way to send data that is required by another module for any other use. This context data should also not be returned in any api response.

Should we rename the decorator and Flask global class namespace to logging_context instead of context to make the intention more clear?

@eschutho
Copy link
Member Author

Thanks for the feedback @michael-s-molina, I was actually just going to update the SIP with a new name for the context to includes some more specifics about usage. (#4 comment)

My original thoughts on this:

Only resource ids should be added to the global context.
Can you detail the motivation behind this restriction? What problems do you perceive if this expands to full objects?

My intention is to clearly define the scope of the context and to keep the contents very limited in order to dissuade users from making it a catch-all that can be misused. Obviously storing data in a global variable is not standard practice, so if we decide to store data here, I think we should be very explicit about what can be stored in this variable so that not only it does not get bloated, but also so that others won't use this as a precedence to store other types of data in a global variable. To me, storing only the id and fetching further information from the metadata is the cleanest way to keep the scope of the data in this variable limited. Do you think that there are downsides to that approach? Having to fetch from metadata can be a downside, but in many cases, the data would have to be fetched before adding it to the context anyway.

The values in the context should not be augmented during the session. After they are added, they should be considered read-only attributes.
Can we programmatically enforce this restriction?

That's a great question. We could look to see if the data is already stored in the context and then not overwrite it. But it's not actually the practice of overwriting it that is the problem, so much as again the practice and intention of the context: it shouldn't be used as a way to store any kind of state that needs to be fetched or manipulated to be saved later in the metadata or returned or raised from an api. If we're only storing resource ids, though, that could likely also enforce that data isn't changed because it would be unlikely that the id of a resource would change in any state management.

@eschutho eschutho changed the title [SIP-109] Global logging context [SIP-109] Global logs context Nov 21, 2023
@rusackas rusackas moved this from Pre-discussion to [DISCUSS] thread opened in SIPs (Superset Improvement Proposals) Nov 21, 2023
@michael-s-molina
Copy link
Member

Thanks for the additional context @eschutho. If we implement the safe guards correctly, avoiding your concerns, then this a great improvement over the current prop drilling. My suggestions would be:

  • Renaming the decorator and Flask global class namespace to something more specific to help with setting the expectations for what can be stored in the context
  • Make the API for interacting with the context very restrictive / type safe.
  • Keep a well documented API to clearly instruct the developers about the context of the context 🥁 🤣

@eschutho eschutho moved this from [DISCUSS] thread opened to [VOTE] thread opened in SIPs (Superset Improvement Proposals) Dec 12, 2023
@eschutho eschutho moved this from [VOTE] thread opened to [RESULT][VOTE] Approved in SIPs (Superset Improvement Proposals) Dec 16, 2023
@eschutho eschutho self-assigned this Jan 3, 2024
@eschutho eschutho moved this from [RESULT][VOTE] Approved to In Development in SIPs (Superset Improvement Proposals) Jan 3, 2024
@eschutho eschutho moved this from In Development to Implemented / Done in SIPs (Superset Improvement Proposals) Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sip Superset Improvement Proposal
Projects
Status: Implemented / Done
Development

No branches or pull requests

2 participants