Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement filtering mechanism in the Logging service #57547

Closed
mshustov opened this issue Feb 13, 2020 · 12 comments · Fixed by #91492
Closed

Implement filtering mechanism in the Logging service #57547

mshustov opened this issue Feb 13, 2020 · 12 comments · Fixed by #91492

Comments

@mshustov
Copy link
Contributor

Blocker for #13241
Once we move request/response logging to the new platform, we need to provide a way to censor sensitive data in the logs (e.g. authorization or cookie headers).
Evaluate how much work it would be to support a filtering mechanism compatible with elasticsearch logging settings https://logging.apache.org/log4j/2.x/manual/filters.html

@mshustov mshustov added blocker Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Feature:New Platform labels Feb 13, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-platform (Team:Platform)

@pgayvallet
Copy link
Contributor

pgayvallet commented Feb 14, 2020

  • In l4j, that kind of filter is usually based on informations present in the MDC. We would not be able to rely on such mechanism. Do we have anything better than applying regexp for that kind of filter then?

  • Do we want to filter (remove) the whole log message when it contains sensitive data, or do we just want to remove/obstruct the sensitive data in the log message but still log it.

@joshdover
Copy link
Contributor

Do we want to filter (remove) the whole log message when it contains sensitive data, or do we just want to remove/obstruct the sensitive data in the log message but still log it.

I strongly prefer we just filter the sensitive data out but still log the message.

@mshustov
Copy link
Contributor Author

mshustov commented Feb 20, 2020

In l4j, that kind of filter is usually based on informations present in the MDC. We would not be able to relies on such mechanism. Do we have anything better than applying regexp for that kind of filter then?

LP relies on Hapi log output for request/response and filters data in JSON before formatting them. We will control the output format for request/response logging when #13241 lands. We can log additional data as MetaData and apply the filter to metadata in JSON format.

Do we want to filter (remove) the whole log message when it contains sensitive data, or do we just want to remove/obstruct the sensitive data in the log message but still log it.

I strongly prefer we just filter the sensitive data out but still log the message.

The current implementation allows only 2 operations:

  • remove a property
  • censor a string property (the full string or via regexp)

Example https://github.com/elastic/kibana/blob/8e9a8a84dccfa7965ce8a22362885e6cdef8b51f/src/legacy/server/logging/apply_filters_to_keys.js

@mshustov mshustov removed the Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc label Feb 26, 2020
@joshdover joshdover mentioned this issue Mar 17, 2020
30 tasks
@lukeelmers lukeelmers self-assigned this Dec 16, 2020
@lukeelmers
Copy link
Member

lukeelmers commented Jan 11, 2021

A few initial thoughts:

The current implementation allows only 2 operations:

  • remove a property
  • censor a string property (the full string or via regexp)

As @restrry outlines above, legacy platform isn't providing too much for us here: with the current filters config, we can basically just remove a property, or censor via regex.

The concept of filters in log4j actually sounds a bit different from the way we've used "filters" in LP logging; they basically serve to determine whether a log entry should be included in its entirety, or thrown out:

  • each filter returns an ACCEPT, DENY or NEUTRAL value for a provided log record
  • it allows you to configure filters at multiple levels (context, loggers, appenders, appender references)
  • there are lots of different types of filters besides regex, allowing you to do everything from rate-limiting to executing custom scripts

However, AFAICT what you can't do with log4j filters alone is modify existing log messages. For that they provide a RewriteAppender, which is sort of a proxy appender that modifies a log entry based on configuration before passing it to a "destination" appender.

RewriteAppenders and Filters do have some overlap (you can optionally provide a Filter in a RewriteAppender configuration), but if our primary goal is feature parity with LP, I don't think we necessarily need to introduce the concept of Filters at all -- that feels like an entirely different feature. Rather, we'd need to create our own RewriteAppender.

This would let us continue to do things like "censoring" a log message by redacting particular headers, performing string replacements in a log message, adding new data to a message, or upgrading/downgrading a log's level.

I think the main question would be how to make the config for this as simple as possible. The logic for the actions that can be performed by a RewriteAppender reside in a RewritePolicy. In our case, it seems some type of "MetaRewritePolicy" may be all we need at first.

The config could potentially look something like this:

logging:
  appenders:
    file:
      kind: file
      path: ./kibana.log
      layout:
        kind: json
    censor:
      kind: rewrite
      appenders: [file] # the destination appender where this is sent after modification
      policy:
        kind: meta # name of the policy
        mode: update # or "add" or "remove" etc
        # Need to think about naming here. log4j calls this KeyValuePair,
        # but in some cases ("remove") you may not need a value.
        property:
          key: "headers.authorization" # path within the log meta object
          value: "[redacted]"

  root:
    appenders: [censor, default]
    level: debug

The legacy approach seems simpler from a configuration standpoint, but is also less powerful and of course not as aligned with log4j 2. Would be interested to get some feedback on what feels like a logical first step to take.

cc @restrry @joshdover @pgayvallet

@joshdover
Copy link
Contributor

In terms of keeping this in line with log4j, I think a rewrite appender makes sense 👍 To simplify things a bit, we could omit the kind: meta and mode: update keys from the rewrite policy and only introduce them if we need these features in the future. For the properties, maybe this shape makes more sense:

properties:
  - key: "headers.authorization"
    value: "[redacted]"

That way multiple properties could be on the same policy, and we could still easily support just key for the remove use case.

That said, I'm wondering if we need to support a configurable interface at all. We don't currently document the logging.filter configuration, so I don't believe we have to explicitly support it (?). We could possibly get away with hard-coding this logic in the BaseLogger class for the known keys (eg header.authorization) and only adding a true rewrite appender once we really need it. It would be great if we had some telemetry on the usage of this, but it doesn't appear that we do 🙁

@lukeelmers
Copy link
Member

We don't currently document the logging.filter configuration, so I don't believe we have to explicitly support it (?). We could possibly get away with hard-coding this logic in the BaseLogger class for the known keys (eg header.authorization) and only adding a true rewrite appender once we really need it.

Yeah this would certainly save some effort, although my takeaway from the discussion in #13241 was that we still want to provide the ability to do this.

FWIW I don't think it would be enormous task to create the simplest iteration of a rewrite appender as you describe above (with a policy that will only change properties in the meta). However if we don't need that functionality at all, we can easily just exclude the authorization/cookie headers in the course of #13241.

@mshustov
Copy link
Contributor Author

mshustov commented Jan 12, 2021

I don't think it would be enormous task to create the simplest iteration of a rewrite appender as you describe above (with a policy that will only change properties in the meta)

Probably we can implement this but keep it private as an escape hatch for debug purposes #13241 (comment)?

That way multiple properties could be on the same policy, and we could still easily support just key for the remove use case.

How would you add censored fields back? Add removal operation only if specified in the config?

@lukeelmers
Copy link
Member

How would you add censored fields back? Add removal operation only if specified in the config?

One option could be to remove properties if null is specified, otherwise do a replacement, which means you wouldn't necessarily need a "mode" initially (unless other policies were added later):

logging:
  appenders:
    censor:
      kind: rewrite
      appenders: [file] # the destination appender where this is sent after modification
      policy:
        kind: meta # name of the policy
        properties:
          - key: "headers.authorization" # path within the log meta object
            value: "[redacted]"
          - key: "headers.cookie"
            value: null # removes this property entirely

@lukeelmers
Copy link
Member

We don't currently document the logging.filter configuration, so I don't believe we have to explicitly support it (?)

To follow up on this: the cloud team has confirmed that the logging.filter config isn't exposed to users and also isn't used anywhere internally.

So if we decided not to address this at all and simply exclude Authorization and Cookie headers, the only people who would be affected are folks who have discovered this undocumented setting and have been relying on it. (And we could add a config deprecation to warn them it is going away)

Alternatively we could implement RewriteAppender for debugging purposes as @restrry suggests, and just leave it undocumented for the time being.

WDYT @restrry @joshdover? Have we seen any examples where having access to these headers would be useful from a debugging/support perspective?

@mshustov
Copy link
Contributor Author

@lukeelmers Security team might need it to debug problems with login. I'd suggest you timebox RewriteAppender implementation and move forward without it if you face any problems.

@lukeelmers
Copy link
Member

This will be closed by #91492 with one important caveat: All http response logs will redact authorization, cookie, and set-cookie headers by default, and this can not be overridden with the new RewriteAppender.

If we decide to pick up #92082 as a future enhancement, we'll be able to lift this restriction at that time and folks will be able to explicitly disable the censoring of those headers should they choose to do so.

With the exception of the 3 headers listed above, the new appender will allow users to remove & modify any path in the LogMeta from any logger.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants