Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Auto Mask Secrets #728

Open
jasonwadsworth opened this issue Apr 5, 2022 · 19 comments
Open

RFC: Auto Mask Secrets #728

jasonwadsworth opened this issue Apr 5, 2022 · 19 comments
Labels
need-customer-feedback Requires more customers feedback before making or revisiting a decision on-hold This item is on-hold and will be revisited in the future revisit-in-3-months Blocked issues/PRs that need to be revisited RFC Technical design documents related to a feature request

Comments

@jasonwadsworth
Copy link

Description of the feature request

Automatically mask secret or password values in extra data.

Problem statement

A best practice is to never log secrets or passwords. There are times when I have an object that might have one of these things in it. I'd like to be able to pass in the object without needing to pull out the secrets manually.

Summary of the feature

If turned on, any extra data passed into the logger would be looked at and secrets/passwords would be masked. Consider the following data.

{
  username: "my-username",
  password: "my-password"
}

The code would see a key that contains "password" (this could be a configurable regex with a default) and would replace it with something like "*****", resulting in the following extra data being logged.

{
  username: "mu-username",
  password: "*****"
}

No secrets in logs

Code examples

const logger = new Logger();

const objectWithASecret = {
  username: "my-username",
  password: "my-password"
};

logger.debug('Logging in', { objectWithASecret });  

Benefits for you and the wider AWS community

Safer logging.

Describe alternatives you've considered

Creating a function that does this for me. The downside to this is twofold. One, I have to include the function call everywhere, so if I forget it I risk writing something I shouldn't to the logs. Two, the devex is slightly worse because I have to do something like { extraData: cleanExtraData(extraData) } instead of just { extraData }

Additional context

Related issues, RFCs

@jasonwadsworth jasonwadsworth added the triage This item has not been triaged by a maintainer, please wait label Apr 5, 2022
@willfarrell
Copy link

willfarrell commented Apr 5, 2022

I agree that credentials should not be logged as per ASVS 7.1.1. There are other types of data that should not be logged as well based on who owns the data or where it is stored.

How would someone automate this?
a) Have a list of keys that should be hidden? English speaking developers might use password or token, but what about in other languages?
b) Detect secret like patterns? Secrets could look like other data?
c) Mix of both?

What are the performance implications need to be considered compared to other approaches?

In @middy/input-output-logger we opted for allowing a developer to list what paths to omit using omitPaths. In my experience secrets/PII is well known during development and not overwhelming to manually list per handler.

I believe GitHub Actions has this functionality built in, they might have a package that could be used. Git secrets scanner may also be a source of ideas for this (https://github.com/awslabs/git-secrets).

@dreamorosi
Copy link
Contributor

Hi @jasonwadsworth thanks a lot for the feature request.

We have had a similar feature on the radar for all the runtimes, although it seems this one has a more reduced scope. You can find more details in this other issue aws-powertools/powertools-lambda-python#1173

I'm 100% on board and in favor of the concept of not logging secrets and leaving the responsibility to the developer seems like a sensible choice.

At the moment though, we want to make sure to make the base feature set of the core utilities as stable and mature as possible before considering new features. After going GA and having collected enough feedback from customers we'll definitely consider this type of request.

@dreamorosi dreamorosi added RFC Technical design documents related to a feature request on-hold This item is on-hold and will be revisited in the future feature labels Apr 5, 2022
@jasonwadsworth
Copy link
Author

I searched for something like this and didn't find it. Thanks for the link to the roadmap. Also completely understand on timing. This was just something that came up today so I thought I'd add it.

This is one case where I'd like to make it easier for devs to not make mistakes, so it would be nice to not leave it to the devs. Having it at the configuration point of creating the logger would be great, since that can be done in a single place.

@damianesteban
Copy link

I like this feature. I suggest that the user can define which keys to be filtered instead of having defaults.

@RajivSah
Copy link

+1

@dreamorosi dreamorosi added need-customer-feedback Requires more customers feedback before making or revisiting a decision and removed feature triage This item has not been triaged by a maintainer, please wait labels Nov 13, 2022
@dreamorosi dreamorosi changed the title Feature (logger): Auto Mask Secrets RFC Auto Mask Secrets Nov 14, 2022
@dreamorosi dreamorosi changed the title RFC Auto Mask Secrets RFC: Auto Mask Secrets Nov 14, 2022
@sthulb sthulb moved this from Backlog to Ideas in Powertools for AWS Lambda (TypeScript) Jun 19, 2023
@wehnerjp
Copy link

Hey guys, I'm interested in contributing. I was thinking about allowing for a maskedValues array on logger creation and then under createAndPopulateLogItem add a function that cross checks key value pairs with the array and masks the values that correspond. Would that work? Should be non-intrusive and self contained.

@wehnerjp
Copy link

@sthulb

@dreamorosi
Copy link
Contributor

dreamorosi commented Oct 19, 2023

Hi @wehnerjp, thank you for the offering to contribute to the feature and project and sorry for the slight delay.

I saw your message yesterday and I wanted to have the time to formulate a proper response.

Adding a Data Masking utility is something that we are strongly considering doing, however we'd like to do so with a bigger scope than just obfuscating, which is what is currently being implemented in the Python version of Powertools.

Obfuscation in itself is not necessarily complex, however after talking with other customers and security experts we have found that the next natural step after obfuscation would be to allow for recovering the sensitive data that was masked. To do this however we can't simply find & replace all the sensitive values with ***. For the values to be rehydrated, we need to use a deterministic approach that can be reversed (aka encryption) - see more details in aws-powertools/powertools-lambda-python#1173.

Since this issue was created, work on the Python side of Powertools has continued and the feature has gone through a RFC (aws-powertools/powertools-lambda-python#1858), which eventually coalesced around an implementation that uses KMS and an Encryption SDK, to encrypt the sensitive values.

Later, the team has worked on an initial implementation (aws-powertools/powertools-lambda-python#2197) using the specification agreed in the RFC, which was merged 3 weeks ago.

As stated in our public roadmap, one of our main focuses as a project is to eventually reach feature parity with Powertools for AWS Lambda (Python), so an eventual implementation of a Data Masking utility in this project will look more or less the same to the one they just merged in terms of feature set and overall scope.

At the moment we are focusing on getting version 2 out (#1714) and two new utilities: Parser (#1334) & Validation (#508), however I'm more than open to start discussing and refining this RFC if you or anyone else in the thread/community is interested in contributing it.
The goal would be to study the RFC and implementation in Python and come up with a similar RFC here that incorporates the same scope & features, but following idioms and conventions from this project and the wider ecosystem (Node.js / TS).

If instead you're only interested in obfuscating values and are not interested in encrypting them and/or being able to retrieve them, then you'll be able to do this with the existing feature set of Logger real soon.

As mentioned, we are currently working on our next major version which will make the Logger and its LogFormatter more customizable by allowing you to fully customize the log output.

At the moment the LogFormatter in v1 only gives you access to the standard keys managed by the Logger (i.e. level, message, timestamp, etc.), however this is going to change soon.

With the new version, you'll be able to create a custom LogFormatter like the one below, and obfuscate any value you want:

import { Logger, LogFormatter, LogItem } from "@aws-lambda-powertools/logger";
import type {
  LogAttributes,
  UnformattedAttributes,
} from "@aws-lambda-powertools/logger/types";

class MyFormatter extends LogFormatter {
  #keysToObfuscate: string[];

  public constructor({ keysToObfuscate }: { keysToObfuscate: string[] }) {
    super();
    this.#keysToObfuscate =
      keysToObfuscate && keysToObfuscate.length > 0 ? keysToObfuscate : [];
  }

  public formatAttributes(
    attributes: UnformattedAttributes,
    additionalLogAttributes: LogAttributes
  ): LogItem {
    return new LogItem({
      attributes: {
        ...this.obfuscate(attributes),
        ...this.obfuscate(additionalLogAttributes),
        timestamp: this.formatTimestamp(attributes.timestamp),
      },
    });
  }

  /**
   * Obfuscate the keys in the object that match the keysToObfuscate array.
   * If the object is nested, recursively obfuscate the keys.
   * 
   * @note this is a sample & naïve implementation - it can probably be optimized
   */
  obfuscate(object: Record<string, unknown>) {
    const obfuscatedObject: Record<string, unknown> = {};
    Object.keys(object).forEach((key) => {
      if (this.#keysToObfuscate.includes(key)) {
        obfuscatedObject[key] = "****";
      } else if (typeof object[key] === "object") {
        obfuscatedObject[key] = this.obfuscate(
          object[key] as Record<string, unknown>
        );
      } else {
        obfuscatedObject[key] = object[key];
      }
    });
    return obfuscatedObject;
  }
}

Which you can then use in your Logger like this:

const logger = new Logger({
  logLevel: "DEBUG",
  logFormatter: new MyFormatter({
    keysToObfuscate: ["my_secret_field", "my_other_secret_field"],
  }),
});

I have tested the above on a build made from the code currently in the feat/v2 branch and the log output generated looks like this:

{
    "logLevel": "DEBUG",
    "timestamp": "2023-10-19T21:13:04.584Z",
    "message": "this log might contain secrets",
    "serviceName": "service_undefined",
    "my_secret_field": "****",
    "some_other_field": "some other value",
    "a_nested_object": {
        "nested_field": "nested value",
        "my_secret_field": "****",
        "another_nested_object": {
            "my_other_secret_field": "****"
        }
    }
}

We are planning on releasing the first alpha version (which will have this feature) early next week.

@wehnerjp
Copy link

Wow, that's fantastic. Really excited about the v2 updates. Thank you so much for your thoughtful response. Great info and very helpful. I ended up just creating a wrapper class around Logger that overrode the methods with a little search and replace subroutine similar to the one you illustrated above before passing it down to the super class invocation. I'll look into the python power tools functionality you're talking about. If it seems in my wheelhouse I'd be happy to contribute.

@wehnerjp
Copy link

Not sure if this is the right place to continue the conversation, but I got a chance to look into the python power tools implementation. It looks like the data masking is a separate utility from the logger rather than integrated into it. Do we want to just provide an encryption/decryption utility that can be imported, but has to be invoked separately from the Logger with the TS implementation as well?

@dreamorosi
Copy link
Contributor

Hi @wehnerjp, yes this is the right place.

Ideally yes, we want to match the implementation and features in the Python version so it'd be a separate utility. This way customers can use it regardless of logging.

With that said, I don't think a deeper integration between the two is out of the question after we have released the first version of the new utility. In the meantime, customers can create a custom log formatter and apply the masking.

@wehnerjp
Copy link

Cool, I'll plan on just using this RFC then

@wehnerjp
Copy link

wehnerjp commented Nov 13, 2023

Hi guys,
I've been chatting with Andrea and I wanted to take the next step to move forward on implementing this RFC in Typescript

Design proposal (Request for comments)

1. Summary

The goal of this document is to propose the scope and design of a Data Masking utility for Powertools for TypeScript. The basis for the implementation of this utility stems from the Python Powertools repo. We will use the current Python implementation as a starting point, and focus mainly on the differences in Typescript implementation.

As this is a step towards parity with the Python utility it is necessary to familiarize yourself with their implementation of Data Masking. The RFC for the Python Data Masking utility and the most substantative feature commit can provide helpful context if you lack prior knowledge.

2. Motivation

We want to create a utility, that offers developers a straightforward way to enhance security within their serverless applications.

Data Masking is an important capability that will ease the out of the box adoption of powertools for customers hoping to leverage our provided utilities in a comprehensive and secure way.

Prior to this, a wrapper class was necessary for consumers who wanted to scrub sensitive data. While this is a viable solution, it is not ideal as it requires additional code and is not as intuitive as a native utility (and potentially as a decorator for other native utilities like logger).

3. Utility interface

With this utility, customers only need to initialize a data masking instance and invoke it providing 3 parameters:

  • A value or object,
  • A list of fields to act on
  • Action to take (ex. encrypt, decrypt, mask)

Initially, there is only two instances of the data masking utility we'll aim to support:

  1. Data masking via decorator (at handler or function level). Can mask incoming event data and outgoing return data.

    • I want to note here that we will be using stage 2 decorators because stage 3 aren't supported by esbuild yet. You can find more information on this here. Stage 2 Documentation can be found here.
  2. Data masking via direct method invocation. (at any point within the handler function)

3.0.1. Mock Implementation of the Data Masking Class dataMaskFilter method:

export class DataMasking {

/**
 * Decorator function for applying data masking to a class or method.
 * It wraps the original method with data masking logic based on the provided options.
 *
 * @param options - Configuration options for data masking.
 * @returns A decorator function to be applied to a class or method.
 */
public dataMaskFilter = function (
  options: DataMaskingFunctionOptions<Parameters<AnyFunction>>
): (
  target: unknown,
  propertyKey: string,
  descriptor: PropertyDescriptor
) => PropertyDescriptor {
  return function (
    _target: unknown,
    _propertyKey: string,
    descriptor: PropertyDescriptor
  ) {
    const original = descriptor.value;
    // Wrap the original method with data masking logic
    descriptor.value = function (...args: any[]) {
      // Apply data masking to the arguments
      let maskedArgs = this.filterData(options, ...args);

      // Call the original method with masked arguments
      const result = original.call(this, ...maskedArgs);
      return result;
    };

    return descriptor;
  };
};


public mask(){
  /*...Masking Logic Here...*/
}
}

3.1.1. Data Masking handler (via Decorator)

In this section, we demonstrate the integration of the Data Masking utility into an AWS Lambda function handler using a decorator. Using decorators offers several benefits to end users, primarily enhancing code modularity, maintainability, and reducing boilerplate. The decorator encapsulates the cross-cutting concern of data masking, by abstracting away implementation details. This allows end users to focus on core functionality and maintain standardized approach to data masking.

class Lambda implements LambdaInterface {
  /**
   * Handler method for AWS Lambda functions, decorated with data masking.
   * It demonstrates the usage of the @masker.dataMaskFilter decorator to apply data masking to specific fields in the input data.
   *
   * @param _event - The input event object.
   * @param _context - The execution context.
   * @returns A Promise<void> representing the completion of the function.
   */
  @masker.dataMaskFilter({
    fieldsToMask: ['password', 'ssn'],
    defaultAction: 'mask',
  })
  public async handler(_event: any, _context: any): Promise<void> {
    // Original function logic here...
  }
}

3.2.1. Data Masking Direct Invocation

Here end users can reference our instance and manipulate the data within the handler directly. This approach offers flexibility for runtime customization and is particularly useful when the exact point of data masking application needs to be determined dynamically within the handler function. Users can easily store the output to other variables (like environment variables), or use some part of the output in a further AWS resource invocation. This allows end users to tailor their utilization of the package at any point in the runtime to a specific usecase.

// Create an instance of the DataMasking utility
const masker = new DataMasking();

class Lambda implements LambdaInterface {
  /**
   * Handler method for AWS Lambda functions with direct data masking invocation.
   *
   * @param _event - The input event object.
   * @param _context - The execution context.
   * @returns A Promise<void> representing the completion of the function.
   */
  public async handler(_event: any, _context: any): Promise<void> {
    // Original function logic here...

    // Sample data containing sensitive information
    const data = {
      project: 'powertools',
      sensitive: 'xxxxxxxxxx',
    };

    // Apply data masking to specific fields
    const masked = masker.mask(data, ['sensitive']);

    // Return the masked data or perform additional logic as needed
    return masked;
  }
}

4. Additional Features

These features may not be released as part of the first release, but should be considered for implementation soon thereafter.

1. Integration with other utilities

In the future, it's worth considering using the decorators implementation to allow one instantiated class instance (with a list of masked fields) to activate masking via an optional passed parameter in method props across different powertools utilities. For example:

import { LambdaInterface } from '@aws-lambda-powertools/interfaces';
import { Logger } from '@aws-lambda-powertools/logger';
import { DataMasking } from '@aws-lambda-powertools/data-masking';


class MyLambda implements LambdaInterface {
  public async handler(event: any, context: any) {
    /*...Function logic here...*/
    const log = new Logger({
    logFormatter: new MyLogFormatter(),
    serviceName: "mx-docs",
    // Anything passed into the info, warn, error methods would be parsed for fields and masked
    dataMasking: {
      fieldsToMask: ['password', 'ssn'],
      defaultAction: 'mask',
    },
  });
  }
}

This integration would streamline end users ability to leverage data masking in a variety of contexts within our powertools suite. Rather than a mandatory two step process for native powertools classes (either a wrapper class solution or executing a masking operation inline before passing data to another powertools utility), we could allow for a single step process where masking could be activated via one additonal param in the method invocation. This would mirror our users ability to leverage the decorator functionality in their own class methods.

2. Alternate ways to set fields to mask

We could potentially pull from a set process.env variable for a list of data masking fields too.

3. Add middy implementation

Similar to Logger

4. Design time Data Masking Decorator

Similar to 'idempotent' Decorator this will allow users to leverage the decorator to mask data in their own class methods. It won't require external instantiation

export const masked = function (
  options: DataMaskingFunctionOptions<Parameters<AnyFunction>>
): (
  target: unknown,
  propertyKey: string,
  descriptor: PropertyDescriptor
) => PropertyDescriptor {

  return function (
    _target: unknown,
    _propertyKey: string,
    descriptor: PropertyDescriptor
  ) {
    const childFunction = descriptor.value;
    descriptor.value = maskData(childFunction, options);

    return descriptor;
  };
};

5. Opened discussions and decisions

This is a list of important decision points from maintainer discussion. Subject to changes from comments from community.

Just a note, this data masking utility can be leveraged for logging in tandem with Cloudwatch Logs sensitive data protection. End users may find a combination more suitable than either individually and may not have awareness of one or both options.

1. References

Orignal RFC from from Python version
aws-powertools/powertools-lambda-python#1858

@venkatparkland
Copy link

venkatparkland commented May 23, 2024

When we can expect data masking utility (similar to the one created in Python aws-powertools/powertools-lambda-python#2197) in typescript?

@dreamorosi
Copy link
Contributor

Hi, for the time being we are focused on shipping the Event Handler utility as per our roadmap, which is our most requested feature by far.

This one is in the top 3 so we should be able to start working on it after that unless customer demand changes.

At the moment I don't have a concrete date to share, but I wouldn't expect work on this feature to start before end of the year / beginning of 2025.

@venkatparkland
Copy link

Thanks for the quick response. Where I can track/check the status/progress of this feature?

@dreamorosi
Copy link
Contributor

This issue is the best place as of now - once we have updates I'll post them here.

For added context: the resolution of this issue (Request for comments) is the pre-requisite for implementation work to start.

The next step here, besides prioritizing the feature and allocating resources, will be to complete the specification & DX and ideally have a discussion. Then, once we settle on a spec, we'll start the actual implementation.

@dreamorosi
Copy link
Contributor

Just wanted to provide an update.

First of all, I want to thank @wehnerjp for all the work put in refining the RFC. Second, I want to acknowledge that due to resource constraints we haven't been able to follow up on this new feature, nor give the RFC the attention needed.

Currently we are focused on delivering on other features as described in our roadmap.

This is however a feature that we are keeping in mind for next year. My ask to everyone in this thread, or who might find the issue after I wrote this comment would be to engage with the original post with a 👍 and consider leaving a comment with your use case. Having customer names attached to a feature helps us prioritize future backlog.

@wehnerjp
Copy link

wehnerjp commented Aug 5, 2024

Hey! No problem @dreamorosi , happy to help. I actually wrote a bunch of code for it, but the topic cooled so I have been focussing on some other personal projects in my free time. If this picks up traction again or you have anything other tickets in the roadmap you think I can help with feel free to ping me or tag me in the thread again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need-customer-feedback Requires more customers feedback before making or revisiting a decision on-hold This item is on-hold and will be revisited in the future revisit-in-3-months Blocked issues/PRs that need to be revisited RFC Technical design documents related to a feature request
Projects
Development

No branches or pull requests

7 participants