-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Auto Mask Secrets #728
Comments
I agree that credentials should not be logged as per ASVS 7.1.1. There are other types of data that should not be logged as well based on who owns the data or where it is stored. How would someone automate this? What are the performance implications need to be considered compared to other approaches? In I believe GitHub Actions has this functionality built in, they might have a package that could be used. Git secrets scanner may also be a source of ideas for this (https://github.com/awslabs/git-secrets). |
Hi @jasonwadsworth thanks a lot for the feature request. We have had a similar feature on the radar for all the runtimes, although it seems this one has a more reduced scope. You can find more details in this other issue aws-powertools/powertools-lambda-python#1173 I'm 100% on board and in favor of the concept of not logging secrets and leaving the responsibility to the developer seems like a sensible choice. At the moment though, we want to make sure to make the base feature set of the core utilities as stable and mature as possible before considering new features. After going GA and having collected enough feedback from customers we'll definitely consider this type of request. |
I searched for something like this and didn't find it. Thanks for the link to the roadmap. Also completely understand on timing. This was just something that came up today so I thought I'd add it. This is one case where I'd like to make it easier for devs to not make mistakes, so it would be nice to not leave it to the devs. Having it at the configuration point of creating the logger would be great, since that can be done in a single place. |
I like this feature. I suggest that the user can define which keys to be filtered instead of having defaults. |
+1 |
Hey guys, I'm interested in contributing. I was thinking about allowing for a maskedValues array on logger creation and then under |
Hi @wehnerjp, thank you for the offering to contribute to the feature and project and sorry for the slight delay. I saw your message yesterday and I wanted to have the time to formulate a proper response. Adding a Data Masking utility is something that we are strongly considering doing, however we'd like to do so with a bigger scope than just obfuscating, which is what is currently being implemented in the Python version of Powertools. Obfuscation in itself is not necessarily complex, however after talking with other customers and security experts we have found that the next natural step after obfuscation would be to allow for recovering the sensitive data that was masked. To do this however we can't simply find & replace all the sensitive values with Since this issue was created, work on the Python side of Powertools has continued and the feature has gone through a RFC (aws-powertools/powertools-lambda-python#1858), which eventually coalesced around an implementation that uses KMS and an Encryption SDK, to encrypt the sensitive values. Later, the team has worked on an initial implementation (aws-powertools/powertools-lambda-python#2197) using the specification agreed in the RFC, which was merged 3 weeks ago. As stated in our public roadmap, one of our main focuses as a project is to eventually reach feature parity with Powertools for AWS Lambda (Python), so an eventual implementation of a Data Masking utility in this project will look more or less the same to the one they just merged in terms of feature set and overall scope. At the moment we are focusing on getting version 2 out (#1714) and two new utilities: Parser (#1334) & Validation (#508), however I'm more than open to start discussing and refining this RFC if you or anyone else in the thread/community is interested in contributing it. If instead you're only interested in obfuscating values and are not interested in encrypting them and/or being able to retrieve them, then you'll be able to do this with the existing feature set of Logger real soon. As mentioned, we are currently working on our next major version which will make the Logger and its At the moment the With the new version, you'll be able to create a custom import { Logger, LogFormatter, LogItem } from "@aws-lambda-powertools/logger";
import type {
LogAttributes,
UnformattedAttributes,
} from "@aws-lambda-powertools/logger/types";
class MyFormatter extends LogFormatter {
#keysToObfuscate: string[];
public constructor({ keysToObfuscate }: { keysToObfuscate: string[] }) {
super();
this.#keysToObfuscate =
keysToObfuscate && keysToObfuscate.length > 0 ? keysToObfuscate : [];
}
public formatAttributes(
attributes: UnformattedAttributes,
additionalLogAttributes: LogAttributes
): LogItem {
return new LogItem({
attributes: {
...this.obfuscate(attributes),
...this.obfuscate(additionalLogAttributes),
timestamp: this.formatTimestamp(attributes.timestamp),
},
});
}
/**
* Obfuscate the keys in the object that match the keysToObfuscate array.
* If the object is nested, recursively obfuscate the keys.
*
* @note this is a sample & naïve implementation - it can probably be optimized
*/
obfuscate(object: Record<string, unknown>) {
const obfuscatedObject: Record<string, unknown> = {};
Object.keys(object).forEach((key) => {
if (this.#keysToObfuscate.includes(key)) {
obfuscatedObject[key] = "****";
} else if (typeof object[key] === "object") {
obfuscatedObject[key] = this.obfuscate(
object[key] as Record<string, unknown>
);
} else {
obfuscatedObject[key] = object[key];
}
});
return obfuscatedObject;
}
} Which you can then use in your Logger like this: const logger = new Logger({
logLevel: "DEBUG",
logFormatter: new MyFormatter({
keysToObfuscate: ["my_secret_field", "my_other_secret_field"],
}),
}); I have tested the above on a build made from the code currently in the {
"logLevel": "DEBUG",
"timestamp": "2023-10-19T21:13:04.584Z",
"message": "this log might contain secrets",
"serviceName": "service_undefined",
"my_secret_field": "****",
"some_other_field": "some other value",
"a_nested_object": {
"nested_field": "nested value",
"my_secret_field": "****",
"another_nested_object": {
"my_other_secret_field": "****"
}
}
} We are planning on releasing the first alpha version (which will have this feature) early next week. |
Wow, that's fantastic. Really excited about the v2 updates. Thank you so much for your thoughtful response. Great info and very helpful. I ended up just creating a wrapper class around Logger that overrode the methods with a little search and replace subroutine similar to the one you illustrated above before passing it down to the super class invocation. I'll look into the python power tools functionality you're talking about. If it seems in my wheelhouse I'd be happy to contribute. |
Not sure if this is the right place to continue the conversation, but I got a chance to look into the python power tools implementation. It looks like the data masking is a separate utility from the logger rather than integrated into it. Do we want to just provide an encryption/decryption utility that can be imported, but has to be invoked separately from the Logger with the TS implementation as well? |
Hi @wehnerjp, yes this is the right place. Ideally yes, we want to match the implementation and features in the Python version so it'd be a separate utility. This way customers can use it regardless of logging. With that said, I don't think a deeper integration between the two is out of the question after we have released the first version of the new utility. In the meantime, customers can create a custom log formatter and apply the masking. |
Cool, I'll plan on just using this RFC then |
Hi guys, Design proposal (Request for comments)1. SummaryThe goal of this document is to propose the scope and design of a Data Masking utility for Powertools for TypeScript. The basis for the implementation of this utility stems from the Python Powertools repo. We will use the current Python implementation as a starting point, and focus mainly on the differences in Typescript implementation. As this is a step towards parity with the Python utility it is necessary to familiarize yourself with their implementation of Data Masking. The RFC for the Python Data Masking utility and the most substantative feature commit can provide helpful context if you lack prior knowledge. 2. MotivationWe want to create a utility, that offers developers a straightforward way to enhance security within their serverless applications. Data Masking is an important capability that will ease the out of the box adoption of powertools for customers hoping to leverage our provided utilities in a comprehensive and secure way. Prior to this, a wrapper class was necessary for consumers who wanted to scrub sensitive data. While this is a viable solution, it is not ideal as it requires additional code and is not as intuitive as a native utility (and potentially as a decorator for other native utilities like logger). 3. Utility interfaceWith this utility, customers only need to initialize a data masking instance and invoke it providing 3 parameters:
Initially, there is only two instances of the data masking utility we'll aim to support:
3.0.1. Mock Implementation of the Data Masking Class dataMaskFilter method:export class DataMasking {
/**
* Decorator function for applying data masking to a class or method.
* It wraps the original method with data masking logic based on the provided options.
*
* @param options - Configuration options for data masking.
* @returns A decorator function to be applied to a class or method.
*/
public dataMaskFilter = function (
options: DataMaskingFunctionOptions<Parameters<AnyFunction>>
): (
target: unknown,
propertyKey: string,
descriptor: PropertyDescriptor
) => PropertyDescriptor {
return function (
_target: unknown,
_propertyKey: string,
descriptor: PropertyDescriptor
) {
const original = descriptor.value;
// Wrap the original method with data masking logic
descriptor.value = function (...args: any[]) {
// Apply data masking to the arguments
let maskedArgs = this.filterData(options, ...args);
// Call the original method with masked arguments
const result = original.call(this, ...maskedArgs);
return result;
};
return descriptor;
};
};
public mask(){
/*...Masking Logic Here...*/
}
} 3.1.1. Data Masking handler (via Decorator)In this section, we demonstrate the integration of the Data Masking utility into an AWS Lambda function handler using a decorator. Using decorators offers several benefits to end users, primarily enhancing code modularity, maintainability, and reducing boilerplate. The decorator encapsulates the cross-cutting concern of data masking, by abstracting away implementation details. This allows end users to focus on core functionality and maintain standardized approach to data masking. class Lambda implements LambdaInterface {
/**
* Handler method for AWS Lambda functions, decorated with data masking.
* It demonstrates the usage of the @masker.dataMaskFilter decorator to apply data masking to specific fields in the input data.
*
* @param _event - The input event object.
* @param _context - The execution context.
* @returns A Promise<void> representing the completion of the function.
*/
@masker.dataMaskFilter({
fieldsToMask: ['password', 'ssn'],
defaultAction: 'mask',
})
public async handler(_event: any, _context: any): Promise<void> {
// Original function logic here...
}
} 3.2.1. Data Masking Direct InvocationHere end users can reference our instance and manipulate the data within the handler directly. This approach offers flexibility for runtime customization and is particularly useful when the exact point of data masking application needs to be determined dynamically within the handler function. Users can easily store the output to other variables (like environment variables), or use some part of the output in a further AWS resource invocation. This allows end users to tailor their utilization of the package at any point in the runtime to a specific usecase. // Create an instance of the DataMasking utility
const masker = new DataMasking();
class Lambda implements LambdaInterface {
/**
* Handler method for AWS Lambda functions with direct data masking invocation.
*
* @param _event - The input event object.
* @param _context - The execution context.
* @returns A Promise<void> representing the completion of the function.
*/
public async handler(_event: any, _context: any): Promise<void> {
// Original function logic here...
// Sample data containing sensitive information
const data = {
project: 'powertools',
sensitive: 'xxxxxxxxxx',
};
// Apply data masking to specific fields
const masked = masker.mask(data, ['sensitive']);
// Return the masked data or perform additional logic as needed
return masked;
}
} 4. Additional FeaturesThese features may not be released as part of the first release, but should be considered for implementation soon thereafter. 1. Integration with other utilitiesIn the future, it's worth considering using the decorators implementation to allow one instantiated class instance (with a list of masked fields) to activate masking via an optional passed parameter in method props across different powertools utilities. For example: import { LambdaInterface } from '@aws-lambda-powertools/interfaces';
import { Logger } from '@aws-lambda-powertools/logger';
import { DataMasking } from '@aws-lambda-powertools/data-masking';
class MyLambda implements LambdaInterface {
public async handler(event: any, context: any) {
/*...Function logic here...*/
const log = new Logger({
logFormatter: new MyLogFormatter(),
serviceName: "mx-docs",
// Anything passed into the info, warn, error methods would be parsed for fields and masked
dataMasking: {
fieldsToMask: ['password', 'ssn'],
defaultAction: 'mask',
},
});
}
} This integration would streamline end users ability to leverage data masking in a variety of contexts within our powertools suite. Rather than a mandatory two step process for native powertools classes (either a wrapper class solution or executing a masking operation inline before passing data to another powertools utility), we could allow for a single step process where masking could be activated via one additonal param in the method invocation. This would mirror our users ability to leverage the decorator functionality in their own class methods. 2. Alternate ways to set fields to maskWe could potentially pull from a set process.env variable for a list of data masking fields too. 3. Add middy implementationSimilar to Logger 4. Design time Data Masking DecoratorSimilar to 'idempotent' Decorator this will allow users to leverage the decorator to mask data in their own class methods. It won't require external instantiation export const masked = function (
options: DataMaskingFunctionOptions<Parameters<AnyFunction>>
): (
target: unknown,
propertyKey: string,
descriptor: PropertyDescriptor
) => PropertyDescriptor {
return function (
_target: unknown,
_propertyKey: string,
descriptor: PropertyDescriptor
) {
const childFunction = descriptor.value;
descriptor.value = maskData(childFunction, options);
return descriptor;
};
}; 5. Opened discussions and decisionsThis is a list of important decision points from maintainer discussion. Subject to changes from comments from community. Just a note, this data masking utility can be leveraged for logging in tandem with Cloudwatch Logs sensitive data protection. End users may find a combination more suitable than either individually and may not have awareness of one or both options. 1. ReferencesOrignal RFC from from Python version |
When we can expect data masking utility (similar to the one created in Python aws-powertools/powertools-lambda-python#2197) in typescript? |
Hi, for the time being we are focused on shipping the Event Handler utility as per our roadmap, which is our most requested feature by far. This one is in the top 3 so we should be able to start working on it after that unless customer demand changes. At the moment I don't have a concrete date to share, but I wouldn't expect work on this feature to start before end of the year / beginning of 2025. |
Thanks for the quick response. Where I can track/check the status/progress of this feature? |
This issue is the best place as of now - once we have updates I'll post them here. For added context: the resolution of this issue (Request for comments) is the pre-requisite for implementation work to start. The next step here, besides prioritizing the feature and allocating resources, will be to complete the specification & DX and ideally have a discussion. Then, once we settle on a spec, we'll start the actual implementation. |
Just wanted to provide an update. First of all, I want to thank @wehnerjp for all the work put in refining the RFC. Second, I want to acknowledge that due to resource constraints we haven't been able to follow up on this new feature, nor give the RFC the attention needed. Currently we are focused on delivering on other features as described in our roadmap. This is however a feature that we are keeping in mind for next year. My ask to everyone in this thread, or who might find the issue after I wrote this comment would be to engage with the original post with a 👍 and consider leaving a comment with your use case. Having customer names attached to a feature helps us prioritize future backlog. |
Hey! No problem @dreamorosi , happy to help. I actually wrote a bunch of code for it, but the topic cooled so I have been focussing on some other personal projects in my free time. If this picks up traction again or you have anything other tickets in the roadmap you think I can help with feel free to ping me or tag me in the thread again. |
Description of the feature request
Automatically mask secret or password values in extra data.
Problem statement
A best practice is to never log secrets or passwords. There are times when I have an object that might have one of these things in it. I'd like to be able to pass in the object without needing to pull out the secrets manually.
Summary of the feature
If turned on, any extra data passed into the logger would be looked at and secrets/passwords would be masked. Consider the following data.
The code would see a key that contains "password" (this could be a configurable regex with a default) and would replace it with something like "*****", resulting in the following extra data being logged.
No secrets in logs
Code examples
Benefits for you and the wider AWS community
Safer logging.
Describe alternatives you've considered
Creating a function that does this for me. The downside to this is twofold. One, I have to include the function call everywhere, so if I forget it I risk writing something I shouldn't to the logs. Two, the devex is slightly worse because I have to do something like
{ extraData: cleanExtraData(extraData) }
instead of just{ extraData }
Additional context
Related issues, RFCs
The text was updated successfully, but these errors were encountered: