Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for secrets metadata feature #8520

Merged
merged 8 commits into from
May 4, 2023

Conversation

JanKrivanek
Copy link
Member

related to #8401

Context

Preliminary design proposal for the Sensitivity Metadata feature
This PR is meant to be a form of design review

@JanKrivanek JanKrivanek marked this pull request as draft March 2, 2023 12:52
@JanKrivanek
Copy link
Member Author

Tagging @michaelcfanning as he has been researching and prototyping in this area quite extensively lately

* concatentaing property with other values or flattening item values or transforming items and then passing via other property - should such be ignored or redacted as whole, or redacting just the part formed from the sensitive data?
* values created via [Property functions](https://learn.microsoft.com/en-us/visualstudio/msbuild/property-functions) or [Item functions](https://learn.microsoft.com/en-us/visualstudio/msbuild/item-functions).

## Out of scope

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope

In previous exploration, we explicitly excluded processing path information, as this is very copious in the processing stream. It is theoretically possible someone might encode a password in a source file or directory name but seems unlikely.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - this is a good point. I can imagine someone would want to redact e.g. their login (#8493) - we need to find proper balance with acceptable performance (e.g. opt-in levels of redacting)

@JanKrivanek JanKrivanek marked this pull request as ready for review March 24, 2023 13:52
@JanKrivanek JanKrivanek changed the title Initial proposal for secrets metadata feature Proposal for secrets metadata feature Mar 24, 2023
* and such is indicated by user as sensitive, the generated build event needs to be redacted.
* and such is not indicated by user as sensitive, but the R-value is indicated as sensitive - the data holder (property/item) is marked as holding sensitive data and treated accordingly.
* and such is not indicated by user as sensitive, the value is passed to sensitivity indicating regex or callback (in case any of those are configured by user) and if matched - the data holder (property/item) is marked as holding sensitive data and treated accordingly.
* No other redacting of log events will be performed. This is not a strong requirement - we can introduce another opt-in level of strict inspection of all log events. The gain is very questionable though, while the performance impact is severe (internal experiments by @michaelcfanning measured by @rokonec indicate 4-times slow-down on mid-size build). Additionally to being perf-expensive, it can possibly get easily confused - e.g.:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should double-check with @rokonec, I'm not sure this figure is correct. I thought it was more like a 40% slowdown (and this was only for the binary logging scenario. Logging to console and file was lower). We did not complete sufficient profiling in the binary logging case to understand whether we're looking at an intimidating floor as far as slowdown costs. Our prototype also uses the built-in .NET regex engine, by far the slowest option (between things like SRM, RE2, etc.). The .NET 7 platform (and its regex engine) looks like it will greatly improve time-to-execute for this scenario.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The measured slowdown ratios for protoype with 5 secret formats were 1.03 for binlog and 1.75 for the diagnostic level of text logger. the 4x possible slow down was estimated from the increase of the patterns #.

This can likely be tackled a lot, but still feels as candidate for extra opt-in (as compared to best-efforts quick solution scanning just most likely sources of input info).

@rainersigwald rainersigwald added the Documentation Issues about docs, including errors and areas we should extend (this repo and learn.microsoft.com) label Mar 28, 2023
Copy link
Member

@Forgind Forgind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! I generally like the idea. Few thoughts:
Should we also permit redacting property/item names (not just the values)
Is the idea that a Secrets would work a bit like a PropertyGroup but with preset things you can opt into redacting? (Looking at things like <GH_token /> and wondering how that works)
I'm wondering if it's ok to wait until events reach loggers before redacting. On the one hand, that's less secure. On the other hand, if we implement the redaction logic correctly in our loggers, and customers trust their own custom loggers, then everything will be reacted properly, and we can probably do that more efficiently than trying to look at every single message sent to a logger...which also makes it easy to miss things.

I'm thinking the custom plugin version might be the cleanest option but also the most difficult for a user to implement. I personally like the option you said was not recommended 🙂 A single global property doesn't have great support for differing types, but we ultimately want to redact strings in various forms, so I think we could make it work, and properties are absurdly easy to set.

@JanKrivanek
Copy link
Member Author

@Forgind great thanks for feedback and thoughts!

Should we also permit redacting property/item names (not just the values)

I'm strongly opposed to this :-)
This night look like being ok with having sensitive data within a source repository. The build logs are then not the main concern.

Is the idea that a Secrets would work a bit like a PropertyGroup but with preset things you can opt into redacting? (Looking at things like <GH_token /> and wondering how that works)

Exactly.
Though - as mentioned - this is just longer term thinking that's likely subject to many changes (or withdrawal)

I'm wondering if it's ok to wait until events reach loggers before redacting. On the one hand, that's less secure. On the other hand, if we implement the redaction logic correctly in our loggers, and customers trust their own custom loggers, then everything will be reacted properly, and we can probably do that more efficiently than trying to look at every single message sent to a logger...which also makes it easy to miss things.

Ideally we do not look on every single message - just the build events that we know were formed together using properties or items indicated as sensitive.
On the other hand within logger we already do not have this information - so we'd either need to change logging contract or inspect all events.

I'm thinking the custom plugin version might be the cleanest option but also the most difficult for a user to implement. I personally like the option you said was not recommended 🙂 A single global property doesn't have great support for differing types, but we ultimately want to redact strings in various forms, so I think we could make it work, and properties are absurdly easy to set.

This is actually an interesting point. This might partly alleviate the problem with need to parse all properties and items prior start of redacting. It might be good idea to allow simple setting mode (fast failing if secrets indication used via items).
I'm adding this thought into the doc

@Forgind
Copy link
Member

Forgind commented Apr 4, 2023

I'm strongly opposed to this :-)
This night look like being ok with having sensitive data within a source repository. The build logs are then not the main concern.

That's an interesting point. I certainly don't think you should have sensitive names within an OS repository. I'm less sure about how important it is within a closed source repo, especially as you might well have secret project names or function names, and those would also be exposed to anyone with access to the source. I'm currently ok with either solution 🙂

On the other hand within logger we already do not have this information - so we'd either need to change logging contract or inspect all events.

I think the worst case scenario is where we tell people that we have a new feature that hides secrets, but we also ensure that it isn't a breaking change. A customer logs some custom events, and we don't realize they contain secrets, and they end up un-redacted, but because the customer knows we hide secrets, they publish their log somewhere public. I don't think there's a perfect non-breaking solution here, so I think your proposal is reasonable; I'm just trying to think through places things could go wrong.

@JanKrivanek
Copy link
Member Author

I think the worst case scenario is where we tell people that we have a new feature that hides secrets, but we also ensure that it isn't a breaking change. A customer logs some custom events, and we don't realize they contain secrets, and they end up un-redacted, but because the customer knows we hide secrets, they publish their log somewhere public. I don't think there's a perfect non-breaking solution here, so I think your proposal is reasonable; I'm just trying to think through places things could go wrong.

Yes - this will allways be just best efforts (as frankly - ideally MSBuild doesn't handle secrets, there should be other means of handling auth/autz - valuts, roles, ...). We need to be clear about is, with clear messaging of what is not supported. Current proposal of not supported possibilities of spilling sensitive data: https://github.com/dotnet/msbuild/pull/8520/files#diff-fd2a314820444c86af18122afc82a49145ddbe37cd3730b7f3e5c01f86ff02cdR108-R113
That feels it covers the menstioned scenario - or do you think we should add something?

@JanKrivanek JanKrivanek added the merge-when-branch-open PRs that are approved, except that there is a problem that means we are not merging stuff right now. label May 2, 2023
@Forgind Forgind merged commit 90f4271 into dotnet:main May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Issues about docs, including errors and areas we should extend (this repo and learn.microsoft.com) merge-when-branch-open PRs that are approved, except that there is a problem that means we are not merging stuff right now.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants