Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement an utility for checking secrets leaks in logs, use it a CI stage at PR level #6191

Open
igor-sirotin opened this issue Dec 10, 2024 · 10 comments

Comments

@igor-sirotin
Copy link
Collaborator

igor-sirotin commented Dec 10, 2024

Functional tests provide some secrets to status-backend through API (e.g. #6144).

When tearing down the test, it could automatically scan the geth.log for any leaks of such data. As it knows the values that should not have leaked. This includes:

  • any provider tokens and other secrets
  • any privacy-discoulsing data (display name, images, keys etc)

We can also expose this approach to desktop and mobile tests.

@igor-sirotin
Copy link
Collaborator Author

cc @antdanchenko @anastasiyaig @churik

@antdanchenko antdanchenko self-assigned this Dec 11, 2024
@igor-sirotin
Copy link
Collaborator Author

We discussed that this should probably be done as a separate CI stage, depending on how long it takes.

Making it as a separate app adds some simplicity and some difficulties:

  • Keeping it external from tests, we can reuse the same utility for unit, functional and e2e tests, even with desktop and mobile
  • Yet we need to pass not just log files, but also list of secrets for each log. Because e.g. we also want to check for display names, which can be generated randomly in tests.

Though it would be nice to detect such issues locally as well, just while running tests.

@igor-sirotin
Copy link
Collaborator Author

Another thought: we can even generate some fake tokens in tests, we don't always need to use real secrets.

@igor-sirotin
Copy link
Collaborator Author

Also, this utility could be used by developers (or users) who wish to share their logs on Github.
Before uploading, they can run the utility locally to check for known display name, local paths and other things.

This is a bit of a rare case, but still a benefit 🙂

@igor-sirotin igor-sirotin changed the title Add automatic check for leaks in logs after each functional test case Implement an utility for checking secrets leaks in logs, use it a CI stage at PR level Dec 13, 2024
@ilmotta
Copy link
Contributor

ilmotta commented Dec 13, 2024

Slightly off topic, but we also need to eventually define what is considered private data circulating in logs. A wallet account address I consider private data, as I wouldn't want that to be exposed in plain text logs.

One idea is to leverage the Go type system as you and I talked about @igor-sirotin. For instance, the common.Address type is very general, but we could have a specialized address type which would always be redacted, unless the log level was debug. In the app, we can also request user approval/confirmation that some private data will leak when they enable debug level logs.

Once we identify what is considered private data we can analyse which data types in status-go should be wrapped/modified to be auto-redacted in logs (either always or above a certain log level like the previous example about common.Address).

@igor-sirotin
Copy link
Collaborator Author

igor-sirotin commented Dec 13, 2024

@ilmotta 💯
We just got the SensitiveString type, which I'm currently integrating:

// SensitiveString is a type for handling sensitive information securely.
// This helps to achieve the following goals:
// 1. Prevent accidental logging of sensitive information.
// 2. Provide controlled visibility (e.g., redacted output for String() or MarshalJSON()).
// 3. Enable controlled access to the sensitive value when needed.
type SensitiveString struct {
value string
}

And we can surely make more specialized types later as well 🙌

@iurimatias iurimatias added this to the 2.33.0 milestone Dec 16, 2024
@igor-sirotin
Copy link
Collaborator Author

Well, I just realised that the "utility" is:

cat geth.log | grep <my-secret>

😂

@igor-sirotin
Copy link
Collaborator Author

igor-sirotin commented Dec 20, 2024

I've been discovering this a bit. It is very simple to implement, but we need to agree on the requirements.
Will write my thoughts below.

Goal

  • Prevent our secrets leaks
  • Prevent user's secrets leaks

Use cases

  • Run on CI after tests
  • Run locally before submitting logs with bug report (can be regular users)

What to look for?

  • Secrets - discussed as part o the goal No critical leaks of data
    • Passwords
    • Private keys
    • Seed phrases
    • RPC provider tokens/credentials
    • Messages contents
    • ...
  • Privacy-exposing data - discussed as part o the goal No private data leaks
    • Profile info: display name, ens name, image, public key, etc.
    • Wallet info: addresses, transactions, etc.
    • Community info: community id,
    • IP Addresses
    • Local file paths
    • Timezones

What exactly to report?

Having as much info as possible is best for faster reaction. At the same time, if we run this on CI (which logs are public), we might want to be careful about what this utility will report. I see several levels of redacting data:

  • report only a boolean flag "something was/wasn't found";
  • report lines of logs which contain a leak, but don't say which secret;
  • report lines of logs and the leaked secrets;
  • make it optional, to print or to not.

I feel that I am overthinking this. Because if the secret was leaked, it doesn't really matter if we print it one more time with the utility, right? 😄 It will only make it a bit harder for potential attacker to find.

And of course the real solution is not to use any real secrets on CI.
But I see this is just the protection of potential future mistake.


A not about implementation. If we implement our own utility (not just shell script around grep), we can parse the logs and actually report which code line caused the leak. We have this info in the logs.

@igor-sirotin
Copy link
Collaborator Author

Idea: redact secrets before writing the logs

One thing we could do: intercept the zap logger and simply redact any known secrets before even writing them to the log.

This simplifies things, as at runtime we have access to all of the given secrets, we just need to register certain values as secret ones in the logger. If this is a separate utility, we'd need to provide

This would be very robust, but might badly affect the performance.
If this sounds interesting, I can explore and make some benchmarks.

wdyt? cc @status-im/status-go-guild @ilmotta

@ilmotta
Copy link
Contributor

ilmotta commented Dec 20, 2024

Idea: redact secrets before writing the logs

I lean more on this idea @igor-sirotin because it seems to be the most reliable one. Have we considered implementing the Stringer interface on types that should have some of their fields redacted? zap supports that out of the box, just checked now from a unit test in status-go. We could also use a custom zap encoder sometimes, but that's where I think performance may be a problem as you said already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

4 participants