Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a flag to control whether credentials are printed during bootstrapping #461

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

eric-maynard
Copy link
Contributor

Description

This adds a new flag, BOOTSTRAP_PRINT_CREDENTIALS, that controls whether the bootstrap command prints root credentials to stdout.

If it's disabled, and environment variables were not provided to set the root credentials, bootstrapping will fail.

Fixes #450

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • Documentation update
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Credentials are now printed during bootstrap when it's enabled:

realm: default-realm root principal credentials: 2b98107557bcce20:f74281319ac8519ef30cbced6563223b

@@ -181,6 +196,19 @@ private PrincipalSecretsResult bootstrapServiceAndCreatePolarisPrincipalForRealm
throw new IllegalArgumentException(overrideMessage);
}

// TODO rebase onto #422, call a method like PrincipalSecretsGenerator.hasEnvironmentVariables
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idea: maybe pass a flag down to PrincipalSecretsGenerator to not use random secrets if printCredentials is false? Then the PrincipalSecretsGenerator can simply throw if the specific realm/user combination is missing env. vars. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea. If there is a good pathway from the bootstrap command down to the PrincipalSecretsGenerator then I think that works as well. It should hopefully be more clear when #422 merges.

@eric-maynard
Copy link
Contributor Author

eric-maynard commented Nov 25, 2024

Hey @dimas-b, do you mind taking a look now that #422 has merged?

I think the integration is easy enough with some slight refactoring to PrincipalSecretsGenerator.

I left the current behavior wrt. using env variables even when printing is enabled, since if that's what the user decides to explicitly configure we can respect it. In the worst case we are just echoing env variables.

Comment on lines +111 to +119
if (this.printCredentials(polarisContext)) {
String msg =
String.format(
"realm: %1s root principal credentials: %2s:%3s",
realmContext.getRealmIdentifier(),
secretsResult.getPrincipalSecrets().getPrincipalClientId(),
secretsResult.getPrincipalSecrets().getMainSecret());
System.out.println(msg);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this logic belongs to the secrets generator. The MetaStoreManager doesn't need to know anything about whether the secrets generated are provided by the user or if they've been generated randomly. So why would it be concerned with printing the credentials? The secrets generator knows if the secrets were provided explicitly or if they were randomly generated.

I think the bootstrap command should take a print-credentials config flag and the constructed secrets generator can react accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the PrincipalSecretsGenerator is doing exactly what the name suggests: generating secrets.

Whatever is done with those secrets -- persisting them, using them, printing them -- is outside the purview of the generator itself.

You are right that the MetaStoreManager doesn't need to know anything about printing either (it doesn't in this PR) and clearly this should be outside the purview of the metastore itself.

And so we landed on the factory. I would be happy to take this bootstrapping logic and excise it to somewhere more idiomatic if that is a concern. But right now the bootstrapping logic lives here (e.g. the purge check) and this seems like the most appropriate place that doesn't change the responsibility of either the metastore or generator classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking again, is your objection specifically to the protected method printCredentials?

That only exists to support the legacy behavior of the in-memory metastore always printing credentials, and if possible I would very much be in favor of removing that.

However it feels like pushing that logic down into an existing method (whether secretsGenerator, createMetaStoreSession, or elsewhere) could be a bit hacky if it winds up somewhere it doesn't belong.

Comment on lines 214 to 224
boolean environmentVariableCredentials =
PrincipalSecretsGenerator.hasCredentialVariables(
realmContext.getRealmIdentifier(), PolarisEntityConstants.getRootPrincipalName());
if (!this.printCredentials(polarisContext) && !environmentVariableCredentials) {
String failureMessage =
String.format(
"It appears that environment variables were not provided for root credentials, and that printing "
+ "the root credentials is disabled via %s. If bootstrapping were to proceed, there would be no way "
+ "to recover the root credentials",
PolarisConfiguration.BOOTSTRAP_PRINT_CREDENTIALS.key);
LOGGER.error("\n\n {} \n\n", failureMessage);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar here - why is the metastore aware of whether the secrets were provided by environment variables? What if there are other impls of secrets generators that don't rely on env variables? E.g., we could have one that calls AWS SecretsManager to dynamically generate and store the secrets without any env variables. Should this code throw an exception?


String clientId = config.apply(propId.toUpperCase(Locale.ROOT));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we lose uppercasing in the new code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I actually think this is wrong isn't it? It seems like you can have both a dimas and a DIMAS user, so how would you differentiate them in the env variables?

This is assuming we now allow the use of env variables for non-root users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixed case env. variable look odd, but technically we can support them. I do not mind ;)

* @return A {@link PrincipalSecretsGenerator} that can generate secrets through `produceSecrets`
*/
public static PrincipalSecretsGenerator bootstrap(String realmName) {
return new DefaultPrincipalSecretsGenerator(realmName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe rename DefaultPrincipalSecretsGenerator -> BootstrapPrincipalSecretsGenerator?.. it is not actually default in LocalPolarisMetaStoreManagerFactory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It's just a wrapper, so I was not sure what to call it. Since it's only used during bootstrap, let me change to BootstrapPrincipalSecretsGenerator for now

String clientIdKey = "POLARIS_BOOTSTRAP_REALM_PRINCIPAL_CLIENT_ID";
String clientSecretKey = "POLARIS_BOOTSTRAP_REALM_PRINCIPAL_CLIENT_SECRET";

doReturn("test-id").when(psg).getEnvironmentVariable(clientIdKey);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nicer to allow EnvVariablePrincipalSecretsGenerator to take an explicit Map (or a Function<String, String>) as a constructor argument. The the "environment" part cab be just a static factory method like EnvVariablePrincipalSecretsGenerator.fromEnv().

Testing the static method would not really be necessary because it would be a one-line redirect to System.env(), but it would allow for nicer design that is testable without Mockito.

Just my 2 cents :)

Copy link
Contributor Author

@eric-maynard eric-maynard Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea, but my concern is that EnvVariablePrincipalSecretsGenerator essentially becomes MapPrincipalSecretsGenerator with that approach. I would rather not place the burden on its callers to do the fromEnv / getEnv; this seems like the exact responsibility we would like EnvVariablePrincipalSecretsGenerator to take on.

For example: if we later add command-line options to bootstrap that allow you to provide credentials, would we use the EnvVariablePrincipalSecretsGenerator for that? If it takes a map, we could.

Taking this further, we could even pass in a map with random secrets to EnvVariablePrincipalSecretsGenerator!

And so its role, as well as that of RandomPrincipalSecretsGenerator, can quickly become quite unclear

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I'm fine with the current code in this PR.

}

@Override
protected PrincipalSecretsGenerator buildEnvVariablePrincipalSecretsGenerator(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here: I think we could have a simple factory that bind to System.env() in runtime, but use explicit constructor parameters in tests... This will remove reliance on Mockito.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case what I think we really want is DI 😔

@eric-maynard eric-maynard requested a review from flyrain December 6, 2024 21:33
@flyrain
Copy link
Contributor

flyrain commented Dec 9, 2024

It seems odd that Polaris determines whether bootstrapping has failed based on a configuration controlling whether credentials are printed. IIUC, #438 removed plain text secrets from the metastore, meaning these secrets cannot be retrieved unless they are printed in the console. Would it be more reasonable to always print the credentials if they are generated by Polaris? This ensures the secrets remain accessible when needed without relying on an external configuration.

@eric-maynard
Copy link
Contributor Author

eric-maynard commented Dec 9, 2024

It seems odd that Polaris determines whether bootstrapping has failed based on a configuration controlling whether credentials are printed.

The issue at hand is that currently credentials are unrecoverable after bootstrapping, which needs to be fixed ASAP.

IIUC, #438 removed plain text secrets from the metastore, meaning these secrets cannot be retrieved unless they are printed in the console. Would it be more reasonable to always print the credentials if they are generated by Polaris? This ensures the secrets remain accessible when needed without relying on an external configuration.

@collado-mike expressed concern about an approach like this some time ago. I think a configuration, or perhaps better a CLI argument to the bootstrap command, is a good compromise in that it allows a secure behavior by default (e.g. no secrets to stdout) but also gives people an "out" in case they want to use polaris-generated credentials with a metastore that doesn't support retrieving credentials.

This last point is also very important to consider: some metastore implementations could allow secrets to be retrieved, in which case it's okay to bootstrap without printing credentials. The problem is that after #438 EclipseLink does not allow this.

@dimas-b
Copy link
Contributor

dimas-b commented Dec 9, 2024

I think it is generally not a good idea to store retrievable secrets in the metastore. If we want that functionality it would probably be preferable to integrate with well-known secret manages (e.g. k8s secrets, cloud-specific secret managers, Vault, etc.).

@snazy
Copy link
Member

snazy commented Dec 9, 2024

I think it is generally not a good idea to store retrievable secrets in the metastore.

Completely agree with this. I'd extend this even to not put any credentials into a server log at all, because that information is not just "ephemeral on a console window", but logs can easily go into 3rd party systems, which would then make those clear text credentials easily accessible.

@eric-maynard
Copy link
Contributor Author

Agreed as well. But for the time being we need to make the EclipseLink metastore work again. This is significantly better than both the current state and where we were before #438.

I added a note to the relevant doc clarifying that we don't recommend using this in production, where users should provide secure credentials through environment variables.

@flyrain
Copy link
Contributor

flyrain commented Dec 9, 2024

Agreed to not log the secrets, but I also feel the urgency of fixing EclipseLink. How about writing the secrets into a separated file? Here are benefits:

  1. A file can be potentially integrated with third-party secret managers in the future.
  2. Avoid putting secrets in logs
  3. No configuration item needed, alway persist the secrets file in case of auto-generation.

@eric-maynard
Copy link
Contributor Author

eric-maynard commented Dec 9, 2024 via email

@collado-mike
Copy link
Contributor

My thoughts here are complex. On the one hand, I agree that we should never print credentials to the service log. On the other hand, users need a way of bootstrapping their Polaris service and storing their secrets.

This is one reason why we made the bootstrap command a separate command from the server command. Bootstrapping is always an explicit action taken by the user - presumably in an environment that is distinct from the actual runtime of the service (e.g., on a user desktop or in a terraform command or something). Thus, the output of the bootstrap command shouldn't be stored in the user's production log store.

My hope for the PolarisSecretsManager is to use it to front a real secrets manager, like Vault or K8s or something. That being the case, randomly generated secrets would be the norm, so I don't think we should print secrets by default just because they were randomly generated. But if the user passes a flag --printSecrets or something, then the user can explicitly tell the command that they need the secrets to be printed so they can write them down on a sticky note or whatever (I'm kidding).

I don't think the user should need a separate secrets store, like Vault or whatever, in order to be able to use Polaris at all. If the RDBMS persistence impl is enough for them to store hashed secrets and they can manage their principal secrets in some other way, we should support that. So for the user who has no separate secrets store and only stores hashes of the secret in Postgres, how do we get the user their secrets?

Either we can require the user to pass in secrets as an argument to bootstrapping or we can randomly generate secrets and print them out for the user. It seems that we can support randomly generated credentials when we do have a separate secrets store, but I don't really like the idea of the secrets manager having to declare that its secrets are retrievable by the end user. So if we require the user to pass in the secrets as an argument, then I think we should always require the secrets as an argument. And if we don't always take the secrets as an argument, then we need to pass randomly generated secrets back to the user in some way - printing seems obvious.

@dimas-b
Copy link
Contributor

dimas-b commented Dec 10, 2024

So it looks like the approach proposed in this PR (while I keep my non-binding approval) appears to be not robust enough.

I'd like to propose to move the printing of generated credentials to the bootstrap CLI command and also add an option to write them to a file on the local filesystem.

For that matter, I think even the generation of random secrets should be delegated to the bootstrap command and removed from core. Them, core services will receive secrets the same way whether the user provided self-managed secrets or asked the bootstrap command to generate them. Integration with secret managers is deferred. WDYT?

@snazy
Copy link
Member

snazy commented Dec 10, 2024

I'm quite open (and probably brutal) here: logging or storing plain/clear text credentials is a severe security issue that justifies a CVE. The process of creating credentials must really be a command that only allows the user to grab the secrets - but only once - not stored anywhere - not explicitly or implicitly (or ephemerally), accessible by other tools (database, logging system/files, etc). If the user does not grab the generated secrets, bummer. If the bootstrap process cannot ensure this, then the bootstrap process has to be changed. Security is a very sensitive topic - and and absolute necessity for the production readiness of Apache Polaris!

Generally, I do not think that Apache Polaris should get into the business of handling identities or secrets, but rather interface w/ systems that are purely there for these kinds of things. The currently built-in secrets handling should IMHO entirely go away.

@flyrain
Copy link
Contributor

flyrain commented Dec 10, 2024

If printing is super controversial then we can always just require the env variables for now.

+1 to this approach. We could introduce an additional step before starting the Polaris instance to handle secret generation and environment variable setup. This step could take one of the following forms:

  1. Manual Process: Generate the secrets manually and set the environment variables accordingly.
  2. Integrated Bootstrap: Automate the secret generation and environment variable setup as part of the bootstrap process.

Additionally, users is responsible for ensuring anything generated during this step, such as logs or extra files, is not leaked to unsecured locations to maintain security.

@dimas-b
Copy link
Contributor

dimas-b commented Dec 10, 2024

Conceptually, bootstrapping may not even be the right term here :) This is not about helping the server to start and run, but about defining some prerequisite persisted objects (the root principal, specifically).

How about removing "bootstrap" methods from MetaStoreManagerFactory and performing the associated initialization in a new initialize CLI command? The command will have options for printing generated secrets to the console (not log) or (preferably) to a local file.

For the in-memory case, we could probably execute initialize inside the main application method, so in-memory servers will have to provide initialization parameters to the server command.

@eric-maynard
Copy link
Contributor Author

This is all valuable discussion, but I am worried we are going in circles a bit.

For now, does someone else want to take a crack at a minimal change to fix the experience of using EclipseLink?

@flyrain
Copy link
Contributor

flyrain commented Dec 10, 2024

I'd not be too worried about the name(bootstrap) at this moment. A quick fix to resolve the issue without security issues would be prefered.

@eric-maynard
Copy link
Contributor Author

eric-maynard commented Dec 10, 2024

With respect to the apparent security issues, I can only echo @collado-mike's comment -- the stdout of the bootstrap command is not the Polaris log. As such, this is neither logging [n]or storing plain/clear text credentials in @snazy's words.

Moreover, I question whether doing such a thing iff the user sets a parameter called BOOTSTRAP_PRINT_CREDENTIALS really does qualify as a security "issue".

If the user does not grab the generated secrets, bummer.

I agree with this point. But we must give them an opportunity to "grab" them from somewhere. If not stdout, then a file, or env variables. Anything is fine. But the current behavior is untenable.

@collado-mike
Copy link
Contributor

Generally, I do not think that Apache Polaris should get into the business of handling identities or secrets, but rather interface w/ systems that are purely there for these kinds of things. The currently built-in secrets handling should IMHO entirely go away.

While I appreciate the security focus, I think the reality is that we are going to have to be able to manage secrets in Polaris as a standalone application for a long time (forever?). External secrets management systems like Vault or external identity providers like Okta are fantastic and I definitely promote their usage, but the reality is that some installations will simply not want/need the overhead of managing an external service. I think our approach should be that core features work out of the box with nothing more than a database (even that might be a local file), but that we allow extension points to add more and more layers of security and functionality. My proposal for Federated Principals and Roles is an example of this - basic identity management works, but if you want to be really secure, delegate identity and group membership to an external service that's tailor made for that.

I 100% agree with the approach that we should expose secrets once and never again. This is a core tenet of the service and the reason why there are no secrets retrieval APIs. It's why @eric-maynard's changes to store only secret hashes works.

But there has to be once. That means that during bootstrapping, we need the ability to return the root credentials to the user who bootstrapped. Never again will we return those credentials be accessible, but we have to be able to return them somewhere.

I see a few approaches for this:

  1. Remove the bootstrap method from the PolarisMetaStoreManager and bootstrap directly from the BootstrapCommand class, as @dimas-b suggests. I actually think this is very elegant and allows the BootstrapCommand itself to determine what to do with the secrets (print or not depending on the argument)
  2. Keep the bootstrap method, but change the response to return a Map<String, PrincipalSecrets> so that the BootstrapCommand can determine what to do with the secrets
  3. Add a PrincipalSecretsConsumer of some kind that takes the generated secrets and does something with them. This can be configured on the CLI to print or to store in a file or whatever

Personally, I like 1. It's clean and simple and there's no opportunity to misuse it. The tests and the in-memory startup will need to change, but I think that's a cost worth paying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Potential eclipselink schema upgrade issue
5 participants