Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: propose new features for the Postgres matrix #4

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

gbartolini
Copy link
Contributor

  • Mutual TLS support for authentication of replicas
  • JSON logging in standard output
  • Audit logging
  • Read-only root file system
  • Transparent change of configuration for PostgreSQL parameters
  • Fencing

@@ -207,6 +207,11 @@ categories:
description: |
Operators may chose by default to generate self-signed SSL certificates.
They may also offer the option to specify the CA and certificates that users want Postgres clusters to use.
- id: mtlsrep
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While not crucially important, so far the convention of having four character ids for categories and five character ids for the features was used. There's no strong reason to keep it like this --simply there isn't-- but if you believe the IDs like this one and others as part of this commit can be fitted under this convention, it would be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can call it sslr

@@ -694,6 +716,16 @@ categories:
The operator must provide proper information to the user as to the status and final result of the operation.
The operator should provide ongoing status information, and perform the operation with the minimum downtime required.
main: true
- id: cfgchg
name: PostgreSQL configuration changes
type: boolean
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd propose to make this feature a bit more generic (broader scope). Changes for restart may be for many more reasons that not only hot-standby parameters change (e.g. a frequent case is adding an extension to shared_preload_libraries). For this reason, I'd vouch to remove any reference to particular parameters and just keep the description generic to "changes to configuration that may require reload or restart" (not a proposed wording, just to illustrate the idea).

Current description also raises some doubts (at least for me) on whether this implies that restarts may happen (or not) as soon as a configuration change is triggered --which is debatable whether that's a good thing or not, leaving more control for the user. For this reason, I'd argue that the description is reworded more in the terms of describing whether the user provides a fully automated way to proceed with a configuration reload or restart, providing the adequate information to the user, without fully qualifying whether that's automatically or user triggered --however a great place to add that information is the comments field on the vendor submission.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, if these thoughts are taking into account, this feature would be quite close to day2/crest, becoming possibly a duplicate. Maybe all these ideas here can be merged into a single one, potentially improving the actual day2/crest?

Copy link
Contributor Author

@gbartolini gbartolini May 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree here. Ensuring and coordinating the restart within a cluster is a feature, an important feature that removes the need for a human operator. For example, if you want to raise max_connections and you have replicas, you need to ensure that this operation is performed first on the standbys, and then - as last - on the primary. If you decrease the value, it is the opposite.

Ideally, a Kubernetes operator should simulate what a human being would do in this case, but do it in an automated and reliable way, without requiring human intervention - in order to prioritize self-healing and high availability. Then, obviously, you can configure and request human intervention, but I believe that these features should be highlighted, as configuration changes should be as transparent as they can possibly be to the end user and an operator should handle that as part of the resource lifecycle.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think we agree on the intent (to show an operator provides capabilities associated with controlling different aspects of the cluster's lifecycle, like a controlled restart operation) but maybe we're confused by the wording.

In particular my concern is not with the fact that automation should be used to perform a careful and correct restart; but rather than with the fact that it is triggered "automatically" (whenever a change is requested) rather than giving the operator of when running it (but then it runs automatically). In other words: I may want to keep a restart operation on hold until a better time (e.g. 3am, on a valley of traffic) rather than being launched as soon as I edit my max_connections parameter (which I may or may not realize may immediately trigger that event).

So I agree it's a good thing and this feature this reflect that the restart operation is fully automated; but I encourage wording to not assume that automated operation is triggered immediately, and that it is even offered as an option or that the default is the opposite.

But I'll leave the final decision to you, having my opinion here I'll just merge the latest patch that you send :) as your criteria should be well represented here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok to soften the wording and suggest that users have the choice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check

Copy link
Collaborator

@ahachete ahachete left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, anything not explicitly commented means agreement from my side with the proposed changes.

@ahachete
Copy link
Collaborator

@gbartolini May you kindly review the above comment? :) It would be great to merge your improvements soon. Thanks!

- Mutual TLS support for authentication of replicas
- JSON logging in standard output
- Audit logging
- Read-only root file system
- Transparent change of configuration for PostgreSQL parameters
- Fencing

Signed-off-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com>
Signed-off-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com>
@gbartolini
Copy link
Contributor Author

@ahachete do you have any further comments? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants