-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: propose new features for the Postgres matrix #4
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -207,6 +207,11 @@ categories: | |
description: | | ||
Operators may chose by default to generate self-signed SSL certificates. | ||
They may also offer the option to specify the CA and certificates that users want Postgres clusters to use. | ||
- id: mtlsrep | ||
name: Mutual TLS support for PostgreSQL replicas | ||
type: boolean | ||
description: | | ||
Operators exclusively rely on TLS certificate authentication and authorization to connect the managed replicas in the cluster. | ||
- id: crtmg | ||
name: CertManager integration | ||
type: boolean | ||
|
@@ -630,6 +635,18 @@ categories: | |
The operator provides a mechanism to expose all the logs of the managed Postgres instances to a centralized logging tool. | ||
The logs must be decorated with extra metadata in order to provide semantic meaning, including the Pod name and namespace, the cluster name, the role of the Postgres instance (e.g. primary, replica, standby-leader, etc.) and the timestamp that will be available to be used to filter logs entries. | ||
There is no need to configure the tool in order to obtain required extra metadata from the logs. | ||
- id: stdout | ||
gbartolini marked this conversation as resolved.
Show resolved
Hide resolved
|
||
name: JSON logs in stdout | ||
type: boolean | ||
description: | | ||
Each container in a pod should directly export logs in JSON format to the standard output channel as recommended and expected by Kubernetes. | ||
- id: audit | ||
name: Audit logs | ||
type: boolean | ||
description: | | ||
The operator provides an integrated way to seamlessly export logs for auditing purposes. | ||
vendor_compliance: | | ||
Provide more information about the technology (e.g. pgaudit extension) used for this purpose. | ||
- id: explg | ||
name: Export logs | ||
type: boolean | ||
|
@@ -682,6 +699,11 @@ categories: | |
The container processes do not run as root. | ||
vendor_compliance: | | ||
Reasonable exceptions to this rule can be made for features that require or do not diminish the container's security, e.g. when using eBPF. | ||
- id: rofs | ||
name: Read-only file system for image containers | ||
type: boolean | ||
description: | | ||
The root file system of the image containers provided by the operator are read-only, enforcing immutability of the binaries. | ||
- id: day2 | ||
name: Day 2 Operations | ||
features: | ||
|
@@ -696,6 +718,16 @@ categories: | |
The operator should provide ongoing status information, and perform the operation with the minimum downtime required. | ||
Provide information about the update strategy (i.e. restart of the pods or rolling update followed by a switchover or a restart). | ||
main: true | ||
- id: cfgchg | ||
name: PostgreSQL configuration changes | ||
type: boolean | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd propose to make this feature a bit more generic (broader scope). Changes for restart may be for many more reasons that not only hot-standby parameters change (e.g. a frequent case is adding an extension to shared_preload_libraries). For this reason, I'd vouch to remove any reference to particular parameters and just keep the description generic to "changes to configuration that may require reload or restart" (not a proposed wording, just to illustrate the idea). Current description also raises some doubts (at least for me) on whether this implies that restarts may happen (or not) as soon as a configuration change is triggered --which is debatable whether that's a good thing or not, leaving more control for the user. For this reason, I'd argue that the description is reworded more in the terms of describing whether the user provides a fully automated way to proceed with a configuration reload or restart, providing the adequate information to the user, without fully qualifying whether that's automatically or user triggered --however a great place to add that information is the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, if these thoughts are taking into account, this feature would be quite close to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't agree here. Ensuring and coordinating the restart within a cluster is a feature, an important feature that removes the need for a human operator. For example, if you want to raise Ideally, a Kubernetes operator should simulate what a human being would do in this case, but do it in an automated and reliable way, without requiring human intervention - in order to prioritize self-healing and high availability. Then, obviously, you can configure and request human intervention, but I believe that these features should be highlighted, as configuration changes should be as transparent as they can possibly be to the end user and an operator should handle that as part of the resource lifecycle. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, I think we agree on the intent (to show an operator provides capabilities associated with controlling different aspects of the cluster's lifecycle, like a controlled restart operation) but maybe we're confused by the wording. In particular my concern is not with the fact that automation should be used to perform a careful and correct restart; but rather than with the fact that it is triggered "automatically" (whenever a change is requested) rather than giving the operator of when running it (but then it runs automatically). In other words: I may want to keep a restart operation on hold until a better time (e.g. 3am, on a valley of traffic) rather than being launched as soon as I edit my So I agree it's a good thing and this feature this reflect that the restart operation is fully automated; but I encourage wording to not assume that automated operation is triggered immediately, and that it is even offered as an option or that the default is the opposite. But I'll leave the final decision to you, having my opinion here I'll just merge the latest patch that you send :) as your criteria should be well represented here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok to soften the wording and suggest that users have the choice. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please check |
||
description: | | ||
The operator automatically handles reloads and, where required, restarts of PostgreSQL following any change to the configuration requested by the user. | ||
This includes seamless coordination of restarts of the instances after changes to the [hot-standby sensitive parameters](https://www.postgresql.org/docs/current/hot-standby.html#HOT-STANDBY-ADMIN), namely `max_connections`, `max_prepared_transactions`, `max_locks_per_transaction`, `max_wal_senders`, and `max_worker_processes`. | ||
vendor_compliance: | | ||
The operator must provide proper information to the user as to the status and final result of the operation. | ||
Declare how changes of hot-standby sensitive parameters are handled by the operator. | ||
main: true | ||
- id: amaup | ||
name: Automated major upgrades | ||
type: boolean | ||
|
@@ -731,6 +763,13 @@ categories: | |
The user may specify SQL scripts that contain migrations (DDL changes, etc) to be deployed to a given database, having the operator apply them automatically. | ||
vendor_compliance: | | ||
The operator must report back to the user detailed information about the results of the execution(s) of the script(s) provided by the user. | ||
- id: fence | ||
name: Fencing of Postgres instances | ||
type: boolean | ||
description: | | ||
The operator provides a way to stop PostgreSQL instances while keeping the pods running in order to enable investigation of the content of the data directories. | ||
This can be very useful for production support and diagnostics, especially in data corruption due to storage issues. | ||
Fencing can be requested on a single instance, a set of them or the entire cluster, in a declarative way. The operator must provide a way to resume. | ||
- id: oday2 | ||
name: Other Day 2 Operations | ||
type: string_array | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While not crucially important, so far the convention of having four character
id
s for categories and five characterid
s for the features was used. There's no strong reason to keep it like this --simply there isn't-- but if you believe the IDs like this one and others as part of this commit can be fitted under this convention, it would be better.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can call it
sslr