-
Notifications
You must be signed in to change notification settings - Fork 994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Client_addr on pg_catalog.pg_stat_replication wrong ip address - istio enabled #1629
Comments
Yes, it is exactly due to the intermediate proxy between primary and replicas. As a result, the client_addr in the pg_stat_replication doesn't match with the actual IPs of replicas.
In theory, we can do the check based on the pg_stat_replication.application_name, but it wouldn't be so strict and it is not possible to guaranty that it is actually the replica that is streaming and not something else that decided to use the same application_name. |
Hello, is there any news on this topic? thanks! |
We are facing the same issue in the environment of one of our customers. |
I think we need to patch https://github.com/zalando/spilo/blob/master/postgres-appliance/major_upgrade/inplace_upgrade.py @CyberDem0n, would it be OK? I can make a MR for that |
I've just made a simple change in spilo so let's see what's maintainers thinks about it |
Sometimes using |
Please, answer some short questions which should help us to understand your problem / question better?
While trying to do a incluster upgrade from PGVERSION 12 to PGVERSION 13 discovered that members ip's are not correctly written into pg_catalog.pg_stat_replication
While running python3 /scripts/inplace_upgrade.py 3 (three nodes cluster), i have following error message:
2021-09-27 14:58:37,457 inplace_upgrade INFO: No PostgreSQL configuration items changed, nothing to reload. 2021-09-27 14:58:37,500 inplace_upgrade WARNING: Kubernetes RBAC doesn't allow GET access to the 'kubernetes' endpoint in the 'default' namespace. Disabling 'bypass_api_service'. 2021-09-27 14:58:37,504 inplace_upgrade INFO: establishing a new patroni connection to the postgres cluster 2021-09-27 14:58:37,561 inplace_upgrade ERROR: Member hco-pg-1-1 is not streaming from the primary
After debugging, discovered that into pg_catalog.pg_stat_replication, client_addr is 127.0.0.6 for both nodes that are replicating data from master
postgres=# SELECT * from pg_catalog.pg_stat_replication; pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state | reply_time -----+----------+---------+------------------+-------------+-----------------+-------------+-------------------------------+--------------+-----------+------------+------------+------------+------------+-----------------+-----------------+-----------------+---------------+------------+------------------------------- 894 | 16637 | standby | hco-pg-1-2 | **127.0.0.6** | | 40175 | 2021-09-27 13:40:26.305049+00 | | streaming | 9/4E027518 | 9/4E027518 | 9/4E027518 | 9/4E027518 | 00:00:00.002132 | 00:00:00.002812 | 00:00:00.002913 | 0 | async | 2021-09-27 15:09:14.095679+00 886 | 16637 | standby | hco-pg-1-1 | **127.0.0.6** | | 36155 | 2021-09-27 13:40:05.528001+00 | | streaming | 9/4E027518 | 9/4E027518 | 9/4E027518 | 9/4E027518 | 00:00:00.001441 | 00:00:00.002128 | 00:00:00.002146 | 0 | async | 2021-09-27 15:09:14.09543+00 (2 rows)
Cluster looks like this:
`root@hco-pg-1-0:/home/postgres# patronictl list
| Member | Host | Role | State | TL | Lag in MB |
+------------+------------+---------+---------+----+-----------+
| hco-pg-1-0 | 11.32.16.6 | Leader | running | 11 | |
| hco-pg-1-1 | 11.32.16.7 | Replica | running | 11 | 0 |
| hco-pg-1-2 | 11.32.16.9 | Replica | running | 11 | 0 |
+------------+------------+---------+---------+----+-----------+
`
After debugging into /scripts/inplace_upgrade.py, found out that into below code, section ip = member.conn_kwargs().get('host') retrieves correct replica ip, then while searching replication lag by ip into lag = streaming.get(ip), value of lag will be None since ip won't match as into pg_catalog.pg_stat_replication i only have client_addr = 127.0.0.6 for both nodes.
def ensure_replicas_state(self, cluster): """ This method checks the satatus of all replicas and also tries to open connections to all of them and puts into the
self.replica_connectionsdict for a future usage. """ self.replica_connections = {} streaming = {a: l for a, l in self.postgresql.query( ("SELECT client_addr, pg_catalog.pg_{0}_{1}_diff(pg_catalog.pg_current_{0}_{1}()," " COALESCE(replay_{1}, '0/0'))::bigint FROM pg_catalog.pg_stat_replication") .format(self.postgresql.wal_name, self.postgresql.lsn_name))} print("Streaming: ", streaming) def ensure_replica_state(member): ip = member.conn_kwargs().get('host') lag = streaming.get(ip) if lag is None: return logger.error('Member %s is not streaming from the primary', member.name) if lag > 16*1024*1024: return logger.error('Replication lag %s on member %s is too high', lag, member.name)
My question would be if this is because we are using istio injection (envoy proxy) for our zalando postgres clusters or if we have some other issue and how we can solve this.
Thank you !
/Cristi Vlad
The text was updated successfully, but these errors were encountered: