Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot recover from k8s endpoint temporarily unavailable situation #420

Open
nobuto-m opened this issue May 29, 2024 · 3 comments
Open

Cannot recover from k8s endpoint temporarily unavailable situation #420

nobuto-m opened this issue May 29, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@nobuto-m
Copy link

MySQL units can be stuck at the following status (but all units are idle).

Unit Workload Agent Address Ports Message
keystone-mysql/0 maintenance idle 10.1.58.103 joining the cluster
keystone-mysql/1* active idle 10.1.51.41 Primary
keystone-mysql/2 waiting idle 10.1.48.106 waiting to get cluster primary from peers

Steps to reproduce

  1. Follow: https://microstack.run/docs/multi-node-maas

And more details are in:
https://bugs.launchpad.net/snap-openstack/+bug/2067451

Expected behavior

The charm can recover from such an event.

Actual behavior

Unhandled exceptions are recorded, and the charm cannot complete the cluster deployment.

unit-keystone-mysql-2: 06:07:58 DEBUG unit.keystone-mysql/2.juju-log ops 2.10.0 up and running.
unit-keystone-mysql-2: 06:07:58 DEBUG unit.keystone-mysql/2.juju-log Emitting Juju event mysql_pebble_ready.
unit-keystone-mysql-2: 06:08:01 ERROR unit.keystone-mysql/2.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/venv/ops/model.py", line 3019, in _run
    result = subprocess.run(args, **kwargs) # type: ignore
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '('/var/lib/juju/tools/unit-keystone-mysql-2/secret-get', '--label', 'database-peers.keystone-mysql.app', '--format=json')' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/./src/charm.py", line 788, in <module>
    main(MySQLOperatorCharm)
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/venv/ops/main.py", line 456, in main
    _emit_charm_event(charm, dispatcher.event_name)
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/venv/ops/main.py", line 144, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/venv/ops/framework.py", line 351, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/venv/ops/framework.py", line 853, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/venv/ops/framework.py", line 943, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/./src/charm.py", line 572, in _on_mysql_pebble_ready
    if self._mysql_pebble_ready_checks(event):
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/./src/charm.py", line 555, in _mysql_pebble_ready_checks
    if not self._is_peer_data_set:
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/lib/charms/mysql/v0/mysql.py", line 632, in _is_peer_data_set
    and self.get_secret("app", ROOT_PASSWORD_KEY)
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/lib/charms/mysql/v0/mysql.py", line 704, in get_secret
    if not (value := self.peer_relation_data(scope).fetch_my_relation_field(peers.id, key)):
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/lib/charms/data_platform_libs/v0/data_interfaces.py", line 1256, in fetch_my_relation_field
    if relation_data := self.fetch_my_relation_data([relation_id], [field], relation_name):
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/lib/charms/data_platform_libs/v0/data_interfaces.py", line 1245, in fetch_my_relation_data
    data[relation.id] = self._fetch_my_specific_relation_data(relation, fields)
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/lib/charms/data_platform_libs/v0/data_interfaces.py", line 534, in wrapper
    return f(self, *args, **kwargs)
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/lib/charms/data_platform_libs/v0/data_interfaces.py", line 2079, in _fetch_my_specific_relation_data
    self.component, self.secret_fields, relation, fields
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/lib/charms/data_platform_libs/v0/data_interfaces.py", line 1813, in secret_fields
    self.static_secret_fields if self.static_secret_fields else self.current_secret_fields
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/lib/charms/data_platform_libs/v0/data_interfaces.py", line 1832, in current_secret_fields
    if content := self._get_group_secret_contents(relation, group):
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/lib/charms/data_platform_libs/v0/data_interfaces.py", line 2066, in _get_group_secret_contents
    result = super()._get_group_secret_contents(relation, group, secret_fields)
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/lib/charms/data_platform_libs/v0/data_interfaces.py", line 996, in _get_group_secret_contents
    if (secret := self._get_relation_secret(relation.id, group)) and (
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/lib/charms/data_platform_libs/v0/data_interfaces.py", line 506, in wrapper
    return f(self, *args, **kwargs)
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/lib/charms/data_platform_libs/v0/data_interfaces.py", line 2055, in _get_relation_secret
    return self.secrets.get(label, secret_uri, legacy_labels=self._previous_labels())
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/lib/charms/data_platform_libs/v0/data_interfaces.py", line 702, in get
    if secret.meta:
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/lib/charms/data_platform_libs/v0/data_interfaces.py", line 605, in meta
    self._secret_meta = self._model.get_secret(label=label)
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/venv/ops/model.py", line 281, in get_secret
    content = self._backend.secret_get(id=id, label=label)
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/venv/ops/model.py", line 3375, in secret_get
    result = self._run('secret-get', *args, return_output=True, use_json=True)
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/venv/ops/model.py", line 3021, in _run
    raise ModelError(e.stderr) from e
ops.model.ModelError: ERROR cannot ensure service account "unit-keystone-mysql-2": Post "https://192.168.151.102:16443/api/v1/namespaces/openstack/serviceaccounts": read tcp 192.168.151.101:33354->192.168.151.102:16443: read: connection reset by peer

unit-keystone-mysql-2: 06:08:02 ERROR juju.worker.uniter.operation hook "mysql-pebble-ready" (via hook dispatching script: dispatch) failed: exit status 1
unit-keystone-mysql-2: 06:08:02 ERROR juju.worker.uniter pebble poll failed for container "mysql": failed to send pebble-ready event: hook failed

Versions

Operating system: 22.04 LTS

Juju CLI: 3.4.2

Juju agent: 3.4.2

Charm revision: 8.0/edge: 138

microk8s: microk8s v1.28.7 6532 1.28-strict/stable

Log output

Juju debug log:

https://bugs.launchpad.net/snap-openstack/+bug/2067451/+attachment/5783832/+files/sunbeam-inspection-report-20240529_071507.tar.gz

Additional context

@nobuto-m nobuto-m added the bug Something isn't working label May 29, 2024
Copy link
Contributor

@taurus-forever
Copy link
Contributor

The funny part, we even have a test for keystone, which looks stable.

It requires investigation.

BTW, I suspect keystone is still using legacy shared_db interface, which is under our radars.
Is it true? If so, can we migrate to the modern interface?

Anyway, @nobuto-m tnx for reporting!

@gboutry
Copy link

gboutry commented Jul 15, 2024

This report is using keystone-k8s, the charm used in sunbeam. Which works very differently from the machine charm + is using data_interfaces 0.37

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants