Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BB-466 Dealing with expired zookeeper sessions #2471

Merged
merged 2 commits into from
Nov 24, 2023

Conversation

nicolas2bert
Copy link
Contributor

Introducing the ZookeeperManager Class, which offers:

  • Connection Management: This class efficiently manages connections to the ZooKeeper server. It ensures continuity by handling reconnections whenever the session expires.


  • Error and State Management: The class handles errors and meticulously logs state changes. It is essential for debugging.


  • Abstraction Layer: It creates an abstraction over the database-specific code, simplifying the integration of application logic. This layer not only facilitates an easier transition to different databases in the future but also aids in simplifying the mocking and testing processes.

Why isn't this issue present in Artesca?
Artesca uses Kubernetes as its orchestrator, which uses liveness probes to check if containers are operating correctly. Zookeeper state is part of the liveness probe of the Backbeat services. If a liveness probe fails, Kubernetes will understand that the application in the container is not functioning properly. In response, Kubernetes typically restarts the problematic container to try to resolve the issue.
This logic is not present in S3C/Federation.

However, to ensure consistency between the two projects, this change will be forward-ported to Artesca.

TODO: while forward-porting, make sure all the Zookeeper client uses the new ZookeeperManager class.

@bert-e
Copy link
Contributor

bert-e commented Nov 15, 2023

Hello nicolas2bert,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Status report is not available.

@nicolas2bert nicolas2bert changed the base branch from development/8.6 to development/7.10 November 15, 2023 14:02
@bert-e
Copy link
Contributor

bert-e commented Nov 15, 2023

Incorrect fix version

The Fix Version/s in issue BB-466 contains:

  • None

Considering where you are trying to merge, I ignored possible hotfix versions and I expected to find:

  • 7.10.15

  • 7.70.10

  • 8.5.5

  • 8.6.30

  • 8.7.0

Please check the Fix Version/s of BB-466, or the target
branch of this pull request.

@scality scality deleted a comment from bert-e Nov 15, 2023
@nicolas2bert
Copy link
Contributor Author

ping

@bert-e
Copy link
Contributor

bert-e commented Nov 15, 2023

Request integration branches

Waiting for integration branch creation to be requested by the user.

To request integration branches, please comment on this pull request with the following command:

/create_integration_branches

Alternatively, the /approve and /create_pull_requests commands will automatically
create the integration branches.

@nicolas2bert
Copy link
Contributor Author

/create_integration_branches

@bert-e
Copy link
Contributor

bert-e commented Nov 15, 2023

Conflict

A conflict has been raised during the creation of
integration branch w/7.70/bugfix/BB-466/zookeeper-expired-session with contents from bugfix/BB-466/zookeeper-expired-session
and development/7.70.

I have not created the integration branch.

Here are the steps to resolve this conflict:

 $ git fetch
 $ git checkout -B w/7.70/bugfix/BB-466/zookeeper-expired-session origin/development/7.70
 $ git merge origin/bugfix/BB-466/zookeeper-expired-session
 $ # <intense conflict resolution>
 $ git commit
 $ git push -u origin w/7.70/bugfix/BB-466/zookeeper-expired-session

The following options are set: create_integration_branches

@nicolas2bert nicolas2bert force-pushed the bugfix/BB-466/zookeeper-expired-session branch from 6b1ac86 to e6fa60f Compare November 15, 2023 14:50
const logger = new werelogs.Logger('NotificationConfigManager:test');
const zkConfigParentNode = 'config';
const concurrency = 10;
const bucketPrefix = 'bucket';
const timeoutMs = 100;

function mockZookeeperManager() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, feel free to ignore, but this helper could be factored out as it's used in more than 1 place.

Comment on lines 90 to 117
zkClient.close();
zkClient.on('disconnected', () => done());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how often this would happen, but I think there's a risk of test flakiness if attaching to an event after triggering it - depending on how async close is.

@nicolas2bert nicolas2bert force-pushed the bugfix/BB-466/zookeeper-expired-session branch 2 times, most recently from 86326e1 to e9ac8af Compare November 16, 2023 18:56
_connect() {
// clean up exists client before reconnect
if (this.client) {
this.client.removeAllListeners();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call this.client.close() before removing the listeners?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already close it here already: https://github.com/scality/backbeat/pull/2471/files#diff-ab8f069d2d46a96dbf066330d2185a0e4705edacbc2cb222cfc40abf4511b9d3R91

but moving this.client.close() here might make more sense to make it more generic.

}
// NO_NODE error and autoCreateNamespace is enabled
if (err) {
const nsIndex = this.connectionString.indexOf('/');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The connection string may not always contain a namespace. Maybe in our products we always have one, but for more genericity it could be better to handle cases where there's no namespace.

this.client.removeAllListeners();
}

this.client = zookeeper.createClient(this.connectionString, this.options);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Structurally I think we should turn the whole construct into an async series or waterfall to avoid nesting too many levels. Also we would then be able to treat success/error in a unique place at the end of the series and emit the appropriate ready or error event.

* @param {Function} callback The callback function.
* @return {undefined}
*/
create(path, data, acls, mode, callback) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too fond of having to reproduce the original API of the zookeeper client, for the maintenance burden notably if we have to use new functions later and for the extra code.

I am thinking, would it be better to extend the zookeeper client itself in such case, as if we're manipulating the zookeeper client but with extra logic inside to deal with session expiration etc., possibly it would cause some issues on its own, so not sure. Not blocking for me but just mentioning it doesn't look like an ideal solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too sure about extending the zk client itself; the client we use is pretty antiquated and not super well maintained so we may want to migrate at some point to a more modern client (async based?) and having ZookeeperManager be an abstraction layer can help make the transition smoother.

@nicolas2bert nicolas2bert force-pushed the bugfix/BB-466/zookeeper-expired-session branch from ab34b90 to ddb0d6c Compare November 17, 2023 14:36
if (this.options && this.options.autoCreateNamespace) {
nsIndex = this.connectionString.indexOf('/');
namespace = this.connectionString.slice(nsIndex);
if (nsIndex > -1 && namespace !== '/') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The connection string could look like 1.2.3.4:2181, I don't think we would even see a / in this case.

Copy link
Contributor Author

@nicolas2bert nicolas2bert Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes I missed the nsIndex > -1 check, we precompute the namespace though without checking if there's a namespace, but in this case calling slice(-1) should not cause issues except it will return "garbage" in namespace but we don't use it. Might be slightly cleaner to compute namespace only if nsIndex > -1 but it's minor and not blocking for me.

if (err && err.name !== 'NODE_EXISTS') {
return next({ event: 'error', err });
}
return next({ event: 'ready' });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to disconnect/close the rootZkClient once the path exists?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Thanks!

@nicolas2bert nicolas2bert force-pushed the bugfix/BB-466/zookeeper-expired-session branch 2 times, most recently from 1e60f42 to 7a41c33 Compare November 20, 2023 09:54
Introducing the ZookeeperManager Class, which offers:

* Connection Management: This class efficiently manages connections to the ZooKeeper server. It ensures continuity by handling reconnections whenever the session expires.


* Error and State Management: The class handles errors and meticulously logs state changes. It is essential for debugging.


* Abstraction Layer: It creates an abstraction over the database-specific code, simplifying the integration of application logic. This layer not only facilitates an easier transition to different databases in the future but also aids in simplifying the mocking and testing processes.
@nicolas2bert nicolas2bert force-pushed the bugfix/BB-466/zookeeper-expired-session branch from 7a41c33 to c9a634d Compare November 21, 2023 08:46
@bert-e
Copy link
Contributor

bert-e commented Nov 21, 2023

Conflict

A conflict has been raised during the creation of
integration branch w/8.5/bugfix/BB-466/zookeeper-expired-session with contents from w/7.70/bugfix/BB-466/zookeeper-expired-session
and development/8.5.

I have not created the integration branch.

Here are the steps to resolve this conflict:

 $ git fetch
 $ git checkout -B w/8.5/bugfix/BB-466/zookeeper-expired-session origin/development/8.5
 $ git merge origin/w/7.70/bugfix/BB-466/zookeeper-expired-session
 $ # <intense conflict resolution>
 $ git commit
 $ git push -u origin w/8.5/bugfix/BB-466/zookeeper-expired-session

The following options are set: create_integration_branches

@bert-e
Copy link
Contributor

bert-e commented Nov 22, 2023

Integration data created

I have created the integration data for the additional destination branches.

The following branches will NOT be impacted:

  • development/7.4

You can set option create_pull_requests if you need me to create
integration pull requests in addition to integration branches, with:

@bert-e create_pull_requests

The following options are set: create_integration_branches

@bert-e
Copy link
Contributor

bert-e commented Nov 22, 2023

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

The following options are set: create_integration_branches

@nicolas2bert
Copy link
Contributor Author

@bert-e approve

@bert-e
Copy link
Contributor

bert-e commented Nov 24, 2023

In the queue

The changeset has received all authorizations and has been added to the
relevant queue(s). The queue(s) will be merged in the target development
branch(es) as soon as builds have passed.

The changeset will be merged in:

  • ✔️ development/7.10

  • ✔️ development/7.70

  • ✔️ development/8.5

  • ✔️ development/8.6

  • ✔️ development/8.7

The following branches will NOT be impacted:

  • development/7.4

There is no action required on your side. You will be notified here once
the changeset has been merged. In the unlikely event that the changeset
fails permanently on the queue, a member of the admin team will
contact you to help resolve the matter.

IMPORTANT

Please do not attempt to modify this pull request.

  • Any commit you add on the source branch will trigger a new cycle after the
    current queue is merged.
  • Any commit you add on one of the integration branches will be lost.

If you need this pull request to be removed from the queue, please contact a
member of the admin team now.

The following options are set: approve, create_integration_branches

@bert-e
Copy link
Contributor

bert-e commented Nov 24, 2023

I have successfully merged the changeset of this pull request
into targetted development branches:

  • ✔️ development/7.10

  • ✔️ development/7.70

  • ✔️ development/8.5

  • ✔️ development/8.6

  • ✔️ development/8.7

The following branches have NOT changed:

  • development/7.4

Please check the status of the associated issue BB-466.

Goodbye nicolas2bert.

@bert-e bert-e merged commit 22408bd into development/7.10 Nov 24, 2023
3 checks passed
@bert-e bert-e deleted the bugfix/BB-466/zookeeper-expired-session branch November 24, 2023 06:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants