Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overview dashboard's "Manager" billboard says "Online Online" when scylla_manager_servers.yml has the new port #1129

Closed
ziggythehamster opened this issue Nov 10, 2020 · 4 comments · Fixed by #1150
Labels
bug Something isn't working right

Comments

@ziggythehamster
Copy link

In Scylla Monitoring 3.5, due to the change introduced in #1014, if you have the new port configured, you end up with two jobs, scylla_manager and scylla_manager1. They both report the same thing because they both point at :5090. The query doesn't do any aggregation or filtering to prevent this from displaying:

image

This is with the patch from #1118 applied, and is a reopening of #1121.

@ziggythehamster ziggythehamster added the bug Something isn't working right label Nov 10, 2020
@amnonh
Copy link
Collaborator

amnonh commented Nov 10, 2020

I'm not sure why this is happening one of them should listen on the old port and one on the new

@ziggythehamster
Copy link
Author

But that's not what #1014 did. scylla_manager1 is added which replaces whatever is in scylla_manager_servers.yml with :5090, and scylla_manager is left alone, using whatever is in scylla_manager_servers.yml. Contrast with manager_agent/manager_agent1 which replace the port number in both situations.

Since sysadmins of a complicated piece of database technology are ostensibly not stupid, they of course would have the correct IP and port listed for the manager server in the config file, and since this changed to :5090, they'd have the new port number.

If you deliberately set scylla_manager_servers.yml entries to the wrong port number (old, or otherwise), then a completely different set of charts breaks, because lots of charts just look for job=scylla_manager, ignoring job=scylla_manager1.

This shows "Online Online" because both scylla_manager and scylla_manager1 point at :5090, and you do no aggregation to handle this case.

IMO, you should revert #1014 and assume that sysadmins will set the port correctly. Adequately document the change (this is not currently the case) and deal with a handful of support issues from people who didn't read anything prior to installing the latest version.

An alternative would be to make the shell script use netcat/socat to figure out which port is good, edit the templates to use the correct one, and boot, ignoring the port configured in the config file (so you could then update the docs to just say to enter the IP, which is usually the docker0 bridge IP). You could be more explicit instead and if the port open disagrees with the port configured, refuse to start. If you didn't have to support so many systems, the echo > /dev/tcp/IP/PORT Bash feature would be even simpler to do (the above would have a nonzero exit status if the port was closed and a zero exit status if the port is open).

@ziggythehamster
Copy link
Author

I can confirm this still happens on Monitoring 3.5.1.

@ziggythehamster
Copy link
Author

I can also confirm that changing (1+scylla_manager_server_current_version) to max(1+scylla_manager_server_current_version) resolves it, but I have no idea if that's the correct behavior (maybe it should be min? maybe there should be a more complicated expression?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right
Projects
None yet
2 participants