-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use the index of a seed node address in the list of nodes at the #4445
Use the index of a seed node address in the list of nodes at the #4445
Conversation
repository to determine the hour to restart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK
Looks ok, but I'll leave a but in case @chimp1984 wants to act on my comment
"\n%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\n\n", target); | ||
shutDown(gracefulShutDownHandler); | ||
} | ||
}, TimeUnit.MINUTES.toSeconds(10)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use CHECK_SHUTDOWN_SEC
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHECK_SHUTDOWN_SEC is 1 hour. That has risk that we miss the hour we are targeting. If one timer event is triggered just 1 sec. before our target hour and then due cpu load the next 1 hour scheduled call takes longer we would miss our target hour. E.g. once call at 13:59, next call at 15:01, 14:00 would be our target.
With 10 min. Its safe enough and does not cause performance costs. We could maybe go higher like 30 min as well...but prefer to keep it on the safe side.
I ran this on wizseed3 and it restarted once in a day during the 9th hour of the day, so the restart code tested fine. However, I have a few comments on the implementation used to decide when to restart. Are you sure it's a good idea to sort the list of hostnames before deriving the IDs? By doing it that way, whenever we add new seednodes to the list it will change all the IDs, and since not everyone will upgrade on the same day, this might cause 2 seednodes to have the same ID and restart at the same time, causing the exact issue we are trying to prevent from happening. Alternatively, why not use a deterministic function to derive a number from the digest of the onion hostname? For example, since there are 1440 minutes in a day, so if you take the the md5sum of the hostname Additionally, regardless of which implementation is used, the restart code should use UTC regardless of the timezone that the server is set to. Some servers have their timezone set to local times which would offset the deterministic restart time. |
@wiz it's using |
@sqrrm this page below confirmed my thoughts that both are equivalent, using UTC seems more the standard way of doing it though. @wiz agreed that the hash is more elegant, but by using the ID the chance of collision is 0% (up to 24 seednodes) which is even better |
@mrosseel thanks for the link, good info. The whole point of this exercise was to have a deterministic restart schedule to have 0 chance that all nodes restart at the same time. Two nodes restarting at the same time is probably fine, three nodes might cause trouble in extreme cases. Merging. |
Re timezone: Local timezone is ignored, so it does not matter what server has set. It is just important to set it same for all as otherwise hour 3 might be same as hour 5 if timeezone of 2 nodes is 2 hours different. Re sorting: Good point with adding a node. I could use the list as it is defined, but in the code its a HashSet, so no order guaranteed but I could change that. Hash with mod will result in a random value which will not avoid out birthday problem. It would be same as now with random restart. The birthday problem is particularily a case for hash collusions for hash use cases. |
There is one problem with using the order in the node list: If a node gets removed we run into the same problem. I keep the sorting for now until a "perfect" solution comes up. If a node gets added we have to update all nodes quickly and restart them. |
@chimp1984 do you not like my suggestion of using the deterministic hash function(onion) mod 1440 ? |
@wiz This does not solve the problem of potential collusions. The only benefit over a random selection is that you can run it first and see the results and can check if there are collusions. But if you add a node you have to run again and if it produces a collusion you are in bigger troubles as with the sorted list model. The you need to change the mod or hash and try to get no collusion again. But with any further additions it can be the same again.... Not saying it is likely but does not provide what we want. |
Use the index of a seed node address in the list of nodes at the repository to determine the hour to restart.
This should avoid the riks that multiple seed nodes restart at the same time which can lead to data loss.