-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate safekeeper unique IDs. #1228
Conversation
bc955cd
to
57e560d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ok to me. Would be good to further unify NodeIds for safekeeper and pageserver. I can improve on that when I'll be working on pageserver side
c08b45d
to
4e98076
Compare
Updated patch makes ids monotonic with decimal representation. @hlinnaka note that this id completely replaces textual 'name' currently known by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still do not get it why we need different flavours of ids. Reposting here for visibility.
If we need to distinguish between entities that ids belong to, lets do that explicitly. We can reserve some of these id bits to some tag value and use it in Debug/Display impls. So id will look something like that: node/abcd12354 or tenant/1234abcd. IMO it is better than some heuristics. Also if our ids are too long, lets shrink them to minimal possible length. e g to have 8 bytes instead of 16.
As said in another PR, I'm for that variant mostly due to shortness/readability/being easy to remember. Right now tenant id + timeline id are taking about half of space of log line; imagine adding to that global installation id + shard id once we get to sharding + storage node id, perhaps several (e.g. pageserver and safekeepers). Most of our logs would be ids %). If you want another example of such "mixed" ids, consider CockroachDB: they have uuid for global cluster id and sequential decimal ids for node ids. For the same reasons, I would expect. Random 64 bit ids combine both disadvantages: they are not long enough to regard them unique without additional checks (unlike uuid) and still all values are quite long. Still, I don't care very much here, I just want to proceed. I wouldn't extremely mind the alternative, and it is not very hard to change later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If one uuid has enough uniqueness can we shrink tenant/timeline ids in half? So together they form this cluster uuid.
I'm not strongly opposed to sequential ids. My concern is having too many concepts to deal with. If one type of id is enough, then why bother and invent new ones? We can have one id with some tag in the first byte which describes what kind of id it is without guessing
Regarding logs I think it is inevitable that more and more variables are being added to message context. Modern logging follows machine readable path, e.g json. With that we can filter out the noise.
Feel free to commit this if you're in a hurry.
3b5b175
to
1481d51
Compare
In preparation for storage node messaging. IDs are supposed to be monotonically assigned by the console. In tests it is issued by ZenithEnv; at the zenith cli level and fixtures, string name is completely replaced by integer id. Example TOML configs are adjusted accordingly. Sequential ids are chosen over Zid mainly because they are compact and easy to type/remember.
Merged through #1310 |
In preparation for storage node messaging.