-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reset / Overwrite invokerId for unique name in zookeeper manually #5024
Reset / Overwrite invokerId for unique name in zookeeper manually #5024
Conversation
core/invoker/src/main/scala/org/apache/openwhisk/core/invoker/InstanceIdAssigner.scala
Outdated
Show resolved
Hide resolved
} | ||
} else { | ||
val newId = overwriteId.getOrElse("") | ||
zkClient.create().orSetData().forPath(myIdPath, BigInt(newId).toByteArray) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I got this correctly, I think there would be still data for obsolete unique names.
Shouldn't we remove them?
For example, let's assume data looks like this:
/invokers/idAssignment/mapping/host1 0 // obsolete
/invokers/idAssignment/mapping/host2 1 // obsolete
/invokers/idAssignment/mapping/host3 2
/invokers/idAssignment/mapping/host4 3
You can update this like,
/invokers/idAssignment/mapping/host1 0 // obsolete
/invokers/idAssignment/mapping/host2 1 // obsolete
/invokers/idAssignment/mapping/host3 0
/invokers/idAssignment/mapping/host4 1
But there will still be data for host1
and host2
.
If such names are used again by any chance, isn't it possible that id conflicts can happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's correct. I added a comment that you should not use this overwrite unless you are sure that there is not another invoker that shares the id.
I would like to make it cleaner to update invoker host mapping, but currently it's tracked by a single atomic counter in zookeeper which makes this complicated since it only goes up. And since the invokers are hostname -> id mappings, you would need to get all invokers and look at all of the ids to determine what is the lowest id available to add it as (which I'm not sure if you can do atomically like with the counter I'm not super familiar with zookeeper). This is just meant to be a manual operation used for corrections when your fleet is out of sync. Do you have any better ideas because I do think it's a problem you can never really go back and remap invoker hosts to a different instance id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like it's a bit picky.
How about removing the obsolete data from the zookeeper when overwriting the ID?
Maybe we can iterate the mappings and figure out the invoker with the overwritten ID.
I am also fine with the proper comment as this feature would not be used without any manual intervention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pushed a change which checks all uniqueNames to see if the instance id already exists before overwriting it to this invoker. This isn't atomic like the counter is for assigning id's so it's not perfect, but I think it's more than good enough for something that should be used for corrective actions to an out of sync fleet.
Next up, I would like to figure out a way to improve the dynamic id assigner so that it can handle these gaps and backfill if things are missing in an atomic way to this check of what ids are there. But that should be saved for another pr. This is good enough for now for us to correct things.
exception case should be covered in a unit test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure this would fix your problem.
It seems the path is not transient so that even if an obsolete invoker no longer exists, there will be the path and data in zookeeper.
So you would be unable to overwrite the existing ID all the time.
Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea that's right the data would have to be gone in zookeeper before the overwrite attempt happens. I guess I could try to delete the existing mapping once I find it and then overwrite the id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, it should delete the mapping with the instance id before it overwrites
221d762
to
45c175c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally looking good to me
core/invoker/src/main/scala/org/apache/openwhisk/core/invoker/InstanceIdAssigner.scala
Outdated
Show resolved
Hide resolved
core/invoker/src/main/scala/org/apache/openwhisk/core/invoker/InstanceIdAssigner.scala
Outdated
Show resolved
Hide resolved
if (idCounter.trySetCount(current, current.getValue() + 1)) { | ||
current.getValue() | ||
} else { | ||
assignId() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's "possible" this can recurse forever, should this give up after N trials?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears that way yea, but I'd rather deal with that in a separate PR since it's already existing.
core/invoker/src/main/scala/org/apache/openwhisk/core/invoker/InstanceIdAssigner.scala
Outdated
Show resolved
Hide resolved
…InstanceIdAssigner.scala Co-authored-by: rodric rabbah <rodric@gmail.com>
…InstanceIdAssigner.scala Co-authored-by: rodric rabbah <rodric@gmail.com>
I made a couple changes to make this better and assigning ids more infallible:
|
fa5691b
to
2c1c397
Compare
tests/src/test/scala/org/apache/openwhisk/core/invoker/test/InstanceIdAssignerTests.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a minor nit
2c1c397
to
1936fe0
Compare
Tested this out in one of our regions. Works great |
Description
This allows the operator to specifically set the instanceId for a unique name in zookeeper from the command line when starting the invoker. Added a comment that this command option should not be used unless you are sure that the invokerId does not exist for any unique name
For context, we use hostname for our invoker unique names. We have hosts that no longer exist and therefore invokers are created backfilled in the controller invoker pool for those invokers that will never be started again. This causes the invoker hashing algorithm to be off since these invokers are always considered down. This change allows us to easily move our highest lexicographical invokers into those slots in zookeeper without manually going into zookeeper to do so.
Related issue and scope
My changes affect the following components
Types of changes
Checklist: