-
Notifications
You must be signed in to change notification settings - Fork 616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite Allocator #2516
Comments
Another example. I happened to notice this while going through the code: You can't take a pointer to a loop variable. the same memory location is reused for each iteration. The value pointed to changes every time. This was a landmine waiting to happen. |
Can you please include code snippets for the last two points in the problem statement. I tried reading the code but it wasn't obvious to me. @dperny |
The network allocator, like, the IPAM stuff, doesn't live in the raft store. It has its own internal state. Before you can allocate new IP addresses, you have to "allocate" all of the old IP addresses that are assigned in the raft store. This involves iterating through every object (node, network, service, task) and "allocating" their assigned IP addresses. However, it's possible for an object to be committed only partially allocated, where it has some IP addresses assigned and some not. This can happen, for example, when a spec has been updated, but the allocator hasn't run on the object yet. In the The difference between initialization and new allocation is whether or not addr is empty or an ip address. We have tons of bugs where we're passing an empty IP address to this method before we've fully iterated through every object and populated the ipam with the existing IPs. There's nothing stopping this from happening, because this is the same code path for new and old allocations. If you have an object in the above-mentioned half-allocated state, it's really tricky to get this right. This is in |
This is done as part of #2615, and work on it is tracked on the To complete the new allocator rewrite, we need to merge #2686, which removes the old allocator. Additionally, there have been changes to the old allocator since the rewrite began, which must be "forward-ported" to the new one. These are:
|
The Allocator code is bad. I'd hazard to say it's irredeemably bad. It's a source of constant bugs and breakages. Things are allocated twice or not allocated at all. The issues don't seem to be present (usually) in the levels deeper than swarmkit (libnetwork). Instead, libnetwork tends to give garbage responses when given garbage input.
Some examples of problems:
We should rewrite the whole thing from scratch. It's not a small project, and there's a lot of risk in a rewrite versus a refactoring. However, a clean slate would let us escape the most ingrained design flaws.
The text was updated successfully, but these errors were encountered: