-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AEP: Replace RabbitMQ #30
Open
muhrin
wants to merge
3
commits into
master
Choose a base branch
from
replace-rmq
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
# Remove the dependence on RabbitMQ | ||
|
||
| AEP number | 000 | | ||
|------------|--------------------------------------------------------------| | ||
| Title | Remove the dependence on RabbitMQ | | ||
| Authors | [Martin Uhrin](mailto:martin.uhrin@epfl.ch) (muhrin), [Chris J. Sewell](mailto:christopher.sewell@epfl.ch) (chrisjsewell)| | ||
| Champions | [Martin Uhrin](mailto:martin.uhrin@epfl.ch) (muhrin), [Chris J. Sewell](mailto:christopher.sewell@epfl.ch) (chrisjsewell)| | ||
| Type | S - Standard Track AEP | | ||
| Created | 09-Dec-2021 | | ||
| Status | submitted | | ||
|
||
## Background | ||
|
||
Currently AiiDA uses RabbitMQ as the communication backbone for the daemon. RabbitMQ is used to queue up tasks, send action messages (asking a worker to pause/play/kill a process) and to send broadcasts informing any listeners of a state change that occurred in a process (e.g. a parent AiiDA process can listen for the termination of its child). | ||
|
||
While providing some useful features and functionalities, the use of RabbitMQ comes with a few drawbacks and problems: | ||
1. Setting up AiiDA is made more complicated as an OS level service needs to be installed. | ||
2. There is no API to introspect RabbitMQ queues to determine what is and what isn't being worked on. | ||
3. Desynchronisation between the database state sometimes happens (e.g. a process is still in a non-terminal state but there is no corresponding RabbitMQ message queued, and therefore the process will never continue). | ||
4. RabbitMQ is [making changes](https://github.com/rabbitmq/rabbitmq-server/pull/2990) that will see tasks timed out within 30 minutes. This is not controllable from user space and, what's more, it has been [made clear](https://github.com/rabbitmq/rabbitmq-server/discussions/3345) that long-running tasks are not the intended use case for RabbitMQ. | ||
|
||
## Proposed Enhancement | ||
This AEP proposes to drop RabbitMQ replacing it by a suitable stand in that does not rely on an external service but retains the core properties (e.g. reliability, responsiveness, scalability, etc). | ||
|
||
## Detailed Explanation | ||
|
||
For now this AEP does not propose a concrete alternative but rather fleshes out the requirements which will inform the choice/implementation of a RabbitMQ replacement. | ||
|
||
### Original requirements | ||
|
||
The design of the current (RMQ based) system was designed to meet the following requirements: | ||
1. The messaging system should be able to handle three types of communication: | ||
1. Task queues (e.g. for a client to submit a process to be run by a worker) | ||
2. Remote Procedure Calls (e.g. asking a process to pause/play/kill) | ||
3. Broadcasts (e.g. a parent listening for the termination of a child) | ||
2. An AiiDA process should only ever be ran by one worker at a time | ||
3. An error can lead to an AiiDA process never running but should not lead to the process being executed by multiple workers simultaneously ([elaboration](#requirement-3)) | ||
4. The messaging system should be able to handle both long running tasks (days) and many short running tasks (seconds), some of which may be dependent, without unnecessary delays (at most seconds) ([elaboration](#requirement-4)) | ||
5. The messaging system should not lead to CPU load when nothing is happening (e.g. as a result of frequent polling) | ||
6. The daemon running processes and the user submitting them should be able to be on separate machines with communication handled by standard protocols (e.g. TCP/IP) ([elaboration](#requirement-6)) | ||
7. The memory requirements of the system (both RAM and disk) should not grow over time beyond that which is proportionate to the number of processes actively running (i.e. messages referencing terminated or deleted process should be guaranteed to be removed in a timely manner) | ||
8. It should be possible to dynamically scale the number of workers to accommodate an increase or decrease in the load on the daemon and for the user to control the permitted load on their machine. | ||
|
||
|
||
### Proposed replacement | ||
|
||
This section will evolve into an actionable solution but, for now, will be used to flesh out and discuss a solution. | ||
|
||
#### Changes to requirements | ||
|
||
It is not clear that if need requirement 6. Currently, it is not possible to run a daemon on a separate machine from the AiiDA database, principally because the repository assumes that the files are stored on the machine where the AiiDA API calls are being made. This may be possible if an object store backend (e.g. S3) were to be implemented but it's not clear that this will ever be done as the current model of one AiiDA instance per user is prevalent. | ||
|
||
#### Technical overview | ||
|
||
If I've understood Chris' prototype the basic architecture would be as follows: | ||
|
||
1. Messages (tasks, actions and broadcasts), would be stored in the database by extending the ORM. | ||
2. A separate mechanism (e.g. communicating over sockets) would be used to 'poke' a worker to check for new messages and respond to events. | ||
|
||
This would, in principle, give us the flexibility to more easily introspect and even mutate the state of queues (which is not possible in python with RMQ) while maintaining a responsive, event-based, system. | ||
|
||
At the implementation level there are several mechanisms that are needed to allows this system to satisfy the above requirements: | ||
|
||
1. An atomic select/update database operation that enables a mutating query of the following form to be issued: 'give me X outstanding tasks and write my PID and (optionally) the current timestamp to the particular columns', similarly for events where the timestamp not be optional | ||
2. A mechanism to check if a worker is still alive. This can be either | ||
i. a call that checks if a process with the given PID is sill alive, or, | ||
ii. a heartbeat by the worker that updates their timestamp in the DB to 'renew' their license to keep working on a task for another X number of heartbeats | ||
3. The communication protocol to 'poke' a worker. | ||
|
||
|
||
## Pros and Cons | ||
|
||
Pros and cons of dropping RMQ: | ||
|
||
### Pros | ||
* Installing AiiDA would become simpler, not requiring the installation of an OS service | ||
* Monitoring which processes are queued or actively running would be possible | ||
* Depending on the implementation, load-balancing could be greatly simplified (doing away with the need to tune number of worker _and_ slots per worker) | ||
|
||
### Cons | ||
* We may need to develop some in-house code to replace RMQ | ||
* Any new system will need to be thoroughly tested to ensure that the requirements are met. This will be challenging for rare failure modes and situations that require large amounts of processes to be submitted/running. | ||
|
||
## Supplemental Information | ||
|
||
### Requirements elaborations | ||
|
||
In order to keep the requirements succinct and quick to read for someone new coming to this AEP I will use this section to give more details where needed. | ||
|
||
#### Requirement 3 | ||
|
||
In order to deal with possible race conditions, any concurrent system must make a choice between one of two strategies. In such systems one can only guarantee one or other of the following: | ||
|
||
1. A task will run at least once, but possibly more times. This is the most appropriate choice for systems that have tasks that are idempotent and where the resources consumed by a task not a concern. Clearly, this choice is not a good fit for AiiDA. | ||
2. A task will run zero or one time. This is pretty much the only choice that makes sense for AiiDA. | ||
|
||
While adopting strategy 2 AiiDA should make it as easy as possible for a user to see and relaunched tasks that failed to execute at all. | ||
|
||
|
||
#### Requirement 4 | ||
|
||
The need to be able to support long and short duration jobs efficiently creates a tension, particularly with regards to requirement 5. Essentially, a polling based solution would likely end up using a non-trivial amount of CPU if it were tuned to support very short jobs with many dependencies (e.g. [CIF cleaning workflows]((https://github.com/chrisjsewell/aiida-process-coordinator/discussions/4#discussioncomment-1296748)) of Sebastiaan). RabbitMQ and other message brokers, get around this using some kernel level socket event hooks,effectively allowing them to eliminate polling and achieve realtime responsiveness. | ||
|
||
#### Requirement 6 | ||
|
||
At the time of developing the new workflow system it was decided to keep the door open to a use case where a single AiiDA instance could be used by multiple users. For example, an academic group could set up a server with AiiDA (using a single or multiple profiles) that users could interact with directly using the python AiiDA API. The database side of this is already supported (Postgress just communicates over TCP/IP anyway). By using RMQ we can also communicate over TCP/IP, thereby allowing the daemon to run on the central server. The file storage component has only recently been generalised to remove the explicit dependence on access to the local filesystem, and this (from the API point of view) was the main impediment to this use case being possible. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for a balanced and more complete discussion, we should explicitly consider alternative solutions in addition to Chris' pure-Python prototype. I understand that needing this service comes with a lot of issues in itself, as outlined in the introduction, but I would like to say with confidence that the problems to be addressed with the removal of RabbitMQ could not also be addressed with
Especially using something like Kafka would potentially give us some avenues for network-distributed and multi-user support. It is possible (maybe even probable) that those are not the right models, but I think we should make sure that we can say that with confidence before venturing on this major architecture change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think you raise a valid point @csadorf that this change should certainly not be taken too lightly, and so merits to have other potential solutions investigated.
Just wanted to already quickly give some feedback on your point 1. I think we currently already use RabbitMQ in as light a manner as possible. What happens is exactly as you describe: when you submit a process, a request to do so is sent to RabbitMQ, which will then send this "task" to a worker. As part of its guarantees that each task will be at some point executed, RabbitMQ will wait for the worker to confirm the task is done and if not, it will at some point send it to another worker, if the heartbeat is missed (and so it has to assume the worker died). This is where the problem lies though as RabbitMQ was designed for short running tasks (order of seconds/minutes) and not the tasks typically run with AiiDA (hours/days). This can be configured, but not out of the box, which is the reason that we are considering replacing it. I don't think that we can get around this feature of RabbitMQ though as the whole reason we selected it in the first place was for its guarantees of eventually completing the tasks it receives and its persistence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, what we really need is a robust queuing system not a message broker, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least for that part, but we also use the message broker part. As a process changes state, we emit broadcast. This is very useful as it makes the system event-based instead of polling based. For example, parent processes that are waiting for their child process to finish will be notified immediately when this happens and can continue running. With a polling system you always have a delay. And making this delay too short tends to overload the system unnecessarily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not even remotely convinced that this is the right solution, but trying to understand: in principle we could push a request for a process to be executed onto a Kafka topic (event log), which could then be picked up by a worker. The same topic could then also be used to communicate back any state changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like the sound of the Kafka architecture but conceptually for this use case we wouldn't really be using it in its most powerful mode (an event log). I did once, mostly jokingly, propose rewriting the AiiDA DAG in Kafka which would be a good fit for an event store.
Anyway, to the matter at hand, I think Kafka could work but I do think there is a pretty large advantage to not requiring the installation of a system level service (which I think Kafla is). I don't know if faust-streaming would get around this.
Essentially, whatever solution we adopt there will need to be at least two components:
I played around last year with using
xmlrpc
(a standard python module) for the event based messages part (it was pretty easy to create a kiwipy compatible communicator), but this would still require the storage component which in Chris' proposal comes in the form of additional database tables. Personally I'm not so keen for AiiDA to take responsibility for the storage of events but then I haven't seen any persistent, non-OS-service based solutions.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeh I'm very keen to avoid swapping out one "user complication" for another 😬
I'll have alook at faust though
Just to note here, I'm going to be delving in to the internals of rabbitmq a bit more in the coming month(s), to understand it more thoroughly.
Crudely speaking though, its not magic; it is a daemon service, that uses a database to persist data, and a JSON based communication protocol (with additional caching features etc https://blog.rabbitmq.com/posts/2015/10/new-credit-flow-settings-on-rabbitmq-3-5-5)
With the aiida daemon, we already have a daemon service, that uses a database to persist data, plus we also have direct control over what process are running etc, which rabbitmq does not, and I feel we are just duplicating effort to some extent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I agree that a push model would be better than polling though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a nice summary: https://hvalls.dev/posts/polling-long-polling-push, which specifically references:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I completely misunderstand the scope and mission of Kafka, that is its exact purpose. So it sounds to me like https://github.com/faust-streaming might indeed be important to evaluate then. Not only with respect to its fit as a technical solution, but also with respect to its sustainability (the project was abandoned by its original creator "Robinhood" and is currently maintained as a community fork).
Edit: Just wanted to mention that apparently my browser spaced out and didn't show @chrisjsewell 's last two comments before mine, so I did not take those into account for this comment.