-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hot reloading Python Operator #239
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great feature!
There are two aspects of the current implementation that I'm not sure about:
- I'm not sure if the daemon is the right place for the file watching.
- Right now the daemon only runs on the local machine, but we plan to support deployments to remote deaemons too.
- The file watching should probably always happen on the development machine.
- I don't think that we should enable live-reloads by default.
- It can be suprising that modifying a file affects a dataflow that was started earlier.
I think the typical approach to features like this is to provide some sort of explicit watch
command. How about the following:
- We add a
--watch
flag to thedora-cli start
command. If this flag is given, the command does not once the dataflow is started, but keeps watching the dataflow YAML file and the referenced nodes/operators. - When a change is detected, the CLI prints an info message (or a warning is the dataflow is no longer valid) and sends a message to the coordinator.
- (Once we add remote deployments in the future, the message could also contain the new executable.)
- The coordinator forwards the reload message to the daemon(s) that the dataflow runs on.
IncomingEvent::Reload => { | ||
// TODO: Reload shared library. | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that we can safely reload a shared library. It's already quite difficult for Rust code and doing it across languages (C, C++, Rust, etc) is even more difficult. So I think we should never try to reload shared libraries and log a warning here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an excellent article. I naively thought that we could just swap context just like in Python. Thanks!
So, instead of reloading shared library, what about re-spawning a Custom node or Runtime, copying the previous state in shared memory and passing it to the newly spawned Custom node or Runtime. This should result in the same behavior as reloading. It might even add the benefit of smaller reloading
interruption time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that the state is controlled entirely by the operator. The operator gives us a opaque pointer on initialization and the runtime just passes this pointer on on_event
calls. Only the operator itself knows what to do with it.
So we cannot just copy the memory referenced by the pointer since we don't know the size of the data. It might even contain pointers to other data internally that is part of the state too. Another challenge is that the new operator might use a different data format, so copying the old state in could result in invalid values and undefined behavior.
The only way to copy the state over is with cooperation between the operator, the runtime, and the new operator. We could for example introduce a serialize_state
event that needs to be handled manually by the operator code. The operator needs to serialize all state into some common format (e.g. JSON) and the new operator needs to be able to deserialize and apply this state again. It also needs to be able to migrate the data from old structures in case the data format changed. Since this approach is quite complex and requires manual work, I think it's only suited for some planned updates (e.g. update a running operator from version 1 to version 1.1), but not for hot reloading...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, thanks for the detailed explanation! Let's see if people use this feature for python, and how they use it, and what state management is more beneficial for the user, that might guide us for the rest.
I like the idea of the I'll check it out later. |
45bb4b0
to
c07e864
Compare
So I added to flags to
example usage: dora start dataflow.yml --attach --hot-reload Note: I change |
e6ba7fe
to
61182b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now, thanks a lot for your work on this!
61182b0
to
d19b74b
Compare
…s to be reloaded
…s, and removing `info!` on `cli` querying of `coordinator`
1b19ed2
to
fb694df
Compare
This PR add the capability to hot-reload python when the files are changed. Hot-reloading means that when the user change the python operator file, it will immediately take effect within the dataflow without loosing the current state of the operator.
Purpose
Sometimes reloading a robotic environment ( both virtual and physical ) might be really time-consuming.
Being able to hot-reload an operator makes iteration on change a lot quicker.
Implementation
Schematic explanation of the implementation:
Reload
event if it notice file changes.dora-cli -> dora-coordinator -> dora-daemon -> Custom Nodes API -> Runtime Nodes -> Operator -> Python Operator API
This is based on
notify
. The implementation is differentiation based.Fail-safe mechanism:
Not included in this Pull Request