-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Address performance issues related to large objects workflows #898
Comments
Thanks for posting the issue and giving detailed information about your suggestions! Here are our current thoughts: 1. Global Variable Manager - gvm.set_variable delayOf course it is possible to enable this DEBUG level check for this single line of code, but maybe it is to specific to actually patch it like this. If we decide to insert the line as you suggested we would then need to do it for every line where variables are printed like this. RAFCON/source/rafcon/logging.conf Line 11 in 59987cd
If you set this to e.g. 2. Output Port Delay and 3. Input Port DelayWe discussed to include the functionality of disabling the deepcopy lines in the code for passing the data in between states. However, we noticed that this might be a little dangerous (see image below): If the data output from ExecutionState1 is changed in ExecutionState2, the data input into ExecutionState3 will also receive the altered data. This is usually not wanted and misleading in the GUI representation also. So if we implement such a flag to disable the deepcopies, it would be some kind of expert-level-flag, to only use when you are really sure that nowhere in your state machine it should be used in another way. Maybe then it is also a little to specific and dangerous if users accidentally disable the flag. |
For 1, as you mentioned Johannes having all settings in one place (including the logging) is important for us. We want to ensure we can replicate performance across all development and deployment machines, including automated CI testing. This can only be done if we can enforce all settings and prevent user configuration. Being able to point to the settings files on launch is super handy. RAFCON has many settings in the YAML's that can drastically alter behavior or performance. For 2 and 3 I concur having a "Perform deepcopy" checkbox in the data port table would be good, that way when there is a known large object we can choose to prevent the copy at a per port basis. If we are concerned about the above case (data being unknowingly mutated before use) it should be sufficient to put in some tooltip warnings or hide the feature under advanced mode. If we want to show this visibly in the GUI, maybe changing the connection colour to red or showing a red flag next to the name might be good? This is a real problem, as it prevents common state patterns performing at high rate. Example: Typical images we work with are quite large and can take seconds to copy, so the overall loop update time is crippled. As mentioned, we also have the problem passing MoveIt trajectories between states. |
Regarding the first point, I have already set the level to INFO. However, the issue is that since the input of logger.debug() is a string, the .format() method is executed before the logger function is called. As a result, even if the message is not printed, time is still consumed serializing the message before entering to logger.debug(). |
Hi together, I would like to address the The problem that @JohannesErnst raised with unknowingly mutating the state is in my opinion not to be underestimated, and therefore in my opinion using the dataflow for that is not ideal. A possibility that you could take is using the Alternatively, how we do this on our robots is, that we have the perception loop running outside of rafcon. In a rafcon statemachine we trigger the perception actively given the parameterization we have in our behavior. We use e.g. a ROS-node that provides a service "detect_object(object_id)" which implements the fetching of the camera image, running a perception method on the image and returning the object pose. This has multiple advantages:
So if you really would like to have this behavior within rafcon, either use the global variable space, fork the repository and integrate this behavior in a fork or you could also try to create a plugin that allows you to do this (https://rafcon.readthedocs.io/en/latest/plugins.html) |
Also thank you from my side for proposing these adaptions.
After quick consultation we think that's a valid point to overcome the performance neck introduced by the logging, especially for the lower levels. However it would then be useful to have a holistic solution which addresses all logger occurrences of the two lowest level (VERBOSE and DEBUG) in the code. In that case it can be handy to implement the suggested changes for the log-levels
If you are willing to come up with a request for this fix as proposed, we'd be happy to have a look over it and apply it to the code. |
I would like to propose a couple of enhancements that address multiple performance issues, one related to a performance issue in GVM and a couple related to deepcopy when using flow ports between states. I understand that the RAFCON team has previously expressed a preference to retain the use of deepcopy for safety reasons. However, I propose the introduction of optional configuration flags to provide users with greater flexibility in optimizing performance while preserving the current behavior by default.
Below are the specific instances where significant performance bottlenecks have been observed:
1. Global Variable Manager - gvm.set_variable delay
File:
source/rafcon/core/global_variable_manager.py:112
Issue:
When setting a global variable, the following line serializes the value to generate a message string. For large objects, this serialization takes several seconds to complete:
Proposed Solution:
Serialize the message only if debug logging is enabled, as shown below:
This change avoids unnecessary serialization when debug logging is disabled, leading to significant performance improvements.
2. Output Port Delay
File:
source/rafcon/core/execution/execution_history_items.py:176
Issue:
When generating
ScopedDataItem
for use in the data flow, a deepcopy is performed, which takes around 500-800 ms per trajectory (using MoveIt2 trajectories as a large complex object):Proposed Solution:
Introduce an option to skip deepcopy for users who do not require it.
This change would maintain backward compatibility, as
use_deepcopy
would default toTrue
, but allow users to enable somehow this flag for significant performance gains.3. Input Port Delay
File:
source/rafcon/core/states/container_state.py:1474
Issue:
When retrieving input data for state execution, a deepcopy is performed on
ScopedDataItem
data from the data flow, leading to an additional 500-800 ms delay in our case sending a moveit2 trajectory:Proposed Solution:
Optionally disable deepcopy using a similar approach to the output port delay fix:
Again, this approach maintains current behavior by default but gives users the flexibility to avoid the deepcopy overhead.
These proposed changes offer a significant reduction in execution time, especially for workflows with large data objects or high-frequency state transitions.
Benefits:
I would like to know if there is a feasible way to introduce this flag in the mentioned cases. Additionally, I propose applying the fix for point 1 directly, as it provides immediate performance benefits. Please let me know if this approach aligns with RAFCON's design principles, and I would be happy to assist in implementing or testing these changes.
The text was updated successfully, but these errors were encountered: