-
Notifications
You must be signed in to change notification settings - Fork 340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable image_tools tests with synthetic image #83
Conversation
update: different reliability QoS settings now tested. todo: tests not running properly on windows http://ci.ros2.org/job/ci_windows/1740/testReport/(root)/test_showimage_cam2image__rmw_fastrtps_cpp_Release/test_reliable_qos/ |
871f2dd
to
048020a
Compare
Update:
it takes a long time for an image to be received when using best effort publishing and subscribing with FastRTPS. see this test output. This does not happen for reliable pub+sub. Interestingly, this also does not happen for a reliable publisher and a best effort subscriber, where the resulting connection should be best effort. it seems that the best effort publisher might be what's causing the performance problems. there might be something in the reliable publisher that makes it easier for the subscriber (best effort or reliable) to get the message. I am not sure why at the moment. If others agree I will focus on getting this merged just to confirm that these demos are runnable, and we can investigate the behaviour issues as a separate task. I'll reduce the publishing rate of the images so that the timeout doesn't have to be so high to accommodate fastrtps (it's 60s). |
another update: only reliable is working for os x. best effort publisher and/or subscriber does not show that any message is received. I don't remember this being an issue when I first started working on this PR. The changes referenced in ros2/rmw_fastrtps#81 (comment) might be responsible (I don't know that we have any other tests for best effort communication + fragmentation, so that might be why we didn't notice this before.) |
That sounds good to me. +1 for getting this merged without fixing the fast-RTPS throughput issue in best effort Rationale:
+1 I think this is the only demo actually testing best effort on big messages. To my knowledge the only other tests with big messages are in |
So, the reason it wasn't working on osx with best effort is because of packet collisions. enabling the throughput controller for fastrtps means that the fragments of the image aren't all sent at the same time and so the collisions are less likely. With a fastrtps throughput controller of 30,000 bytes/10ms and a heartbeat period of 10ms, all platforms and all reliability combinations can receive an image using the default settings of 320x240 @ 30fps in less than a second: I have opened ros2/rmw_fastrtps#92 to track the discussion on what to do with this moving forward. For now, now that the reason these tests were having different behaviour on different platforms is clearer, I will only enable the reliable tests for this demo. |
As @mikaelarguedas pointed out this isn't 100% accurate - it's still not clear why the behaviour is dependent on the operating system, just that the throughput controller is a fix. To be explicit, without any changes to fastrtps (ie not throughput controller, default heartbeat period), if the best effort tests are enabled: CI for just this branch, with best effort tests disabled so as not to be dependent on any changes to fastrtps: |
How are you measuring that? Just the time the tests take to complete? |
yes, with the test completion time. the tests will pass once any image has been received. |
Yeah still wondering how there can be collision with a single sender o_O. maybe a magic ability to send simultaneously unicast and multicast packets 🦄 that end up colliding... I think you'll want to rebase this before merging (or use the "rebase and merge" option) otherwise the tests marked as skipped won't be skipped anymore. |
I have edited one of my previous comments to clarify that I was linking to two different sets of CI jobs: one with best effort tests enabled (top), and one with them not enabled (bottom), the current state of this PR.
Yeah I was just speaking with @wjwwood about that. I do see that, at least on my linux machine, I can see unicast and multicast packets being sent.
I am not sure that I understand. I do not want the "showimage_cam2image" tests to be skipped anymore. Are you speaking about another? |
It could also be that the kernel's udp buffer is getting overwhelmed by the bursty nature without the flow controller, but that's just conjecture. What's the evidence for the package collision as described above? I didn't see anything in @dhood's post where she mentions that. |
I don't have evidence that it's packet collision - I shouldn't have jumped to conclusions on that one. As an additional data point, I observed that running the demo on OS X at higher frequency (e.g. 100 Hz instead of 30 Hz) allows messages to be received by the best effort subscriber, even without any throughput controller.... |
Sorry I should have phrased it differently, I just wanted to be sure you were aware that this was reenabling all of them. Haven't tried any |
lol wut |
I think that the decision we made once upon a time was that we only disable tests for "on pause" rmw implementations e.g. https://github.com/ros2/system_tests/blob/3005b6df5261098bc6a51567b57f1022f7ef8830/test_communication/CMakeLists.txt#L60 if they are being skipped because of known implementation-specific bugs. I don't think we skip tests for them otherwise. so enabling for all rmw implementations is OK as I see it |
OK you're referring to the "COMMENDSrWriterService_write:!write. Reliable large data requires asynchronous writer." comment that's in the code. We use asynchronous writer now in the shared connext functions, so it should be fine to re-enable this even for connext dynamic. |
sounds good to me 👍 thanks for clarifying |
I've checked on the three OS X buildfarm machines and they all had the same thing.. wouldn't receive images at 30 Hz, would receive images at 100 Hz (eyeballing it it seems to actually be at 100 Hz, too). It evens improves further if you increase the frequency more....
Implementing the changes suggested here to only use multicast for subscribers seems to improve performance a bit (it can receive an image every so often, varies with runs), but there still seems to be something preventing images from being received in whole. I've opened ros2/rmw_fastrtps#93 to ask the eProsima folks if they have any ideas. |
I didn't realise that this was labelled as in progress and not in review. Could I get an explicit approval from someone? As we discussed above, this PR is just to check that the executables are runnable and while it was useful for evaluating fastrtps' performance, its scope has been narrowed to not depend on any changes to rmw_fastrtps. |
image_tools/CMakeLists.txt
Outdated
get_available_rmw_implementations(rmw_implementations) | ||
foreach(rmw_implementation ${rmw_implementations}) | ||
find_package("${rmw_implementation}" REQUIRED) | ||
endforeach() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason that I thought it was necessary is because it seems from your comment here that these lines were not removed from the tests mentioned for a reason that also applies here (we are making test executables for each rmw implementation).
Will the call_for_each_rmw_implementation
macro work fine without doing find_package
on each of the rmw implementations? I thought it must be necessary but since that macro calls get_all_rmw_implementations
itself perhaps this block can be removed again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see where this is true (if we consider "executable" to be a compiled executable rather than an executable Pythonscript):
we are making test executables for each rmw implementation
As far as I can see the testing_targets
macros doesn't contain any compiled targets. It only expands a Python template and registers that Python file as a nosetest. So there is nothing which uses the find_package
information. For the referenced tests in system_tests
that is different - the function there actually builds executables for each rmw impl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I see now that the only info that's needed is the list of RMW implementations from get_all_rmw_implementations
, not the packages themselves. I will remove this block thanks.
Without flow controllers there is no guarantee to receive best effort images
0e42da7
to
86c3d36
Compare
Addressed @dirk-thomas' comment in 11cd2e3 |
@@ -0,0 +1,2 @@ | |||
Publishing data on topic 'image' | |||
Publishing image #1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give more context about why renaming this file was necessary ? Asking because I thought that launch testing was explicitely looking for .regex
files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry missing a pointer here: based on this block I would expect this cam2image.txt
file to never be taken into account during the tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
launch testing can work with either .txt
or .regex
. .txt
has to match explicitly. cam2image.txt
will be matched as a literal file because of the particular block that you referenced
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops messed up my search for .txt
😕 sorry for the noise
connects to #43
todo: test different quality of service configurations