Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable image_tools tests with synthetic image #83

Merged
merged 5 commits into from
Mar 24, 2017
Merged

Conversation

dhood
Copy link
Member

@dhood dhood commented Oct 26, 2016

connects to #43
todo: test different quality of service configurations

@dhood dhood added the in progress Actively being worked on (Kanban column) label Oct 26, 2016
@dhood dhood self-assigned this Oct 26, 2016
@dhood
Copy link
Member Author

dhood commented Oct 29, 2016

update: different reliability QoS settings now tested.

todo: tests not running properly on windows http://ci.ros2.org/job/ci_windows/1740/testReport/(root)/test_showimage_cam2image__rmw_fastrtps_cpp_Release/test_reliable_qos/

@dhood
Copy link
Member Author

dhood commented Feb 21, 2017

Update:

  1. runs on windows now. all that was needed was to add the opencv bin directory to the path of the buildfarm machines.
  2. there's still some bad performance with fastrtps.

it takes a long time for an image to be received when using best effort publishing and subscribing with FastRTPS. see this test output. This does not happen for reliable pub+sub. Interestingly, this also does not happen for a reliable publisher and a best effort subscriber, where the resulting connection should be best effort.

it seems that the best effort publisher might be what's causing the performance problems. there might be something in the reliable publisher that makes it easier for the subscriber (best effort or reliable) to get the message. I am not sure why at the moment.

If others agree I will focus on getting this merged just to confirm that these demos are runnable, and we can investigate the behaviour issues as a separate task. I'll reduce the publishing rate of the images so that the timeout doesn't have to be so high to accommodate fastrtps (it's 60s).

@dhood
Copy link
Member Author

dhood commented Feb 21, 2017

another update: only reliable is working for os x. best effort publisher and/or subscriber does not show that any message is received. I don't remember this being an issue when I first started working on this PR.

The changes referenced in ros2/rmw_fastrtps#81 (comment) might be responsible (I don't know that we have any other tests for best effort communication + fragmentation, so that might be why we didn't notice this before.)

@mikaelarguedas
Copy link
Member

That sounds good to me. +1 for getting this merged without fixing the fast-RTPS throughput issue in best effort

Rationale:
The best effort throughput issue on FastRTPS may be out of the scope of this PR and I think that several issues and discussion are reporting the same problem. Edit: like the on referenced in the previous comment

(I don't know that we have any other tests for best effort communication + fragmentation, so that might be why we didn't notice this before.)

+1 I think this is the only demo actually testing best effort on big messages. To my knowledge the only other tests with big messages are in test_communication where we established only reliable communication

@dhood
Copy link
Member Author

dhood commented Feb 28, 2017

So, the reason it wasn't working on osx with best effort is because of packet collisions. enabling the throughput controller for fastrtps means that the fragments of the image aren't all sent at the same time and so the collisions are less likely.

With a fastrtps throughput controller of 30,000 bytes/10ms and a heartbeat period of 10ms, all platforms and all reliability combinations can receive an image using the default settings of 320x240 @ 30fps in less than a second:
http://ci.ros2.org/job/ci_osx/1740/testReport/(root)/test_showimage_cam2image__rmw_fastrtps_cpp_/
http://ci.ros2.org/job/ci_windows/2266/testReport/(root)/test_showimage_cam2image__rmw_fastrtps_cpp_Release/
http://ci.ros2.org/job/ci_linux/2267/testReport/(root)/test_showimage_cam2image__rmw_fastrtps_cpp_/
(these jobs were run with a branch on rmw_fastrtps enabling the throughput controller and 10ms heartbeat period)

I have opened ros2/rmw_fastrtps#92 to track the discussion on what to do with this moving forward. For now, now that the reason these tests were having different behaviour on different platforms is clearer, I will only enable the reliable tests for this demo.

@dhood
Copy link
Member Author

dhood commented Feb 28, 2017

now that the reason these tests were having different behaviour on different platforms is clearer

As @mikaelarguedas pointed out this isn't 100% accurate - it's still not clear why the behaviour is dependent on the operating system, just that the throughput controller is a fix.

To be explicit, without any changes to fastrtps (ie not throughput controller, default heartbeat period), if the best effort tests are enabled:
Performance is fine on linux: 0.6s to receive a best effort image
Performance is poor on Windows: 28s to receive a best effort image
Performance is bad on OS X: no best effort image received in 60s


CI for just this branch, with best effort tests disabled so as not to be dependent on any changes to fastrtps:
windows: Build Status
os x: Build Status (existing test failure)
linux: Build Status

@wjwwood
Copy link
Member

wjwwood commented Feb 28, 2017

How are you measuring that? Just the time the tests take to complete?

@dhood
Copy link
Member Author

dhood commented Feb 28, 2017

yes, with the test completion time. the tests will pass once any image has been received.

@mikaelarguedas
Copy link
Member

Yeah still wondering how there can be collision with a single sender o_O. maybe a magic ability to send simultaneously unicast and multicast packets 🦄 that end up colliding...

I think you'll want to rebase this before merging (or use the "rebase and merge" option) otherwise the tests marked as skipped won't be skipped anymore.

@dhood
Copy link
Member Author

dhood commented Feb 28, 2017

I have edited one of my previous comments to clarify that I was linking to two different sets of CI jobs: one with best effort tests enabled (top), and one with them not enabled (bottom), the current state of this PR.

maybe a magic ability to send simultaneously unicast and multicast packets 🦄 that end up colliding...

Yeah I was just speaking with @wjwwood about that. I do see that, at least on my linux machine, I can see unicast and multicast packets being sent.

I think you'll want to rebase this before merging (or use the "rebase and merge" option) otherwise the tests marked as skipped won't be skipped anymore.

I am not sure that I understand. I do not want the "showimage_cam2image" tests to be skipped anymore. Are you speaking about another?

@wjwwood
Copy link
Member

wjwwood commented Feb 28, 2017

It could also be that the kernel's udp buffer is getting overwhelmed by the bursty nature without the flow controller, but that's just conjecture.

What's the evidence for the package collision as described above? I didn't see anything in @dhood's post where she mentions that.

@dhood
Copy link
Member Author

dhood commented Mar 1, 2017

What's the evidence for the package collision as described above? I didn't see anything in @dhood's post where she mentions that.

I don't have evidence that it's packet collision - I shouldn't have jumped to conclusions on that one.

As an additional data point, I observed that running the demo on OS X at higher frequency (e.g. 100 Hz instead of 30 Hz) allows messages to be received by the best effort subscriber, even without any throughput controller....

@mikaelarguedas
Copy link
Member

I think you'll want to rebase this before merging (or use the "rebase and merge" option) otherwise the tests marked as skipped won't be skipped anymore.

I am not sure that I understand. I do not want the "showimage_cam2image" tests to be skipped anymore. Are you speaking about another?

Sorry I should have phrased it differently, I just wanted to be sure you were aware that this was reenabling all of them. Haven't tried any connext_dynamic code for a while so I just assumed it was still broken.

@wjwwood
Copy link
Member

wjwwood commented Mar 1, 2017

As an additional data point, I observed that running the demo on OS X at higher frequency (e.g. 100 Hz instead of 30 Hz) allows messages to be received by the best effort subscriber, even without any throughput controller....

lol wut

@dhood
Copy link
Member Author

dhood commented Mar 1, 2017

Haven't tried any connext_dynamic code for a while so I just assumed it was still broken.

I think that the decision we made once upon a time was that we only disable tests for "on pause" rmw implementations e.g. https://github.com/ros2/system_tests/blob/3005b6df5261098bc6a51567b57f1022f7ef8830/test_communication/CMakeLists.txt#L60 if they are being skipped because of known implementation-specific bugs. I don't think we skip tests for them otherwise. so enabling for all rmw implementations is OK as I see it

@dhood
Copy link
Member Author

dhood commented Mar 1, 2017

Haven't tried any connext_dynamic code for a while so I just assumed it was still broken.

OK you're referring to the "COMMENDSrWriterService_write:!write. Reliable large data requires asynchronous writer." comment that's in the code. We use asynchronous writer now in the shared connext functions, so it should be fine to re-enable this even for connext dynamic.

@mikaelarguedas
Copy link
Member

sounds good to me 👍 thanks for clarifying

@dhood
Copy link
Member Author

dhood commented Mar 1, 2017

As an additional data point, I observed that running the demo on OS X at higher frequency (e.g. 100 Hz instead of 30 Hz) allows messages to be received by the best effort subscriber, even without any throughput controller....

lol wut

I've checked on the three OS X buildfarm machines and they all had the same thing.. wouldn't receive images at 30 Hz, would receive images at 100 Hz (eyeballing it it seems to actually be at 100 Hz, too). It evens improves further if you increase the frequency more....

Yeah still wondering how there can be collision with a single sender o_O. maybe a magic ability to send simultaneously unicast and multicast packets 🦄 that end up colliding...

Implementing the changes suggested here to only use multicast for subscribers seems to improve performance a bit (it can receive an image every so often, varies with runs), but there still seems to be something preventing images from being received in whole.

I've opened ros2/rmw_fastrtps#93 to ask the eProsima folks if they have any ideas.

@dhood dhood added in review Waiting for review (Kanban column) and removed in progress Actively being worked on (Kanban column) labels Mar 7, 2017
@dhood
Copy link
Member Author

dhood commented Mar 7, 2017

I didn't realise that this was labelled as in progress and not in review. Could I get an explicit approval from someone? As we discussed above, this PR is just to check that the executables are runnable and while it was useful for evaluating fastrtps' performance, its scope has been narrowed to not depend on any changes to rmw_fastrtps.

get_available_rmw_implementations(rmw_implementations)
foreach(rmw_implementation ${rmw_implementations})
find_package("${rmw_implementation}" REQUIRED)
endforeach()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason that I thought it was necessary is because it seems from your comment here that these lines were not removed from the tests mentioned for a reason that also applies here (we are making test executables for each rmw implementation).

Will the call_for_each_rmw_implementation macro work fine without doing find_package on each of the rmw implementations? I thought it must be necessary but since that macro calls get_all_rmw_implementations itself perhaps this block can be removed again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see where this is true (if we consider "executable" to be a compiled executable rather than an executable Pythonscript):

we are making test executables for each rmw implementation

As far as I can see the testing_targets macros doesn't contain any compiled targets. It only expands a Python template and registers that Python file as a nosetest. So there is nothing which uses the find_package information. For the referenced tests in system_tests that is different - the function there actually builds executables for each rmw impl.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I see now that the only info that's needed is the list of RMW implementations from get_all_rmw_implementations, not the packages themselves. I will remove this block thanks.

@dhood dhood force-pushed the enable-burger-tests branch from 0e42da7 to 86c3d36 Compare March 8, 2017 16:49
@dhood
Copy link
Member Author

dhood commented Mar 8, 2017

Addressed @dirk-thomas' comment in 11cd2e3

Rebased and adapted for #117 in 86c3d36

Linux: Build Status
OS X: Build Status (standard test failure these days)
Windows: Build Status

@@ -0,0 +1,2 @@
Publishing data on topic 'image'
Publishing image #1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give more context about why renaming this file was necessary ? Asking because I thought that launch testing was explicitely looking for .regex files

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry missing a pointer here: based on this block I would expect this cam2image.txt file to never be taken into account during the tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

launch testing can work with either .txt or .regex. .txt has to match explicitly. cam2image.txt will be matched as a literal file because of the particular block that you referenced

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops messed up my search for .txt 😕 sorry for the noise

@dhood
Copy link
Member Author

dhood commented Mar 24, 2017

I've run CI again on this since it's been a few weeks and I'm going to merge this.
linux Build Status (linters on another package)
windows Build Status
osx Build Status

@dhood dhood merged commit 5e84afc into master Mar 24, 2017
@dhood dhood deleted the enable-burger-tests branch March 24, 2017 17:21
@dhood dhood removed the in review Waiting for review (Kanban column) label Mar 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants