You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
👨🌾 test_security -> test_processes_finished_gracefully and test_subscriber_terminates_in_a_finite_amount_of_time tests failing on nightlies linux + linux-rhel and windows repeated
#498
Open
Blast545 opened this issue
Mar 24, 2022
· 2 comments
This one is tricky. The error on Jammy has been fixed by various changes we did to infrastructure.
We've had a hard time getting reliable RHEL builds lately, but on one of the latest ones that did run to completion it did fail: https://ci.ros2.org/view/nightly/job/nightly_linux-rhel_release/1100/#showFailuresLink . @cottsay has dug into a bit, and figured out that it had something to do with the versions of OpenSSL linked in, but I don't know the status of that investigation.
cottsay has dug into a bit, and figured out that it had something to do with the versions of OpenSSL linked in, but I don't know the status of that investigation.
There are a couple of issues related to RTI's bundled OpenSSL.
For RHEL: The default linker flags don't include -Wl,--as-needed, and because we're linking against message packages by linking in all of the generator outputs, every binary that links a message package gets linked against the Python libraries. On RHEL, the Python libraries are linked against libcrypto.so, which gets loaded from the system. When Connext tries to load libssl.so from the bundled OpenSSL, it encounters the unexpected system version of libcrypto.so instead of the bundled one. This one will be hard to fix - we need to change the way we link against message packages so that we can target only the generators we actually want (c/cpp/py).
For Jammy, we were missing the magic environment variable that tells test_security to update LD_LIBRARY_PATH with the bundled OpenSSL location. That should have been resolved by Set RTI_OPENSSL_LIBS for Connext 6.0.1 ci#648.
That's as far as my investigations got. I'd consider any test_security failures for Connext on RHEL to be low-priority for now.
Bug report
Required Info:
jammy
+ Linux-Rhel + Windows flakySteps to reproduce issue
Run a buildfarm job on linux rhel or on ubuntu jammy.
Expected behavior
All test pass.
Actual behavior
Some security tests fail consistently:
test_security.TestSecurePublisherSubscriberAfterShutdown.test_processes_finished_gracefully
test_security.TestSecurePublisherSubscriber.test_subscriber_terminates_in_a_finite_amount_of_time
Additional information
This started failing consistently on the ubuntu jammy jobs after the transition from python3.9 to 3.10.
https://ci.ros2.org/view/nightly/job/nightly_linux_debug/2246/
This a link to the failing section of the test:
system_tests/test_security/test/test_secure_publisher_subscriber.py.in
Line 76 in 4fe90b6
and
system_tests/test_security/test/test_secure_publisher_subscriber.py.in
Line 84 in 4fe90b6
Possible clues
It has been failing constantly since 2 months ago in rhel.
First getting some clues from RHEL. It started showing there 2 months ago between the transition of these two jobs:
https://ci.ros2.org/view/nightly/job/nightly_linux-rhel_debug/1012/
https://ci.ros2.org/view/nightly/job/nightly_linux-rhel_debug/1013/
And the changes between one job and the other are these PRs:
ros2/rosidl_python#149
ros2/geometry2#496
eProsima/Fast-DDS#2399
ros2/rclcpp#1862
ament/ament_index#83
The text was updated successfully, but these errors were encountered: