Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

👨‍🌾 test_security -> test_processes_finished_gracefully and test_subscriber_terminates_in_a_finite_amount_of_time tests failing on nightlies linux + linux-rhel and windows repeated #498

Open
Blast545 opened this issue Mar 24, 2022 · 2 comments

Comments

@Blast545
Copy link

Bug report

Required Info:

  • Operating System:
    • jammy + Linux-Rhel + Windows flaky
  • Installation type:
    • Source, buildfarm
  • Version or commit hash:
    • master
  • DDS implementation:
    • Default rmw, Fast-RTPS

Steps to reproduce issue

Run a buildfarm job on linux rhel or on ubuntu jammy.

Expected behavior

All test pass.

Actual behavior

Some security tests fail consistently:

test_security.TestSecurePublisherSubscriberAfterShutdown.test_processes_finished_gracefully
test_security.TestSecurePublisherSubscriber.test_subscriber_terminates_in_a_finite_amount_of_time

Additional information

This started failing consistently on the ubuntu jammy jobs after the transition from python3.9 to 3.10.
https://ci.ros2.org/view/nightly/job/nightly_linux_debug/2246/

This a link to the failing section of the test:

def test_subscriber_terminates_in_a_finite_amount_of_time(self, proc_info, subscriber_process):

and
def test_processes_finished_gracefully(self, proc_info):

Possible clues

It has been failing constantly since 2 months ago in rhel.
First getting some clues from RHEL. It started showing there 2 months ago between the transition of these two jobs:
https://ci.ros2.org/view/nightly/job/nightly_linux-rhel_debug/1012/
https://ci.ros2.org/view/nightly/job/nightly_linux-rhel_debug/1013/

And the changes between one job and the other are these PRs:
ros2/rosidl_python#149
ros2/geometry2#496
eProsima/Fast-DDS#2399
ros2/rclcpp#1862
ament/ament_index#83

@clalancette
Copy link
Contributor

This one is tricky. The error on Jammy has been fixed by various changes we did to infrastructure.

We've had a hard time getting reliable RHEL builds lately, but on one of the latest ones that did run to completion it did fail: https://ci.ros2.org/view/nightly/job/nightly_linux-rhel_release/1100/#showFailuresLink . @cottsay has dug into a bit, and figured out that it had something to do with the versions of OpenSSL linked in, but I don't know the status of that investigation.

@cottsay
Copy link
Member

cottsay commented Apr 8, 2022

cottsay has dug into a bit, and figured out that it had something to do with the versions of OpenSSL linked in, but I don't know the status of that investigation.

There are a couple of issues related to RTI's bundled OpenSSL.

  1. For RHEL: The default linker flags don't include -Wl,--as-needed, and because we're linking against message packages by linking in all of the generator outputs, every binary that links a message package gets linked against the Python libraries. On RHEL, the Python libraries are linked against libcrypto.so, which gets loaded from the system. When Connext tries to load libssl.so from the bundled OpenSSL, it encounters the unexpected system version of libcrypto.so instead of the bundled one. This one will be hard to fix - we need to change the way we link against message packages so that we can target only the generators we actually want (c/cpp/py).
  2. For Jammy, we were missing the magic environment variable that tells test_security to update LD_LIBRARY_PATH with the bundled OpenSSL location. That should have been resolved by Set RTI_OPENSSL_LIBS for Connext 6.0.1 ci#648.

That's as far as my investigations got. I'd consider any test_security failures for Connext on RHEL to be low-priority for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants