Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Message latency depends on the workload of the process owning the data writer #5504

Open
1 task done
MMarcus95 opened this issue Dec 17, 2024 · 2 comments
Open
1 task done
Labels
need more info Issue that requires more info from contributor

Comments

@MMarcus95
Copy link

Is there an already existing issue for this?

  • I have searched the existing issues

Expected behavior

The time between message publication and message reception does not depend on the workload of the of the process owning the data writer

Current behavior

The time between message publication and message reception depends on the workload of the of the process owning the data writer

Steps to reproduce

I'm testing the message latency between a data writer and a data reader. They belong to separate processes, and I'm using the discovery server, launching the server in a third process. To publish the message I'm using the following while loop

RoundTripTimeMsg msg;
while (true)
{
    msg.id() += 1;
    msg.timestamp() = std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::high_resolution_clock::now().time_since_epoch()).count();
    writer->write(&msg);
    
    for (int i = 0; i < 10000000; i++) {
        // Simulating workload
    }
}

The for loop simulates a workload. RoundTripTimeMsg is instead a custom message defined as follows

struct RoundTripTimeMsg{
    unsigned long long id;
    long double timestamp;
};

The data reader is instead receiving the message and printing the elapsed time between the current time and the time saved in the timestamp field of the received message. Its callback does basically the following

auto received_msg = (RoundTripTimeMsg *)msg;
auto now = std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::high_resolution_clock::now().time_since_epoch()).count();
std::cout << "Received message with id: " << received_msg->id() << " and delta time: " << (now - received_msg->timestamp())/1000000.0 << " ms" << std::endl;

Both the data writer and the data reader use their default QoS.

I have noticed that if inside the while loop I reduce the workload (using a lower number in the condition) the data reader receives a message earlier. If I increase the workload, the message is received later. What is it happening?

Fast DDS version/commit

v3.1.0

Platform/Architecture

Other. Please specify in Additional context section.

Transport layer

UDPv4

Additional context

Platform: Ubuntu Jammy Jellyfish 22.04 amd64

XML configuration file

No response

Relevant log output

No response

Network traffic capture

No response

@MMarcus95 MMarcus95 added the triage Issue pending classification label Dec 17, 2024
@EugenioCollado
Copy link
Contributor

Hi @MMarcus95 ,

We have replicated this same context you are describing and have not encountered any latency. Based on your description, the observed latency behavior is likely due to CPU resource contention and not an actual DDS-related issue. The workload simulation inside the DataWriter's loop can monopolize the CPU core it is running on, potentially delaying the other Fast DDS middleware threads if they are running on the same core.

To better understand the cause of the issue, please provide additional details about your system’s configuration, including the number of CPU cores, and observe the CPU core usage during your tests.

Thank you!

@EugenioCollado EugenioCollado added need more info Issue that requires more info from contributor and removed triage Issue pending classification labels Dec 19, 2024
@MMarcus95
Copy link
Author

Hi @EugenioCollado,

thanks for the feedback.

I'm running the test I was talking about inside a Docker Image with Ubuntu 22.04. The CPU is an Intel 13th Gen i7-13700H. In the following there is a screenshot of the lscpu command
Screenshot from 2024-12-19 14-26-07

This is the CPU core usage when using the same workload I was mentioning before (so the condition number for the for loop is 10000000)
Screenshot from 2024-12-19 15-04-53

In this other example instead, I'm reducing the workload, having as condition number of the for loop 100000
Screenshot from 2024-12-19 15-06-28

In both cases, the sender thread is indeed using around 100% of CPU.

However, I see that the message latency is much lower in the second case, as you can see in this plot
Screenshot from 2024-12-19 15-08-56

Please let me know if you need any other information.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need more info Issue that requires more info from contributor
Projects
None yet
Development

No branches or pull requests

2 participants