Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix warnings in tcp/udp UT #2744

Merged
merged 13 commits into from
Oct 7, 2024

Conversation

JohanBertrand
Copy link
Contributor

@JohanBertrand JohanBertrand commented May 23, 2024

Related Issue(s) #2706
Has Unit Tests (y/n)
Documentation Included (y/n)

Change Description

I am not sure we want to have those changes in the codebase. That could be used as a baseline for a better update.

Most of the warnings in the UT with TCP/UDP module should be fixed with those changes. A mutex is making sure that the receive/open/startup functions are called only if the stop/close function are not going to be called in an other thread.

I added the changes in the unit tests instead of the class itself (in open/close), because I did not find a way to make it work properly.

I also needed to make recv to stop iterating if errno == EAGAIN to avoid a timeout. I'm not sure if that's the proper way to fix this issue.

The only warning still happening for me, after 10.000 tests, is the following (happened only once):

[ RUN      ] Reconnect.ReceiveThreadReconnect
Listening for single client at 127.0.0.1:32769
Connected to 127.0.0.1:32769 as a tcp client
Accepted client at 127.0.0.1:32769
Connected to 127.0.0.1:32769 as a tcp client
Accepted client at 127.0.0.1:32769
Connected to 127.0.0.1:32769 as a tcp client
Accepted client at 127.0.0.1:32769
Accepted client at 127.0.0.1:32769
[WARNING] Failed to recv from port with status -8 and errno 0

I feel like the warnings in the test should make the UT to fail. However, I am not sure how to catch the warning streamed through Fw::Logger::logMsg.

Also fix an error in StringUtils.cpp when compiled on ubuntu20.04 with a FW_ASSERT:

/home/fprime/Fw/Types/StringUtils.cpp:51:40: error: comparison of integer expressions of different signedness: ‘FwSizeType’ {aka ‘long unsigned int’} and ‘long int’ [-Werror=sign-compare]
   51 |     FW_ASSERT((source_size - sub_size) <= std::numeric_limits<FwSignedSizeType>::max());

Fw/Types/StringUtils.cpp Outdated Show resolved Hide resolved
Drv/Ip/IpSocket.cpp Show resolved Hide resolved
@@ -72,13 +72,15 @@ void SocketReadTask::readTask(void* pointer) {
SocketIpStatus status = SOCK_SUCCESS;
SocketReadTask* self = reinterpret_cast<SocketReadTask*>(pointer);
do {
self->m_task_lock.lock();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense!

if (recv_thread) {
//this->component.close();
if ((1 + i) == iterations) {
this->component.m_task_lock.lock();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These locks should be done in the shutdown and close methods, not just the UT.

Copy link
Contributor Author

@JohanBertrand JohanBertrand Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the mutex in the close/shutdown functions. I had to update the TcpServerTester.cpp a bit for it to work: the client needs to be closed after the server is closed. I do not have the explanation for this behavior, but otherwise I was getting the following warning:

Connected to 127.0.0.1:57755 as a tcp client
Accepted client at 127.0.0.1:57755
[WARNING] Failed to recv from port with status -8 and errno 0

There was a comment stating: // Client must be closed first or the server risks binding to an existing address. Let me know if it is a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added the muted on the stop function. Can you confirm that it is something that is expected?

@JohanBertrand
Copy link
Contributor Author

@LeStarch is there something we can do in the UT to check that no warnings are logged using Fw::Logger::logMsg, and cause the UT to fail if it contains warnings?

During my tests, I used that warning messages to realize that something was wrong.

I also used a timeout to make sure that the tests were not taking too long. It was especially useful for checking that the TcpServer tests were working properly. The test could take a few seconds on failure, where it's supposed to be less than 500ms in nominal conditions on my hardware. However, I don't think that's something that can/should be implemented, since it's hardware dependent.

// Properly stop the client on the last iteration
if ((1 + i) == iterations && recv_thread) {
this->component.shutdown();
Copy link
Contributor Author

@JohanBertrand JohanBertrand Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LeStarch Can you confirm that we do not need this shutdown, since it's already part of the stop function?

Keeping it is causing a timeout on my setup (when the UT is tasking more than 1s)

@JohanBertrand
Copy link
Contributor Author

@LeStarch @thomas-bc Let me know if there is something else to change in this PR

@JohanBertrand
Copy link
Contributor Author

@LeStarch @thomas-bc I don't really understand where the problem is coming from on the Mac OS side and I can not reproduce the problem on my setup. Is it possible for you to have a look at this?

@LeStarch
Copy link
Collaborator

@JohanBertrand yes we can!

@JohanBertrand
Copy link
Contributor Author

@LeStarch I fixed the merge conflict and merged devel, it should be good to be merged once the issue on Mac OS is resolved

@LeStarch
Copy link
Collaborator

We have been working to get #2683 merged and are about to do so. In that PR we substantially rearchitect the mutex handling. I will note which changes need to be kept and which are likely superseded.

@@ -219,8 +219,13 @@ SocketIpStatus IpSocket::recv(U8* data, U32& req_read) {
for (U32 i = 0; (i < SOCKET_MAX_ITERATIONS) && (size <= 0); i++) {
// Attempt to recv out data
size = this->recvProtocol(data, req_read);

if ((size == -1) && (errno == EAGAIN)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JohanBertrand @csmith608 this file's changes should be kept, as they improve error handling (
EAGAIN is differentiated from EINTR).

@@ -51,34 +51,42 @@ SocketIpStatus SocketReadTask::open() {
}

void SocketReadTask::shutdown() {
this->m_task_lock.lock();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JohanBertrand @csmith608 These likely need to be discarded as they affect locking.

@@ -171,7 +171,7 @@ class SocketReadTask {
Os::Task m_task;
bool m_reconnect; //!< Force reconnection
bool m_stop; //!< Stops the task when set to true

Os::Mutex m_task_lock;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JohanBertrand @csmith608 These likely need to be discarded as they affect locking.

Drv/TcpClient/test/ut/TcpClientTester.cpp Show resolved Hide resolved
@@ -106,7 +106,7 @@ void UdpTester::test_with_loop(U32 iterations, bool recv_thread) {
}
}
// Properly stop the client on the last iteration
if ((1 + i) == iterations && recv_thread) {
if (((1 + i) == iterations) && recv_thread) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JohanBertrand @csmith608 keep, code clean-up

// Properly stop the client on the last iteration
if ((1 + i) == iterations && recv_thread) {
this->component.shutdown();
if (((1 + i) == iterations) && recv_thread) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JohanBertrand @csmith608 keep, code clean-up

@@ -88,15 +88,16 @@ void TcpServerTester ::test_with_loop(U32 iterations, bool recv_thread) {
while (not m_spinner) {}
}
}
client.close(); // Client must be closed first or the server risks binding to an existing address

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JohanBertrand @csmith608 Review with upcoming change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears I already changed where this occurs in my latest changes, although I can't remember exactly why. https://github.com/nasa/fprime/blob/devel/Drv/TcpServer/test/ut/TcpServerTester.cpp#L91-L103

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the position of client.close() in the last commit, after updating the branch with the changes from devel.

From what I have seen in my different tests, if client.close() is called before this->component.close();, there are some cases where ::accept(serverFd, nullptr, nullptr); in Drv/Ip/TcpServerSocket.cpp (See: https://github.com/nasa/fprime/blob/devel/Drv/Ip/TcpServerSocket.cpp#L110) will hang forever.

It usually happened about once every 1600 CI runs.

I just ran the UT for 100.000 times on my machine with the changes in this PR, without failures on the CI, timeouts or hang.

this->component.stop();
this->component.join();
} else {
this->component.close();
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JohanBertrand @csmith608 Review with upcoming change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about this placement already being changed, so I'm going to leave as is unless someone has thoughts

@JohanBertrand
Copy link
Contributor Author

JohanBertrand commented Sep 28, 2024

@LeStarch The branch has been updated with devel.

I can see that the messages logged through Fw::Logger::log do not seem to appear in the test log anymore. Is there a way to print it again?

I used it in this PR for checking that no [WARNING] messages were logged in the test log.

Edit: After manually enabling print in the UT, I am still getting a lot of [WARNING] Failed to recv from port with status -8 and errno 0 messages in TCP client, TCP server and UDP UTs.

Out of curiosity, did you @csmith608 investigate this in your previous PR?

@csmith608
Copy link
Collaborator

Hey @JohanBertrand, these unit tests fail on Mac due to the first recv after the send sometimes getting an EAGAIN rather than the actual data. This issue does not appear on Linux.
I added a retry mechanism and a clearer status enum for the EAGAIN situation as the recv man page still classifies this as an error rather than a success.

Drv/TcpClient/TcpClientComponentImpl.cpp Dismissed Show dismissed Hide dismissed
Drv/TcpClient/TcpClientComponentImpl.cpp Dismissed Show dismissed Hide dismissed
Drv/TcpClient/TcpClientComponentImpl.cpp Dismissed Show dismissed Hide dismissed
Drv/TcpServer/TcpServerComponentImpl.cpp Dismissed Show dismissed Hide dismissed
Drv/TcpServer/TcpServerComponentImpl.cpp Dismissed Show dismissed Hide dismissed
Drv/TcpServer/TcpServerComponentImpl.cpp Dismissed Show dismissed Hide dismissed
Drv/Udp/UdpComponentImpl.cpp Dismissed Show dismissed Hide dismissed
Drv/Udp/UdpComponentImpl.cpp Dismissed Show dismissed Hide dismissed
Drv/Udp/UdpComponentImpl.cpp Dismissed Show dismissed Hide dismissed
@LeStarch
Copy link
Collaborator

LeStarch commented Oct 7, 2024

I am going to merge this, as it seems to correct the UT issues, is what I expect. Any other issues should become a new issue, or PR.

@LeStarch LeStarch merged commit 652c670 into nasa:devel Oct 7, 2024
33 of 34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants