Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pushoff keep alive on ack received #314

Merged
merged 7 commits into from
Aug 10, 2023
Merged

Pushoff keep alive on ack received #314

merged 7 commits into from
Aug 10, 2023

Conversation

xiazhvera
Copy link
Contributor

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link
Contributor

@bretambrose bretambrose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The write timestamp can't be shared/singleton.

If I send 5 publishes at times 1, 2, 3, 4, 5. And then an ack comes back for the first one, the push out needs to be based on the first one's write stamp (1).

@xiazhvera
Copy link
Contributor Author

The write timestamp can't be shared/singleton.

If I send 5 publishes at times 1, 2, 3, 4, 5. And then an ack comes back for the first one, the push out needs to be based on the first one's write stamp (1).

I dont think it matters, we only need to push off based on the "last" write time since that's when our "last transmission" happened.

@bretambrose
Copy link
Contributor

bretambrose commented Aug 8, 2023

The write timestamp can't be shared/singleton.
If I send 5 publishes at times 1, 2, 3, 4, 5. And then an ack comes back for the first one, the push out needs to be based on the first one's write stamp (1).

I dont think it matters, we only need to push off based on the "last" write time since that's when our "last transmission" happened.

The write timestamp can't be shared/singleton.
If I send 5 publishes at times 1, 2, 3, 4, 5. And then an ack comes back for the first one, the push out needs to be based on the first one's write stamp (1).

I dont think it matters, we only need to push off based on the "last" write time since that's when our "last transmission" happened.

It absolutely matters. If the connect is dropped at time 1.5 then the ping cannot be pushed out farther than that. Connection drops aren't instantaneous.

What we're trying to express is the following:

"I know the connection was valid at time T because the request I sent at time T received a response from the server."

Edit: Adding the followup discussion for completeness/documentation:

Consider this situation
Keep alive is 60 time units.
The connection path goes from a very slow network (the device in the field) before eventually reaching a fast network (internet backbone). It takes 20 time units total for a packet to go from one endpoint to another.
So a publish on time 1 arrives at the other end on time 21.
A response sent at time 21 arrives at time 41.
Now simulate what happens when the client sends a steady stream of publishes, one every time unit: publish(1), publish(2), publish(3), etc...
The client receives the ack for publish(1) at time 41. The connection could potentially have been dropped at a point anywhere after time 21 and the ack still make it back to the client. A connection drop on the fast part of the network after the ack squeaks through is easily possible and using 41 as the push-out time is not what we want to do.

@codecov-commenter
Copy link

codecov-commenter commented Aug 8, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: -0.02% ⚠️

Comparison is base (b777be4) 82.22% compared to head (d962b9c) 82.20%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #314      +/-   ##
==========================================
- Coverage   82.22%   82.20%   -0.02%     
==========================================
  Files          20       20              
  Lines        8670     8678       +8     
==========================================
+ Hits         7129     7134       +5     
- Misses       1541     1544       +3     
Files Changed Coverage Δ
source/client_channel_handler.c 74.60% <100.00%> (+0.93%) ⬆️

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -3366,8 +3366,8 @@ AWS_TEST_CASE_FIXTURE(
&test_data)

/**
* Makes a CONNECT, with 1 second keep alive ping interval, send a publish roughly every second, and then ensure NO
* pings were sent
* Makes a CONNECT, with 1 second keep alive ping interval, send a different operation every second, and then ensure NO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this comment is accurate then I feel like this test is likely overly sensitive to timing conditions and containerization.

/**
* Makes a CONNECT, with 1 second keep alive ping interval, disable the server auto ack so that we never received ack
* back. Send a qos1 publish roughly every second for 4 seconds. As we never received the ACK, we should send a total of
* 4 ping
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is alarming. Given a keep aliveof 1 second and a ping timeout of 100ms we should be seeing multiple disconnections which are going to disrupt the logical simplicity of the test.

* We would like to wait for a total of ~4.5 seconds to account for slight drift/jitter.
* We have been waiting for 0.8*4=3.2 sec. Wait for another 1 sec here.
*/
aws_thread_current_sleep(1000000000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any asserts in this test actually checking anything meaningful.

@xiazhvera
Copy link
Contributor Author

xiazhvera commented Aug 9, 2023

Updated the unit tests.
Also checkout s_test_mqtt_connection_ping_basic_scenario_fn for existing keep alive tests.

Copy link
Contributor

@bretambrose bretambrose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix comment and ship


// Make sure we publish for 4 seconds;
while (elapsed_time < test_duration) {
/* Publish qos1*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qos0

@xiazhvera xiazhvera merged commit b7a323d into main Aug 10, 2023
31 checks passed
@xiazhvera xiazhvera deleted the keep_alive_improve branch August 10, 2023 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants