[4.0] P2P: Use magnitude safe time types #1316

heifner · 2023-06-17T20:43:23Z

Use std::chrono types instead of tstamp so that magnitude is maintained. Fixes issue with heartbeat calculation being off on some machines causing pre-mature close of what it thought was a idle connection.

Resolves #1315

greg7mdp

I don't quite understand what was the issue with tstamp. I felt that standardizing on the number of nanoseconds since epoch was nice and simple, but I'm clearly missing what the issue was here.

heifner · 2023-06-19T12:47:27Z

plugins/net_plugin/net_plugin.cpp

- void connection::check_heartbeat( tstamp current_time ) {
- if( latest_msg_time > 0 ) {
+ void connection::check_heartbeat( std::chrono::system_clock::time_point current_time ) {
+ if( latest_msg_time > std::chrono::system_clock::time_point::min() ) {
 if( current_time > latest_msg_time + hb_timeout ) {


The issue was here. current_time was in nanoseconds. lastest_msg_time was also in nanoseconds, but hb_timeout was in milliseconds. So when the timer ran the latest_msg_time had to be within 10000 nanoseconds (10 microseconds) or it would close the connection.

So the issue was that hb_timeout was in milliseconds. It seems incorrect to me, as it is defined as std::chrono::system_clock::duration::rep, so I feel we should consistently use system_clock duration in tstamp (on my linux system, system_clock's duration is defined as chrono::nanoseconds.

I assume that by using std::chrono types, we will benefit from an automatic conversion in if( current_time > latest_msg_time + hb_timeout ) which will prevent the error. But tstamp was already a std::chrono duration, so I wonder if most of the changes, including this one, are really needed.

My feeling is that the change at line 610 (hb_timeout = msec;) would have been enough for the fix, as the problem was that we used count() which just returned a tick count, so we didn't benefit from the conversion from milliseconds to nanoseconds.

Could be. Either way, this is much cleaner/safer code.

I don't see how this makes the code cleaner or safer. I think if you changed line 610 to hb_timeout = msec.count() the issue would come back exactly as before.

I'm confused at what you are suggesting should be done. time_point, milliseconds, nanoseconds types are well defined and arithmetic operations are handled correctly. tstamp (std::chrono::system_clock::duration::rep) on the other hand is defined as rep - signed arithmetic type representing the number of ticks in the clock's duration which is system dependent.

Is nanoseconds too sensitive in P2P?

Yes, nanoseconds is kind of ridiculous to use. However, that is what we were already using for time_message and time of handshake_message. Internally now with this change we just use whatever std::chrono::system_clock::now() returns.

I think tstamp should be either defined as:

std::chrono::nanoseconds (in which case we would benefit from safe type checking and conversions),

or as int64_t (in which case it would be clear it is simply a number and up to the programmer to ensure we always store nanoseconds into it).

As far as I can see, the bug was that we are in case #2 (although it is somewhat obfuscated and unreliable because we use a std::chrono internal type) and that we stored milliseconds in a tstamp (line #611).

Your change fixes the bug at line 611, and in addition replaces the use of tstamp with std::chrono::system_clock::time_point. I think it makes the code less readable.

My preference would be to implement #1 above, i.e. redefine tstamp as std::chrono::nanoseconds, which I think would provide the additional safety in a cleaner way.

Completely removed tstamp type and backported the better time_message handling from main.

Thanks Kevin for the change. I'm not quite sure is the initialization with std::chrono::system_clock::time_point::min() has a specific benefit, I kinda feel that 0 is more appropriate than a negative value. And I also thing that defining tstamp as std::chrono::nanoseconds and using it for our connection time tracking members would be better. But maybe we can talk about this further for 5.0.

…'t trip heartbeat timer

linh2931

Is it possible to add some tests for the original problem and/or those changes?

GH-1315 Use magnitude safe time types

5dbcdc5

heifner requested review from greg7mdp and linh2931 June 17, 2023 20:43

heifner added the OCI Work exclusive to OCI team label Jun 17, 2023

vladtr approved these changes Jun 17, 2023

View reviewed changes

greg7mdp approved these changes Jun 18, 2023

View reviewed changes

heifner commented Jun 19, 2023

View reviewed changes

GH-1315 Reset latest_msg_time and latest_blk_time so re-connect doesn…

b221c12

…'t trip heartbeat timer

linh2931 approved these changes Jun 19, 2023

View reviewed changes

GH-1315 Completely remove problematic tstamp type.

0b3b3b6

greg7mdp approved these changes Jun 19, 2023

View reviewed changes

linh2931 reviewed Jun 19, 2023

View reviewed changes

heifner linked an issue Jun 19, 2023 that may be closed by this pull request

4.0.2: p2p constantly disconnects #1315

Closed

heifner merged commit e9016b0 into release/4.0 Jun 20, 2023

heifner deleted the GH-1315-time-4.0 branch June 20, 2023 18:43

heifner mentioned this pull request Jun 20, 2023

[4.0 -> main] P2P: Use magnitude safe time types #1321

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[4.0] P2P: Use magnitude safe time types #1316

[4.0] P2P: Use magnitude safe time types #1316

heifner commented Jun 17, 2023 •

edited

Loading

greg7mdp left a comment

heifner Jun 19, 2023

greg7mdp Jun 19, 2023 •

edited

Loading

heifner Jun 19, 2023

greg7mdp Jun 19, 2023

heifner Jun 19, 2023

linh2931 Jun 19, 2023

heifner Jun 19, 2023

greg7mdp Jun 19, 2023 •

edited

Loading

heifner Jun 19, 2023

greg7mdp Jun 19, 2023

linh2931 left a comment

[4.0] P2P: Use magnitude safe time types #1316

[4.0] P2P: Use magnitude safe time types #1316

Conversation

heifner commented Jun 17, 2023 • edited Loading

greg7mdp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

greg7mdp Jun 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

greg7mdp Jun 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

linh2931 left a comment

Choose a reason for hiding this comment

heifner commented Jun 17, 2023 •

edited

Loading

greg7mdp Jun 19, 2023 •

edited

Loading

greg7mdp Jun 19, 2023 •

edited

Loading