Reset client reset cycle detection after upload/download complete #6196

jbreams · 2023-01-13T16:58:32Z

What, How & Why?

When we start a client reset with recovery we insert an object into the realm to track whether we've already started a client reset with recovery so that if the client reset fails (say because we recover local changes that cannot be applied to the server) we don't end up in an endless loop. Previously we cleared this tracking tombstone when we received a download message that advanced the upload progress - the idea being that if the server acknowledged our client reset with a download ack then the recovered changes must have been valid.

It turns out that if client reset with recovery doesn't actually recover anything, then there's nothing for the server to acknowledge and we never clear that tracking tombstone, even though everything succeeded. If there is another client reset, then we'll enter into a cycle where the sync client thinks there's a pending client reset.

This change moves where we remove the tombstone from just whenever we receive a download with an advancing upload cursor to after we've received an upload/download ACK (i.e. a download message with an advancing upload cursor and/or a MARK message indicating the server has nothing else to send us).

I also fixed the connection-level error handling so it respects the retry-timeout info sent from the server rather than a hardcoded 5 minute timeout, and unified the try again backoff logic into a helper class.

☑️ ToDos

📝 Changelog update
🚦 Tests (or not relevant)
* [ ] C-API, if public C++ API changed.

jbreams · 2023-01-17T21:55:59Z

@ironage / @michael-wb , I think this is ready for a look now.

ironage · 2023-01-17T23:39:42Z

src/realm/sync/noinst/client_impl_base.cpp

@@ -2181,6 +2221,10 @@ std::error_code Session::receive_ident_message(SaltedFileIdent client_file_ident
 REALM_ASSERT_EX(m_last_version_selected_for_upload == 0, m_last_version_selected_for_upload);

 get_transact_reporter()->report_sync_transact(client_reset_old_version, client_reset_new_version);
+
+ if (has_pending_client_reset) {


I don't think it is correct to clear the pending reset state here because the reset has just finished locally and it hasn't received any acknowledgement from the server yet.

this doesn't necessarily clear anything, it schedules the async waiting for the server to ack all uploads and tell the client when there are no more downloads, which will then clear the reset state.

Btw, the reason we check if there's a pending client reset and then only call this if there is one is so that we don't need to open a separate read transaction after calling get_status() if there are no client resets to do this dance with.

Ah, got it thanks! I hadn't understood that async_wait_for is initiating an upload/download completion handler.

ironage · 2023-01-17T23:54:53Z

src/realm/sync/client.cpp

+ auto ft = m_db->start_frozen();
+ return _impl::client_reset::has_pending_reset(ft);
+ }();
+ REALM_ASSERT(pending_reset);


Isn't it safer to just return early if there is no pending reset?

This flow feels racy because we check for pending reset state in a read transaction and then clear it on a separate write transaction. Are we allowed to assume this here because of the single threaded nature of the sync client?

I kinda think it's equally safe to check pre-conditions vs returning early if pre-conditions aren't met?

Hmm ok. Can we be certain that no other writer can clear the pending reset between when it gets written and when this frozen read transaction starts here?

Yes, in the places we call this function we can guarantee it because we always call it after checking pre-conditions.

ironage

Thanks for fixing this!
Just needs a bugfix entry in the changelog, otherwise LGTM! 👍

michael-wb · 2023-01-18T19:43:05Z

src/realm/sync/client.cpp

+ m_sess->logger.info("Tracking pending client reset of type \"%1\" from %2", pending_reset->type,
+ pending_reset->time);
+ util::bind_ptr<SessionWrapper> self(this);
+ async_wait_for(true, true, [self = std::move(self), pending_reset = *pending_reset](std::error_code ec) {


Do you think it's a problem if more than one async_wait_for() calls are waiting at a time to clear the client reset? Such as if process_pending_flx_bootstrap() throws an exception and is run again or if a connection drops after receiving the ident message and then Session::receive_ident_message() is called again when the client reconnects.
Probably it won't be a problem since the write transaction will serialize the calls to remove_pending_client_resets() and this function is a noop if there is nothing to do.

I think at worst it will be a no-op. We used to always remove the pending client reset whenever we got a download that advanced the upload cursor. so this should be no worse than that? I've added some extra guards and logging so that if we end up with multiple callbacks from async_wait_for, only the one that matches the pending reset info that kicked off the waiting for actually triggers removing the pending resets.

michael-wb

Walked through the flow and didn't see any issues. LGTM

michael-wb · 2023-01-18T20:52:42Z

src/realm/sync/client.cpp

+ auto cur_pending_reset = _impl::client_reset::has_pending_reset(wt);
+ if (!cur_pending_reset) {
+ logger.debug(
+ "Was going to remove client reset tracker for type \"%1\" from %2, but it was already removed");


Missing values for %1 and %2 - assuming this is supposed to be pending_reset.type, pending_reset.time

Reset client reset cycle detection after upload/download complete

24b7197

cla-bot bot added the cla: yes label Jan 13, 2023

jbreams linked an issue Jan 13, 2023 that may be closed by this pull request

Client reset cycle detection can get locked in a cycle if recovery does not result in any local commits #6195

Closed

jbreams added 3 commits January 13, 2023 12:02

fix compile on old xcode

cad1ab3

fix test failures and unify/simplify try_again backoff logic

3f50ac1

remove unneeded includes

e5df539

jbreams marked this pull request as ready for review January 13, 2023 19:52

jbreams requested review from ironage and michael-wb January 13, 2023 19:52

give long timeout

c50e8d2

jbreams marked this pull request as draft January 13, 2023 21:57

test failures

341818e

jbreams marked this pull request as ready for review January 17, 2023 21:55

ironage reviewed Jan 17, 2023

View reviewed changes

jbreams requested a review from danieltabacaru January 18, 2023 15:02

ironage approved these changes Jan 18, 2023

View reviewed changes

jbreams added 2 commits January 18, 2023 13:43

Merge remote-tracking branch 'origin/master' into jbr/client_reset_cycle

1a79611

changelog

7afd573

michael-wb reviewed Jan 18, 2023

View reviewed changes

michael-wb approved these changes Jan 18, 2023

View reviewed changes

jbreams added 2 commits January 18, 2023 15:38

add more guards/logging

ee6c518

fix guard

ede279f

michael-wb reviewed Jan 18, 2023

View reviewed changes

fix logging

5a97fc6

jbreams merged commit 2f08973 into master Jan 18, 2023

jbreams deleted the jbr/client_reset_cycle branch January 18, 2023 23:32

github-actions bot locked as resolved and limited conversation to collaborators Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reset client reset cycle detection after upload/download complete #6196

Reset client reset cycle detection after upload/download complete #6196

jbreams commented Jan 13, 2023 •

edited

Loading

jbreams commented Jan 17, 2023

ironage Jan 17, 2023

jbreams Jan 18, 2023

jbreams Jan 18, 2023

ironage Jan 18, 2023

ironage Jan 17, 2023

jbreams Jan 18, 2023

ironage Jan 18, 2023

jbreams Jan 18, 2023

ironage left a comment

michael-wb Jan 18, 2023

jbreams Jan 18, 2023

michael-wb left a comment

michael-wb Jan 18, 2023

jbreams Jan 18, 2023

Reset client reset cycle detection after upload/download complete #6196

Reset client reset cycle detection after upload/download complete #6196

Conversation

jbreams commented Jan 13, 2023 • edited Loading

What, How & Why?

☑️ ToDos

jbreams commented Jan 17, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ironage left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michael-wb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbreams commented Jan 13, 2023 •

edited

Loading