Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix syncing dirty pages on IOS #5993

Merged
merged 21 commits into from
Nov 9, 2022
Merged

Fix syncing dirty pages on IOS #5993

merged 21 commits into from
Nov 9, 2022

Conversation

nicola-cab
Copy link
Member

@nicola-cab nicola-cab commented Nov 3, 2022

What, How & Why?

On IOS, we have been seeing a huge increase of database corruption issues.

I went down bisecting a little bit our code base, and it seems like the problem is related to the fact that as per the current implementation we never sync dirty pages up until this barrier is hit:

void GroupWriter::commit(ref_type new_top_ref) {
....
....
flush_all_mappings();
if (!disable_sync)
    m_alloc.get_file().barrier();
.....
....
}

The barrier implementation itself is platform specific, and on Linux for example this does not cause any issues, because msync() is basically equal to fsync(). On IOS, this does not look like the normal behaviour. Thus, there is the chance of losing some data if the dirty page is not msynced before we hit the barrier. Moreover, on IOS, even the barrier does not guarantee that the page will actually reach disk. We need to explicitly call msync for each page.

The easiest way to reproduce this is to start a write transaction, that writes several bytes of data (my test consisted of 2 reasonably long strings), and crash the platform in the middle of the commit itself. Most likely, some of the dirty pages won't be written, causing havoc in the database file.

The same behaviour could happen during a normal commit, thus some insertion or deletion could fail apparently spuriously because of some data that got corrupted during a previous commit.

Calling msync for each page proved to be effective, and the database was never corrupted.

We need to add a specific tests for this case, in which we write to disk, and we crash the platform.

Corruption:
Fixes: #5972
Fixes: #5718
Fixes: #5859
Fixes: #5976
FIxes: #5975
Fixes: #5970
Fixes: #5758
Fixes: #5761
Fixes: #5298
Fixes: #5299
Fixes: #5941

Encryption
Fixes: #5810
Fixes: #5811

☑️ ToDos

  • 📝 Changelog update
  • 🚦 Tests (or not relevant)
  • C-API, if public C++ API changed.

src/realm/group_writer.cpp Outdated Show resolved Hide resolved
src/realm/group_writer.cpp Outdated Show resolved Hide resolved
@nicola-cab nicola-cab merged commit 495dfb4 into master Nov 9, 2022
@nicola-cab nicola-cab deleted the nc/fix_corruptions_ios branch November 9, 2022 18:27
@bmunkholm bmunkholm linked an issue Nov 11, 2022 that may be closed by this pull request
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
5 participants