-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WaitForFlushMemTables finishing before result is committed to Manifest #10089
Comments
@daggarwal387 Can you try with the latest RocksDB version. I couldn't find the exact PR with the fix but it shouldn't occur in the lastest version. |
Upgrading the whole rocksdb version is not feasible for me as it'll take long to qualify and deploy. If you can point me to the PR that fixed it then it will be of immense help as I'll patch with that. I tried searching the latest repo commits with WaitForFlushMemTables but could not find the relevant PR. In case you were talking about #8046 then this is not the case because the logs show that the flush did commit to the Manifest a little later. |
@daggarwal387 can you take a look at 5f2b661 and see if this fixes the issue? |
@riversand963 It will not because if you check the logs above, the flush actually did not see any issue and committed just a few milliseconds after coming out of WaitForFlushMemTables. 5f2b661 is same as #8046 that I mentioned in the previous comment. |
@daggarwal387 Can you share a unit test. I can check it with latest release to see if it has been fixed or not and can take from there. |
@akankshamahajan15 @ajkr parallel_flush_UT.patch.txt and Pull request(#10334) Suspected Race:
Sample logs when failure happens: 2022/07/10-22:23:55.424822 7f9bafde5240 [lumn_family.cc:481] --------------- Options for column family [default]: 2022/07/10-22:23:55.426174 7f9bafde5240 [rsion_set.cc:4421] Column family [default] (ID 0), log number is 0 2022/07/10-22:23:55.448325 7f9bafde5240 [ties/checkpoint/checkpoint_impl.cc:357] [default] export column family onto export directory /tmp/checkpoint1_1657517035448 2022/07/10-22:23:55.448421 7f9bafde5240 [_impl/db_impl_compaction_flush.cc:1320] [default] Manual flush start. 2022/07/10-22:23:55.448568 7f9bafde5240 [_impl/db_impl_write.cc:1614] [default] New memtable created with log file: #9. Immutable memtables: 0. 2022/07/10-22:23:55.448760 7f9b967fc700 (Original Log Time 2022/07/10-22:23:55.448679) [_impl/db_impl_compaction_flush.cc:2192] Calling FlushMemTableToOutputFile with column family [default], flush slots available 3, compaction slots available 1, flush slots scheduled 3, compaction slots scheduled 0 2022/07/10-22:23:55.448785 7f9b967fc700 [ush_job.cc:325] [default] [JOB 4] Flushing memtable with next log file: 9 2022/07/10-22:23:55.448818 7f9b967fc700 [ush_job.cc:354] [default] [JOB 4] Level-0 flush table #10: started 2022/07/10-22:23:55.449092 7f9b967fc700 [_impl/db_impl_compaction_flush.cc:1320] [default] Manual flush start. 2022/07/10-22:23:55.449270 7f9b967fc700 [_impl/db_impl_write.cc:1614] [default] New memtable created with log file: #11. Immutable memtables: 1. 2022/07/10-22:23:55.449406 7f9b967fc700 [_impl/db_impl_compaction_flush.cc:1331] [default] Manual flush finished, status: OK 2022/07/10-22:23:55.449422 7f9b95ffb700 (Original Log Time 2022/07/10-22:23:55.449386) [_impl/db_impl_compaction_flush.cc:2192] Calling FlushMemTableToOutputFile with column family [default], flush slots available 3, compaction slots available 1, flush slots scheduled 3, compaction slots scheduled 0 2022/07/10-22:23:55.449431 7f9b95ffb700 [ush_job.cc:325] [default] [JOB 6] Flushing memtable with next log file: 11 2022/07/10-22:23:55.449450 7f9b95ffb700 [ush_job.cc:354] [default] [JOB 6] Level-0 flush table #12: started 2022/07/10-22:23:55.455355 7f9b95ffb700 EVENT_LOG_v1 {"time_micros": 1657517035455330, "cf_name": "default", "job": 6, "event": "table_file_creation", "file_number": 12, "file_size": 812, "table_properties": {"data_size": 28, "index_size": 29, "index_partitions": 0, "top_level_index_size": 0, "index_key_is_user_key": 0, "index_value_is_delta_encoded": 0, "filter_size": 0, "raw_key_size": 11, "raw_average_key_size": 11, "raw_value_size": 1, "raw_average_value_size": 1, "num_data_blocks": 1, "num_entries": 1, "num_deletions": 0, "num_merge_operands": 0, "num_range_deletions": 0, "format_version": 0, "fixed_key_len": 0, "filter_policy": "", "column_family_name": "default", "column_family_id": 0, "comparator": "leveldb.BytewiseComparator", "merge_operator": "nullptr", "prefix_extractor_name": "nullptr", "property_collectors": "[]", "compression": "NoCompression", "compression_options": "window_bits=-14; level=32767; strategy=0; max_dict_bytes=0; zstd_max_train_bytes=0; enabled=0; ", "creation_time": 1657517035, "oldest_key_time": 1657517035, "file_creation_time": 1657517035}} 2022/07/10-22:23:55.455398 7f9b95ffb700 [ush_job.cc:397] [default] [JOB 6] Level-0 flush table #12: 812 bytes OK 2022/07/10-22:23:55.455372 7f9b967fc700 EVENT_LOG_v1 {"time_micros": 1657517035455334, "cf_name": "default", "job": 4, "event": "table_file_creation", "file_number": 10, "file_size": 812, "table_properties": {"data_size": 28, "index_size": 29, "index_partitions": 0, "top_level_index_size": 0, "index_key_is_user_key": 0, "index_value_is_delta_encoded": 0, "filter_size": 0, "raw_key_size": 11, "raw_average_key_size": 11, "raw_value_size": 1, "raw_average_value_size": 1, "num_data_blocks": 1, "num_entries": 1, "num_deletions": 0, "num_merge_operands": 0, "num_range_deletions": 0, "format_version": 0, "fixed_key_len": 0, "filter_policy": "", "column_family_name": "default", "column_family_id": 0, "comparator": "leveldb.BytewiseComparator", "merge_operator": "nullptr", "prefix_extractor_name": "nullptr", "property_collectors": "[]", "compression": "NoCompression", "compression_options": "window_bits=-14; level=32767; strategy=0; max_dict_bytes=0; zstd_max_train_bytes=0; enabled=0; ", "creation_time": 1657517035, "oldest_key_time": 1657517035, "file_creation_time": 1657517035}} 2022/07/10-22:23:55.455445 7f9b967fc700 [ush_job.cc:397] [default] [JOB 4] Level-0 flush table #10: 812 bytes OK 2022/07/10-22:23:55.627947 7f9b95ffb700 (Original Log Time 2022/07/10-22:23:55.627865) [_impl/db_impl_compaction_flush.cc:205] [default] Level summary: files[0 0 0 0 0 0 0] max score 0.00 2022/07/10-22:23:55.628103 7f9bafde5240 [_impl/db_impl_compaction_flush.cc:1331] [default] Manual flush finished, status: OK 2022/07/10-22:23:55.632167 7f9bafde5240 [ties/checkpoint/checkpoint_impl.cc:434] [default] Export succeeded. 2022/07/10-22:23:55.632225 7f9b967fc700 (Original Log Time 2022/07/10-22:23:55.541595) [mtable_list.cc:382] [default] Level-0 commit table #10 started 2022/07/10-22:23:55.632230 7f9b967fc700 (Original Log Time 2022/07/10-22:23:55.628007) [mtable_list.cc:430] [default] Level-0 commit table #10: memtable #1 done 2022/07/10-22:23:55.632233 7f9b967fc700 (Original Log Time 2022/07/10-22:23:55.628012) [mtable_list.cc:382] [default] Level-0 commit table #12 started 2022/07/10-22:23:55.632236 7f9b967fc700 (Original Log Time 2022/07/10-22:23:55.632161) [mtable_list.cc:430] [default] Level-0 commit table #12: memtable #1 done 2022/07/10-22:23:55.632241 7f9b967fc700 (Original Log Time 2022/07/10-22:23:55.632195) [_impl/db_impl_compaction_flush.cc:205] [default] Level summary: files[2 0 0 0 0 0 0] max score 0.50` |
[sst_transfer_fix.patch](https://github.com/facebook/rocksdb/files/10772967/sst_transfer_fix.patch)
I used the following commit to overcome this, please check if this works
for you too.
…On Sat, Feb 18, 2023 at 4:22 AM He Liu ***@***.***> wrote:
I am not sure yet if it is the same issue, I am also seeing missing
commits in an exported column family:
1. ColumnFamily A is exported via ExportColumnFamily();
2. CreateColumnFamilyWithImport()is called on the above export_dir
with the same ExportImportFilesMetaData to create a new ColumnFamily
in the same DB, say ColumnFamily B;
3. Iterators are created on both Column Families, to compare all the
key-value pairs, and they are not identical, the new some commits are
missing in the new column family.
—
Reply to this email directly, view it on GitHub
<#10089 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AITUSL554RG4IFMVZIW56WTWX76LJANCNFSM5XQF53BQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
While using the ExportColumnFamily API I observed that the result is missing the newly flushed memtable. Upon further investigation, I found that WaitForFlushMemTables is finishing before the flush has actually committed the result to the Manifest. Following are the logs for the column family that experienced it(there are some additional logs added for debugging).
2022/05/31-16:28:39.080669 4f8a700 [ties/checkpoint/checkpoint_impl.cc:357] [0_0] export column family onto export directory /tmp/bringup/0/disks/0/scribe/newscribe/bucket_0/db_sender_sst_file 2022/05/31-16:28:39.080729 4f8a700 [_impl/db_impl_compaction_flush.cc:1310] [0_0] Manual flush start. 2022/05/31-16:28:39.095206 4f8a700 [_impl/db_impl_write.cc:1614] [0_0] New memtable created with log file: #113. Immutable memtables: 0. 2022/05/31-16:28:39.095227 4f8a700 [_impl/db_impl_compaction_flush.cc:1596] [0_0] Flush request queued. 2022/05/31-16:28:39.112425 7f5694e41700 (Original Log Time 2022/05/31-16:28:39.095402) [_impl/db_impl_compaction_flush.cc:2208] Calling FlushMemTableToOutputFile with column family [0_0], flush slots available 10, compaction slots available 1, flush slots scheduled 10, compaction slots scheduled 0 2022/05/31-16:28:39.112432 7f5694e41700 [ush_job.cc:322] [0_0] [JOB 320] Flushing memtable with next log file: 113 and mem_table_id: 10 2022/05/31-16:28:39.112450 7f5694e41700 [ush_job.cc:351] [0_0] [JOB 320] Level-0 flush table #114: started 2022/05/31-16:28:39.116592 4d70700 [_impl/db_impl_write.cc:1614] [0_0] New memtable created with log file: #115. Immutable memtables: 1. 2022/05/31-16:28:39.116620 4d70700 [_impl/db_impl_compaction_flush.cc:1596] [0_0] Flush request queued. 2022/05/31-16:28:39.127211 7f5696e45700 (Original Log Time 2022/05/31-16:28:39.116846) [_impl/db_impl_compaction_flush.cc:2208] Calling FlushMemTableToOutputFile with column family [0_0], flush slots available 10, compaction slots available 1, flush slots scheduled 10, compaction slots scheduled 0 2022/05/31-16:28:39.127218 7f5696e45700 [ush_job.cc:322] [0_0] [JOB 331] Flushing memtable with next log file: 115 and mem_table_id: 11 2022/05/31-16:28:39.127238 7f5696e45700 [ush_job.cc:351] [0_0] [JOB 331] Level-0 flush table #116: started 2022/05/31-16:28:39.130535 7f5694e41700 [ush_job.cc:392] [0_0] [JOB 320] Level-0 flush table #114: 235965 bytes OK 2022/05/31-16:28:39.140412 7f5696e45700 [ush_job.cc:392] [0_0] [JOB 331] Level-0 flush table #116: 2489 bytes OK 2022/05/31-16:28:39.154308 7f5696e45700 (Original Log Time 2022/05/31-16:28:39.150687) [_impl/db_impl_compaction_flush.cc:204] [0_0] Level summary: base level 4 level multiplier 8.00 max bytes base 4096000 files[1 0 0 0 1] max score 0.12 2022/05/31-16:28:39.154415 4f8a700 [_impl/db_impl_compaction_flush.cc:1866] [0_0]Flush finished. Num not flushed: 1, earliest_memtable_id_now: 11, mem_table_we_flushed: 10 2022/05/31-16:28:39.154423 4f8a700 [_impl/db_impl_compaction_flush.cc:1320] [0_0] Manual flush finished, status: OK 2022/05/31-16:28:39.154474 4f8a700 [ties/checkpoint/checkpoint_impl.cc:380] [0_0] HardLinking /000100.sst 2022/05/31-16:28:39.154551 4f8a700 [ties/checkpoint/checkpoint_impl.cc:380] [0_0] HardLinking /000090.sst 2022/05/31-16:28:39.166255 7f5694e41700 (Original Log Time 2022/05/31-16:28:39.137073) [mtable_list.cc:383] [0_0] Level-0 commit table #114 has started, mem_id: 10 2022/05/31-16:28:39.166260 7f5694e41700 (Original Log Time 2022/05/31-16:28:39.154210) [mtable_list.cc:439] [0_0] Level-0 commit table #114: memtable #1memtable_id: 10 done 2022/05/31-16:28:39.166263 7f5694e41700 (Original Log Time 2022/05/31-16:28:39.154215) [mtable_list.cc:383] [0_0] Level-0 commit table #116 has started, mem_id: 11 2022/05/31-16:28:39.166265 7f5694e41700 (Original Log Time 2022/05/31-16:28:39.166175) [mtable_list.cc:439] [0_0] Level-0 commit table #116: memtable #1memtable_id: 11 done 2022/05/31-16:28:39.166270 7f5694e41700 (Original Log Time 2022/05/31-16:28:39.166208) [_impl/db_impl_compaction_flush.cc:204] [0_0] Level summary: base level 4 level multiplier 8.00 max bytes base 4096000 files[3 0 0 0 1] max score 0.38 2022/05/31-16:28:39.166273 4f8a700 [ties/checkpoint/checkpoint_impl.cc:435] [0_0] Export succeeded. 2022/05/31-16:28:39.166309 4d70700 [_impl/db_impl_compaction_flush.cc:1866] [0_0]Flush finished. Num not flushed: 0, earliest_memtable_id_now: 18446744073709551615, mem_table_we_flushed: 11 2022/05/31-16:28:39.166321 4d70700 [_impl/db_impl_compaction_flush.cc:1443] [0_0] Manual compaction starting
So if you look here, the memtable with GetID() as 10 is the one that was selected for manual flush from ExportColumnFamily. During committing the logline "Level-0 commit table #114 has started, mem_id: 10" the memtable is at the back of the memlist but a few lines before it "Flush finished. Num not flushed: 1, earliest_memtable_id_now: 11, mem_table_we_flushed: 10" the GetEarliestMemTableID gave us 11.
Expected behavior
WaitForFlushMemTables should not finish before committing to manifest file.
Actual behavior
WaitForFlushMemTables is finishing before committing to manifest file.
Steps to reproduce the behavior
Trigger parallel flush on the same column family.
I am using 6.4.6 rocksdb version with some additional logging and attaching the complete LOG file also in case that helps.
LOG.zip
The text was updated successfully, but these errors were encountered: