Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export Import sst files #5495

Closed
wants to merge 5 commits into from
Closed

Export Import sst files #5495

wants to merge 5 commits into from

Conversation

vpallipadi
Copy link

Refresh of the earlier change here - #5135

This is a review request for code change needed for - #3469
"Add support for taking snapshot of a column family and creating column family from a given CF snapshot"

We have an implementation for this that we have been testing internally. We have two new APIs that together provide this functionality.

(1) ExportColumnFamily() - This API is modelled after CreateCheckpoint() as below.
// Exports all live SST files of a specified Column Family onto export_dir,
// returning SST files information in metadata.
// - SST files will be created as hard links when the directory specified
// is in the same partition as the db directory, copied otherwise.
// - export_dir should not already exist and will be created by this API.
// - Always triggers a flush.
virtual Status ExportColumnFamily(ColumnFamilyHandle* handle,
const std::string& export_dir,
ExportImportFilesMetaData** metadata);

Internally, the API will DisableFileDeletions(), GetColumnFamilyMetaData(), Parse through
metadata, creating links/copies of all the sst files, EnableFileDeletions() and complete the call by
returning the list of file metadata.

(2) CreateColumnFamilyWithImport() - This API is modeled after IngestExternalFile(), but invoked only during a CF creation as below.
// CreateColumnFamilyWithImport() will create a new column family with
// column_family_name and import external SST files specified in metadata into
// this column family.
// (1) External SST files can be created using SstFileWriter.
// (2) External SST files can be exported from a particular column family in
// an existing DB.
// Option in import_options specifies whether the external files are copied or
// moved (default is copy). When option specifies copy, managing files at
// external_file_path is caller's responsibility. When option specifies a
// move, the call ensures that the specified files at external_file_path are
// deleted on successful return and files are not modified on any error
// return.
// On error return, column family handle returned will be nullptr.
// ColumnFamily will be present on successful return and will not be present
// on error return. ColumnFamily may be present on any crash during this call.
virtual Status CreateColumnFamilyWithImport(
const ColumnFamilyOptions& options, const std::string& column_family_name,
const ImportColumnFamilyOptions& import_options,
const ExportImportFilesMetaData& metadata,
ColumnFamilyHandle** handle);

Internally, this API creates a new CF, parses all the sst files and adds it to the specified column family, at the same level and with same sequence number as in the metadata. Also performs safety checks with respect to overlaps between the sst files being imported.

If incoming sequence number is higher than current local sequence number, local sequence
number is updated to reflect this.

Note, as the sst files is are being moved across Column Families, Column Family name in sst file
will no longer match the actual column family on destination DB. The API does not modify Column
Family name or id in the sst files being imported.

Venki Pallipadi added 3 commits June 21, 2019 01:56
Introduce a new API ExportColumnFamily to export all the live sst
files of a particular column family and metadata information to be used
to import these sst files.

This is part 1 of the change for
#3469
" Add support for taking snapshot of a CF and creating a CF from a given
  CF snapshot"

Partially Solves #3469
This change adds a new API to create a column family and importing sst
files into the column family. The files imported can come from
ExportColumnFamily API or can be created with SST file writer.

This is part 2 of the change for
#3469
" Add support for taking snapshot of a CF and creating a CF from a given
  CF snapshot"

Partially Solves #3469
Add new unit tests for ExportColumnFamily() and
CreateColumnFamilyWithImport() APIs.

This is part 3 of the change for
#3469
" Add support for taking snapshot of a CF and creating a CF from a given
  CF snapshot"

Partially Solves #3469
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@siying
Copy link
Contributor

siying commented Jul 2, 2019

@vpallipadi sanitizer fails:

==2786007==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 56 byte(s) in 1 object(s) allocated from:
     #0 checkpoint_test_bin+0x337480             operator new(unsigned long)
     #1 internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_impl.cc:416 rocksdb::CheckpointImpl::ExportColumnFamily(rocksdb::ColumnFamilyHandle*, std::__cxx11::basic_string<...> const&, rocksdb::ExportImportFilesMetaData**)
    #2 0x2af1c5 in rocksdb::CheckpointTest_ExportColumnFamilyWithLinks_Test::TestBody() internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_test.cc:374:27:52:50:6:14:6:31:6:46:27:52:55
    #16 internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_test.cc:811 main
    #17 libc.so.6+0x211a5                        __libc_start_main

Direct leak of 56 byte(s) in 1 object(s) allocated from:
     #0 checkpoint_test_bin+0x337480             operator new(unsigned long)
     #1 internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_impl.cc:416 rocksdb::CheckpointImpl::ExportColumnFamily(rocksdb::ColumnFamilyHandle*, std::__cxx11::basic_string<...> const&, rocksdb::ExportImportFilesMetaData**)
    #2 0x2adcc9 in rocksdb::CheckpointTest_ExportColumnFamilyWithLinks_Test::TestBody() internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_test.cc:352:27:52:50:6:14:6:31:6:46:27:52:55
    #16 internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_test.cc:811 main
    #17 libc.so.6+0x211a5                        __libc_start_main

Direct leak of 16 byte(s) in 1 object(s) allocated from:
     #0 checkpoint_test_bin+0x337480             operator new(unsigned long)
     #1 internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_impl.cc:33 rocksdb::Checkpoint::Create(rocksdb::DB*, rocksdb::Checkpoint**)
    #2 0x2acbad in rocksdb::CheckpointTest_ExportColumnFamilyWithLinks_Test::TestBody() internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_test.cc:337:27:52:50:6:14:6:31:6:46:27:52:55
    #16 internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_test.cc:811 main
    #17 libc.so.6+0x211a5                        __libc_start_main

Direct leak of 16 byte(s) in 1 object(s) allocated from:
     #0 checkpoint_test_bin+0x337480             operator new(unsigned long)
     #1 internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_impl.cc:33 rocksdb::Checkpoint::Create(rocksdb::DB*, rocksdb::Checkpoint**)
    #2 0x2aee02 in rocksdb::CheckpointTest_ExportColumnFamilyWithLinks_Test::TestBody() internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_test.cc:370:27:52:50:6:14:6:31:6:46:27:52:55
    #16 internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_test.cc:811 main
    #17 libc.so.6+0x211a5                        __libc_start_main

Direct leak of 16 byte(s) in 1 object(s) allocated from:
     #0 checkpoint_test_bin+0x337480             operator new(unsigned long)
     #1 internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_impl.cc:33 rocksdb::Checkpoint::Create(rocksdb::DB*, rocksdb::Checkpoint**)
    #2 0x2b1ff7 in rocksdb::CheckpointTest_ExportColumnFamilyNegativeTest_Test::TestBody() internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_test.cc:393:27:52:50:6:14:6:31:6:46:27:52:55
    #16 internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_test.cc:811 main
    #17 libc.so.6+0x211a5                        __libc_start_main

Indirect leak of 448 byte(s) in 1 object(s) allocated from:
     #0 checkpoint_test_bin+0x337480             operator new(unsigned long)
     #6 internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_impl.cc:429 rocksdb::CheckpointImpl::ExportColumnFamily(rocksdb::ColumnFamilyHandle*, std::__cxx11::basic_string<...> const&, rocksdb::ExportImportFilesMetaData**)
    #7 0x2adcc9 in rocksdb::CheckpointTest_ExportColumnFamilyWithLinks_Test::TestBody() internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_test.cc:352:27:52:50:6:14:6:31:6:46:27:52:55
    #21 internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_test.cc:811 main
    #22 libc.so.6+0x211a5                        __libc_start_main

Indirect leak of 224 byte(s) in 1 object(s) allocated from:
     #0 checkpoint_test_bin+0x337480             operator new(unsigned long)
     #6 internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_impl.cc:429 rocksdb::CheckpointImpl::ExportColumnFamily(rocksdb::ColumnFamilyHandle*, std::__cxx11::basic_string<...> const&, rocksdb::ExportImportFilesMetaData**)
    #7 0x2af1c5 in rocksdb::CheckpointTest_ExportColumnFamilyWithLinks_Test::TestBody() internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_test.cc:374:27:52:50:6:14:6:31:6:46:27:52:55
    #21 internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_test.cc:811 main
    #22 libc.so.6+0x211a5                        __libc_start_main

Indirect leak of 78 byte(s) in 2 object(s) allocated from:
     #0 checkpoint_test_bin+0x337480             operator new(unsigned long)
     #8 internal_repo_rocksdb/repo/include/rocksdb/metadata.h:55 rocksdb::SstFileMetaData::SstFileMetaData(rocksdb::SstFileMetaData const&)
     #9 internal_repo_rocksdb/repo/include/rocksdb/metadata.h:106 rocksdb::LiveFileMetaData::LiveFileMetaData(rocksdb::LiveFileMetaData const&)
    #14 internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_impl.cc:429 rocksdb::CheckpointImpl::ExportColumnFamily(rocksdb::ColumnFamilyHandle*, std::__cxx11::basic_string<...> const&, rocksdb::ExportImportFilesMetaData**)
    #15 0x2adcc9 in rocksdb::CheckpointTest_ExportColumnFamilyWithLinks_Test::TestBody() internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_test.cc:352:27:52:50:6:14:6:31:6:46:27:52:55
    #29 internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_test.cc:811 main
    #30 libc.so.6+0x211a5                        __libc_start_main

Indirect leak of 39 byte(s) in 1 object(s) allocated from:
     #0 checkpoint_test_bin+0x337480             operator new(unsigned long)
     #8 internal_repo_rocksdb/repo/include/rocksdb/metadata.h:55 rocksdb::SstFileMetaData::SstFileMetaData(rocksdb::SstFileMetaData const&)
     #9 internal_repo_rocksdb/repo/include/rocksdb/metadata.h:106 rocksdb::LiveFileMetaData::LiveFileMetaData(rocksdb::LiveFileMetaData const&)
    #14 internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_impl.cc:429 rocksdb::CheckpointImpl::ExportColumnFamily(rocksdb::ColumnFamilyHandle*, std::__cxx11::basic_string<...> const&, rocksdb::ExportImportFilesMetaData**)
    #15 0x2af1c5 in rocksdb::CheckpointTest_ExportColumnFamilyWithLinks_Test::TestBody() internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_test.cc:374:27:52:50:6:14:6:31:6:46:27:52:55
    #29 internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_test.cc:811 main
    #30 libc.so.6+0x211a5                        __libc_start_main

Indirect leak of 34 byte(s) in 1 object(s) allocated from:
     #0 checkpoint_test_bin+0x337480             operator new(unsigned long)
     #8 internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_impl.cc:417 rocksdb::CheckpointImpl::ExportColumnFamily(rocksdb::ColumnFamilyHandle*, std::__cxx11::basic_string<...> const&, rocksdb::ExportImportFilesMetaData**)
    #9 0x2af1c5 in rocksdb::CheckpointTest_ExportColumnFamilyWithLinks_Test::TestBody() internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_test.cc:374:27:52:50:6:14:6:31:6:46:27:52:55
    #23 internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_test.cc:811 main
    #24 libc.so.6+0x211a5                        __libc_start_main

Indirect leak of 31 byte(s) in 1 object(s) allocated from:
     #0 checkpoint_test_bin+0x337480             operator new(unsigned long)
     #8 internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_impl.cc:417 rocksdb::CheckpointImpl::ExportColumnFamily(rocksdb::ColumnFamilyHandle*, std::__cxx11::basic_string<...> const&, rocksdb::ExportImportFilesMetaData**)
    #9 0x2adcc9 in rocksdb::CheckpointTest_ExportColumnFamilyWithLinks_Test::TestBody() internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_test.cc:352:27:52:50:6:14:6:31:6:46:27:52:55
    #23 internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_test.cc:811 main
    #24 libc.so.6+0x211a5                        __libc_start_main

Can you take a look?

@vpallipadi
Copy link
Author

vpallipadi commented Jul 2, 2019 via email

@vpallipadi
Copy link
Author

vpallipadi commented Jul 4, 2019 via email

@siying
Copy link
Contributor

siying commented Jul 5, 2019

@vpallipadi how about running asan with clang? Does it catch the issue?

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@vpallipadi has updated the pull request. Re-import the pull request

@vpallipadi
Copy link
Author

vpallipadi commented Jul 8, 2019 via email

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@siying
Copy link
Contributor

siying commented Jul 9, 2019

@vpallipadi still some failures:

=================================================================
==2787156==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 16 byte(s) in 1 object(s) allocated from:
     #0 checkpoint_test_bin+0x337620             operator new(unsigned long)
     #1 internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_impl.cc:33 rocksdb::Checkpoint::Create(rocksdb::DB*, rocksdb::Checkpoint**)
    #2 0x2aeeaf in rocksdb::CheckpointTest_ExportColumnFamilyWithLinks_Test::TestBody() internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_test.cc:377:27:52:50:6:14:6:31:6:46:27:52:55
    #16 internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_test.cc:816 main
    #17 libc.so.6+0x211a5                        __libc_start_main

Direct leak of 16 byte(s) in 1 object(s) allocated from:
     #0 checkpoint_test_bin+0x337620             operator new(unsigned long)
     #1 internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_impl.cc:33 rocksdb::Checkpoint::Create(rocksdb::DB*, rocksdb::Checkpoint**)
    #2 0x2b20cc in rocksdb::CheckpointTest_ExportColumnFamilyNegativeTest_Test::TestBody() internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_test.cc:399:27:52:50:6:14:6:31:6:46:27:52:55
    #16 internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_test.cc:816 main
    #17 libc.so.6+0x211a5                        __libc_start_main

Direct leak of 16 byte(s) in 1 object(s) allocated from:
     #0 checkpoint_test_bin+0x337620             operator new(unsigned long)
     #1 internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_impl.cc:33 rocksdb::Checkpoint::Create(rocksdb::DB*, rocksdb::Checkpoint**)
    #2 0x2acb7f in rocksdb::CheckpointTest_ExportColumnFamilyWithLinks_Test::TestBody() internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_test.cc:343:27:52:50:6:14:6:31:6:46:27:52:55
    #16 internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_test.cc:816 main
    #17 libc.so.6+0x211a5                        __libc_start_main

SUMMARY: AddressSanitizer: 48 byte(s) leaked in 3 allocation(s).

@facebook-github-bot
Copy link
Contributor

@vpallipadi has updated the pull request. Re-import the pull request

@vpallipadi
Copy link
Author

vpallipadi commented Jul 11, 2019 via email

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@siying
Copy link
Contributor

siying commented Jul 17, 2019

I already started the merging process, but I realized that HISTORY.md is not updated for this new feature. Can you open a new pull request to update HISTORY.md?

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 22ce462.

@vpallipadi
Copy link
Author

vpallipadi commented Jul 18, 2019 via email

This was referenced Sep 6, 2019
merryChris pushed a commit to merryChris/rocksdb that referenced this pull request Nov 18, 2019
Summary:
Refresh of the earlier change here - facebook#5135

This is a review request for code change needed for - facebook#3469
"Add support for taking snapshot of a column family and creating column family from a given CF snapshot"

We have an implementation for this that we have been testing internally. We have two new APIs that together provide this functionality.

(1) ExportColumnFamily() - This API is modelled after CreateCheckpoint() as below.
// Exports all live SST files of a specified Column Family onto export_dir,
// returning SST files information in metadata.
// - SST files will be created as hard links when the directory specified
//   is in the same partition as the db directory, copied otherwise.
// - export_dir should not already exist and will be created by this API.
// - Always triggers a flush.
virtual Status ExportColumnFamily(ColumnFamilyHandle* handle,
                                  const std::string& export_dir,
                                  ExportImportFilesMetaData** metadata);

Internally, the API will DisableFileDeletions(), GetColumnFamilyMetaData(), Parse through
metadata, creating links/copies of all the sst files, EnableFileDeletions() and complete the call by
returning the list of file metadata.

(2) CreateColumnFamilyWithImport() - This API is modeled after IngestExternalFile(), but invoked only during a CF creation as below.
// CreateColumnFamilyWithImport() will create a new column family with
// column_family_name and import external SST files specified in metadata into
// this column family.
// (1) External SST files can be created using SstFileWriter.
// (2) External SST files can be exported from a particular column family in
//     an existing DB.
// Option in import_options specifies whether the external files are copied or
// moved (default is copy). When option specifies copy, managing files at
// external_file_path is caller's responsibility. When option specifies a
// move, the call ensures that the specified files at external_file_path are
// deleted on successful return and files are not modified on any error
// return.
// On error return, column family handle returned will be nullptr.
// ColumnFamily will be present on successful return and will not be present
// on error return. ColumnFamily may be present on any crash during this call.
virtual Status CreateColumnFamilyWithImport(
    const ColumnFamilyOptions& options, const std::string& column_family_name,
    const ImportColumnFamilyOptions& import_options,
    const ExportImportFilesMetaData& metadata,
    ColumnFamilyHandle** handle);

Internally, this API creates a new CF, parses all the sst files and adds it to the specified column family, at the same level and with same sequence number as in the metadata. Also performs safety checks with respect to overlaps between the sst files being imported.

If incoming sequence number is higher than current local sequence number, local sequence
number is updated to reflect this.

Note, as the sst files is are being moved across Column Families, Column Family name in sst file
will no longer match the actual column family on destination DB. The API does not modify Column
Family name or id in the sst files being imported.
Pull Request resolved: facebook#5495

Differential Revision: D16018881

fbshipit-source-id: 9ae2251025d5916d35a9fc4ea4d6707f6be16ff9
@cscetbon
Copy link

@siying any news on this ? I'm looking for a similar feature

@LIBA-S
Copy link
Contributor

LIBA-S commented Mar 25, 2024

@vpallipadi @siying hi, for cross-machine import, does the ExportImportFilesMetaData data structure need to define its own serialization/deserialization functions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants