Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New API to get all merge operands for a Key #5604

Closed
wants to merge 24 commits into from

Conversation

vjnadimpalli
Copy link
Contributor

@vjnadimpalli vjnadimpalli commented Jul 22, 2019

This is a new API added to db.h to allow for fetching all merge operands associated with a Key. The main motivation for this API is to support use cases where doing a full online merge is not necessary as it is performance sensitive. Example use-cases:

  1. Update subset of columns and read subset of columns -
    Imagine a SQL Table, a row is encoded as a K/V pair (as it is done in MyRocks). If there are many columns and users only updated one of them, we can use merge operator to reduce write amplification. While users only read one or two columns in the read query, this feature can avoid a full merging of the whole row, and save some CPU.
  2. Updating very few attributes in a value which is a JSON-like document -
    Updating one attribute can be done efficiently using merge operator, while reading back one attribute can be done more efficiently if we don't need to do a full merge.

API :
Status GetMergeOperands(
const ReadOptions& options, ColumnFamilyHandle* column_family,
const Slice& key, PinnableSlice* merge_operands,
GetMergeOperandsOptions* get_merge_operands_options,
int* number_of_operands)

Example usage :
int size = 100;
int number_of_operands = 0;
std::vector values(size);
GetMergeOperandsOptions merge_operands_info;
db_->GetMergeOperands(ReadOptions(), db_->DefaultColumnFamily(), "k1", values.data(), merge_operands_info, &number_of_operands);

Description :
Returns all the merge operands corresponding to the key. If the number of merge operands in DB is greater than merge_operands_options.expected_max_number_of_operands no merge operands are returned and status is Incomplete. Merge operands returned are in the order of insertion.
merge_operands-> Points to an array of at-least merge_operands_options.expected_max_number_of_operands and the caller is responsible for allocating it. If the status returned is Incomplete then number_of_operands will contain the total number of merge operands found in DB for key.

Test Plan:
Added unit test and perf test in db_bench that can be run using the command:
./db_bench -benchmarks=getmergeoperands --merge_operator=sortlist

@vjnadimpalli vjnadimpalli requested a review from siying July 22, 2019 18:28
@vjnadimpalli vjnadimpalli requested review from sagar0 and removed request for sagar0 and siying July 22, 2019 18:28
@vjnadimpalli
Copy link
Contributor Author

vjnadimpalli commented Jul 22, 2019

Some open questions:

  1. Based on the current API users have to use trial and error to get the num_records to be exactly or more than the number of merge_records for a key in DB which may not be a pleasant experience to use the API. Ideally users shouldn't need to give num_records and just fetch all the merge operands. The reason its needed is because PinnableSlice doesn't support copy constructor hence dynamically sized std::vector can't be used to add new PinnableSlice's to the vector. To get around this a fixed size array is used so all the PinnableSlice's are pre-initialized in user code and a pointer to that is passed in the API along with its size to prevent overflow. (Resolved by returning the actual number of merge operands found)

  2. Its possible that the Merge Operator is coded in a way that doesn't need to explore all the merge operands in which case the question is should the API still return all the merge operands (till a PUT/DELETE record is encountered) or respect the Merge Operator and check when to stop looking for more merge operands.

@vjnadimpalli vjnadimpalli requested review from siying and sagar0 July 22, 2019 21:39
include/rocksdb/db.h Outdated Show resolved Hide resolved
include/rocksdb/db.h Show resolved Hide resolved
include/rocksdb/db.h Outdated Show resolved Hide resolved
include/rocksdb/db.h Outdated Show resolved Hide resolved
db/memtable.cc Outdated Show resolved Hide resolved
table/get_context.cc Outdated Show resolved Hide resolved
table/get_context.cc Show resolved Hide resolved
table/get_context.h Outdated Show resolved Hide resolved
table/get_context.h Outdated Show resolved Hide resolved
db/version_set.cc Outdated Show resolved Hide resolved
db/version_set.cc Outdated Show resolved Hide resolved
db/version_set.cc Outdated Show resolved Hide resolved
db/version_set.cc Outdated Show resolved Hide resolved
db/version_set.h Outdated Show resolved Hide resolved
db/memtable.cc Outdated Show resolved Hide resolved
db/version_set.cc Outdated Show resolved Hide resolved
table/get_context.cc Outdated Show resolved Hide resolved
try {
operand.push_back(std::stoi(std::string(begin, slice.data_)));
} catch (...) {
// std::cout << "Malformed string: " <<
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to add this because sometimes the data is garbage, not sure why. I also see compilation failed in one of the jobs with error: utilities/merge_operators/sortlist.cc:72:14: error: exception handling disabled, use -fexceptions to enable. Any suggestions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it feel like a bug in test code or bug in code?
By putting the merge operator here, the code is compiled and built into production code (though not all users will be able to use it). So I hope we fix the code, and/or move it to the test file using it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bug is fixed and does this file still need to be moved to a different location?

Copy link
Contributor

@siying siying left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad the progress we've made to the PR and we are very closed.
I left some comments. Except the question about re-pinning merge operands, they are mostly code style comments.

include/rocksdb/db.h Outdated Show resolved Hide resolved
db/db_impl/db_impl.h Outdated Show resolved Hide resolved
db/db_impl/db_impl.cc Outdated Show resolved Hide resolved
db/db_impl/db_impl.cc Outdated Show resolved Hide resolved
db/db_merge_operand_test.cc Outdated Show resolved Hide resolved
utilities/merge_operators/sortlist.cc Show resolved Hide resolved
utilities/merge_operators/sortlist.cc Outdated Show resolved Hide resolved
utilities/merge_operators/sortlist.h Outdated Show resolved Hide resolved
utilities/merge_operators/sortlist.h Outdated Show resolved Hide resolved
db/db_merge_operand_test.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@siying siying left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!
Some extra comments are mostly coding style comments, and it's still nice to be addressed.

db/db_blob_index_test.cc Outdated Show resolved Hide resolved
utilities/merge_operators/sortlist.h Outdated Show resolved Hide resolved
utilities/merge_operators/sortlist.h Outdated Show resolved Hide resolved

void MakeVector(std::vector<int>& operand, Slice slice) const;

std::vector<int> Merge(std::vector<int>& left, std::vector<int>& right) const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can MakeVector() and Merge() private?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't make MakeVector private because I am using it in db_dench_tool.

do {
const char* begin = slice.data_;
while (*slice.data_ != ',' && *slice.data_) slice.data_++;
operand.push_back(std::stoi(std::string(begin, slice.data_)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal since it's just a test code, but it makes more sense to have the test case to manage fixed 64 uint. You can find encoding of decoding fixed 64 uint in coding.h.

tools/db_bench_tool.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vjnadimpalli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vjnadimpalli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@vjnadimpalli has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vjnadimpalli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@vjnadimpalli has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vjnadimpalli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@vjnadimpalli has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vjnadimpalli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@vjnadimpalli merged this pull request in d150e01.

@vjnadimpalli vjnadimpalli deleted the getmergeops branch August 7, 2019 20:33
merryChris pushed a commit to merryChris/rocksdb that referenced this pull request Nov 18, 2019
Summary:
This is a new API added to db.h to allow for fetching all merge operands associated with a Key. The main motivation for this API is to support use cases where doing a full online merge is not necessary as it is performance sensitive. Example use-cases:
1. Update subset of columns and read subset of columns -
Imagine a SQL Table, a row is encoded as a K/V pair (as it is done in MyRocks). If there are many columns and users only updated one of them, we can use merge operator to reduce write amplification. While users only read one or two columns in the read query, this feature can avoid a full merging of the whole row, and save some CPU.
2. Updating very few attributes in a value which is a JSON-like document -
Updating one attribute can be done efficiently using merge operator, while reading back one attribute can be done more efficiently if we don't need to do a full merge.
----------------------------------------------------------------------------------------------------
API :
Status GetMergeOperands(
      const ReadOptions& options, ColumnFamilyHandle* column_family,
      const Slice& key, PinnableSlice* merge_operands,
      GetMergeOperandsOptions* get_merge_operands_options,
      int* number_of_operands)

Example usage :
int size = 100;
int number_of_operands = 0;
std::vector<PinnableSlice> values(size);
GetMergeOperandsOptions merge_operands_info;
db_->GetMergeOperands(ReadOptions(), db_->DefaultColumnFamily(), "k1", values.data(), merge_operands_info, &number_of_operands);

Description :
Returns all the merge operands corresponding to the key. If the number of merge operands in DB is greater than merge_operands_options.expected_max_number_of_operands no merge operands are returned and status is Incomplete. Merge operands returned are in the order of insertion.
merge_operands-> Points to an array of at-least merge_operands_options.expected_max_number_of_operands and the caller is responsible for allocating it. If the status returned is Incomplete then number_of_operands will contain the total number of merge operands found in DB for key.
Pull Request resolved: facebook#5604

Test Plan:
Added unit test and perf test in db_bench that can be run using the command:
./db_bench -benchmarks=getmergeoperands --merge_operator=sortlist

Differential Revision: D16657366

Pulled By: vjnadimpalli

fbshipit-source-id: 0faadd752351745224ee12d4ae9ef3cb529951bf
levichen94 pushed a commit to bytedance/terarkdb that referenced this pull request Jun 18, 2021
Summary:
This is a new API added to db.h to allow for fetching all merge operands associated with a Key. The main motivation for this API is to support use cases where doing a full online merge is not necessary as it is performance sensitive. Example use-cases:
1. Update subset of columns and read subset of columns -
Imagine a SQL Table, a row is encoded as a K/V pair (as it is done in MyRocks). If there are many columns and users only updated one of them, we can use merge operator to reduce write amplification. While users only read one or two columns in the read query, this feature can avoid a full merging of the whole row, and save some CPU.
2. Updating very few attributes in a value which is a JSON-like document -
Updating one attribute can be done efficiently using merge operator, while reading back one attribute can be done more efficiently if we don't need to do a full merge.
----------------------------------------------------------------------------------------------------
API :
Status GetMergeOperands(
      const ReadOptions& options, ColumnFamilyHandle* column_family,
      const Slice& key, PinnableSlice* merge_operands,
      GetMergeOperandsOptions* get_merge_operands_options,
      int* number_of_operands)

Example usage :
int size = 100;
int number_of_operands = 0;
std::vector<PinnableSlice> values(size);
GetMergeOperandsOptions merge_operands_info;
db_->GetMergeOperands(ReadOptions(), db_->DefaultColumnFamily(), "k1", values.data(), merge_operands_info, &number_of_operands);

Description :
Returns all the merge operands corresponding to the key. If the number of merge operands in DB is greater than merge_operands_options.expected_max_number_of_operands no merge operands are returned and status is Incomplete. Merge operands returned are in the order of insertion.
merge_operands-> Points to an array of at-least merge_operands_options.expected_max_number_of_operands and the caller is responsible for allocating it. If the status returned is Incomplete then number_of_operands will contain the total number of merge operands found in DB for key.
Pull Request resolved: facebook/rocksdb#5604

Test Plan:
Added unit test and perf test in db_bench that can be run using the command:
./db_bench -benchmarks=getmergeoperands --merge_operator=sortlist

Differential Revision: D16657366

Pulled By: vjnadimpalli

fbshipit-source-id: 0faadd752351745224ee12d4ae9ef3cb529951bf
Signed-off-by: Changlong Chen <levisonchen@live.cn>
mm304321141 pushed a commit to bytedance/terarkdb that referenced this pull request Jun 23, 2021
Summary:
This is a new API added to db.h to allow for fetching all merge operands associated with a Key. The main motivation for this API is to support use cases where doing a full online merge is not necessary as it is performance sensitive. Example use-cases:
1. Update subset of columns and read subset of columns -
Imagine a SQL Table, a row is encoded as a K/V pair (as it is done in MyRocks). If there are many columns and users only updated one of them, we can use merge operator to reduce write amplification. While users only read one or two columns in the read query, this feature can avoid a full merging of the whole row, and save some CPU.
2. Updating very few attributes in a value which is a JSON-like document -
Updating one attribute can be done efficiently using merge operator, while reading back one attribute can be done more efficiently if we don't need to do a full merge.
----------------------------------------------------------------------------------------------------
API :
Status GetMergeOperands(
      const ReadOptions& options, ColumnFamilyHandle* column_family,
      const Slice& key, PinnableSlice* merge_operands,
      GetMergeOperandsOptions* get_merge_operands_options,
      int* number_of_operands)

Example usage :
int size = 100;
int number_of_operands = 0;
std::vector<PinnableSlice> values(size);
GetMergeOperandsOptions merge_operands_info;
db_->GetMergeOperands(ReadOptions(), db_->DefaultColumnFamily(), "k1", values.data(), merge_operands_info, &number_of_operands);

Description :
Returns all the merge operands corresponding to the key. If the number of merge operands in DB is greater than merge_operands_options.expected_max_number_of_operands no merge operands are returned and status is Incomplete. Merge operands returned are in the order of insertion.
merge_operands-> Points to an array of at-least merge_operands_options.expected_max_number_of_operands and the caller is responsible for allocating it. If the status returned is Incomplete then number_of_operands will contain the total number of merge operands found in DB for key.
Pull Request resolved: facebook/rocksdb#5604

Test Plan:
Added unit test and perf test in db_bench that can be run using the command:
./db_bench -benchmarks=getmergeoperands --merge_operator=sortlist

Differential Revision: D16657366

Pulled By: vjnadimpalli

fbshipit-source-id: 0faadd752351745224ee12d4ae9ef3cb529951bf
Signed-off-by: Changlong Chen <levisonchen@live.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants