Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PBTree: Implement dual-buffer container for MNode management #12048

Merged
merged 12 commits into from
Feb 23, 2024

Conversation

linxt20
Copy link
Contributor

@linxt20 linxt20 commented Feb 19, 2024

This work is part of Pbtree's internal and external memory collaborative concurrency control work. It mainly replaces the original single buffer processing of ChildrenContainer with dual buffer alternating processing to achieve more efficient concurrency control.

The main ideas for implementation at this stage are as follows: planning the abstract base class MNodeChildBuffer, which inherits from IMNodeContainer. Among them, FlushingBuffer is used to store old nodes, and ReceivingBuffer is used to store newly created or modified nodes. Through the combination of the two Buffers, basic Map processing functions are provided to the outside world. Thus, both NewChildBuffer and UpdateChildBuffer in CachedMNodeContainer can be modified to use the MNodeChildBuffer type.

However, compared with the original single Buffer situation, using double buffers requires handling more complex situations:

  • The first point is that due to the different functions of FlushingBuffer and ReceivingBuffer, write functions such as put are written directly to ReceivingBuffer, while flushing is processed directly in FlushingBuffer. The data exchange between the two parts occurs before flushing. At this time, the nodes in the ReceivingBuffer need to be handed over to the FlushingBuffer.

  • Secondly, the original data is stored in the same Buffer. Since the Buffer is a Map structure, it can ensure that the contents will not be repeated. After changing to double Buffer, FlushingBuffer and Receiving can respectively ensure that their internal nodes will not be repeated, but nodes in FlushingBuffer may appear in ReceivingBuffer. Overlap needs to be considered when counting statistics

  • Subsequently, since NewChildBuffer is a Buffer used to receive newly created nodes, there will be no overlap between FlushingBuffer and ReceivingBuffer. This is because the names of new nodes will not be repeated. The situation with UpdateChildBuffer is different. This is a buffer used to accept modified nodes, so the same node can be processed multiple times. There is a possibility of overlap between FlushingBuffer and ReceivingBuffer. Therefore, in response to this phenomenon, two subclasses are used to manage the two Buffers respectively, inheriting from the abstract base class MNodeChildBuffer.

  • To pave the way for subsequent internal and external memory merging and sorting, here when obtaining the pointer of MNodeChildBuffer, it is required to be able to deduplicate and sort. Consider using the merge sorting algorithm here and using the trygetnext mechanism to gradually complete the merging steps while continuously obtaining next.

For the newly implemented double buffer alternating processing mechanism, MNodeChildBufferTest was designed to detect the rationality of its basic function implementation. At the same time, targeted modifications were made to the single buffer call in the original UT, and the corresponding test cases were improved.

@MarcosZyk MarcosZyk changed the title P btree container divide to two buffer PBTree: Implement dual-buffer container for MNode management Feb 21, 2024
Copy link
Contributor

@MarcosZyk MarcosZyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL

…the BufferIterator into CachedMNodeContainerIterator
Copy link
Contributor

@MarcosZyk MarcosZyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bravo work! LGTM~

@MarcosZyk MarcosZyk merged commit bc8d866 into apache:master Feb 23, 2024
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants