Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(storagenode): introduce accumulator #415

Open
ijsong opened this issue Apr 11, 2023 · 0 comments
Open

perf(storagenode): introduce accumulator #415

ijsong opened this issue Apr 11, 2023 · 0 comments
Assignees

Comments

@ijsong
Copy link
Member

ijsong commented Apr 11, 2023

Motivation

A client writes log entries by calling Append RPC. The request message of Append RPC can contain a batch of log entries to write a bunch of logs at once, reducing network RTTs. Batching log entries also makes the log-appending pipeline in a storage node efficient since the storage layer in the storage node stores a set of log entries as a batch rather than one by one.

                                            +-----------+
                                            |           |
                                       +--->| committer |
                                       |    |           |
                                       |    +-----------+
+---------------+                      |                 
| AppendRequest |      +-----------+   |    +-----------+
|   +-+-+-+-+   |      |           |   |    |           |
|   |1|2|3|4|---+----->| sequencer |---+--->|  writer   |
|   +-+-+-+-+   |      |           |   |    |           |
+---------------+      +-----------+   |    +-----------+
                                       |                 
                                       |    +-----------+
                                       |    |           |
                                       +--->|replicator |
                                            |           |
                                            +-----------+

It works well if a client writes many log entries at a single request. For example, the append request can write four logs simultaneously, as shown in the figure above.

However, when the append request has a few log entries, for instance, a single log entry in a request, the storage node cannot benefit from the batch write of the storage layer. In the current implementation, the storage node flows the log entries only in a single request into the log-appending pipeline.

We introduce the accumulator to collect log entries in front of the sequencer and to form a batch.

+---------------+                                                           
| AppendRequest |                                                           
|   +-+         |                                                           
|   |1|         |--+                                           +-----------+
|   +-+         |  |                                           |           |
+---------------+  |                                        +->| committer |
+---------------+  |                                        |  |           |
| AppendRequest |  |                                        |  +-----------+
|   +-+         |  |                +-+-+-+-+               |               
|   |2|         |--+  +-----------+ |1|2|3|4| +-----------+ |  +-----------+
|   +-+         |  |  |           | +-+-+-+-+ |           | |  |           |
+---------------+  +->|accumulator|---------->| sequencer |-+->|  writer   |
+---------------+  |  |           |           |           | |  |           |
| AppendRequest |  |  +-----------+           +-----------+ |  +-----------+
|   +-+         |  |                                        |               
|   |3|         |--+                                        |  +-----------+
|   +-+         |  |                                        |  |           |
+---------------+  |                                        +->|replicator |
+---------------+  |                                           |           |
| AppendRequest |  |                                           +-----------+
|   +-+         |  |                                                        
|   |4|         |--+                                                        
|   +-+         |                                                           
+---------------+                                                           

Design

A goroutine executes the accumulator. It has a queue to receive a log entry from the Append RPC handler.

The back-of-the-envelope design of the accumulator looks like this:

type Accumulator struct {
    queue chan *accumulateTask

}

func (a *Accumulator) Send(*accumulateTask) error {
}

func (a *Accumulator) Close() error {
}

func (a *Accumulator) loop() error {
    tick := time.NewTick(MinAccumulateInterval)
    buffer := newAccumulateBuffer(MaxAccumulateSize)
    position := 0
    for {
        select {
            case at := <-queue:
                a.buffer.insert(position, at)
                position++
                if position < MaxAccumulateSize {
                    continue
                }
            case <- tick:
                if position == 0 {
                    continue
                }
        }
        sequencer.Send(buffer)
        tick.Reset(MinAccumulateInterval)
        buffer = newAccumulateBuffer(MaxAccumulateSize)
        position = 0
    }
}

It has two tunable parameters:

  • MinAccumulateInterval: The minimum interval of sending accumulated log entries to the sequencer. The accumulator keeps a log entry from a client until this interval expires unless the number of retained log entries exceeds the MaxAccumulateSize.
  • MaxAccumulateSize: The maximum number of log entries kept by the accumulator. If the number of retained log entries equals the MaxAccumulateSize, the accumulator sends log entries to the sequencer and resets the timer for MinAccumulateInterval.

Values to retain log entries can be pooled. Since the number of log entries that have to keep is fixed, it is easy to pool.

Currently, the Append RPC handler builds the write batch to be stored in the storage layer. Since the handler runs concurrently, the building write batch is also executed concurrently.
However, the accumulator has to build up the write batch instead of the Append RPC. If it causes head-of-line blocking, we can use some optimization like double buffering.

Challenges

  • Performance testing
  • Finding good parameters.
@ijsong ijsong self-assigned this Apr 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant