OlapTableSink::send is low efficient?

**Problem**
When we use broker load, OlapTableSink::send() takes the longest time, almost all of the plan_fragment active time.
Example in one BE:
```
Fragment f59d832368a84c94-be109f903cf4698d:(Active: 3h36m, % non-child: 0.00%)
   - AverageThreadTokens: 1.00
   - PeakReservation: 0
   - PeakUsedReservation: 0
   - RowsProduced: 168.61M
   - SizeProduced: 30.25 GB
  BlockMgr:
     - BlockWritesOutstanding: 0
     - BlocksCreated: 0
     - BlocksRecycled: 0
     - BufferedPins: 0
     - BytesWritten: 0
     - MaxBlockSize: 8.00 MB
     - MemoryLimit: 2.00 GB
     - TotalBufferWaitTime: 0.000ns
     - TotalEncryptionTime: 0.000ns
     - TotalIntegrityCheckTime: 0.000ns
     - TotalReadBlockTime: 0.000ns
  OlapTableSink:(Active: 3h35m, % non-child: 0.00%)
     - CloseTime: 102.932ms
     - ConvertBatchTime: 0.000ns
     - OpenTime: 247.194ms
     - RowsFiltered: 0
     - RowsRead: 168.61M
     - RowsReturned: 168.61M
     - SendDataTime: 3h34m
     - SerializeBatchTime: 8m26s
     - ValidateDataTime: 19s554ms
     - WaitInFlightPacketTime: 3h23m
  BROKER_SCAN_NODE (id=0):(Active: 1m8s, % non-child: 0.00%)
     - BytesRead: 0
     - MemoryUsed: 0
     - NumThread: 0
     - PerReadThreadRawHdfsThroughput: 0.00 /sec
     - RowsRead: 168.61M
     - RowsReturned: 168.61M
     - RowsReturnedRate: 2.48 M/sec
     - ScanRangesComplete: 0
     - ScannerThreadsInvoluntaryContextSwitches: 0
     - ScannerThreadsTotalWallClockTime: 0.000ns
       - MaterializeTupleTime(*): 5m37s
       - ScannerThreadsSysTime: 0.000ns
       - ScannerThreadsUserTime: 0.000ns
     - ScannerThreadsVoluntaryContextSwitches: 0
     - TotalRawReadTime(*): 38m58s
     - TotalReadThroughput: 0.00 /sec
     - WaitScannerTime: 1m7s
```

As can be seen above, WaitInFlightPacketTime is the most time-consuming portion.

**Analysis**
I describe the whole progress here.

PlanFragmentExecutor pseudo code:
```
while(1){
    batch=get_one_batch();
    OlapTableSink::send(batch);
}
```
Then, OlapTableSink::send() pseudo code:
```
for(row in batch){
    channel=get_corresponding_channel(row);

    // channel::add_row() explanation:
    ok=channel::add_row_in_cur_batch(row);
    if(!ok){
        if(channel::has_in_flight_packet){
            channel::wait_in_flight_packet(); // (*)
        }
        channel::send_add_batch_req();
        channel::add_row_in_cur_batch(row);
    }
    // channel::add_row() end
}
```

So if we trigger channel::wait_in_flight_packet(), it will block the whole process. But there's no need to block other channels add_row(). 
For example, channel0 is waiting in_flight_packet, we can still add row to other channels.

**Better solutions(preliminary thoughts)**
* make channel::add_row() non-blocking. It might be a massive change.
* make channel::add_row() less blocking. e.g. avoid adding row to channel0 immediately after channel0 send a add_batch request.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OlapTableSink::send is low efficient? #2780

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OlapTableSink::send is low efficient? #2780

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions