Skip to content

OlapTableSink::send is low efficient? #2780

@vagetablechicken

Description

@vagetablechicken

Problem
When we use broker load, OlapTableSink::send() takes the longest time, almost all of the plan_fragment active time.
Example in one BE:

Fragment f59d832368a84c94-be109f903cf4698d:(Active: 3h36m, % non-child: 0.00%)
   - AverageThreadTokens: 1.00
   - PeakReservation: 0
   - PeakUsedReservation: 0
   - RowsProduced: 168.61M
   - SizeProduced: 30.25 GB
  BlockMgr:
     - BlockWritesOutstanding: 0
     - BlocksCreated: 0
     - BlocksRecycled: 0
     - BufferedPins: 0
     - BytesWritten: 0
     - MaxBlockSize: 8.00 MB
     - MemoryLimit: 2.00 GB
     - TotalBufferWaitTime: 0.000ns
     - TotalEncryptionTime: 0.000ns
     - TotalIntegrityCheckTime: 0.000ns
     - TotalReadBlockTime: 0.000ns
  OlapTableSink:(Active: 3h35m, % non-child: 0.00%)
     - CloseTime: 102.932ms
     - ConvertBatchTime: 0.000ns
     - OpenTime: 247.194ms
     - RowsFiltered: 0
     - RowsRead: 168.61M
     - RowsReturned: 168.61M
     - SendDataTime: 3h34m
     - SerializeBatchTime: 8m26s
     - ValidateDataTime: 19s554ms
     - WaitInFlightPacketTime: 3h23m
  BROKER_SCAN_NODE (id=0):(Active: 1m8s, % non-child: 0.00%)
     - BytesRead: 0
     - MemoryUsed: 0
     - NumThread: 0
     - PerReadThreadRawHdfsThroughput: 0.00 /sec
     - RowsRead: 168.61M
     - RowsReturned: 168.61M
     - RowsReturnedRate: 2.48 M/sec
     - ScanRangesComplete: 0
     - ScannerThreadsInvoluntaryContextSwitches: 0
     - ScannerThreadsTotalWallClockTime: 0.000ns
       - MaterializeTupleTime(*): 5m37s
       - ScannerThreadsSysTime: 0.000ns
       - ScannerThreadsUserTime: 0.000ns
     - ScannerThreadsVoluntaryContextSwitches: 0
     - TotalRawReadTime(*): 38m58s
     - TotalReadThroughput: 0.00 /sec
     - WaitScannerTime: 1m7s

As can be seen above, WaitInFlightPacketTime is the most time-consuming portion.

Analysis
I describe the whole progress here.

PlanFragmentExecutor pseudo code:

while(1){
    batch=get_one_batch();
    OlapTableSink::send(batch);
}

Then, OlapTableSink::send() pseudo code:

for(row in batch){
    channel=get_corresponding_channel(row);

    // channel::add_row() explanation:
    ok=channel::add_row_in_cur_batch(row);
    if(!ok){
        if(channel::has_in_flight_packet){
            channel::wait_in_flight_packet(); // (*)
        }
        channel::send_add_batch_req();
        channel::add_row_in_cur_batch(row);
    }
    // channel::add_row() end
}

So if we trigger channel::wait_in_flight_packet(), it will block the whole process. But there's no need to block other channels add_row().
For example, channel0 is waiting in_flight_packet, we can still add row to other channels.

Better solutions(preliminary thoughts)

  • make channel::add_row() non-blocking. It might be a massive change.
  • make channel::add_row() less blocking. e.g. avoid adding row to channel0 immediately after channel0 send a add_batch request.

Metadata

Metadata

Labels

area/loadIssues or PRs related to all kinds of loadarea/sql/executionIssues or PRs related to the execution enginekind/featureCategorizes issue or PR as related to a new feature.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions