-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Closed
Labels
area/loadIssues or PRs related to all kinds of loadIssues or PRs related to all kinds of loadarea/sql/executionIssues or PRs related to the execution engineIssues or PRs related to the execution enginekind/featureCategorizes issue or PR as related to a new feature.Categorizes issue or PR as related to a new feature.
Description
Problem
When we use broker load, OlapTableSink::send() takes the longest time, almost all of the plan_fragment active time.
Example in one BE:
Fragment f59d832368a84c94-be109f903cf4698d:(Active: 3h36m, % non-child: 0.00%)
- AverageThreadTokens: 1.00
- PeakReservation: 0
- PeakUsedReservation: 0
- RowsProduced: 168.61M
- SizeProduced: 30.25 GB
BlockMgr:
- BlockWritesOutstanding: 0
- BlocksCreated: 0
- BlocksRecycled: 0
- BufferedPins: 0
- BytesWritten: 0
- MaxBlockSize: 8.00 MB
- MemoryLimit: 2.00 GB
- TotalBufferWaitTime: 0.000ns
- TotalEncryptionTime: 0.000ns
- TotalIntegrityCheckTime: 0.000ns
- TotalReadBlockTime: 0.000ns
OlapTableSink:(Active: 3h35m, % non-child: 0.00%)
- CloseTime: 102.932ms
- ConvertBatchTime: 0.000ns
- OpenTime: 247.194ms
- RowsFiltered: 0
- RowsRead: 168.61M
- RowsReturned: 168.61M
- SendDataTime: 3h34m
- SerializeBatchTime: 8m26s
- ValidateDataTime: 19s554ms
- WaitInFlightPacketTime: 3h23m
BROKER_SCAN_NODE (id=0):(Active: 1m8s, % non-child: 0.00%)
- BytesRead: 0
- MemoryUsed: 0
- NumThread: 0
- PerReadThreadRawHdfsThroughput: 0.00 /sec
- RowsRead: 168.61M
- RowsReturned: 168.61M
- RowsReturnedRate: 2.48 M/sec
- ScanRangesComplete: 0
- ScannerThreadsInvoluntaryContextSwitches: 0
- ScannerThreadsTotalWallClockTime: 0.000ns
- MaterializeTupleTime(*): 5m37s
- ScannerThreadsSysTime: 0.000ns
- ScannerThreadsUserTime: 0.000ns
- ScannerThreadsVoluntaryContextSwitches: 0
- TotalRawReadTime(*): 38m58s
- TotalReadThroughput: 0.00 /sec
- WaitScannerTime: 1m7s
As can be seen above, WaitInFlightPacketTime is the most time-consuming portion.
Analysis
I describe the whole progress here.
PlanFragmentExecutor pseudo code:
while(1){
batch=get_one_batch();
OlapTableSink::send(batch);
}
Then, OlapTableSink::send() pseudo code:
for(row in batch){
channel=get_corresponding_channel(row);
// channel::add_row() explanation:
ok=channel::add_row_in_cur_batch(row);
if(!ok){
if(channel::has_in_flight_packet){
channel::wait_in_flight_packet(); // (*)
}
channel::send_add_batch_req();
channel::add_row_in_cur_batch(row);
}
// channel::add_row() end
}
So if we trigger channel::wait_in_flight_packet(), it will block the whole process. But there's no need to block other channels add_row().
For example, channel0 is waiting in_flight_packet, we can still add row to other channels.
Better solutions(preliminary thoughts)
- make channel::add_row() non-blocking. It might be a massive change.
- make channel::add_row() less blocking. e.g. avoid adding row to channel0 immediately after channel0 send a add_batch request.
Metadata
Metadata
Assignees
Labels
area/loadIssues or PRs related to all kinds of loadIssues or PRs related to all kinds of loadarea/sql/executionIssues or PRs related to the execution engineIssues or PRs related to the execution enginekind/featureCategorizes issue or PR as related to a new feature.Categorizes issue or PR as related to a new feature.