-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Closed
Description
The main purpose of this proposal is to make doris stream load more robust with higher concurrent peformance.
Now, in our production environment, we mainly use stream load to load business data into doris.
And we found the following problem:
- the transaction processing in GlobalTransactionMgr performance is poor, because the transaction lock is global without any isolation, for some operation we need to iterate the whole map.
- Fe thrift server use Executors.newFixedThreadPool and Executors.newCachedThreadPool to construct thread pool, which may cause oom or too many thread to be created and then crash
- the rpc cache client num is not limited in be and when rpc client timeout, the request task is still blocked in fe, the client encounter rpc_timeout and retry, it may cause disaster for fe sometimes.
- the tablet writer open cost too much time, because when user doesn't specify the partition, the all tablet writer would be opened.
Implementation
So the following work we need to implement:
- Support db level isolation for transaction processing and use ArrayDeque to stored the finished transaction
- Remove all code that used newFixedThreadPool or newCachedThreadPool, construct the thread pool explicitly, support thread pool metric monitor
- use client pool to manage rpc client, set the core and max cache client num, use try lock instead of lock to support server task be can be canceled in time due to timeout
- open the tablet writer when needed not open all when partitions of table are not specified.
morningman, kangkaisen, WingsGo, chaoyli and acelyc111