-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Optimize][Delete] Simplify the delete process to make it fast #3191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
could you add some performance optimization result description? |
| partition.getId(), partitionName, | ||
| -1, 0, deleteConditions); | ||
| deleteJob = new DeleteJob(transactionId, deleteInfo); | ||
| idToDeleteJob.put(deleteJob.getTransactionId(), deleteJob); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you put deleteInfo in idToDeleteJob, you need to make sure that the deleteInfo will be cleaned finally, even if any exception is thrown.
So I think you should clear the deleteInfo in finally block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a new try finally block surround outside this logic
| DeleteInfo info = (DeleteInfo) journal.getData(); | ||
| Load load = catalog.getLoadInstance(); | ||
| load.replayDelete(info, catalog); | ||
| DeleteHandler deleteHandler = catalog.getDeleteHandler(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can not use the origin OP_FINISH_SYNC_DELETE, because they are different operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now add a new operation OP_FINISH_DELETE
|
|
||
| Load load = catalog.getLoadInstance(); | ||
| List<List<Comparable>> deleteInfos = load.getDeleteInfosByDb(dbId, true); | ||
| DeleteHandler deleteHandler = catalog.getDeleteHandler(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should also show the delete info from Load, or after we upgrade the Doris, the old delete info can not be seen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| } | ||
|
|
||
| public boolean addFinishedReplica(long tabletId, Replica replica) { | ||
| TabletDeleteInfo tDeleteInfo = tabletDeleteInfoMap.get(tabletId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After changing tabletDeleteInfoMap to ConncurrentHashMap, you should use putIfAbsent to perform the atomic operation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| } | ||
| } | ||
|
|
||
| public boolean cancelJob(DeleteJob job, CancelType cancelType, String reason) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method return boolean, but you never use it.
I think return true means cancel succeed(txn failed), and return false means cancel failed(txn succeed).
And the caller should use this return value to decide whether to return user success or failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if transaction is COMMITTED but not VISIBLE, you should return a transaction id to user, so that user can use that it to check transaction's state.
| DdlExecutor.execute(context.getCatalog(), (DdlStmt) parsedStmt, originStmt); | ||
| context.getState().setOk(); | ||
| } catch (QueryStateException e) { | ||
| context.getState().setOk(0L, 0, e.getMessage()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- QueryStateException should derived from UserException.
- Better to just create a
QueryStateinside the QueryStateException, and here you can just callcontext.setState(e.getQueryState());. If other people use this exception, he will know how to use it.
| public boolean addFinishedReplica(long tabletId, Replica replica) { | ||
| tabletDeleteInfoMap.putIfAbsent(tabletId, new TabletDeleteInfo(tabletId)); | ||
| TabletDeleteInfo tDeleteInfo = tabletDeleteInfoMap.get(tabletId); | ||
| synchronized (tDeleteInfo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to use synchronized here, I think you can just use a ConcurrentSet in TabletDeleteInfo
morningman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…e#3191) Our current DELETE strategy reuses the LoadChecker framework. LoadChecker runs jobs in different stages by polling them in every 5 seconds. There are four stages of a load job, Pending/ETL/Loading/Quorum_finish, each of them is allocated to a LoadChecker. Four example, if a load job is submitted, it will be initialized to the Pending state, then wait for running by the Pending LoadChecker. After the pending job is ran, its stage will change to ETL stage, and then wait for running by the next LoadChecker(ETL). Because interval time of the LoadChecker is 5s, in worst case, a pending job need to wait for 20s during its life cycle. In particular, the DELETE jobs do not need to wait for polling, they can run the pushTask() function directly to delete. In this commit, I add a delete handler to concurrently processing delete tasks. All delete tasks will push to BE immediately, not required to wait for LoadCheker, without waiting for 2 LoadChecker(delete job started in LOADING state), at most 10s will be save(5s per LoadCheker). The delete process now is synchronized and users get response only after the delete finished or be canceled. If a delete is running over a certain period of time, it will be cancelled with a timeout exception. NOTICE: this CL upgrade FE meta version to 82
#3190
Our current DELETE strategy reuses the LoadChecker framework. Loadchecker runs jobs in different stages by polling them in every 5 seconds.
There are four stages of a load job, Pending/ETL/Loading/Quorum_finish, each of them is allocated to a LoadChecker. Four example, if a load job is submitted, it will be initialized to the pending state, then wait for running by the Pending LoadChecker. After the pending job is runned, its stage will change to ETL stage, and then wait for running by the next LoadChecker(ETL). Because interval time of the LoadChecker is 5s, in worst case, a pending job need to wait for 20s during its life cycle.
In particular, the DELETE jobs do not need to wait for polling, they can run the pushTask() function directly to delete. In this commit, I add a delete handler to concurrently processing delete tasks. All delete tasks will push to BE immediately, not required to wait for LoadCheker, without waiting for 2 LoadChecker(delete job started in LOADING state), at most 10s will be save(5s per LoadCheker). The delete process now is synchronized and users get response only after the delete finished or be canceled. If a delete is running over a certain period of time, it will be cancelled with a timeout exception.
当前Doris的删除策略是走的LoadChecker路线,Delete Job 被视为一般的Load Job,而一般的Load Job需要经过一次LoadChecker的轮询才能改变自己的状态。
Load Job的状态变化顺序为PENDING->ETL->LOADING->QUORUM_FINISHED,因为LoadCheker的轮询间隔为5秒,因此最坏情况下Load Job需要等待20s时间。
但是Delete Job一初始化即为LOADING状态,甚至它根本不需要状态,直接提交Push Task后等待就可以了,因此这次提交旨在将Delete Job从LoadChecker的系统中分离出来,节省不需要浪费的时间。
最好情况下,让Delete Job直接push task,可以节省5~10s时间,在提交任务后,根据此次提交涉及的tablet数量指定等待超时时间,这样同步删除的方式使一个Delete Job被取消或者完成时用户能够立刻收到消息,最后我们只要持久化已完成的Delete Job,供用户show delete命令查找即可。