-
Notifications
You must be signed in to change notification settings - Fork 3.7k
(fix)[deleteJob] It will not dispatch task when backend has been drop… #59569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…ped or not alive on delete from command.
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
| st = status; | ||
| public void addMark(K key, V value) { | ||
| synchronized (lock) { | ||
| marks.put(key, value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里如果downLatch != null,需要报错,我们不能够支持await 之后,还变更mark的数量
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK,这里可以增加一个判断的,之所以让这个行为存在主要是为了兼容之前的写入,我们可以让使用者在事先就知道目标的count是多少,然后调用有count的构造函数在构建MarkedCountDownLatch时就初始化downLatch,如果可以不兼容这种情况那么是需要在这里限制这种行为的。
| public synchronized void addMark(K key, V value) { | ||
| marks.put(key, value); | ||
| public long getCount() { | ||
| synchronized (lock) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个lock,跟直接在函数上加synchronized 修饰有什么区别?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里主要是考虑到了部分函数比如说wait不是锁整个method只是锁住代码块,为了整个类的风格统一所以在函数内使用代码块的锁
| public MarkedCountDownLatch(int count) { | ||
| super(count); | ||
| this.markCount = count; | ||
| this.downLatch = new CountDownLatch(count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个构造函数还存在的意义是什么?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果这个构造函数去不掉,就说明有的时候,我们的这个count 可能不等于marks的数量,可能是有问题的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个构造函数和上面一样是为了减少改动量,兼容之前的一些使用方式
…ped or not alive on delete from command.
When a node is abnormal or dropped, the delete job function does not filter the replicas for these nodes, resulting in failure during task dispatch and causing the task to not execute properly during the Delete operation. On the other hand, after a task execution fails, it keeps retrying; however, it does not adequately consider that in certain scenarios a retry cannot resolve the issue and can only rely on the overall timeout of the outer task to terminate, which can lead to the task being stuck for a long time.