Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix-16382] Fix the bug of async master task casthread pool invocations ramp-up #16461

Merged
merged 2 commits into from
Aug 15, 2024

Conversation

Dyqer
Copy link
Contributor

@Dyqer Dyqer commented Aug 14, 2024

Purpose of the pull request

fix #16460

Brief change log

Verify this pull request

This pull request is code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(or)

Pull Request Notice

Pull Request Notice

If your pull request contain incompatible change, you should also add it to docs/docs/en/guide/upgrede/incompatible.md

@SbloodyS SbloodyS added the bug Something isn't working label Aug 14, 2024
@SbloodyS SbloodyS added this to the 3.3.0 milestone Aug 14, 2024
}

@Override
public long getDelay(TimeUnit unit) {
long nextExecuteTimeDelay = Math.min(currentStartTime + executeInterval, timeout) - System.currentTimeMillis();
return unit.convert(nextExecuteTimeDelay, TimeUnit.MILLISECONDS);
return unit.convert(this.executeInterval, TimeUnit.MILLISECONDS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you test this? Once the element changed in the delay queue, the delay time will recalculate, in this pr the delay time will be a fixed value, this will cause some function will never be triggered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for that, I changed it.

Comment on lines +62 to +57
public synchronized void refreshStartTime() {
if (executeTimes != 0) {
// The first time doesn't have delay
executeTimes++;
} else {
currentStartTime = System.currentTimeMillis();
}
executeTimes++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The executeTimes is only a mark flag, it is not used to represent execute the real times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can record the execute times for monitoring

Copy link
Member

@ruanwenjun ruanwenjun Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's great, we can add this in task metrics.

@ruanwenjun
Copy link
Member

The key problem related to your issue is the async task doesn't support timeout, the async should stop once it's timeout.

@Dyqer
Copy link
Contributor Author

Dyqer commented Aug 14, 2024

The key problem related to your issue is the async task doesn't support timeout, the async should stop once it's timeout.

but the timeout observer is in master standby list, it may have 1~10 minutes later than real timeout, the functions will be invoked in a dead loop during those 10 minutes.

@Dyqer
Copy link
Contributor Author

Dyqer commented Aug 14, 2024

The key problem related to your issue is the async task doesn't support timeout, the async should stop once it's timeout.

@ruanwenjun
Here are two critical issues about AsyncMasterTask:

  1. if the task is set to just warn and not fail on timeout, it will immediately enter an infinite loop after reaching the timeout period.
  2. there's a mistake in the timeout unit, should be minute not second, it exacerbates the problem.

I think the best solution is to ignore the timeout logic here and only focus on the execution interval. The timeout event should still be handled by the stateEventhandler.

@ruanwenjun
Copy link
Member

The key problem related to your issue is the async task doesn't support timeout, the async should stop once it's timeout.

@ruanwenjun Here are two critical issues about AsyncMasterTask:

  1. if the task is set to just warn and not fail on timeout, it will immediately enter an infinite loop after reaching the timeout period.
  2. there's a mistake in the timeout unit, should be minute not second, it exacerbates the problem.

I think the best solution is to ignore the timeout logic here and only focus on the execution interval. The timeout event should still be handled by the stateEventhandler.

Yes, there exist a bug as you mention, and you are right, the task executor don't need to care about the timeout, all timeout event should be controlled by master event engine.

ruanwenjun
ruanwenjun previously approved these changes Aug 15, 2024
Copy link
Member

@ruanwenjun ruanwenjun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, and we didn't have integration test for the logic task, after #16327, we can easily add this kind of case.

SbloodyS
SbloodyS previously approved these changes Aug 15, 2024
Copy link
Member

@SbloodyS SbloodyS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@Dyqer
Copy link
Contributor Author

Dyqer commented Aug 15, 2024

@ruanwenjun @SbloodyS anyone can help merge? Seems github actions stuck

@SbloodyS
Copy link
Member

@ruanwenjun @SbloodyS anyone can help merge? Seems github actions stuck

CI failed. You should run mvn spotless:apply to format code.

@Dyqer Dyqer dismissed stale reviews from ruanwenjun and SbloodyS via aebfea6 August 15, 2024 08:12
@Dyqer
Copy link
Contributor Author

Dyqer commented Aug 15, 2024

@ruanwenjun @SbloodyS anyone can help merge? Seems github actions stuck

CI failed. You should run mvn spotless:apply to format code.

Oh, Thank you!, please help to review again @ruanwenjun @SbloodyS

Copy link

sonarcloud bot commented Aug 15, 2024

@ruanwenjun ruanwenjun merged commit 57c80f2 into apache:dev Aug 15, 2024
65 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend bug Something isn't working ready-to-merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] [dolphinscheduler-master] AsyncMasterTask may causing a ramp-up in thread pool invocations
3 participants