Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KYUUBI #2686] Fix lock bug if engine initialization timeout #2687

Closed
wants to merge 4 commits into from

Conversation

ulysses-you
Copy link
Contributor

@ulysses-you ulysses-you commented May 18, 2022

Why are the changes needed?

closes #2686

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before make a pull request

@codecov-commenter
Copy link

codecov-commenter commented May 18, 2022

Codecov Report

Merging #2687 (11fad96) into master (8de2f5f) will decrease coverage by 0.19%.
The diff coverage is 100.00%.

❗ Current head 11fad96 differs from pull request most recent head 1ba6538. Consider uploading reports for the commit 1ba6538 to get more accurate results

@@             Coverage Diff              @@
##             master    #2687      +/-   ##
============================================
- Coverage     64.32%   64.12%   -0.20%     
- Complexity       80      275     +195     
============================================
  Files           385      407      +22     
  Lines         18628    19105     +477     
  Branches       2523     2569      +46     
============================================
+ Hits          11983    12252     +269     
- Misses         5506     5703     +197     
- Partials       1139     1150      +11     
Impacted Files Coverage Δ
.../org/apache/kyuubi/ha/client/DiscoveryClient.scala 56.52% <ø> (ø)
...ha/client/zookeeper/ZookeeperDiscoveryClient.scala 70.65% <100.00%> (+0.28%) ⬆️
...cala/org/apache/kyuubi/metrics/MetricsSystem.scala 84.09% <100.00%> (+0.36%) ⬆️
...ain/scala/org/apache/kyuubi/engine/EngineRef.scala 78.09% <100.00%> (+0.73%) ⬆️
...n/spark/authz/ranger/FilteredShowObjectsExec.scala 42.10% <0.00%> (-37.90%) ⬇️
...park/authz/ranger/FilterDataSourceV2Strategy.scala 57.14% <0.00%> (-22.86%) ⬇️
...ache/kyuubi/plugin/spark/authz/OperationType.scala 57.64% <0.00%> (-12.36%) ⬇️
.../kyuubi/plugin/spark/authz/PrivilegesBuilder.scala 68.84% <0.00%> (-11.09%) ⬇️
...he/kyuubi/spark/connector/tpcds/TPCDSCatalog.scala 35.29% <0.00%> (-6.82%) ⬇️
...he/kyuubi/plugin/spark/authz/util/AuthZUtils.scala 46.80% <0.00%> (-4.36%) ⬇️
... and 51 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8de2f5f...1ba6538. Read the comment docs.

@ulysses-you
Copy link
Contributor Author

unbelievable my local works well with the test code... but the ga failed sometime

@ulysses-you
Copy link
Contributor Author

Seems the test is stable, cc @yaooqinn @turboFei @pan3793 @cfmcgrady

@yaooqinn
Copy link
Member

what is the current behavior?

@ulysses-you
Copy link
Contributor Author

what is the current behavior?

if three(more than two) request coming, if the first request initializes engine timeout, we will get two engines from the last two request.

@yaooqinn
Copy link
Member

if A hangs with a big timeout, considering B/C are sequentially retries of A, so they are going to make server fe threads be exhausted?

@yaooqinn
Copy link
Member

what is the current behavior?

if three(more than two) request coming, if the first request initializes engine timeout, we will get two engines from the last two request.

hmm? so the timeout here means time to acquire the lock,not time to hold the lock?

@ulysses-you
Copy link
Contributor Author

what is the current behavior?

if three(more than two) request coming, if the first request initializes engine timeout, we will get two engines from the last two request.

hmm? so the timeout here means time to acquire the lock,not time to hold the lock?

yeah.. that's right

@yaooqinn
Copy link
Member

so,shall we throw exception if timeout?

@ulysses-you
Copy link
Contributor Author

For now, the timeout of ENGINE_INIT_TIMEOUT only considers the elapsed time of creating engine. That said, it does not consider the time of acquiring lock. I'm not sure if we should include the time when we are acquiring the lock. And if we want that, we should do some changes.

The current code path:

tryWithLock(discoveryClient) {
  val startTime ..
   if (started + timeout <= System.currentTimeMillis()) {
     throw .. 
  }
}

The expected code path:

val startTime ..
tryWithLockAndThrowWhenTimeout(startTime, discoveryClient) {
   if (started + timeout <= System.currentTimeMillis()) {
     throw .. 
  }
}

// to avoid client too long to waiting in concurrent.

// Return false means we are timeout
val acquired = lock.acquire(timeout, unit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use TimeUnit.MILLISECONDS directly, the timeout is always millis

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep the unit here? ETCD lease timeout is seconds, and cannot change unit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simply timeout / 1000 for etcd?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's OK to


// we should only submit two engines, the last request should timeout and fail
assert(MetricsSystem.counterValue(ENGINE_TOTAL).get - beforeEngines == 2)
executor.shutdown()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we put it in the finally block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, added it

@ulysses-you
Copy link
Contributor Author

ulysses-you commented May 19, 2022

thanks for review ! merging to master

@ulysses-you
Copy link
Contributor Author

seems a lot of conflicts.. I will do backport manually

ulysses-you added a commit that referenced this pull request May 19, 2022
backport #2687 for branch-1.5

### _Why are the changes needed?_

closes #2686

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #2705 from ulysses-you/KYUUBI-2686-1.5.

Closes #2686

8db5ee3 [ulysses-you] fix
a8b17db [ulysses-you] engine type
f76bfb7 [ulysses-you] Fix lock bug if engine initialization timeout

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: ulysses-you <ulyssesyou@apache.org>
beryllw pushed a commit to beryllw/incubator-kyuubi that referenced this pull request May 29, 2024
backport apache#2687 for branch-1.5

### _Why are the changes needed?_

closes apache#2686

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request

Closes apache#2705 from ulysses-you/KYUUBI-2686-1.5.

Closes apache#2686

8db5ee3 [ulysses-you] fix
a8b17db [ulysses-you] engine type
f76bfb7 [ulysses-you] Fix lock bug if engine initialization timeout

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: ulysses-you <ulyssesyou@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Fix lock bug if engine initialization timeout
5 participants