Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Scheduler] Implement KeepAliveService #5067

Merged
merged 4 commits into from
Mar 11, 2021
Merged

[New Scheduler] Implement KeepAliveService #5067

merged 4 commits into from
Mar 11, 2021

Conversation

KeonHee
Copy link
Member

@KeonHee KeonHee commented Feb 15, 2021

Description

  • In invoker and scheduler components, each host has a lease. This service manages lease to not disappear due to timeout.
  • Each host writes health data to etcd with this lease. When lease disappears by timeout, each host's health data disappear together, so it can be determined as an unhealthy host by the scheduler.
  • Because it is the service that has the greatest influence on the health state of the component, a dedicated dispatcher is allocated.
  • https://cwiki.apache.org/confluence/display/OPENWHISK/LeaseKeepAliveService

Related issue and scope

  • I opened an issue to propose and discuss this change (#????)

My changes affect the following components

  • API
  • Controller
  • Message Bus (e.g., Kafka)
  • Loadbalancer
  • Invoker
  • Scheduler
  • Intrinsic actions (e.g., sequences, conductors)
  • Data stores (e.g., CouchDB)
  • Tests
  • Deployment
  • CLI
  • General tooling
  • Documentation

Types of changes

  • Bug fix (generally a non-breaking change which closes an issue).
  • Enhancement or new feature (adds new functionality).
  • Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

  • I signed an Apache CLA.
  • I reviewed the style guides and followed the recommendations (Travis CI will check :).
  • I added tests to cover my changes.
  • My changes require further changes to the documentation.
  • I updated the documentation where necessary.

@codecov-io
Copy link

codecov-io commented Feb 15, 2021

Codecov Report

Merging #5067 (dd4e051) into master (ed58b23) will decrease coverage by 46.07%.
The diff coverage is 78.78%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #5067       +/-   ##
===========================================
- Coverage   81.61%   35.54%   -46.08%     
===========================================
  Files         204      205        +1     
  Lines        9950    10011       +61     
  Branches      447      457       +10     
===========================================
- Hits         8121     3558     -4563     
- Misses       1829     6453     +4624     
Impacted Files Coverage Δ
...rg/apache/openwhisk/core/scheduler/Scheduler.scala 0.00% <0.00%> (ø)
...openwhisk/core/service/LeaseKeepAliveService.scala 84.21% <84.21%> (ø)
...in/scala/org/apache/openwhisk/common/Logging.scala 71.85% <100.00%> (-13.93%) ⬇️
.../scala/org/apache/openwhisk/core/WhiskConfig.scala 95.45% <100.00%> (+0.05%) ⬆️
...a/org/apache/openwhisk/common/ConfigMapValue.scala 0.00% <0.00%> (-100.00%) ⬇️
.../apache/openwhisk/core/controller/Namespaces.scala 0.00% <0.00%> (-100.00%) ⬇️
...pache/openwhisk/core/controller/CorsSettings.scala 0.00% <0.00%> (-100.00%) ⬇️
...che/openwhisk/core/entitlement/RateThrottler.scala 0.00% <0.00%> (-100.00%) ⬇️
...he/openwhisk/core/entitlement/KindRestrictor.scala 0.00% <0.00%> (-100.00%) ⬇️
...penwhisk/core/database/cosmosdb/CosmosDBUtil.scala 0.00% <0.00%> (-100.00%) ⬇️
... and 136 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ed58b23...dd4e051. Read the comment docs.

@KeonHee KeonHee closed this Feb 16, 2021
@KeonHee KeonHee reopened this Feb 16, 2021
@KeonHee KeonHee closed this Feb 16, 2021
@KeonHee KeonHee reopened this Feb 16, 2021
case class Lease(id: Long, ttl: Long) extends KeepAliveServiceData

// Events received by the actor
case object ReGrantLease
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case object ReGrantLease
case object RegrantLease

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated it to RegrantLease

implicit val ec: ExecutionContextExecutor = context.dispatcher

private val leaseTimeout = loadConfigOrThrow[Int](ConfigKeys.etcdLeaseTimeout).seconds
private var worker: Option[Cancellable] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this var is thread safe since it gets set in a Future

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By including the worker in the state data, I removed the var value.

@KeonHee KeonHee changed the title [New Scheduler] Implement KeepAliveService [WIP] [New Scheduler] Implement KeepAliveService Feb 24, 2021
@KeonHee KeonHee changed the title [WIP] [New Scheduler] Implement KeepAliveService [New Scheduler] Implement KeepAliveService Feb 28, 2021
@KeonHee KeonHee changed the title [New Scheduler] Implement KeepAliveService [WIP] [New Scheduler] Implement KeepAliveService Feb 28, 2021
@KeonHee KeonHee changed the title [WIP] [New Scheduler] Implement KeepAliveService [New Scheduler] Implement KeepAliveService Feb 28, 2021
@ningyougang
Copy link
Contributor

LGTM

@KeonHee
Copy link
Member Author

KeonHee commented Mar 4, 2021

@bdoyle0182

I've reflected all of your reviews. please review again :)

@style95
Copy link
Member

style95 commented Mar 9, 2021

I would merge this in 24 hours for subsequent PRs.

@bdoyle0182
Copy link
Contributor

I would merge this in 24 hours for subsequent PRs.

LGTM

@style95 style95 merged commit e05aa44 into apache:master Mar 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants