Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Scheduler] Implement FPCInvokerReactive #5125

Merged
merged 3 commits into from
Jan 13, 2022

Conversation

ningyougang
Copy link
Contributor

@ningyougang ningyougang commented Jun 8, 2021

Description

This component is for initialize the invoker side's components when invoker starts up, can use FPCInvokerReactive
instead of InvokerReactive using SPI mechanism here: https://github.com/apache/openwhisk/blob/master/common/scala/src/main/resources/reference.conf#L29

Related issue and scope

  • I opened an issue to propose and discuss this change (#????)

My changes affect the following components

  • API
  • Controller
  • Message Bus (e.g., Kafka)
  • Loadbalancer
  • Scheduler
  • Invoker
  • Intrinsic actions (e.g., sequences, conductors)
  • Data stores (e.g., CouchDB)
  • Tests
  • Deployment
  • CLI
  • General tooling
  • Documentation

Types of changes

  • Bug fix (generally a non-breaking change which closes an issue).
  • Enhancement or new feature (adds new functionality).
  • Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

  • I signed an Apache CLA.
  • I reviewed the style guides and followed the recommendations (Travis CI will check :).
  • I added tests to cover my changes.
  • My changes require further changes to the documentation.
  • I updated the documentation where necessary.

val warmedContainerKeepingTimeout = Try {
identity.limits.warmedContainerKeepingTimeout.map(Duration(_).toSeconds.seconds).get
}.getOrElse(containerProxyTimeoutConfig.keepingDuration)
(warmedContainerKeepingCount, warmedContainerKeepingTimeout)
Copy link
Contributor Author

@ningyougang ningyougang Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warmedContainerKeepingCount, warmedContainerKeepingTimeout is for avoid cold start: doesn't remove one namespace's all warmed containers when FunctionPullingContainerProxy timeout happens, need to keep the right amount warmed containers running for warmedContainerKeepingTimeout time.

After wait some long time and invocation happens, have no need to create a container, just uses the non-deleted warmed container.

for a specified namespace, we can configure the warmedContainerKeepingCount, warmedContainerKeepingTimeout in subjects's limit doucment, e.g.
image

If doesn't configure warmedContainerKeepingCount, warmedContainerKeepingTimeout in couchdb, it will use default configuration (warmedContainerKeepingCount: 1, warmedContainerKeepingTimeout: "60 minutes")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this work for individual functions? If it's by namespace you can get into situations where one function starves all of the warm containers kept for that namespace right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding get into situations where one function starves all of the warm containers kept for that namespace right?
Let's assume warmedContainerKeepCount is 2, and same namespace has 4 actions (e.g. actionA, actionB, actionC, actionD)
Normally, these 4 actions will complete for warmedContainerKeepCount: 2,

  • if 1 or 2 actions are executed actions frequently with medium tps
    these actions's 2 warmed containers will keep warmedContainerkeepingTimeout before removed even if no activations comes anymore.
  • if > 2 actions are executed actions frequently with medium tps
    Just satisfy 2 actions's exist non-removed containers, another 2 actions should do cold start, for this situation, use can configure warmedContainerKeepCount with a little bigger value, decide by user.
  • big tps for actionA
    It seems actionA will use all warmedContainerKeepCount : 2 warmed container, other actions(e.g. actionB, actionC, actionD) will do cold start, seems have no method to solve this, due to even if we configure a little bigger value, actionA will use all due to actionA's tps is big.

Currently, we just configure it for namespace not for individual functions

@ningyougang ningyougang closed this Jun 9, 2021
@ningyougang ningyougang reopened this Jun 9, 2021
@codecov-commenter
Copy link

codecov-commenter commented Jun 9, 2021

Codecov Report

Attention: Patch coverage is 1.31579% with 225 lines in your changes missing coverage. Please review.

Project coverage is 73.67%. Comparing base (9633043) to head (a5acd88).
Report is 166 commits behind head on master.

Files with missing lines Patch % Lines
...he/openwhisk/core/invoker/FPCInvokerReactive.scala 0.00% 218 Missing ⚠️
...rg/apache/openwhisk/core/ack/HealthActionAck.scala 0.00% 5 Missing ⚠️
...in/scala/org/apache/openwhisk/common/Logging.scala 0.00% 1 Missing ⚠️
...pache/openwhisk/core/invoker/InvokerReactive.scala 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #5125       +/-   ##
===========================================
+ Coverage   45.64%   73.67%   +28.02%     
===========================================
  Files         234      236        +2     
  Lines       13389    13616      +227     
  Branches      551      571       +20     
===========================================
+ Hits         6112    10031     +3919     
+ Misses       7277     3585     -3692     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

storeActivations: Option[Boolean] = None)
storeActivations: Option[Boolean] = None,
warmedContainerKeepingCount: Option[Int] = None,
warmedContainerKeepingTimeout: Option[String] = None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe have infects for non-scheduler codes?

Copy link
Contributor Author

@ningyougang ningyougang Jun 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes,infects below codes

case class UserLimits(invocationsPerMinute: Option[Int] = None,
                      concurrentInvocations: Option[Int] = None,
                      firesPerMinute: Option[Int] = None,
                      allowedKinds: Option[Set[String]] = None,
                      storeActivations: Option[Boolean] = None,
                      warmedContainerKeepingCount: Option[Int] = None,        // this is the new field
                      warmedContainerKeepingTimeout: Option[String] = None)   // this is the new field

If don't want to infect above code, one option is
make the keepingCount and keepingTime value to a fixed value this time rather than read from db, e.g.

  • keepingCount: 1
  • keepingTimeout: 60.minutes

But there has a problem that all frequent invocation actions will exist at least 1 container in 60.seconds both, obviously, this is a waste of resources

if add above 2 fileds, the benefit is, these 2 values can be configured from db, because these 2 value is for namespace, so one namespace's all actions complete for this keepingCount value.

So my opinion is, in spite of infect the non-scheduler code, it is better to add above 2 fields to case class UserLimits,
anyway, the non-scheduler codes can run well in spite of infect the codes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdoyle0182 @style95 ,due to above 2 fields(warmedContainerKeepingCount, warmedContainerKeepingTimeout) are added into case class UserLimits, do you guys have any opinion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not opposed to infecting non-scheduler code where absolutely necessary. I think it's unreasonable to suggest we can make such a large architectural change without touching existing code at all. So long as we're avoiding a breaking change I see no issue with this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ningyougang @style95 also sorry haven't had time to dedicate to the scheduler recently, but will review the remaining existing pr's this week

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdoyle0182 ,thanks for you review, just add 2 fields to case class UserLimits, it is not a large architectural change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I just meant the entire new scheduler is a large architectural change so it's unreasonable to say we can't ever touch other components while implementing. We should avoid where possible, but it will be necessary in a couple situations like this one.

@ningyougang
Copy link
Contributor Author

Have any comment?

acknowledegment: AcknowledegmentMessage): Future[Any] = {
implicit val transid: TransactionId = tid

logging.debug(this, s"health action is successfully invoked")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This debug shouldn't happen if the health action wasn't succesfully invoked right? Should check response value before logging.

nit: health action was successfully invoked

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated accordingly.

storeActivations: Option[Boolean] = None)
storeActivations: Option[Boolean] = None,
warmedContainerKeepingCount: Option[Int] = None,
warmedContainerKeepingTimeout: Option[String] = None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I just meant the entire new scheduler is a large architectural change so it's unreasonable to say we can't ever touch other components while implementing. We should avoid where possible, but it will be necessary in a couple situations like this one.

@@ -155,6 +155,7 @@ whisk {
#aka 'How long should a container sit idle until we kill it?'
idle-container = 10 minutes
pause-grace = 50 milliseconds
keeping-duration = 60 minutes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same thing as the idle-container config? Or is the warm container removed even if it has received an activation in the last 60 minutes?

Copy link
Contributor Author

@ningyougang ningyougang Aug 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For different configuration.

  • idle-container, e.g.

    If a FunctionPullingContainerProxy actor already executed some activations, but didn't execute activation for some time(e.g. 50.milliseconds), FunctionPullingContainerProxy would goto(Paused), then it will create a timer

    case _ -> Paused   => startSingleTimer(IdleTimeoutName, StateTimeout, idleTimeout)
    

    we can see, it would send StateTimeout after idleTimeout, then, StateTimeout would be handled by below codes, it would judge this FunctionPullingContainerProxy whether in warmedContainerKeepingCount

    when(Paused) {
    ...
      case Event(StateTimeout, data: WarmData) =>
        (for {
          count <- getLiveContainerCount(data.invocationNamespace, data.action.fullyQualifiedName(false), data.revision)
          (warmedContainerKeepingCount, warmedContainerKeepingTimeout) <- getWarmedContainerLimit(
            data.invocationNamespace)
        } yield {
          logging.info(
            this,
            s"Live container count: ${count}, warmed container keeping count configuration: ${warmedContainerKeepingCount} in namespace: ${data.invocationNamespace}")
          if (count <= warmedContainerKeepingCount) {
            Keep(warmedContainerKeepingTimeout)
          } else {
            Remove
          }
        }).pipeTo(self)
        stay
    ...
    }
    
  • keeping-duration
    Then, if this FunctionPullingContainerProxy is not in warmedContainerKeepingCount, it would be removed.
    otherwise, this FunctionPullingContainerProxy would keep more keeping-duration time.

val warmedContainerKeepingTimeout = Try {
identity.limits.warmedContainerKeepingTimeout.map(Duration(_).toSeconds.seconds).get
}.getOrElse(containerProxyTimeoutConfig.keepingDuration)
(warmedContainerKeepingCount, warmedContainerKeepingTimeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this work for individual functions? If it's by namespace you can get into situations where one function starves all of the warm containers kept for that namespace right?

@bdoyle0182
Copy link
Contributor

Few comments but mostly looks good to me

@bdoyle0182
Copy link
Contributor

LGTM

@ningyougang ningyougang merged commit e172168 into apache:master Jan 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants