[New Scheduler] Implement FPCInvokerReactive #5125

ningyougang · 2021-06-08T08:53:49Z

Description

This component is for initialize the invoker side's components when invoker starts up, can use FPCInvokerReactive
instead of InvokerReactive using SPI mechanism here: https://github.com/apache/openwhisk/blob/master/common/scala/src/main/resources/reference.conf#L29

Related issue and scope

I opened an issue to propose and discuss this change (#????)

My changes affect the following components

Types of changes

Bug fix (generally a non-breaking change which closes an issue).
Enhancement or new feature (adds new functionality).
Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

I signed an Apache CLA.
I reviewed the style guides and followed the recommendations (Travis CI will check :).
I added tests to cover my changes.
My changes require further changes to the documentation.
I updated the documentation where necessary.

ningyougang · 2021-06-08T09:06:13Z

core/invoker/src/main/scala/org/apache/openwhisk/core/invoker/FPCInvokerReactive.scala

+        val warmedContainerKeepingTimeout = Try {
+          identity.limits.warmedContainerKeepingTimeout.map(Duration(_).toSeconds.seconds).get
+        }.getOrElse(containerProxyTimeoutConfig.keepingDuration)
+        (warmedContainerKeepingCount, warmedContainerKeepingTimeout)


warmedContainerKeepingCount, warmedContainerKeepingTimeout is for avoid cold start: doesn't remove one namespace's all warmed containers when FunctionPullingContainerProxy timeout happens, need to keep the right amount warmed containers running for warmedContainerKeepingTimeout time.

After wait some long time and invocation happens, have no need to create a container, just uses the non-deleted warmed container.

for a specified namespace, we can configure the warmedContainerKeepingCount, warmedContainerKeepingTimeout in subjects's limit doucment, e.g.

If doesn't configure warmedContainerKeepingCount, warmedContainerKeepingTimeout in couchdb, it will use default configuration (warmedContainerKeepingCount: 1, warmedContainerKeepingTimeout: "60 minutes")

how does this work for individual functions? If it's by namespace you can get into situations where one function starves all of the warm containers kept for that namespace right?

Regarding get into situations where one function starves all of the warm containers kept for that namespace right?
Let's assume warmedContainerKeepCount is 2, and same namespace has 4 actions (e.g. actionA, actionB, actionC, actionD)
Normally, these 4 actions will complete for warmedContainerKeepCount: 2,

if 1 or 2 actions are executed actions frequently with medium tps
these actions's 2 warmed containers will keep warmedContainerkeepingTimeout before removed even if no activations comes anymore.

if > 2 actions are executed actions frequently with medium tps
Just satisfy 2 actions's exist non-removed containers, another 2 actions should do cold start, for this situation, use can configure warmedContainerKeepCount with a little bigger value, decide by user.

big tps for actionA
It seems actionA will use all warmedContainerKeepCount : 2 warmed container, other actions(e.g. actionB, actionC, actionD) will do cold start, seems have no method to solve this, due to even if we configure a little bigger value, actionA will use all due to actionA's tps is big.

Currently, we just configure it for namespace not for individual functions

codecov-commenter · 2021-06-09T01:23:01Z

Codecov Report

Attention: Patch coverage is 1.31579% with 225 lines in your changes missing coverage. Please review.

Project coverage is 73.67%. Comparing base (9633043) to head (a5acd88).
Report is 166 commits behind head on master.

Files with missing lines	Patch %	Lines
...he/openwhisk/core/invoker/FPCInvokerReactive.scala	0.00%	218 Missing ⚠️
...rg/apache/openwhisk/core/ack/HealthActionAck.scala	0.00%	5 Missing ⚠️
...in/scala/org/apache/openwhisk/common/Logging.scala	0.00%	1 Missing ⚠️
...pache/openwhisk/core/invoker/InvokerReactive.scala	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           master    #5125       +/-   ##
===========================================
+ Coverage   45.64%   73.67%   +28.02%     
===========================================
  Files         234      236        +2     
  Lines       13389    13616      +227     
  Branches      551      571       +20     
===========================================
+ Hits         6112    10031     +3919     
+ Misses       7277     3585     -3692

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jiangpengcheng · 2021-06-10T07:28:23Z

common/scala/src/main/scala/org/apache/openwhisk/core/entity/Identity.scala

-                      storeActivations: Option[Boolean] = None)
+                      storeActivations: Option[Boolean] = None,
+                      warmedContainerKeepingCount: Option[Int] = None,
+                      warmedContainerKeepingTimeout: Option[String] = None)


maybe have infects for non-scheduler codes?

Yes，infects below codes

case class UserLimits(invocationsPerMinute: Option[Int] = None, concurrentInvocations: Option[Int] = None, firesPerMinute: Option[Int] = None, allowedKinds: Option[Set[String]] = None, storeActivations: Option[Boolean] = None, warmedContainerKeepingCount: Option[Int] = None, // this is the new field warmedContainerKeepingTimeout: Option[String] = None) // this is the new field

If don't want to infect above code, one option is
make the keepingCount and keepingTime value to a fixed value this time rather than read from db, e.g.

keepingCount: 1

keepingTimeout: 60.minutes

But there has a problem that all frequent invocation actions will exist at least 1 container in 60.seconds both, obviously, this is a waste of resources

if add above 2 fileds, the benefit is, these 2 values can be configured from db, because these 2 value is for namespace, so one namespace's all actions complete for this keepingCount value.

So my opinion is, in spite of infect the non-scheduler code, it is better to add above 2 fields to case class UserLimits,
anyway, the non-scheduler codes can run well in spite of infect the codes.

@bdoyle0182 @style95 ,due to above 2 fields(warmedContainerKeepingCount, warmedContainerKeepingTimeout) are added into case class UserLimits, do you guys have any opinion?

I'm not opposed to infecting non-scheduler code where absolutely necessary. I think it's unreasonable to suggest we can make such a large architectural change without touching existing code at all. So long as we're avoiding a breaking change I see no issue with this

@ningyougang @style95 also sorry haven't had time to dedicate to the scheduler recently, but will review the remaining existing pr's this week

@bdoyle0182 ,thanks for you review, just add 2 fields to case class UserLimits, it is not a large architectural change

Yea I just meant the entire new scheduler is a large architectural change so it's unreasonable to say we can't ever touch other components while implementing. We should avoid where possible, but it will be necessary in a couple situations like this one.

ningyougang · 2021-06-23T05:03:13Z

Have any comment?

bdoyle0182 · 2021-06-23T21:09:59Z

common/scala/src/main/scala/org/apache/openwhisk/core/ack/HealthActionAck.scala

+                     acknowledegment: AcknowledegmentMessage): Future[Any] = {
+    implicit val transid: TransactionId = tid
+
+    logging.debug(this, s"health action is successfully invoked")


This debug shouldn't happen if the health action wasn't succesfully invoked right? Should check response value before logging.

nit: health action was successfully invoked

Updated accordingly.

bdoyle0182 · 2021-06-23T21:11:45Z

common/scala/src/main/scala/org/apache/openwhisk/core/entity/Identity.scala

-                      storeActivations: Option[Boolean] = None)
+                      storeActivations: Option[Boolean] = None,
+                      warmedContainerKeepingCount: Option[Int] = None,
+                      warmedContainerKeepingTimeout: Option[String] = None)


Yea I just meant the entire new scheduler is a large architectural change so it's unreasonable to say we can't ever touch other components while implementing. We should avoid where possible, but it will be necessary in a couple situations like this one.

bdoyle0182 · 2021-06-23T21:13:39Z

core/invoker/src/main/resources/application.conf

@@ -155,6 +155,7 @@ whisk {
      #aka 'How long should a container sit idle until we kill it?'
      idle-container = 10 minutes
      pause-grace = 50 milliseconds
+      keeping-duration = 60 minutes


Is this the same thing as the idle-container config? Or is the warm container removed even if it has received an activation in the last 60 minutes?

For different configuration.

idle-container, e.g.

If a FunctionPullingContainerProxy actor already executed some activations, but didn't execute activation for some time(e.g. 50.milliseconds), FunctionPullingContainerProxy would goto(Paused), then it will create a timer

case _ -> Paused => startSingleTimer(IdleTimeoutName, StateTimeout, idleTimeout)

we can see, it would send StateTimeout after idleTimeout, then, StateTimeout would be handled by below codes, it would judge this FunctionPullingContainerProxy whether in warmedContainerKeepingCount

when(Paused) { ... case Event(StateTimeout, data: WarmData) => (for { count <- getLiveContainerCount(data.invocationNamespace, data.action.fullyQualifiedName(false), data.revision) (warmedContainerKeepingCount, warmedContainerKeepingTimeout) <- getWarmedContainerLimit( data.invocationNamespace) } yield { logging.info( this, s"Live container count: ${count}, warmed container keeping count configuration: ${warmedContainerKeepingCount} in namespace: ${data.invocationNamespace}") if (count <= warmedContainerKeepingCount) { Keep(warmedContainerKeepingTimeout) } else { Remove } }).pipeTo(self) stay ... }

keeping-duration
Then, if this FunctionPullingContainerProxy is not in warmedContainerKeepingCount, it would be removed.
otherwise, this FunctionPullingContainerProxy would keep more keeping-duration time.

bdoyle0182 · 2021-06-23T21:24:51Z

core/invoker/src/main/scala/org/apache/openwhisk/core/invoker/FPCInvokerReactive.scala

+        val warmedContainerKeepingTimeout = Try {
+          identity.limits.warmedContainerKeepingTimeout.map(Duration(_).toSeconds.seconds).get
+        }.getOrElse(containerProxyTimeoutConfig.keepingDuration)
+        (warmedContainerKeepingCount, warmedContainerKeepingTimeout)


how does this work for individual functions? If it's by namespace you can get into situations where one function starves all of the warm containers kept for that namespace right?

core/invoker/src/main/scala/org/apache/openwhisk/core/invoker/FPCInvokerReactive.scala

bdoyle0182 · 2021-06-23T21:35:33Z

Few comments but mostly looks good to me

bdoyle0182 · 2022-01-13T08:13:29Z

LGTM

ningyougang commented Jun 8, 2021

View reviewed changes

ningyougang closed this Jun 9, 2021

ningyougang reopened this Jun 9, 2021

ningyougang force-pushed the FPCInvokerReactive branch from e5eb296 to 919acfc Compare June 10, 2021 00:35

jiangpengcheng reviewed Jun 10, 2021

View reviewed changes

style95 added the scheduler label Jun 14, 2021

ningyougang force-pushed the FPCInvokerReactive branch from 919acfc to 854b676 Compare June 15, 2021 01:14

bdoyle0182 reviewed Jun 23, 2021

View reviewed changes

ningyougang force-pushed the FPCInvokerReactive branch from 854b676 to f75b616 Compare August 30, 2021 06:43

bdoyle0182 approved these changes Jan 12, 2022

View reviewed changes

ningyougang added 3 commits January 13, 2022 15:02

Implement FPCInvokerReactive

56b947c

Fix review points

e023361

Remove unnecessary code

a5acd88

ningyougang force-pushed the FPCInvokerReactive branch from f75b616 to a5acd88 Compare January 13, 2022 07:17

ningyougang merged commit e172168 into apache:master Jan 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[New Scheduler] Implement FPCInvokerReactive #5125

[New Scheduler] Implement FPCInvokerReactive #5125

ningyougang commented Jun 8, 2021 •

edited

Loading

ningyougang Jun 8, 2021 •

edited

Loading

bdoyle0182 Jun 23, 2021

ningyougang Aug 30, 2021

codecov-commenter commented Jun 9, 2021 •

edited

Loading

jiangpengcheng Jun 10, 2021

ningyougang Jun 10, 2021 •

edited

Loading

ningyougang Jun 15, 2021

bdoyle0182 Jun 15, 2021

bdoyle0182 Jun 15, 2021

ningyougang Jun 16, 2021

bdoyle0182 Jun 23, 2021

ningyougang commented Jun 23, 2021

bdoyle0182 Jun 23, 2021

ningyougang Aug 30, 2021

bdoyle0182 Jun 23, 2021

bdoyle0182 Jun 23, 2021

ningyougang Aug 30, 2021 •

edited

Loading

bdoyle0182 Jun 23, 2021

bdoyle0182 commented Jun 23, 2021

bdoyle0182 commented Jan 13, 2022

[New Scheduler] Implement FPCInvokerReactive #5125

[New Scheduler] Implement FPCInvokerReactive #5125

Conversation

ningyougang commented Jun 8, 2021 • edited Loading

Description

Related issue and scope

My changes affect the following components

Types of changes

Checklist:

ningyougang Jun 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Jun 9, 2021 • edited Loading

Codecov Report

Choose a reason for hiding this comment

ningyougang Jun 10, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ningyougang commented Jun 23, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ningyougang Aug 30, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdoyle0182 commented Jun 23, 2021

bdoyle0182 commented Jan 13, 2022

ningyougang commented Jun 8, 2021 •

edited

Loading

ningyougang Jun 8, 2021 •

edited

Loading

codecov-commenter commented Jun 9, 2021 •

edited

Loading

ningyougang Jun 10, 2021 •

edited

Loading

ningyougang Aug 30, 2021 •

edited

Loading