-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Scheduler] Implement FPCInvokerReactive #5125
Conversation
val warmedContainerKeepingTimeout = Try { | ||
identity.limits.warmedContainerKeepingTimeout.map(Duration(_).toSeconds.seconds).get | ||
}.getOrElse(containerProxyTimeoutConfig.keepingDuration) | ||
(warmedContainerKeepingCount, warmedContainerKeepingTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warmedContainerKeepingCount, warmedContainerKeepingTimeout
is for avoid cold start
: doesn't remove one namespace's all warmed containers when FunctionPullingContainerProxy timeout happens, need to keep the right amount warmed containers running for warmedContainerKeepingTimeout time.
After wait some long time and invocation happens, have no need to create a container, just uses the non-deleted warmed container.
for a specified namespace, we can configure the warmedContainerKeepingCount, warmedContainerKeepingTimeout
in subjects's limit doucment, e.g.
If doesn't configure warmedContainerKeepingCount, warmedContainerKeepingTimeout
in couchdb, it will use default configuration (warmedContainerKeepingCount: 1, warmedContainerKeepingTimeout: "60 minutes")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this work for individual functions? If it's by namespace you can get into situations where one function starves all of the warm containers kept for that namespace right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding get into situations where one function starves all of the warm containers kept for that namespace right?
Let's assume warmedContainerKeepCount is 2
, and same namespace has 4 actions (e.g. actionA, actionB, actionC, actionD)
Normally, these 4 actions will complete for warmedContainerKeepCount: 2
,
- if 1 or 2 actions are executed actions frequently with medium tps
these actions's 2 warmed containers will keepwarmedContainerkeepingTimeout
before removed even if no activations comes anymore. - if > 2 actions are executed actions frequently with medium tps
Just satisfy 2 actions's exist non-removed containers, another 2 actions should docold start
, for this situation, use can configure warmedContainerKeepCount with a little bigger value, decide by user. - big tps for actionA
It seems actionA will use allwarmedContainerKeepCount : 2
warmed container, other actions(e.g. actionB, actionC, actionD) will docold start
, seems have no method to solve this, due to even if we configure a little bigger value, actionA will use all due to actionA's tps is big.
Currently, we just configure it for namespace not for individual functions
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5125 +/- ##
===========================================
+ Coverage 45.64% 73.67% +28.02%
===========================================
Files 234 236 +2
Lines 13389 13616 +227
Branches 551 571 +20
===========================================
+ Hits 6112 10031 +3919
+ Misses 7277 3585 -3692 ☔ View full report in Codecov by Sentry. |
e5eb296
to
919acfc
Compare
storeActivations: Option[Boolean] = None) | ||
storeActivations: Option[Boolean] = None, | ||
warmedContainerKeepingCount: Option[Int] = None, | ||
warmedContainerKeepingTimeout: Option[String] = None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe have infects for non-scheduler codes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes,infects below codes
case class UserLimits(invocationsPerMinute: Option[Int] = None,
concurrentInvocations: Option[Int] = None,
firesPerMinute: Option[Int] = None,
allowedKinds: Option[Set[String]] = None,
storeActivations: Option[Boolean] = None,
warmedContainerKeepingCount: Option[Int] = None, // this is the new field
warmedContainerKeepingTimeout: Option[String] = None) // this is the new field
If don't want to infect above code, one option is
make the keepingCount and keepingTime value
to a fixed value this time rather than read from db
, e.g.
- keepingCount: 1
- keepingTimeout: 60.minutes
But there has a problem that all frequent invocation actions will exist at least 1
container in 60.seconds both, obviously, this is a waste of resources
if add above 2 fileds, the benefit is, these 2 values can be configured from db, because these 2 value is for namespace
, so one namespace's all actions complete for this keepingCount
value.
So my opinion is, in spite of infect the non-scheduler code, it is better to add above 2 fields to case class UserLimits
,
anyway, the non-scheduler codes can run well in spite of infect the codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bdoyle0182 @style95 ,due to above 2 fields(warmedContainerKeepingCount, warmedContainerKeepingTimeout) are added into case class UserLimits
, do you guys have any opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not opposed to infecting non-scheduler code where absolutely necessary. I think it's unreasonable to suggest we can make such a large architectural change without touching existing code at all. So long as we're avoiding a breaking change I see no issue with this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ningyougang @style95 also sorry haven't had time to dedicate to the scheduler recently, but will review the remaining existing pr's this week
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bdoyle0182 ,thanks for you review, just add 2 fields to case class UserLimits
, it is not a large architectural change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I just meant the entire new scheduler is a large architectural change
so it's unreasonable to say we can't ever touch other components while implementing. We should avoid where possible, but it will be necessary in a couple situations like this one.
919acfc
to
854b676
Compare
Have any comment? |
acknowledegment: AcknowledegmentMessage): Future[Any] = { | ||
implicit val transid: TransactionId = tid | ||
|
||
logging.debug(this, s"health action is successfully invoked") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This debug shouldn't happen if the health action wasn't succesfully invoked right? Should check response value before logging.
nit: health action was successfully invoked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated accordingly.
storeActivations: Option[Boolean] = None) | ||
storeActivations: Option[Boolean] = None, | ||
warmedContainerKeepingCount: Option[Int] = None, | ||
warmedContainerKeepingTimeout: Option[String] = None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I just meant the entire new scheduler is a large architectural change
so it's unreasonable to say we can't ever touch other components while implementing. We should avoid where possible, but it will be necessary in a couple situations like this one.
@@ -155,6 +155,7 @@ whisk { | |||
#aka 'How long should a container sit idle until we kill it?' | |||
idle-container = 10 minutes | |||
pause-grace = 50 milliseconds | |||
keeping-duration = 60 minutes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the same thing as the idle-container
config? Or is the warm container removed even if it has received an activation in the last 60 minutes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For different configuration.
-
idle-container, e.g.
If a FunctionPullingContainerProxy actor already executed some activations, but didn't execute activation for some time(e.g. 50.milliseconds), FunctionPullingContainerProxy would goto(Paused), then it will create a timer
case _ -> Paused => startSingleTimer(IdleTimeoutName, StateTimeout, idleTimeout)
we can see, it would send
StateTimeout
afteridleTimeout
, then, StateTimeout would be handled by below codes, it would judge this FunctionPullingContainerProxy whether inwarmedContainerKeepingCount
when(Paused) { ... case Event(StateTimeout, data: WarmData) => (for { count <- getLiveContainerCount(data.invocationNamespace, data.action.fullyQualifiedName(false), data.revision) (warmedContainerKeepingCount, warmedContainerKeepingTimeout) <- getWarmedContainerLimit( data.invocationNamespace) } yield { logging.info( this, s"Live container count: ${count}, warmed container keeping count configuration: ${warmedContainerKeepingCount} in namespace: ${data.invocationNamespace}") if (count <= warmedContainerKeepingCount) { Keep(warmedContainerKeepingTimeout) } else { Remove } }).pipeTo(self) stay ... }
-
keeping-duration
Then, if this FunctionPullingContainerProxy is not inwarmedContainerKeepingCount
, it would be removed.
otherwise, this FunctionPullingContainerProxy would keep more keeping-duration time.
val warmedContainerKeepingTimeout = Try { | ||
identity.limits.warmedContainerKeepingTimeout.map(Duration(_).toSeconds.seconds).get | ||
}.getOrElse(containerProxyTimeoutConfig.keepingDuration) | ||
(warmedContainerKeepingCount, warmedContainerKeepingTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this work for individual functions? If it's by namespace you can get into situations where one function starves all of the warm containers kept for that namespace right?
core/invoker/src/main/scala/org/apache/openwhisk/core/invoker/FPCInvokerReactive.scala
Outdated
Show resolved
Hide resolved
core/invoker/src/main/scala/org/apache/openwhisk/core/invoker/FPCInvokerReactive.scala
Outdated
Show resolved
Hide resolved
Few comments but mostly looks good to me |
854b676
to
f75b616
Compare
f75b616
to
a5acd88
Compare
LGTM |
Description
This component is for
initialize the invoker side's components when invoker starts up
, can useFPCInvokerReactive
instead of
InvokerReactive
using SPI mechanism here: https://github.com/apache/openwhisk/blob/master/common/scala/src/main/resources/reference.conf#L29Related issue and scope
My changes affect the following components
Types of changes
Checklist: