-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Scheduler] Initial commit for the scheduler component #4983
Conversation
81353c8
to
0be5e1a
Compare
|
||
private val etcdWorkerFactory = "" // TODO: TBD | ||
|
||
val dataManagementService = "" // TODO: TBD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This component is in charge of storing data to ETCD.
|
||
val creationJobManagerFactory = "" // TODO: TBD | ||
|
||
val containerManager = "" // TODO: TBD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This component is responsible for creating containers for a given action.
It relies on the creationJobManager
to manage the container creation job.
|
||
val containerManager = "" // TODO: TBD | ||
|
||
val memoryQueueFactory = "" // TODO: TBD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a factory to create memory queues.
In the new architecture, each action is given its own dedicated queue.
|
||
implicit val trasnid = TransactionId.containerCreation | ||
|
||
val queueManager = "" // TODO: TBD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of the major components which take charge of managing queues and coordinating requests among the scheduler, controllers, and invokers.
Map( | ||
servicePort -> 8080.toString, | ||
schedulerHost -> null, | ||
schedulerAkkaPort -> null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scheduler has two ports, one for akka-remote and the other for akka-grpc.
|
||
Seq( | ||
("scheduler" + instanceId.asString, "actions", Some(ActivationEntityLimit.MAX_ACTIVATION_LIMIT)), | ||
("creationAck" + instanceId.asString, "creationAck", Some(ActivationEntityLimit.MAX_ACTIVATION_LIMIT))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The final goal is to remove Kafka from the critical path, but it still relies on Kafka as of now.
Now activation messages are sent to the scheduler via schedulerN
topic and container creation messages are sent to invoker via invokerN
topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
controller is posting to schedulerN
and scheduler is posting to invokerN
correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
core/scheduler/src/main/scala/org/apache/openwhisk/core/scheduler/Scheduler.scala
Outdated
Show resolved
Hide resolved
core/scheduler/src/main/scala/org/apache/openwhisk/core/scheduler/Scheduler.scala
Outdated
Show resolved
Hide resolved
core/scheduler/src/main/scala/org/apache/openwhisk/core/scheduler/SchedulerServer.scala
Show resolved
Hide resolved
core/scheduler/src/main/scala/org/apache/openwhisk/core/scheduler/Scheduler.scala
Outdated
Show resolved
Hide resolved
core/scheduler/src/main/scala/org/apache/openwhisk/core/scheduler/Scheduler.scala
Show resolved
Hide resolved
From what I understand so far, LGTM for an initial commit. Can always come back and add comments if I learn something later |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you @bdoyle0182 for the review as well.
@style95 a lot of the comments you're adding in the PR would make good comments in the code.
@@ -426,3 +426,16 @@ object EventMessage extends DefaultJsonProtocol { | |||
|
|||
def parse(msg: String) = Try(format.read(msg.parseJson)) | |||
} | |||
|
|||
object StatusQuery | |||
case class StatusData(invocationNamespace: String, fqn: String, waitingActivation: Int, status: String, data: String) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is waiting activation?
what is status / why is it a string?
what is data?
are these basic (string & int) types for performance reasons?
Some comments or examples would be helpful here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to query the queue status from the scheduler.
The following is an example.
When a MemoryQueue(AkkaFSM)
is running, there can be a different combination of status and data.
This would be useful to figure out the status when any issue happens in a queue or scheduler.
[
...
{
"data": "RunningData",
"fqn": "whisk.system/elasticsearch/status-alarm@0.0.2",
"invocationNamespace": "style95",
"status": "Running",
"waitingActivation": 1
}
...
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The waitingActivation
is the number of waiting activations in a queue.
Generally, this value would be a small number or zero as most activation messages would be properly handled in time.
But if there is any issue; containers are not provisioned in time, any disconnection with other components happens, etc, there can be many activations waiting for processing.
Regarding the basic types, we just used the string as it is simple.
If we need to change it to certain types with proper serde, how about adding them when we add a feature to use this data class?
core/scheduler/src/main/scala/org/apache/openwhisk/core/scheduler/SchedulerServer.scala
Show resolved
Hide resolved
core/scheduler/src/main/scala/org/apache/openwhisk/core/scheduler/SchedulerServer.scala
Show resolved
Hide resolved
It seems there are some issues in the travis job. |
Codecov Report
@@ Coverage Diff @@
## master #4983 +/- ##
==========================================
- Coverage 83.70% 76.85% -6.86%
==========================================
Files 202 202
Lines 9808 9823 +15
Branches 413 420 +7
==========================================
- Hits 8210 7549 -661
- Misses 1598 2274 +676
Continue to review full report at Codecov.
|
21c5da6
to
4bc1f21
Compare
Let's postpone merging this PR until we release the OpenWhisk core 1.0 |
I merged this as the core 1.0.0 is released. |
Description
This is the first change to add the new scheduler component.
A series of subsequent PRs would be opened by me and my team members.
We would start from modules with fewer dependencies to modules that are highly dependent on the others.
We will add the "scheduler" label and
[New scheduler]
prefix in the title of PRs.Please refer to the issue for more information and discussion history.
You may find this useful as well: new scheduler design
JFYI, this scheduler is running in production in Naver.
There are many "TODO"s in this PR.
As we agreed to incrementally merge PRs along with temporal commits, I left many parts of the original codes as "TODO".
I wanted to give some hints to reviewers about how it would be working.
Related issue and scope
My changes affect the following components
Types of changes
Checklist: