Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduler Integration with Mantis Master #183

Merged

Conversation

sundargates
Copy link
Collaborator

Context

This integrates the new federated scheduler factory that can schedule jobs on both mesos and the new task executors with the mantis master.

Checklist

  • ./gradlew build compiles code correctly
  • Added new tests where applicable
  • ./gradlew test passes all tests
  • Extended README or added javadocs where applicable
  • Added copyright headers for new files from CONTRIBUTING.md

@@ -537,7 +538,15 @@ public void onWorkerEvent(WorkerEvent workerEvent) {
if(!JobHelper.isTerminalWorkerEvent(workerEvent)) {
logger.warn("Event from Worker {} for a cluster {} that no longer exists. Terminate worker", workerEvent, workerEvent.getWorkerId().getJobCluster());
Optional<String> host = JobHelper.getWorkerHostFromWorkerEvent(workerEvent);
mantisScheduler.unscheduleAndTerminateWorker(workerEvent.getWorkerId(), host);
Optional<JobDefinition> archivedJobDefinition =
jobClusterInfoManager.getArchivedJobDefinition(workerEvent.getWorkerId().getJobId());
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Andyz26 can you check if this is valid?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@github-actions
Copy link

github-actions bot commented Apr 19, 2022

Unit Test Results

111 files  +2  111 suites  +2   5m 34s ⏱️ +8s
481 tests +4  462 ✔️ +4  19 💤 ±0  0 ±0 

Results for commit 3f7e7e0. ± Comparison against base commit 85d9687.

♻️ This comment has been updated with latest results.

@@ -143,6 +155,20 @@ public MasterMain(ConfigurationFactory configFactory, AuditEventSubscriber audit
final MantisJobStore mantisJobStore = new MantisJobStore(storageProvider);
final ActorRef jobClusterManagerActor = system.actorOf(JobClustersManagerActor.props(mantisJobStore, lifecycleEventPublisher), "JobClustersManager");

// Beginning of new stuff
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: improve the comment?

Optional<IMantisJobMetadata> completedJobOptional =
jobManager.getJobDataForCompletedJob(r.getWorkerId().getJobId());
if (completedJobOptional.isPresent()) {
JobDefinition jobDefinition =
Copy link
Collaborator

@Andyz26 Andyz26 Apr 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add log here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also shall we consider saving this state here into 'jobManager' in case some in-transit node keeps triggering this to create a new scheduler and attempt to unscheduled?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I wanted to take a stab at this. But, unfortunately, that class is super dense - so I thought I'll clean it up later.

import java.util.Properties;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicBoolean;
import java.util.function.Consumer;
import org.apache.flink.configuration.Configuration;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to be a short-term plan? I feel like this could cause some pain in the future upgrade if things changed on the flink side (I don't feel like this is something they will consider/maintain back-compat carefully though).

@sundargates sundargates merged commit 60d4194 into Netflix:master Apr 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants