Skip to content

Conversation

@philipphoffmann
Copy link
Contributor

What changes were proposed in this pull request?

Mesos agents by default will not pull docker images which are cached
locally already. In order to run Spark executors from mutable tags like
:latest this commit introduces a Spark setting
spark.mesos.executor.docker.forcePullImage. Setting this flag to
true will tell the Mesos agent to force pull the docker image (default is false which is consistent with the previous
implementation and Mesos' default
behaviour).

How was this patch tested?

I ran a sample application including this change on a Mesos cluster and verified the correct behaviour for both, with and without, force pulling the executor image. As expected the image is being force pulled if the flag is set.

@philipphoffmann
Copy link
Contributor Author

Please note that I had to upgrade the Mesos library version to implement this feature! Unfortunately its only available in Mesos 0.22 and above.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docker image support, generally, will still work in 0.20.1. It's just the forcePullImage feature that will require 0.22.2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, rewrote that part.

@philipphoffmann
Copy link
Contributor Author

I rebased again, against master. Anything else I can do here to get this merged?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't really spell out the parameter name in other places in this call. I recommend just putting the value here.

@tnachen
Copy link
Contributor

tnachen commented Jun 2, 2016

Other than the style nit, I think it LGTM. Once you fixed it we need to ask a Spark committer to review it.

@tnachen
Copy link
Contributor

tnachen commented Jun 2, 2016

ok to test

@SparkQA
Copy link

SparkQA commented Jun 2, 2016

Test build #59858 has finished for PR 13051 at commit 1b295e8.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@philipphoffmann
Copy link
Contributor Author

@tnachen I addressed your comment and also fixed the scala style error from the build.

@SparkQA
Copy link

SparkQA commented Jun 2, 2016

Test build #59864 has finished for PR 13051 at commit 80d1e3e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tnachen
Copy link
Contributor

tnachen commented Jun 2, 2016

@andrewor14 PTAL, this PR LGTM

@philipphoffmann
Copy link
Contributor Author

rebase to master

@SparkQA
Copy link

SparkQA commented Jul 4, 2016

Test build #61719 has finished for PR 13051 at commit 7768589.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgummelt
Copy link

mgummelt commented Jul 5, 2016

Can you add this to the CoarseMesosSchedulerBackend as well? This solution adds the force pull feature to drivers, but not executors for spark jobs.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the redundancy in having both methods. Can we simplify here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@SparkQA
Copy link

SparkQA commented Jul 14, 2016

Test build #62317 has finished for PR 13051 at commit c2d4b7f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a gratuitous conversion. I'd rather retain the semantics of SparkConf, and just pass that in, unless there's a really good reason not to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I provided a better approach. Note that this method is also called from MesosClusterScheduler which maintains the driver settings as a raw Map[String, String] so passing the SparkConf here is not an option imho.

Mesos agents by default will not pull docker images which are cached
locally already. In order to run Spark executors from mutable tags like
`:latest` this commit introduces a Spark setting
(`spark.mesos.executor.docker.forcePullImage`). Setting this flag to
true will tell the Mesos agent to force pull the docker image (default is `false` which is consistent with the previous
implementation and Mesos' default
behaviour).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working on this, I figured that the network parameter is used nowhere. Just wanted to point this out, not sure if there is WIP for adding this feature.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I'm not sure what happened here. Thanks for pointing it out.

@SparkQA
Copy link

SparkQA commented Jul 18, 2016

Test build #62465 has finished for PR 13051 at commit 36b3258.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgummelt
Copy link

@andrewor14 This LGTM. Can you take a look and merge?

@tnachen
Copy link
Contributor

tnachen commented Jul 25, 2016

@srowen Or if you could help :)

@srowen
Copy link
Member

srowen commented Jul 25, 2016

Merged to master

@JoshRosen
Copy link
Contributor

This patch just broke the build:

[error] /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.2/core/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackendSuite.scala:268: not enough arguments for method verifyTaskLaunched: (driver: org.apache.mesos.SchedulerDriver, offerId: String)List[org.apache.mesos.Protos.TaskInfo].
[error] Unspecified value parameter offerId.
[error]     val launchedTasks = verifyTaskLaunched("o1").asScala
[error]                                           ^
[error] /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.2/core/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackendSuite.scala:306: not enough arguments for method verifyTaskLaunched: (driver: org.apache.mesos.SchedulerDriver, offerId: String)List[org.apache.mesos.Protos.TaskInfo].
[error] Unspecified value parameter offerId.
[error]     val launchedTasks = verifyTaskLaunched("o1").asScala
[error]                                           ^
[warn] one warning found
[error] two errors found
[error] Compile failed at Jul 25, 2016 12:34:51 PM [37.534s]

Going to revert now.

@srowen
Copy link
Member

srowen commented Jul 25, 2016

Shoot, yeah PR builder tests passed, but that was 7 days ago. Sorry @philipphoffmann can you try this one again after figuring out whatever caused this?

@philipphoffmann
Copy link
Contributor Author

philipphoffmann commented Jul 25, 2016

@srowen fixed the above issue and sent another request here: #14348

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants