Skip to content

Conversation

@dragos
Copy link
Contributor

@dragos dragos commented Mar 11, 2015

This is largely based on extracting the dynamic allocation parts from @tnachen's #3861.

@dragos
Copy link
Contributor Author

dragos commented Mar 12, 2015

See #4990 for the second part of the original PR (external shuffle service)

@dragos dragos changed the title [MESOS][SPARK-6287] Add dynamic allocation to the coarse-grained Mesos scheduler [SPARK-6287][MESOS] Add dynamic allocation to the coarse-grained Mesos scheduler Mar 12, 2015
@tnachen
Copy link
Contributor

tnachen commented Mar 23, 2015

@pwendell @andrewor14

@sryza
Copy link
Contributor

sryza commented Mar 24, 2015

One thing to watch out for here is that, ExecutorAllocationManager calculates the number of cores per executor based on the YARN default. If this is different than the Mesos default, then something probably needs to be changed there.

@andrewor14
Copy link
Contributor

@dragos is this ready for review? Can you rebase to master first? I'm assuming this depends logically on your other patch #4990.

@dragos dragos force-pushed the issue/mesos-coarse-dynamicAllocation branch from 93b4fcc to 88f2b35 Compare April 16, 2015 11:52
@andrewor14
Copy link
Contributor

@dragos is this still targeted for 1.4?

@dragos
Copy link
Contributor Author

dragos commented Apr 29, 2015

Yes, I'd love to get this in in 1.4. I'll push a rebased version.

@andrewor14
Copy link
Contributor

Thanks, but it still has conflicts

@dragos
Copy link
Contributor Author

dragos commented Apr 29, 2015

Sorry, I didn't get to it.

@dragos dragos force-pushed the issue/mesos-coarse-dynamicAllocation branch from 88f2b35 to b2b51ad Compare May 4, 2015 16:16
@dragos
Copy link
Contributor Author

dragos commented May 4, 2015

Unfortunately conflicts were really non-trivial and it took me quite a bit of time to bring this up to date. I guess it's too late for 1.4, but I'd like to push this forward as soon as possible. I don't want to go through another round fixing conflicts. :) /cc @tnachen

@tnachen
Copy link
Contributor

tnachen commented May 4, 2015

@dragos is this rebased now? seems like 1.4 branch is cut at this point, let me ask @andrewor14

@dragos dragos force-pushed the issue/mesos-coarse-dynamicAllocation branch from b2b51ad to a68c3bf Compare May 5, 2015 09:44
@dragos
Copy link
Contributor Author

dragos commented May 6, 2015

It's rebased. It can (still) be merged without conflicts :)

@andrewor14
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented May 7, 2015

Test build #32052 has finished for PR 4984 at commit a68c3bf.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be after scala imports

@dragos
Copy link
Contributor Author

dragos commented May 10, 2015

Thanks @tnachen for your comments. I'll look into them ASAP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this make the limit negative now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I forgot to push my latest changes. Yes, it's max now, as you suggested

@dragos dragos force-pushed the issue/mesos-coarse-dynamicAllocation branch from 4053462 to 4e831e2 Compare May 11, 2015 16:31
@SparkQA
Copy link

SparkQA commented May 11, 2015

Test build #32397 has finished for PR 4984 at commit 4053462.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dragos
Copy link
Contributor Author

dragos commented May 11, 2015

Hm, the test fails because it can't find the mesos native lib in the classpath. Note that the call is triggered when trying to mock MesosSchedulerDriver, which loads the native library in its static initializer. Not sure if this changed recently, but if so.. I'm not sure how to fix it.

sbt.ForkMain$ForkError: no mesos in java.library.path
    at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1886)
    at java.lang.Runtime.loadLibrary0(Runtime.java:849)
    at java.lang.System.loadLibrary(System.java:1088)
    at org.apache.mesos.MesosNativeLibrary.load(MesosNativeLibrary.java:54)
    at org.apache.mesos.MesosNativeLibrary.load(MesosNativeLibrary.java:79)
    at org.apache.mesos.MesosSchedulerDriver.<clinit>(MesosSchedulerDriver.java:61)
    at sun.reflect.GeneratedSerializationConstructorAccessor1261.newInstance(Unknown Source)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at org.objenesis.instantiator.sun.SunReflectionFactoryInstantiator.newInstance(SunReflectionFactoryInstantiator.java:56)
    at org.objenesis.ObjenesisBase.newInstance(ObjenesisBase.java:73)
    at org.mockito.internal.creation.jmock.ClassImposterizer.createProxy(ClassImposterizer.java:111)
    at org.mockito.internal.creation.jmock.ClassImposterizer.imposterise(ClassImposterizer.java:51)
    at org.mockito.internal.util.MockUtil.createMock(MockUtil.java:52)
    at org.mockito.internal.MockitoCore.mock(MockitoCore.java:41)
    at org.mockito.Mockito.mock(Mockito.java:1014)
    at org.mockito.Mockito.mock(Mockito.java:909)
    at org.scalatest.mock.MockitoSugar$class.mock(MockitoSugar.scala:74)
    at org.apache.spark.scheduler.mesos.CoarseMesosSchedulerBackendSuite.mock(CoarseMesosSchedulerBackendSuite.scala:45)
    at org.apache.spark.scheduler.mesos.CoarseMesosSchedulerBackendSuite$$anonfun$2.apply$mcV$sp(CoarseMesosSchedulerBackendSuite.scala:92)
    at org.apache.spark.scheduler.mesos.CoarseMesosSchedulerBackendSuite$$anonfun$2.apply(CoarseMesosSchedulerBackendSuite.scala:91)
    at org.apache.spark.scheduler.mesos.CoarseMesosSchedulerBackendSuite$$anonfun$2.apply(CoarseMesosSchedulerBackendSuite.scala:91)

@tnachen
Copy link
Contributor

tnachen commented May 11, 2015

Interesting, we shouldn't actually try to load the library since it might not be present in their machine for everyone. I wonder how come no one reported this before though, I doubt everyone installed Mesos library already. Let me take a deeper look to see if we can do something about it

@tnachen
Copy link
Contributor

tnachen commented May 11, 2015

Oh nothing should have changed too.

@SparkQA
Copy link

SparkQA commented May 11, 2015

Test build #32400 has finished for PR 4984 at commit 4e831e2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

@dragos Thanks for all your work and patience! I took a pass over it and I think the high level approach is reasonable. Most of my comments are relatively minor, but I have two bigger comments which I will describe here:

  • How does dynamic allocation work with max cores? It won't ever scale up beyond max cores, which may not be what the user expects. There are two things we can do about this: (1) Nothing - maybe this isn't a problem after all; if the user sets max cores we should trust the setting and respect it as a hard cap, or (2) Make dynamic allocation and max cores mutually exclusive. What are your thoughts, @dragos @tnachen?
  • We should do the clean up through the cluster manager instead. Currently, we rely on the driver exiting cleanly for the shuffle files to be cleaned up, but it is very common that the driver just terminates without stopping the SparkContext. In YARN, for instance, we rely on the NodeManager to tell the shuffle services when an application exits. I would assume that we can do something similar in Mesos too.

By the way, if you prefer, I'm completely open to separating the clean up shuffle files logic to a separate patch. This might allow us to merge the dynamic allocation part, which I think is mostly ready, sooner. :)

@dragos
Copy link
Contributor Author

dragos commented Jul 1, 2015

@andrewor14 thanks for you thorough review. I'll address your comments tomorrow (my timezone).

@dragos
Copy link
Contributor Author

dragos commented Jul 2, 2015

I rebased on master and implemented the latest round of reviews. Please let me know what you think, @andrewor14 /cc @tnachen.

I'm not sure how much more time I can pour into this, so it'd be great if we can get to some sort of closure. I'm diving in SPARK-7398 and my time in the following weeks will be limited. I can probably spend a couple hours more on this PR, and I'm happy to jump on a call to iron out the latest issues (sometimes it's so much faster to look at the code together!).

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36411 has finished for PR 4984 at commit 1673c8b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class RemoveApplication extends BlockTransferMessage

@andrewor14
Copy link
Contributor

@dragos I think this is pretty close. The one outstanding issue is still who cleans up the shuffle files.

This patch proposes to have the driver send clean up requests to all shuffle services when the application exits. As I mentioned earlier, there are no guarantees here even if we install shutdown hooks.
We cannot expect the driver to manage its own exit; this is really the responsibility of the cluster manager. My main objection is that for other modes like YARN and standalone mode (coming soon) we'll have two separate code paths to clean up the shuffle files, which leads to additional complexity.

I understand that the external shuffle service is started independently of Mesos. I believe the difficulty you are referring to is the following: even though the Mesos master knows when the framework exits, we still have to pass this information on to the shuffle service. There are two potential ways to make this work:

  • (1) Have the shuffle service query the master for framework status periodically. When the framework has exited, we clean up the corresponding application's shuffle files. @tnachen is there enough support on the Mesos side to make this happen?
  • (2) Initiate a long-running connection between the shuffle service and each driver. When the connection is closed on the other end, the shuffle service knows the corresponding application has exited.

@andrewor14
Copy link
Contributor

Lastly, I suggested this before but I would actually recommend that we split this patch into two separate issues. The first, which is Mesos dynamic allocation (SPARK-6287), is ready for merge. The second, which is cleaning up shuffle files in Mesos when the shuffle service is used (no JIRA yet), is still being discussed. For the latter, since you have other obligations and we still can't agree on a solution yet, it's worthwhile to address it separately without blocking the first one.

@dragos How does that sound?

@dragos
Copy link
Contributor Author

dragos commented Jul 3, 2015

@andrewor14 the part about cleaning up files is indeed the part that deserves some discussion. I'm very open to alternative solutions, and you summarised the challenges very well. I agree that ideally, Mesos should perform the cleanup when the framework exists, but let's wait for @tnachen for opinions.

@SparkQA
Copy link

SparkQA commented Jul 3, 2015

Test build #36496 has finished for PR 4984 at commit 9d88756.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class RemoveApplication extends BlockTransferMessage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tnachen the discussion is collapsed now by Github, probably because I rebased. Here is the relevant discussion.

@dragos dragos force-pushed the issue/mesos-coarse-dynamicAllocation branch from 9d88756 to cf97666 Compare July 8, 2015 15:56
@dragos
Copy link
Contributor Author

dragos commented Jul 8, 2015

I made the changes we agreed. I still have to rebase, that will take some time. Conflicts are not trivial, I hope to have it tomorrow.

@dragos
Copy link
Contributor Author

dragos commented Jul 8, 2015

But you can still have a look and give it a go in the meantime.

@SparkQA
Copy link

SparkQA commented Jul 8, 2015

Test build #36802 has finished for PR 4984 at commit cf97666.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

In case others are following, @dragos @tnachen and I discussed offline about a way forward. To merge this feature sooner, we decided to split this patch into two separate issues: (1) mesos dynamic allocation, and (2) ensure shuffle files are cleaned up in mesos over time. This patch only tackles (1), and @tnachen will file a new JIRA for (2) and maybe fix it.

Once this is rebased and passes tests, I'll merge it into master.

@dragos dragos force-pushed the issue/mesos-coarse-dynamicAllocation branch from cf97666 to 4f30803 Compare July 9, 2015 08:13
@tnachen
Copy link
Contributor

tnachen commented Jul 9, 2015

SPARK-8873 created for cleaning up shuffle files

@dragos dragos force-pushed the issue/mesos-coarse-dynamicAllocation branch from 4f30803 to 39df8cd Compare July 9, 2015 09:27
@SparkQA
Copy link

SparkQA commented Jul 9, 2015

Test build #36914 has finished for PR 4984 at commit 4f30803.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@dragos
Copy link
Contributor Author

dragos commented Jul 9, 2015

I've tested this on a small Mesos cluster, things work as before.

I uploaded a pre-packaged binary here to make it easier to test.

@SparkQA
Copy link

SparkQA commented Jul 9, 2015

Test build #36921 has finished for PR 4984 at commit 39df8cd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

As discussed, this implementation currently doesn't clean up shuffle files, but this will be addressed separately by @tnachen in SPARK-8873. The dynamic allocation side LGTM so I'm merging this into master. Congratulations @dragos!

@asfgit asfgit closed this in c483059 Jul 9, 2015
@dragos
Copy link
Contributor Author

dragos commented Jul 10, 2015

✌️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants