Skip to content

Conversation

@mccheah
Copy link

@mccheah mccheah commented Nov 11, 2016

Includes the following initial feature set:

  • Cluster mode with only Scala/Java jobs
  • Spark-submit support
  • Dynamic allocation

Does not include, most notably:

  • Client mode support
  • Proper testing on both the unit and integration level; integration tests are flaky

Includes the following initial feature set:
- Cluster mode with only Scala/Java jobs
- Spark-submit support
- Dynamic allocation

Does not include, most notably:
- Client mode support
- Proper testing on both the unit and integration level; integration
tests are flaky

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: invert if/else

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name threads?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can there be more than 1 status?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think so.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we have to look up the container status for the container that is hosting the Driver container... will need to think about this. Alternatively we can just wait for all container statuses to be ready.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably check if it's a file (which implicitly checks existence) to ensure it's not a directory

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name threads?

Executors.newCachedThreadPool(
new ThreadFactoryBuilder()
.setDaemon(true)
.setNameFormat("kubernetes-executor-requests")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want name format of "kubernetes-executor-requests-%d"

WORKDIR /opt/spark

# TODO support spark.executor.extraClassPath
CMD ${JAVA_HOME}/bin/java -Dspark.executor.port=$SPARK_EXECUTOR_PORT -Xmx$SPARK_EXECUTOR_MEMORY -cp ${SPARK_HOME}/jars/\* org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url $SPARK_DRIVER_URL --executor-id $(dbus-uuidgen) --cores $SPARK_EXECUTOR_CORES --app-id $SPARK_APPLICATION_ID --hostname $HOSTNAME

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should set min heap as well: -Xms$SPARK_EXECUTOR_MEMORY to avoid a bunch of Full GC due to ergonomics as heap size dynamically adjusts


WORKDIR /opt/spark

CMD ${JAVA_HOME}/bin/java -Dspark.shuffle.service.port=$SPARK_SHUFFLE_SERVICE_PORT -Xmx1g -cp ${SPARK_HOME}/jars/\* org.apache.spark.deploy.ExternalShuffleService

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Configurable heap? Also set min heap = max?

Copy link

@ash211 ash211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Geez Maven is verbose with xml


case KUBERNETES_EXPOSE_DRIVER_PORT =>
value.split("=", 2).toSeq match {
case Seq(k, v) => exposeDriverPorts(k) = v.toInt
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will throw if v isn't toIntable?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Probably should have a try-catch for a more favorable message than what I presume the default will be though.

protected final String KUBERNETES_APP_NAMESPACE = "--kubernetes-app-namespace";
protected final String KUBERNETES_CLIENT_CERT_FILE = "--kubernetes-client-cert-file";
protected final String KUBERNETES_CLIENT_KEY_FILE = "--kubernetes-client-key-file";
protected final String KUBERNETES_CA_CERT_FILE = "--kubernetes-ca-cert-file";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alphabetize within this k8s group

<artifactId>netty</artifactId>
<version>3.8.0.Final</version>
</dependency>

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space

* the locally-generated ID from the superclass.
* @return The application ID
*
* @return The application ID
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: spacing changed unnecessarily

- Run spark shuffle service independently of a Spark job
- Run executor pods instead of replication controllers
@foxish
Copy link

foxish commented Nov 21, 2016

@ash211 @mccheah Shall we move these changes to a branch in https://github.com/foxish/spark/ and continue the discussions there on the various points we discussed last week?

@ash211
Copy link

ash211 commented Nov 21, 2016

@foxish yep! Just got legal signoff this morning so we'll be sending it over shortly

@foxish
Copy link

foxish commented Nov 21, 2016

Thanks!

HostPath volume will be used both on the shuffle service and the
executors that connect to it. The shuffle service picks up the files
written by the executors via the shared host volume.
@mccheah
Copy link
Author

mccheah commented Nov 22, 2016

@foxish can I get permission to push to your fork?

@ash211
Copy link

ash211 commented Nov 27, 2016

Code transferred to foxish#7 with discussion now happening there instead

@ash211 ash211 closed this Nov 27, 2016
@robert3005 robert3005 deleted the kubernetes branch December 7, 2016 16:04
ash211 added a commit that referenced this pull request Feb 16, 2017
* Create README to better describe project purpose

* Add links to usage guide and dev docs

* Minor changes
mccheah pushed a commit that referenced this pull request Apr 27, 2017
* Create README to better describe project purpose

* Add links to usage guide and dev docs

* Minor changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants