Skip to content
This repository has been archived by the owner on Jan 9, 2020. It is now read-only.

[SPARK-18278][NOSUBMIT] Ongoing diff for Spark on Kubernetes (branch-2.2) #450

Open
wants to merge 231 commits into
base: branch-2.2
Choose a base branch
from

Conversation

ash211
Copy link

@ash211 ash211 commented Aug 22, 2017

This pull request can serve as a live diff as to what our work has amounted to so far.

Upstream ticket: https://issues.apache.org/jira/browse/SPARK-18278

mccheah and others added 30 commits July 24, 2017 10:03
- Don't hold the raw secret bytes
- Add CPU limits and requests
The build process fails ScalaStyle checks otherwise.
* Use tar and gzip to archive shipped jars.

* Address comments

* Move files to resolve merge
* Use alpine and java 8 for docker images.

* Remove installation of vim and redundant comment
* Error messages when the driver container fails to start.

* Fix messages a bit

* Use timeout constant

* Delete the pod if it fails for any reason (not just timeout)

* Actually set submit succeeded

* Fix typo
* Documentation for the current state of the world.

* Adding navigation links from other pages

* Address comments, add TODO for things that should be fixed

* Address comments, mostly making images section clearer

* Virtual runtime -> container runtime
#20)

* Development workflow documentation for the current state of the world.

* Address comments.

* Clarified code change and added ticket link
* Added service name as prefix to executor pods to be able to tell them apart from kubectl output

* Addressed comments
* Add kubernetes profile to travis yml file

* Fix long lines in CompressionUtils.scala
* Improved the example commands in running-on-k8s document.

* Fixed more example commands.

* Fixed typo.
* Support custom labels on the driver pod.

* Add integration test and fix logic.

* Fix tests

* Fix minor formatting mistake

* Reduce unnecessary diff
* A number of small tweaks to the MVP.

- Master protocol defaults to https if not specified
- Removed upload driver extra classpath functionality
- Added ability to specify main app resource with container:// URI
- Updated docs to reflect all of the above
- Add examples to Docker images, mostly for integration testing but
could be useful for easily getting started without shipping anything

* Add example to documentation.
* Support setting the driver pod launching timeout.

And increase the default value from 30s to 60s. The current value of
30s is kind of short for pulling the image from public docker registry
plus the container/JVM start time.

* Use a better name for the default timeout.
* Use "extraTestArgLine" to pass extra options to scalatest.

Because the "argLine" option of scalatest is set in pom.xml and we can't
overwrite it from the command line.

Ref #37

* Added a default value for extraTestArgLine

* Use a better name.

* Added a tip for this in the dev docs.
varunkatta and others added 30 commits September 14, 2017 21:46
…with a concurrent map. (#392)

* Replaced explicit synchronized access to hashmap with a concurrent map

* Removed usages of scala.collection.concurrent.Map
#447)

* Fail submission if submitter-local files are provided without resource staging server URI

* Modified logic to validate only submitted jars; added orchestrator tests

* Incorporated feedback

* Fix failing test case
* Rename package to k8s

* Rename string constants
* Update POMs

* Update extensions/v1beta1.Deployment to apps

* Modified defaults on rss and ss
* Unit test for executorpodfactory

* Fix test

* Indentation fix

* Fix isEmpty and split between lines

* Address issues with multi-line code fragments

* Replace == with ===

* mock shuffleManager

* .kubernetes. => .k8s.

* move to k8s subdir

* fix package clause to k8s

* mock nodeAffinityExecutorPodModifier

* remove commented code

* move when clause to before{} block

* mock initContainerBootstrap, smallFiles

* insert actual logic into smallFiles mock

* verify application of nodeAffinityExecutorPodModifier

* avoid cumulative invocation

* Fixed env-var check to include values, removed mock for small files
…ic allocation mode (rebased) (#522)

* Use emptyDir volume mounts for executor local directories.

* Mount local dirs in the driver. Remove shuffle dir configuration.

* Arrange imports

* Fix style and integration tests.

* Add TODO note for volume types to change.

* Add unit test and extra documentation.

* Fix existing unit tests and add tests for empty dir volumes

* Remove extraneous constant
* initial R support without integration tests

* finished sparkR integration

* case sensitive file names in unix

* revert back to previous lower case in dockerfile

* addition into the build-push-docker-images
…#528)

* Use the new initContainers field in Kubernetes 1.8

* Fixed the integration tests
* Use the driver pod IP address for spark.driver.bindAddress

* Addressed comments

* Addressed more comments

* Fixed broken DriverServiceBootstrapStepSuite
The quotes around $SPARK_CLASSPATH in the Dockerfiles prevents
the shell from expanding wildcard paths in cases where the
classpath is a single value like /opt/spark/jars/*
* Spark Submit Unit tests

* Improvements

* Add missing options

* Added check for jar
…ntainer (#564)

* Allow setting user-specified environments in the init-container

* Use driver/executor env keys for the init-container

* Mount user-specified driver/executor secrets

* Addressed comments
* first stage of PR #514 of just logic

* fixing build and unit test issues

* fixed integration tests

* fixed issue with executorPodFactory unit tests

* first series of PR comments

* handle most PR comments

* third round of PR comments

* initial round of comments and initial unit tests for deploy

* handled most of the comments and added test cases for pods

* resolve conflicts

* merge conflicts

* adding thread sleeping for RSS issues as a test

* resolving comments and unit testing

* regarding comments on PR
…597)

* Avoids adding duplicated secret volumes when init-container is used

Cherry-picked from apache#20148.

* Added the missing commit from upstream
* Create ISSUE_TEMPLATE.md

* add dev mailing list and jira links
* Add message to redirect PRs upstream if possible

We want to re-direct community dev upstream as much as possible. However, some contributions impact components (e.g. shuffle server) that do not yet exist upstream. To handle this, we decided to add this message and leave it up to developers, but encourage them to submit upstream unless it isn't feasible.

* Add dev mailling list and jira links
The names are currently used when HadoopKerberosKeytabResolverStep
tries to safe the kerberos delegation token into a kubernete secret.

However, the current camel case values will cause a
io.fabric8.kubernetes.client.KubernetesClientException
stating the following:

a DNS-1123 subdomain must consist of lower case alphanumeric characters,
'-' or '.', and must start and end with an alphanumeric character (e.g.
'example.com', regex used for validation is
'[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.