Skip to content

Conversation

@EnricoMi
Copy link
Contributor

What changes were proposed in this pull request?

Adds code to spin up an example Uniffle/Spark docker cluster using docker compose. This is used by the CI to test the example cluster setup.

Why are the changes needed?

This setup has a smaller footprint than the existing kubernetes example in deploy/kubernetes/integration-test/e2e/README.md, which is not trivial to setup. The new example only requires Docker to be installed, and can be spun up via

./deploy/docker-compose/build.sh
docker compose -f deploy/docker-compose/docker-compose.yml up

The Uniffle and Spark cluster can be used to interactively test Spark with Uniffle.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manually and CI tested.

@codecov-commenter
Copy link

codecov-commenter commented Feb 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 54.19%. Comparing base (866f5ff) to head (573d101).
Report is 16 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #1532      +/-   ##
============================================
- Coverage     54.20%   54.19%   -0.01%     
- Complexity     2808     2835      +27     
============================================
  Files           430      436       +6     
  Lines         24410    24524     +114     
  Branches       2082     2075       -7     
============================================
+ Hits          13231    13292      +61     
- Misses        10349    10404      +55     
+ Partials        830      828       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions
Copy link

github-actions bot commented Feb 20, 2024

Test Results

2 452 files  ±0  2 452 suites  ±0   4h 44m 29s ⏱️ -7s
  823 tests ±0    822 ✅ ±0   1 💤 ±0  0 ❌ ±0 
9 743 runs  ±0  9 729 ✅ ±0  14 💤 ±0  0 ❌ ±0 

Results for commit 573d101. ± Comparison against base commit 9142516.

♻️ This comment has been updated with latest results.

Copy link
Member

@zuston zuston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for @EnricoMi proposing this, docker-compose deploy is easier than K8s.

But I'm think can we extend this scope from only CI test to provide one-stop uniffle cluster in a single machine? That means user can run external spark app(maybe on yarn) on this uniffle cluster. maybe some steps need to implement

  1. allow network open to everyone for uniffle on docker
  2. add example case to describe this

shell: bash
- name: Prepare example Spark app
run: |
cat << EOL > example.scala
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code here looks messy, how about putting into the spark cluster dockerfile directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way the CI is self-contained, it does not depend on a file somewhere outside .github, and you can directly see and edit the code that it executes. This code is different to what is given in the README.md because it is a bit more complex.

✔ Network rss_default Removed 0.4s
```

docker exec -it rss-spark-master-1 /opt/spark/bin/spark-shell \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot adding a new title? This paragraph looks strange.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@EnricoMi EnricoMi force-pushed the docker-compose-spark branch from b9204d5 to ef7a916 Compare February 21, 2024 06:54
@EnricoMi EnricoMi force-pushed the docker-compose-spark branch from 0b71763 to d63699d Compare February 21, 2024 08:34
@EnricoMi EnricoMi force-pushed the docker-compose-spark branch from 1c32be8 to 7147a06 Compare February 21, 2024 13:09
@EnricoMi EnricoMi force-pushed the docker-compose-spark branch from a20d26a to fd35ec2 Compare February 21, 2024 19:21
@EnricoMi EnricoMi force-pushed the docker-compose-spark branch from e54eea6 to 1bf9aa0 Compare February 21, 2024 19:41
@zuston
Copy link
Member

zuston commented Feb 22, 2024

CI is blocking @EnricoMi

@EnricoMi EnricoMi force-pushed the docker-compose-spark branch from f2b9075 to f0a6539 Compare February 22, 2024 10:54
@EnricoMi
Copy link
Contributor Author

@zuston CI fixed, not pretty but stable.

run: |
docker compose -f deploy/docker/docker-compose.yml up --detach --scale shuffle-server=4 --scale spark-worker=5

# for some reason, the uniffle containers terminate after first creation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why this will happen?

Copy link
Contributor Author

@EnricoMi EnricoMi Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have found the issue. The start.sh script gave the process 10 seconds to start. Starting 5 containers at once on the hardware spec used by GHA turns out to take more than those 10 seconds, in fact it takes over 20 seconds whereas on my local machine it takes less than 6.

I have reworked the start.sh script and the way the CI waits for the containers to come up.

@EnricoMi EnricoMi force-pushed the docker-compose-spark branch 2 times, most recently from cd07b78 to 911610a Compare February 23, 2024 15:37
@EnricoMi EnricoMi force-pushed the docker-compose-spark branch from 911610a to 573d101 Compare February 23, 2024 16:04
Copy link
Member

@zuston zuston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @EnricoMi

@zuston zuston merged commit a73dcb9 into apache:master Feb 27, 2024
@jerqi
Copy link
Contributor

jerqi commented Feb 27, 2024

@zuston Rust ci is so flaky. Could you fix it?

@zuston
Copy link
Member

zuston commented Feb 27, 2024

@zuston Rust ci is so flaky. Could you fix it?

Yes. I will

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants