Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support rolling spark.kubernetes.file.upload.path #6876

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

pan3793
Copy link
Member

@pan3793 pan3793 commented Dec 30, 2024

Why are the changes needed?

The vanilla Spark neither support rolling nor expiration mechanism for spark.kubernetes.file.upload.path, if you use file system that does not support TTL, e.g. HDFS, additional cleanup mechanisms are needed to prevent the files in this directory from growing indefinitely.

This PR proposes to let spark.kubernetes.file.upload.path support placeholders {{YEAR}}, {{MONTH}} and {{DAY}} and introduce a switch kyuubi.kubernetes.spark.autoCreateFileUploadPath.enabled to let Kyuubi server create the directory with 777 permission automatically before submitting Spark application.

For example, the user can configure the below configurations in kyuubi-defaults.conf to enable monthly rolling support for spark.kubernetes.file.upload.path

kyuubi.kubernetes.spark.autoCreateFileUploadPath.enabled=true
spark.kubernetes.file.upload.path=hdfs://hadoop-cluster/spark-upload-{{YEAR}}{{MONTH}}

Note that: spark would create sub dir s"spark-upload-${UUID.randomUUID()}" under the spark.kubernetes.file.upload.path for each uploading, the administer still needs to clean up the staging directory periodically.

For example:

hdfs://hadoop-cluster/spark-upload-202412/spark-upload-f2b71340-dc1d-4940-89e2-c5fc31614eb4
hdfs://hadoop-cluster/spark-upload-202412/spark-upload-173a8653-4d3e-48c0-b8ab-b7f92ae582d6
hdfs://hadoop-cluster/spark-upload-202501/spark-upload-3b22710f-a4a0-40bb-a3a8-16e481038a63

Administer can safely delete the hdfs://hadoop-cluster/spark-upload-202412 after 20250101

How was this patch tested?

New UTs are added.

Was this patch authored or co-authored using generative AI tooling?

No.

@codecov-commenter
Copy link

codecov-commenter commented Dec 30, 2024

Codecov Report

Attention: Patch coverage is 0% with 40 lines in your changes missing coverage. Please review.

Project coverage is 0.00%. Comparing base (e8cbff3) to head (6614bf2).
Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
...ache/kyuubi/engine/spark/SparkProcessBuilder.scala 0.00% 33 Missing ⚠️
...in/scala/org/apache/kyuubi/config/KyuubiConf.scala 0.00% 5 Missing ⚠️
...kyuubi/engine/spark/SparkBatchProcessBuilder.scala 0.00% 2 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff           @@
##           master   #6876   +/-   ##
======================================
  Coverage    0.00%   0.00%           
======================================
  Files         688     688           
  Lines       42545   42589   +44     
  Branches     5800    5805    +5     
======================================
- Misses      42545   42589   +44     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@turboFei
Copy link
Member

will check it today.

@turboFei
Copy link
Member

turboFei commented Dec 31, 2024

Note that:

kyuubi.kubernetes.spark.autoCreateFileUploadPath.enabled=true
spark.kubernetes.file.upload.path=hdfs://hadoop-cluster/spark-upload-{{YEAR}}{{MONTH}}

Spark would create sub dir s"spark-upload-${UUID.randomUUID()}" under the spark.kubernetes.file.upload.path for each uploading, the administer still need to cleanup the staging directory periodically.

For example:

hdfs://hadoop-cluster/spark-upload-202412/spark-upload-f2b71340-dc1d-4940-89e2-c5fc31614eb4

Comment on lines +51 to +56
The vanilla Spark neither support rolling nor expiration mechanism for `spark.kubernetes.file.upload.path`, if you use
file system that does not support TTL, e.g. HDFS, additional cleanup mechanisms are needed to prevent the files in this
directory from growing indefinitely. Since Kyuubi v1.11.0, you can configure `spark.kubernetes.file.upload.path` with
placeholders `{{YEAR}}`, `{{MONTH}}` and `{{DAY}}`, and enable `kyuubi.kubernetes.spark.autoCreateFileUploadPath.enabled`
to let Kyuubi server create the directory with 777 permission automatically before submitting Spark application.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that our current implementation does not solve the problem of file growth. Will this issue be solved in subsequent PRs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It adds the rolling support for spark.kubernetes.file.upload.path, for example,

spark.kubernetes.file.upload.path=hdfs://hadoop-testing/spark-upload-{{YEAR}}{{MONTH}}
hdfs://hadoop-testing/spark-upload-202412
hdfs://hadoop-testing/spark-upload-202501

Admin can safely delete the hdfs://hadoop-testing/spark-upload-202412 after 20250101

@pan3793
Copy link
Member Author

pan3793 commented Dec 31, 2024

Spark would create sub dir s"spark-upload-${UUID.randomUUID()}" under the spark.kubernetes.file.upload.path for each uploading, the administer still need to cleanup the staging directory periodically.

Yes, exactly. Previously, it was unsafe to delete the whole spark.kubernetes.file.upload.path directly because the submitting Spark apps might use it.

Copy link
Member

@turboFei turboFei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

Copy link
Contributor

@zwangsheng zwangsheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants