-
Notifications
You must be signed in to change notification settings - Fork 921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support rolling spark.kubernetes.file.upload.path
#6876
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6876 +/- ##
======================================
Coverage 0.00% 0.00%
======================================
Files 688 688
Lines 42545 42589 +44
Branches 5800 5805 +5
======================================
- Misses 42545 42589 +44 ☔ View full report in Codecov by Sentry. |
will check it today. |
Note that:
Spark would create sub dir For example:
|
kyuubi-server/src/main/scala/org/apache/kyuubi/engine/spark/SparkProcessBuilder.scala
Show resolved
Hide resolved
The vanilla Spark neither support rolling nor expiration mechanism for `spark.kubernetes.file.upload.path`, if you use | ||
file system that does not support TTL, e.g. HDFS, additional cleanup mechanisms are needed to prevent the files in this | ||
directory from growing indefinitely. Since Kyuubi v1.11.0, you can configure `spark.kubernetes.file.upload.path` with | ||
placeholders `{{YEAR}}`, `{{MONTH}}` and `{{DAY}}`, and enable `kyuubi.kubernetes.spark.autoCreateFileUploadPath.enabled` | ||
to let Kyuubi server create the directory with 777 permission automatically before submitting Spark application. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that our current implementation does not solve the problem of file growth. Will this issue be solved in subsequent PRs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It adds the rolling support for spark.kubernetes.file.upload.path
, for example,
spark.kubernetes.file.upload.path=hdfs://hadoop-testing/spark-upload-{{YEAR}}{{MONTH}}
hdfs://hadoop-testing/spark-upload-202412
hdfs://hadoop-testing/spark-upload-202501
Admin can safely delete the hdfs://hadoop-testing/spark-upload-202412
after 20250101
kyuubi-server/src/main/scala/org/apache/kyuubi/engine/spark/SparkProcessBuilder.scala
Show resolved
Hide resolved
Yes, exactly. Previously, it was unsafe to delete the whole |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM
Why are the changes needed?
The vanilla Spark neither support rolling nor expiration mechanism for
spark.kubernetes.file.upload.path
, if you use file system that does not support TTL, e.g. HDFS, additional cleanup mechanisms are needed to prevent the files in this directory from growing indefinitely.This PR proposes to let
spark.kubernetes.file.upload.path
support placeholders{{YEAR}}
,{{MONTH}}
and{{DAY}}
and introduce a switchkyuubi.kubernetes.spark.autoCreateFileUploadPath.enabled
to let Kyuubi server create the directory with 777 permission automatically before submitting Spark application.For example, the user can configure the below configurations in
kyuubi-defaults.conf
to enable monthly rolling support forspark.kubernetes.file.upload.path
Note that: spark would create sub dir
s"spark-upload-${UUID.randomUUID()}"
under thespark.kubernetes.file.upload.path
for each uploading, the administer still needs to clean up the staging directory periodically.For example:
Administer can safely delete the
hdfs://hadoop-cluster/spark-upload-202412
after 20250101How was this patch tested?
New UTs are added.
Was this patch authored or co-authored using generative AI tooling?
No.