Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
During Spark Submit in cluster mode, users would need to bundle their python packages and xai_component library. This PR provides the
ZipDirectory
component which can do it in python.It is smart enough to create a new zip if
zip_fn
doesn't exist (if not provided, will use xircuits_name as fn), and add it to the zip if fn already exists. It is also has the option to include / exclude the root dir.For example if you'd like to zip your venv site packages you could zip it like:
or you can zip it excluding site-packages:
I've also added a xircuits in the examples folder (since people would need to do it before spark submitting anyways) and for you to try.
Pull Request Type
Type of Change
Tests
You can either start from a new .xircuits or use
UtilsZipDirectory.xircuits
zip_fn
string or leave it blank. Run it.Check Exclude - Include Root Dir
zip_fn
string or leave it blank.UtilsZipDirectory.xircuits
, the output should be:include_dir = True:
include_dir = False:
Tested on?
Notes
The component uses
tqdm
+os.walk
which I've read that it might consume unlimited RAM if there's a circular symlink path... but that's unlikely to happen right?