Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature implementation for zip & unzip functionality #316

Conversation

chaitanyawaikar
Copy link
Contributor

@chaitanyawaikar chaitanyawaikar commented Sep 29, 2024

This PR addresses the issue and adds the following functionalities

  1. Zipping of files and folders
  2. Unzip files
  3. Unit test cases covering all the zip and unzip functionality
  4. Documentation support for zip and unzip functions

Summary of Changes:

Zip Functionality:

  • Added options to include (-i), exclude (-x), and delete (-d) files from the zip archive using file name patterns (provided as List[String]).
  • Converted exclusion and inclusion patterns from String to Regex for pattern matching.
  • Enhanced the zipping logic to support:
    - Excluding files/folders that match specific patterns.
    - Including only specified files.
    - Deleting files that match patterns from an existing zip.
  • Added support for appending files to an existing zip archive.

Unzip Functionality

  • Added support for excluding specific files or folders during extraction using patterns (-x option).
  • Implemented a listing feature (-l option) that lists the contents of the zip archive without extracting files.

… file. The zipIn functionality will create a new file `archive.zip` if the name of the zip file is absent
…irectory if the target directory is not present.
- Rename `zipIn` to `zip`
- Remove `options` parameter and replace with individual flags i.e `appendToExisting`
- Include only given files during zipping (similar to -i option in zip)
- Exclude given files during zipping (similar to -x option in zip)
- Delete given files from the zip archieve (similar to -d option in zip)

Tests added along with the implementation
- listing files without unzipping
- excluding certain files/file regex during unzipping
@lihaoyi
Copy link
Member

lihaoyi commented Sep 30, 2024

@chaitanyawaikar the docs and impl look great overall. Some compilation issues in CI, and we need to have Scaladoc on zip.apply and unzip.apply (mostly duplicating the stuff in readme.adoc is fine)

@lihaoyi
Copy link
Member

lihaoyi commented Sep 30, 2024

Can we add an API zip.stream to zip a folder and return a geny.Writable/geny.Readable, and unzip.stream to unzip a folder read from a geny.Writable/geny.Readable? That would help this zip/unzip API interop with the other com-lihaoyi libraries which generally are standardized on that interface

@lihaoyi
Copy link
Member

lihaoyi commented Sep 30, 2024

Can we also add a test that filesystem permissions are preserved through a zip/unzip round trip. Is that even possible?

We should also add a test demonstrating the semantics of mtimes: Do we preserve them on creation? Zero them out? Do they get reset to "now" on unzipping? I think we probably need to add flags with all these options, since they are useful in different scenarios (e.g. build tools typically want them zeroed out for reproducibility, but other scenarios don't)

@chaitanyawaikar
Copy link
Contributor Author

@lihaoyi thanks for the detailed review comments. I have incorporated the following review comments

  1. Scaladocs on zip.apply and unzip.apply Comment
  2. zip.stream and unzip.stream functionality Comment

However, could you please elaborate more on the comment. I was a bit unsure as to what the tests should cover.

@lihaoyi
Copy link
Member

lihaoyi commented Oct 4, 2024

@chaitanyawaikar we need to expose/test/document an API to allow the user to configure how mtimes are managed as part of the zipping process, as well as how filesystem permissions are managed as part of the zipping process (if possible)

@chaitanyawaikar
Copy link
Contributor Author

chaitanyawaikar commented Oct 5, 2024

@chaitanyawaikar we need to expose/test/document an API to allow the user to configure how mtimes are managed as part of the zipping process, as well as how filesystem permissions are managed as part of the zipping process (if possible)

Hey @lihaoyi

I tried to implement the mtimes functionality in the code and it works fine during the zip process. However, preserving it during unzipping process is not possible with the existing libraries. I would need to add the Apache Commons Compress library as java.util.zip.ZipEntry does not support Unix file permissions out of the box. To get the file permissions, I would need to use ZipArchiveEntry with Unix permission support from Apache Commons Compress library.

I checked the project's dependencies and found that the we rely less on apache commons libraries and hence had this question if we are open to adding this dependency

"org.apache.commons" % "commons-compress" % "1.21"

@lihaoyi
Copy link
Member

lihaoyi commented Oct 5, 2024

@chaitanyawaikar got it. For now, I think let's just provide an option to zero out mtimes during zipping, since that's what's necessary for most build tools as it affects the byte-contents of the zip file. Properly setting mtimes during unzipping I think we can skip, since in general most build tools already ignore those when computing content hashes.

@chaitanyawaikar
Copy link
Contributor Author

chaitanyawaikar commented Oct 5, 2024

@lihaoyi The preserveMTimes and preservePermissions works fine for all platforms except os.native[2.12.17] . This is due to the fact that PosixFileAttributeView and related methods like readAttributes are not supported by Scala Native. Scala Native doesn't provide the full set of Java's NIO (New I/O) classes, especially the ones related to file attributes and POSIX file permissions, which are specific to Unix-like systems.

I get the following errors when I cross compile to scala native os

[error] Found 1 unreachable symbols!
[error] Unknown method java.nio.file.attribute.PosixFileAttributeView.readAttributes(): java.nio.file.attribute.PosixFileAttributes, referenced from:
 private method os.zip$.zipFile(java.io.File, java.util.zip.ZipOutputStream, bool, bool): scala.runtime.BoxedUnit at ZipOps.scala:226
 private method os.zip$.$anonfun$createNewZip$1(scala.collection.immutable.List, scala.collection.immutable.List, java.util.zip.ZipOutputStream, bool, bool, java.nio.file.Path): scala.runtime.BoxedUnit at ZipOps.scala:104
         method os.zip$$$Lambda$5.apply(java.lang.Object): java.lang.Object at ZipOps.scala:90
 private method os.zip$.createNewZip(java.nio.file.Path, scala.collection.immutable.List, scala.collection.immutable.List, scala.collection.immutable.List, bool, bool): scala.runtime.BoxedUnit at ZipOps.scala:90
         method os.zip$.apply(os.Path, scala.collection.immutable.List, bool, scala.collection.immutable.List, scala.collection.immutable.List, scala.collection.immutable.List, bool, bool): os.Path at ZipOps.scala:66
 private method test.os.ZipOpTests$.$anonfun$tests$50(os.Path): scala.runtime.BoxedUnit at ZipOpTests.scala:292
         method test.os.ZipOpTests$$$Lambda$42.apply(java.lang.Object): java.lang.Object at ZipOpTests.scala:280
 private method test.os.ZipOpTests$.$anonfun$tests$49(): scala.util.Left at ZipOpTests.scala:280
         method test.os.ZipOpTests$$$Lambda$43.apply(): java.lang.Object at ZipOpTests.scala:18
 private method test.os.ZipOpTests$.$anonfun$tests$48(): scala.util.Right at ZipOpTests.scala:18
         method test.os.ZipOpTests$$$Lambda$52.apply(): java.lang.Object at ZipOpTests.scala:18
 private method test.os.ZipOpTests$.$anonfun$tests$1(): scala.util.Right at ZipOpTests.scala:18
         method test.os.ZipOpTests$$$Lambda$1.apply(): java.lang.Object at ZipOpTests.scala:18
         method test.os.ZipOpTests$.tests(): utest.Tests at ZipOpTests.scala:18
         method utest.runner.BaseRunner.runSuite(scala.collection.Seq, java.lang.String, sbt.testing.EventHandler, sbt.testing.TaskDef, scala.collection.Seq): scala.concurrent.Future at BaseRunner.scala:174
 private method utest.runner.BaseRunner.$anonfun$makeTask$1(sbt.testing.TaskDef, scala.collection.Seq, scala.collection.Seq, sbt.testing.EventHandler): scala.concurrent.Future at BaseRunner.scala:208
         method utest.runner.BaseRunner$$Lambda$16.apply(java.lang.Object, java.lang.Object): java.lang.Object at BaseRunner.scala:208
 private method utest.runner.BaseRunner.makeTask(sbt.testing.TaskDef, scala.collection.Seq): sbt.testing.Task at BaseRunner.scala:208
 private method utest.runner.BaseRunner.$anonfun$tasks$6(scala.Tuple3): sbt.testing.Task at BaseRunner.scala:80
         method utest.runner.BaseRunner$$Lambda$8.apply(java.lang.Object): java.lang.Object at BaseRunner.scala:76
         method utest.runner.BaseRunner.tasks(sbt.testing.TaskDef[]): sbt.testing.Task[] at BaseRunner.scala:76
         method utest.runner.Framework.slaveRunner(java.lang.String[], java.lang.String[], java.lang.ClassLoader, scala.Function1): utest.runner.ScalaJsSlaveRunner at Framework.scala:83
         method utest.runner.Framework.slaveRunner(java.lang.String[], java.lang.String[], java.lang.ClassLoader, scala.Function1): sbt.testing.Runner at Framework.scala:17
 private method scala.scalanative.testinterface.TestAdapterBridge.$anonfun$createRunnerFun$1(bool, scala.scalanative.testinterface.common.RunnerArgs): scala.runtime.BoxedUnit at TestAdapterBridge.scala:41
         method scala.scalanative.testinterface.TestAdapterBridge$$Lambda$2.apply(java.lang.Object): java.lang.Object at TestAdapterBridge.scala:33
 private method scala.scalanative.testinterface.TestAdapterBridge.createRunnerFun(bool): scala.Function1 at TestAdapterBridge.scala:33
         method scala.scalanative.testinterface.TestAdapterBridge.start(): scala.runtime.BoxedUnit at TestAdapterBridge.scala:16
         method scala.scalanative.testinterface.TestMain$.main(java.lang.String[]): scala.runtime.BoxedUnit at TestMain.scala:94
  static method scala.scalanative.testinterface.TestMain.main(java.lang.String[]): scala.runtime.BoxedUnit at TestMain.scala:61


[info] Total (2676 ms)
1 targets failed
os.native[2.12.17].test.nativeLink scala.scalanative.linker.LinkingException: Unreachable symbols found after classloading run. It can happen when using dependencies not cross-compiled for Scala Native or not yet ported JDK definitions.

We have two options in front of us -

  1. Remove POSIX File Attribute Handling completely
  2. Conditional Compilation for Cross-Platform Support (we will skip it on scala native)

For the second option, I would need some help to enable this feature only for certain platforms as I have never done it before

@lihaoyi
Copy link
Member

lihaoyi commented Oct 6, 2024

@chaitanyawaikar let's do conditional compilation. Go ahead and push you're changes to the PR with the pull request validation for scala-native failing, I'll set up the conditional compilation and merge it

…ods.

Add scaladoc for `unzip.apply` method
@lihaoyi lihaoyi mentioned this pull request Oct 7, 2024
@lihaoyi
Copy link
Member

lihaoyi commented Oct 7, 2024

Continuing this work in #317. @chaitanyawaikar thanks for your work, send me your international wire/bank transfer details over email and I will close out the bounty. Also do take a look at the updated PR to see what the code looks like now that it's been made more idiomatic

@lihaoyi lihaoyi closed this Oct 7, 2024
lihaoyi added a commit that referenced this pull request Oct 7, 2024
Pulls in changes from #316 and
#310 and cleans it up

Mostly documented in the readme.adoc. 

Major APIs added:

- `os.zip`: create or append to an existing zip file on disk
- `os.zip.stream`: create a new zip file but write it to an
`java.io.OutputStream` rather than a file on disk
- `os.unzip`: unzip a zip file on disk into a folder on disk
- `os.unzip.stream`: unzip a zip file from an `java.io.InputStream` into
a folder on disk
- `os.unzip.list`: list the contents of a zip file
- `os.unzip.streamRaw`: low-level API used by `os.unzip.stream` and
`os.unzip.list`, exposed in case users need it
- `os.zip.open`: Opens a zip file as `java.nio.file.FileSystem` and
gives you an `os.Path` you can use to work with it

Hopefully these are APIs we can start using in Mill rather than `"zip"`
subprocesses or ad-hoc helpers like `IO.unpackZip`

Limitations:

* Use of `java.nio.file.FileSystem` is only supported on JVM and not on
Scala-Native, and so using `os.zip` to append to existing jar files or
`os.zip.open` does not work on Scala-Native.
* Also `os.zip` doesn't support creating/unpacking symlinks or
preserving filesystem permissions in Zip files, because the underlying
`java.util.zip.Zip*Stream` doesn't support them. Apache Commons Compress
can work with them
(https://commons.apache.org/proper/commons-compress/zip.html), but if
we're sticking with std lib we don't have that
* Bumps the version requirement to Java 11 and above, matching the
direction of the rest of com-lihaoyi. Probably not strictly necessary,
but we have to do it eventually and now is as good a time as ever with
requests already bumped and Mill bumping soon in 0.12.0

---------

Co-authored-by: Chaitanya Waikar <chaitanyawaikar1993@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants