Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change .gz compression to .zip for submissions? #2289

Open
huertanix opened this issue Sep 12, 2017 · 3 comments
Open

Change .gz compression to .zip for submissions? #2289

huertanix opened this issue Sep 12, 2017 · 3 comments

Comments

@huertanix
Copy link
Member

Feature request

Change .gz compression to .zip for submissions (unless there's a security reason not to).

Description

Journalist analyzing submissions on the SVS are confused when they encounter a .gz file and don't know what to do with it until it is explained to them that it's a fancy zip file from UNIX land. It's another thing they have to jot down to remember when they use the system.

Note: There's an assumption I'm making in that journalists will be more familiar with .zip files, and that might not be the case, so the actual solution might have to be something else. Would be great to discuss other possible solutions and fact-check familiarity with zip files.

User Stories

As a journalist, I'd like to know when to de-compress a file and when I don't have to.

@huertanix huertanix added the UX label Sep 12, 2017
@huertanix
Copy link
Member Author

A quick update on this: .gz is still being used as of SecureDrop 0.8.0 when decrypting on the SVS, and ideally it'd be great to have zip be used consistently across the process.

@eloquence
Copy link
Member

Confusingly, SecureDrop currently uses both ZIP and gzip:

  • If you're downloading a single file by clicking its name, it will be an encrypted gzip fle
  • If you're downloading a single file by clicking "Download Selected", it will be an encrypted gzip-file wrapped in a ZIP file
  • If you're downloading more than one file, it will always be encrypted gzip-files wrapped in ZIP files.

This makes sense insofar gzip is a single file compression format (it is commonly used together with tools like tar). Hoewver, gzip has one especially unfortunate property: Filename handling during extraction is inconsistently implemented. If you have a file apple.gz that contains orange.jpg, you will get different results, e.g.:

  • gzip -d apple.gz just results in a file called apple
  • gzip -d -N apple.gz results in orange.jpg
  • Tails context menu extraction via "Extract to" or "Extract here" results in apple
  • Tails extraction via bundled file-roller application results in orange.jpg

This can cause many follow-up problems. For example, Tails will treat Office Open XML files (.docx etc.) without a file extension as archives, because that's what they are underneath. Obviously this is an extremely common file type journalists may encounter. The filename itself is useful information we should treat as valuable, not disposable.

All of this, together with David's observation about unfamiliarity with the file format, IMO argues strongly for standardizing on ZIP as this issue suggests, which does not have this filename preservation problem.

@eloquence
Copy link
Member

ZIP file compression was originally used for single files, and was replaced with gzip in #862. The motivation appears to have been a combination of security and scalability considerations. If we want to pick this up again, we'll need to revisit the questions that motivated the switch. Contrary to what was assumed in the PR, there are clearly major usability tradeoffs in using gzip, so this may very well be worth investigating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants