Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unzip and save compressed files #26

Closed
HideBa opened this issue Mar 24, 2022 · 0 comments
Closed

Unzip and save compressed files #26

HideBa opened this issue Mar 24, 2022 · 0 comments
Labels

Comments

@HideBa
Copy link
Member

HideBa commented Mar 24, 2022

Leader

@HideBa

User story

A user wants to upload a file with a large file size as a Zip file. On the other hand, the user wants to extract the Zip file after uploading and keep it unzipped as a file. CMS provides the ability for users to upload Zip files and decompress and save them as needed.

Requirements

  • User can
    • upload Zip file
      • Zip file can be unzipped to multiple files and saved separately
      • Zip file can be saved as is
      • System should be able to handle a maximum of 10GB

Things needed to be thought

  • Abstraction of file upload to be able to implement for GCS, S3, local file system
  • Abstraction of queuing like local (no external apps), Kafka, Cloud Task (we want to run various async tasks)
  • Abstraction of compressed file format: zip, tar.gz

Implementation policy

Must

  1. feat(server): add task runner interface #109
  2. feat(server): init worker app with multi-module workspaces #112
  3. [BE]: (server)send message from application server to cloud tasks #55
  4. [BE]: (worker) download zip file from gcs and upload folder to gcs #57
  5. [BE]: (worker) notify that compressed file is successfully decompressed to App serer #113

Nice to have

  1. [BE]: (zip file decompress task ) add Unit test with mocking GCP CloudTasks client #149

Technical decision

Consider compressed file upload function

Request

  • I want to upload a compressed file of up to 10 GB in size after decompression, and decompress and save it.

Format

Before

  • zip
  • tar.gz
  • Other

After

Technology Selection

Queuing and Asynchronous Processing Execution Platform

**Conclusion: **Cloud Tasks

Reasons for selection

  • In light of this requirement, we judged that the publisher's ability to control the subscriber side in detail and the long maximum execution time, etc., in total, best matched our requirement to "perform time-consuming processing asynchronously".

| Cloud Tasks | Cloud PubSub | Memo
| Memo
| Calling Method | Explicit | Implicit |
| Purpose | Allow publishers to have full control over execution | Separate event publishers and subscribers to achieve loose coupling | This time, rather than aiming for loose coupling, the purpose is to achieve an asynchronous configuration to handle time-consuming processing, so in this case, Cloud Tasks may be a better choice. | "Cloud Tasks" is better in this case.
| Scheduling of delivery time
Delivery rate management
Configurable retries
Individual access and management in queue
De-duplication of tasks/message creation
Batch insertion
Multiple handlers/subscribers per message (e.g., one-to-many configurations) | |
| Task / message retention | 30 days | up to 7 days
| Maximum processing time for push handlers / subscribers | 30 minutes (HTTP) | 10 minutes for a push operation | I don't think it takes more than 10 minutes, but Tasks seems better in this case in terms of maximum processing time |
| Number of queues/subscriptions per project | 1,000/project (can be increased with an allocation increase request) | 10,000/project | PubSub

@HideBa HideBa added the PBL label Mar 24, 2022
@HideBa HideBa changed the title Unzip and save Zip files Unzip and save compressed files Apr 7, 2022
@HideBa HideBa closed this as completed Nov 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants