Skip to content
This repository has been archived by the owner on Aug 21, 2024. It is now read-only.

Draft: Proposal to enable ZST encoded archives loaded by the transformer #574

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

F-X64
Copy link
Member

@F-X64 F-X64 commented Jun 2, 2023

The retriever has been switched to provide zstandard encoded raw data instead of pure JSON.
As this is indeed an archived format it cannot be read like pure JSON data.
To enable the transformer to work with the new data we need to add the proper file conversion.

These changes are up for debate. However we need to fix this within the day as the scheduled transformer run for
tomorrow morning will break the data again otherwise.

I've also removed the delete statement in our S3 upload as this removes files that aren't within the upload folder but are present in the S3 bucket.

@F-X64 F-X64 requested review from major, poncovka and miyunari June 2, 2023 07:03
@F-X64 F-X64 force-pushed the hotfix-zst-issue branch 3 times, most recently from 78910b9 to be59a00 Compare June 2, 2023 07:13
@F-X64 F-X64 force-pushed the hotfix-zst-issue branch from 2d71f08 to 1f32a4f Compare June 2, 2023 07:36
@F-X64 F-X64 marked this pull request as draft June 2, 2023 07:56
@F-X64 F-X64 changed the title Hotfix to enable ZST encoded archives loaded by the transformer Draft: Hotfix to enable ZST encoded archives loaded by the transformer Jun 2, 2023
@F-X64 F-X64 changed the title Draft: Hotfix to enable ZST encoded archives loaded by the transformer Draft: Proposal to enable ZST encoded archives loaded by the transformer Jun 2, 2023
@@ -62,5 +62,5 @@ jobs:

- name: Upload to S3
run: |
s3cmd sync --acl-public --delete-removed --mime-type application/json \
s3cmd sync --acl-public --mime-type application/json \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why you think this could be a good idea. But we actually need to remove the files that are not uploaded, otherwise we can't filter things in the transformer, since they will still be on S3 :) WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, I had no idea. Yes that makes sense to me.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants