Flattener #235

yolile · 2022-09-20T04:04:16Z

No description provided.

for more information, see https://pre-commit.ci

… into flattener

yolile · 2022-09-20T13:07:05Z

@jpmckinney, this is what I have so far. Some additional things I needed to do this time:

As I added the flatten step as an additional job, I have to edit the Exporter class to have different lock files depending on whether the export generates the JSON files or CSV/Excels.
As this step is now separate from the JSON export, I have to decompress the jsonl.gz files before flattening them and then delete the jsonl files.
I'm creating CSV files for all the existing jsonl files. Do we always want that? Or should I check for size limits? Especially for "all time" files.

… flattener

views: - Guard against path traversal by malicious input - Set export_format query string parameter to full suffix to avoid extra view logic - Abbreviate export_format to format in query string, and export_formats to formats in template template: - Sort years in consistent order (as before) - Add condition for all full files - Add {% empty %} if files not yet available util: - Ensure the template always receives expected keys - Don't process unknown suffixes - Fix "jsonl" vs "json" typo for export_format - Fix "csv.gz" -> "csv.tar.gz" typo worker: - Do not compress Excel file (an Excel file is already a ZIP file) - Take advantage of Export().directory returning a pathlib.Path (temp file had ".jsonl.jsonl" suffix) - Do "var = condition" not "if condition: var = True; else: var = False" - Merge flatten_and_package_file() into callback() perf: - Use tempfile module to clean up files (temp files are written to the SSD, which is faster) - Use scandir() (faster) instead of listdir()

… suffix.

jpmckinney · 2022-09-21T02:51:23Z

I made some edits and fixes. Have you tested it manually?

After we deploy, to process existing data, we can run publish({"job_id": self.job.id}, "flattener_init") for every relevant job.

The other option is to create a Task (in the database) for every Job, and then change the Job status accordingly, but this would be a lot more effort. I think the first option will work – we just won't have any job management features.

jpmckinney · 2022-09-21T02:52:25Z

Also, we need to translate "Files not yet available."

yolile · 2022-09-21T02:59:58Z

I made some edits and fixes. Have you tested it manually?

Yes, I've tested it so I can test it again tomorrow before merging. I tried using a Temp directory instead of creating and removing the directory and file and I had some issues, so I want to test this before merging.

exporter/util.py

exporter/views.py

data_registry/templates/includes/files.html

exporter/management/commands/flattener.py

… flattener

yolile and others added 3 commits September 20, 2022 00:02

exporter: add flattener command and task

01ac63b

frontend: add flattener features

1277cdb

[pre-commit.ci] auto fixes from pre-commit.com hooks

dfbea50

for more information, see https://pre-commit.ci

yolile marked this pull request as draft September 20, 2022 04:04

yolile added 3 commits September 20, 2022 08:33

flattener: fix hardcoded job id and flattener call

61fa6ba

Merge branch 'flattener' of github.com:open-contracting/data-registry…

5ea368a

… into flattener

task:flattener: Export with export_type

354cdac

lint: remove unused import

f4cf15f

jpmckinney mentioned this pull request Sep 20, 2022

direct flatten files versions export and download #208

Closed

yolile added 2 commits September 20, 2022 13:18

Merge branch 'main' of github.com:open-contracting/data-registry into…

d825f9a

… flattener

fix: flattener, rename function parameter name

25a5c04

yolile marked this pull request as ready for review September 20, 2022 18:42

yolile requested a review from jpmckinney September 20, 2022 18:42

jpmckinney added 7 commits September 20, 2022 20:40

chore: Don't call a destructive method in a list comprehension

21d5a47

fix: Fix typo (flatenner -> flattener)

6aae5bb

docs: Document what 1073741824 equals (1 GB)

16f5259

chore: Fix HTML indentation

6c16ccf

chore: Add missing {% endif %} from f1b5f77

6c92d5d

html: Extract includes/files.html. Rename formats -> files, format ->…

bb914c6

… suffix.

jpmckinney approved these changes Sep 21, 2022

View reviewed changes

jpmckinney added 3 commits September 20, 2022 22:38

Merge branch 'main' into flattener

dc3ae43

chore: Delete BaseTask as it provides no functionality

e9b12bf

chore: Remove unused key from Flattener message

cf852da

i18n: Run makemessages and translate English

3f636a2

chore: Remove unused loggers

cfdff26

chore: Remove unused logger

39e5694

yolile commented Sep 21, 2022

View reviewed changes

exporter/util.py Show resolved Hide resolved

yolile commented Sep 21, 2022

View reviewed changes

exporter/views.py Outdated Show resolved Hide resolved

yolile commented Sep 21, 2022

View reviewed changes

data_registry/templates/includes/files.html Show resolved Hide resolved

jpmckinney and others added 3 commits September 21, 2022 12:12

fix: Apply changes from code review

6c9a21a

fix: flattener, fix path for flatterer

05be850

i18ln: add missing ru and es translation

c5ebaf8

jpmckinney reviewed Sep 21, 2022

View reviewed changes

exporter/management/commands/flattener.py Outdated Show resolved Hide resolved

yolile added 2 commits September 21, 2022 14:59

fix: use str(path) instead of as_posix()

91bd4ba

Merge branch 'main' of github.com:open-contracting/data-registry into…

d3a34de

… flattener

yolile merged commit 88c2b0e into main Sep 21, 2022

yolile deleted the flattener branch September 21, 2022 21:23

jpmckinney mentioned this pull request Sep 23, 2022

Monthly buckets of data #215

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flattener #235

Flattener #235

yolile commented Sep 20, 2022

yolile commented Sep 20, 2022

jpmckinney commented Sep 21, 2022

jpmckinney commented Sep 21, 2022

yolile commented Sep 21, 2022

Flattener #235

Flattener #235

Conversation

yolile commented Sep 20, 2022

yolile commented Sep 20, 2022

jpmckinney commented Sep 21, 2022

jpmckinney commented Sep 21, 2022

yolile commented Sep 21, 2022