fix: gzip all JSON OCRs when saving OCR file on disk #8320

raphael0202 · 2023-04-14T08:24:31Z

To save space, all OCR files were gzipped, but new generated files are still saved as plain text JSON files.
Also add a created_at field in the OCR JSON file containing the timestamp of generation of the OCR file. This is useful to know if we should generate again old OCR files.

codecov-commenter · 2023-04-14T08:54:49Z

Codecov Report

Merging #8320 (2bfb407) into main (a0cfac9) will increase coverage by 0.03%.
The diff coverage is 82.60%.

@@            Coverage Diff             @@
##             main    #8320      +/-   ##
==========================================
+ Coverage   48.46%   48.50%   +0.03%     
==========================================
  Files         114      114              
  Lines       21268    21295      +27     
  Branches     4768     4773       +5     
==========================================
+ Hits        10308    10329      +21     
- Misses       9677     9679       +2     
- Partials     1283     1287       +4

Impacted Files	Coverage Δ
lib/ProductOpener/Import.pm	`30.73% <0.00%> (-0.04%)`	⬇️
lib/ProductOpener/Images.pm	`10.37% <66.66%> (+0.22%)`	⬆️
lib/ProductOpener/Packaging.pm	`75.00% <75.00%> (ø)`
lib/ProductOpener/Test.pm	`40.75% <85.71%> (+3.19%)`	⬆️
tests/unit/send_image_to_cloud_vision.t	`100.00% <100.00%> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

alexgarel · 2023-04-14T17:30:17Z

@raphael0202 broken test: (tests/unit/send_image_to_cloud_vision.t)

To run locally : make unit-test test=send_image_to_cloud_vision.t

2023-04-14T08:49:53.5274088Z malformed JSON string, neither tag, array, object, number, string or atom, at character offset 0 (before "\x{1f}\x{fffd}\b\x{0}...") at tests/unit/send_image_to_cloud_vision.t line 50.
2023-04-14T08:49:53.5286080Z # Tests were run but no plan was declared and done_testing() was not seen.
2023-04-14T08:49:53.5292832Z # Looks like your test exited with 255 just after 3.
2023-04-14T08:50:02.4969181Z tests/unit/send_image_to_cloud_vision.t .......

raphael0202 · 2023-04-16T04:57:14Z

@alexgarel I haven't set up Product Opener locally, will give it a try ;)

- gzip all JSON OCRs when saving OCR file on disk - add new `created_at` field to save the timestamp of OCR generation

sonarqubecloud · 2023-05-12T12:25:18Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
No Duplication information

raphael0202 · 2023-05-12T13:34:45Z

@alexgarel it should be good now!

raphael0202 · 2023-05-15T12:52:25Z

I checked locally, the generated gzipped JSON file saved correctly.

alexgarel

Great @raphael0202 !

raphael0202 requested a review from a team as a code owner April 14, 2023 08:24

github-actions bot assigned raphael0202 Apr 14, 2023

github-actions bot added 🖼️ Images OCR 🧪 tests labels Apr 14, 2023

raphael0202 force-pushed the gzip-ocr-file branch from 5aa1d75 to 638852b Compare May 12, 2023 11:47

fix: improve Google Cloud OCR processing

2bfb407

- gzip all JSON OCRs when saving OCR file on disk - add new `created_at` field to save the timestamp of OCR generation

raphael0202 force-pushed the gzip-ocr-file branch from 638852b to 2bfb407 Compare May 12, 2023 12:21

raphael0202 requested a review from stephanegigandet May 15, 2023 12:52

alexgarel approved these changes May 17, 2023

View reviewed changes

alexgarel merged commit 45df380 into main May 17, 2023

alexgarel deleted the gzip-ocr-file branch May 17, 2023 08:33

openfoodfacts-bot mentioned this pull request May 17, 2023

chore(main): release 2.13.0 #8424

Merged

raphael0202 mentioned this pull request Jul 31, 2023

Store Cloud Vision OCR json files compressed in gzip format #6273

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: gzip all JSON OCRs when saving OCR file on disk #8320

fix: gzip all JSON OCRs when saving OCR file on disk #8320

raphael0202 commented Apr 14, 2023 •

edited

Loading

codecov-commenter commented Apr 14, 2023 •

edited

Loading

alexgarel commented Apr 14, 2023

raphael0202 commented Apr 16, 2023

sonarqubecloud bot commented May 12, 2023

raphael0202 commented May 12, 2023

raphael0202 commented May 15, 2023

alexgarel left a comment

fix: gzip all JSON OCRs when saving OCR file on disk #8320

fix: gzip all JSON OCRs when saving OCR file on disk #8320

Conversation

raphael0202 commented Apr 14, 2023 • edited Loading

codecov-commenter commented Apr 14, 2023 • edited Loading

Codecov Report

alexgarel commented Apr 14, 2023

raphael0202 commented Apr 16, 2023

sonarqubecloud bot commented May 12, 2023

raphael0202 commented May 12, 2023

raphael0202 commented May 15, 2023

alexgarel left a comment

Choose a reason for hiding this comment

raphael0202 commented Apr 14, 2023 •

edited

Loading

codecov-commenter commented Apr 14, 2023 •

edited

Loading