[FIX] base_import: don't initialise openpyxl as read_only · acsone/odoo@b19ef6c

Commit

[FIX] base_import: don't initialise openpyxl as read_only

From most openpyxl resources it is generally recommended to
instantiate it as `read_only=True` or `write_only=True` in order to
leverage lazy modes which do not need to eagerly load the entire file
in memory.

However this causes memory exhaustion issues in Odoo workers, with
file preview / imports failing: apparently we have managed to generate
the `res_partner.xlsx` sample file in such a way that
`load_workbook(read_only=True)` is incapable of finding the end of the
file, so it thinks the file goes up to the limit for the format of
1048576 rows:

    $ python -c 'import openpyxl; w = openpyxl.load_workbook("odoo/addons/base/static/xls/res_partner.xlsx", data_only=True, read_only=True); print(sum(1 for _ in w.worksheets[0]))'
    1048576

even though the file only has 4 rows (including the header).

This means any client which uses this file as basis to create their
own export (or a method generating similarly odd / corrupted files)
requires parsing on the order of 20~30 million cells to try and import
the file, even though in reality they might only have a few dozens or
hundreds.

As a result the import *attempt* takes several minutes (4~5 locally,
maybe a bit less on beefy server) and ~1.7GB memory, and thus
routinely fails if the worker has any sort of existing
pressure (e.g. well filled caches) as it hits the hard memory
limit (2.5G by default).

Using the "less efficient" standard mode, the ingestion takes ~1.4s
and 14.5MB memory. Which is still a far cry less efficient than
xlrd.xlsx was (~0.00 seconds and 120k) but at least somewhat
reasonable...

The issue has been reported upstream at
https://foss.heptapod.net/openpyxl/openpyxl/-/issues/2221

closes odoo#177586

Signed-off-by: Xavier Morel (xmo) <xmo@odoo.com>

Loading branch information

xmo-odoo committed Aug 22, 2024

1 parent 1b6240b commit b19ef6c

addons/base_import/models/base_import.py

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -452,7 +452,7 @@ def _read_xlsx(self, options):
  
                return self._read_xls(options)

            import openpyxl.cell.cell as types

            book = load_workbook(io.BytesIO(self.file or b''), read_only=True, data_only=True)

            book = load_workbook(io.BytesIO(self.file or b''), data_only=True)

            sheets = options['sheets'] = book.sheetnames

            sheet_name = options['sheet'] = options.get('sheet') or sheets[0]

            sheet = book[sheet_name]

0 comments on commit `b19ef6c`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `b19ef6c`

Commit

There are no files selected for viewing

0 comments on commit b19ef6c

0 comments on commit `b19ef6c`