Skip to content

Commit

Permalink
[FIX] base_import: don't initialise openpyxl as read_only
Browse files Browse the repository at this point in the history
From most openpyxl resources it is generally recommended to
instantiate it as `read_only=True` or `write_only=True` in order to
leverage lazy modes which do not need to eagerly load the entire file
in memory.

However this causes memory exhaustion issues in Odoo workers, with
file preview / imports failing: apparently we have managed to generate
the `res_partner.xlsx` sample file in such a way that
`load_workbook(read_only=True)` is incapable of finding the end of the
file, so it thinks the file goes up to the limit for the format of
1048576 rows:

    $ python -c 'import openpyxl; w = openpyxl.load_workbook("odoo/addons/base/static/xls/res_partner.xlsx", data_only=True, read_only=True); print(sum(1 for _ in w.worksheets[0]))'
    1048576

even though the file only has 4 rows (including the header).

This means any client which uses this file as basis to create their
own export (or a method generating similarly odd / corrupted files)
requires parsing on the order of 20~30 million cells to try and import
the file, even though in reality they might only have a few dozens or
hundreds.

As a result the import *attempt* takes several minutes (4~5 locally,
maybe a bit less on beefy server) and ~1.7GB memory, and thus
routinely fails if the worker has any sort of existing
pressure (e.g. well filled caches) as it hits the hard memory
limit (2.5G by default).

Using the "less efficient" standard mode, the ingestion takes ~1.4s
and 14.5MB memory. Which is still a far cry less efficient than
xlrd.xlsx was (~0.00 seconds and 120k) but at least somewhat
reasonable...

The issue has been reported upstream at
https://foss.heptapod.net/openpyxl/openpyxl/-/issues/2221

closes odoo#177586

Signed-off-by: Xavier Morel (xmo) <xmo@odoo.com>
  • Loading branch information
xmo-odoo committed Aug 22, 2024
1 parent 1b6240b commit b19ef6c
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion addons/base_import/models/base_import.py
Original file line number Diff line number Diff line change
Expand Up @@ -452,7 +452,7 @@ def _read_xlsx(self, options):
return self._read_xls(options)

import openpyxl.cell.cell as types
book = load_workbook(io.BytesIO(self.file or b''), read_only=True, data_only=True)
book = load_workbook(io.BytesIO(self.file or b''), data_only=True)
sheets = options['sheets'] = book.sheetnames
sheet_name = options['sheet'] = options.get('sheet') or sheets[0]
sheet = book[sheet_name]
Expand Down

0 comments on commit b19ef6c

Please sign in to comment.