Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[FIX] base_import: don't initialise openpyxl as read_only
From most openpyxl resources it is generally recommended to instantiate it as `read_only=True` or `write_only=True` in order to leverage lazy modes which do not need to eagerly load the entire file in memory. However this causes memory exhaustion issues in Odoo workers, with file preview / imports failing: apparently we have managed to generate the `res_partner.xlsx` sample file in such a way that `load_workbook(read_only=True)` is incapable of finding the end of the file, so it thinks the file goes up to the limit for the format of 1048576 rows: $ python -c 'import openpyxl; w = openpyxl.load_workbook("odoo/addons/base/static/xls/res_partner.xlsx", data_only=True, read_only=True); print(sum(1 for _ in w.worksheets[0]))' 1048576 even though the file only has 4 rows (including the header). This means any client which uses this file as basis to create their own export (or a method generating similarly odd / corrupted files) requires parsing on the order of 20~30 million cells to try and import the file, even though in reality they might only have a few dozens or hundreds. As a result the import *attempt* takes several minutes (4~5 locally, maybe a bit less on beefy server) and ~1.7GB memory, and thus routinely fails if the worker has any sort of existing pressure (e.g. well filled caches) as it hits the hard memory limit (2.5G by default). Using the "less efficient" standard mode, the ingestion takes ~1.4s and 14.5MB memory. Which is still a far cry less efficient than xlrd.xlsx was (~0.00 seconds and 120k) but at least somewhat reasonable... The issue has been reported upstream at https://foss.heptapod.net/openpyxl/openpyxl/-/issues/2221 closes odoo#177586 Signed-off-by: Xavier Morel (xmo) <xmo@odoo.com>
- Loading branch information