You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In pandas.read_excel it's not possible to access information about the cell object before converting to dataframe. In some cases as in #47269,#46895, and #49770 it's necessary to acces the cell, especially number_format to change the values
Feature Description
Add a new parameter to read_excel, something like value_of_cell, that receive a cell and return a value. In my case i change _convert_cell in _openpyxl to:
def _convert_cell(self, cell, convert_float: bool, value_of_cell:Union[None, Callable]) -> Scalar:
from openpyxl.cell.cell import (
TYPE_ERROR,
TYPE_NUMERIC,
)
if cell.value is None:
return "" # compat with xlrd
elif cell.data_type == TYPE_ERROR:
return np.nan
elif value_of_cell is not None: # this is new
return value_of_cell(cell) # this is new
elif cell.data_type == TYPE_NUMERIC:
# GH5394, GH46988
if convert_float:
val = int(cell.value)
if val == cell.value:
return val
else:
return float(cell.value)
return cell.value
of course is neccesary to edit all the calls from read_excel to _convert_cell ( i think is something like 10 functions), but i will create a pull request if can pass the test. I hope this is useful for the community
Alternative Solutions
Another option is to use openpyxl directly as, but loss the extra functions of read_excel and more code is needed:
document = load_workbook(file)
sheet = document.active
for row in sheet:
for cell in row:
cell.value = value_of_cell(cell)
df = pd.DataFrame(sheet.values)
df = df.fillna('')
Additional Context
No response
The text was updated successfully, but these errors were encountered:
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
In pandas.read_excel it's not possible to access information about the cell object before converting to dataframe. In some cases as in #47269,#46895, and #49770 it's necessary to acces the cell, especially number_format to change the values
Feature Description
Add a new parameter to read_excel, something like value_of_cell, that receive a cell and return a value. In my case i change _convert_cell in _openpyxl to:
so in my code i call
of course is neccesary to edit all the calls from read_excel to _convert_cell ( i think is something like 10 functions), but i will create a pull request if can pass the test. I hope this is useful for the community
Alternative Solutions
Another option is to use openpyxl directly as, but loss the extra functions of read_excel and more code is needed:
Additional Context
No response
The text was updated successfully, but these errors were encountered: