fix: Fix excel merge cells header #2265

shaohuzhang1 · 2025-02-13T07:45:52Z

The provided Python function fill_merged_cells for filling merged cells in an Excel spreadsheet contains a few areas that could be optimized or improved:

Handling Merged Cells: The current implementation does not explicitly handle merged cells within the loop over rows and columns. Ensuring that each cell value is correctly retrieved even if it's part of a merged cell group can be necessary (though this depends on the specific requirements).

Empty Header Handling: The header initialization should ensure that headers are populated even if some cells have missing values. However, the use of spaces indicates manual handling, which may lead to confusion.

Optimization:

Consider using pandas to manipulate the DataFrame instead of directly working with sheets from openpyxl since it provides more robust functionalities.

For handling multiple sheets/tables simultaneously, consider refactoring the function to accept different sheets as input parameters rather than relying solely on a single sheet object.

Validation: Add checks to ensure that all inputs are valid objects before proceeding, particularly when dealing with worksheets and dictionaries.

Here’s an enhanced version of the function incorporating these considerations:

import pandas as pd def fill_merged_cells(sheets_data): """ Fills merged cells in one or more sheets with data from another dictionary. :param sheets_data: A dictionary where keys are sheet names and values are DataFrames containing the data. """ merged_data = {} for sheet_name, df in sheets_data.items(): if isinstance(df, pd.DataFrame) and sum([sheet.has_merges() for sheet in df._xls.sheet_list]) > 0: # Extract headers assuming first non-empty column is the title cols_with_headers = next((i for i, col_vals in enumerate(zip(*df)) if any(col_vals)), None) if cols_with_headers is not None: header_cols = list(df.columns[cols_with_headers:]) # Handle empty headers manually by replacing them with strings # This might depend on actual usage cases headers_replacement = ['' for _ in range(len(header_cols))] # Fill remaining values while respecting row/column order # Example: Assuming we want to preserve column order but replace empty values new_df_values = [] header_mapping = dict(zip(cols_with_headers, headers_replacement)) for index, raw_row in df.iterrows(): formatted_row = [header_mapping.get(i, '') for i in df.columns] raw_row_vals = [value for value in zip(raw_row)] for j, val in enumerate(raw_row_vals): formatted_row[j] += val new_df_values.append(formatted_row) # Create a new DataFrame for merging purposes temp_df_new = pd.DataFrame(new_df_values, columns=new_df_values.pop(0)) # Merge existing data into the temporary DataFrame if sheet_name in sheets_data: existing_df = sheets_data[sheet_name].dropna(subset=headers) combined_df = pd.concat([existing_df, temp_df_new], ignore_index=True).sort_index() # Ensure unique indices; add counter to avoid duplication combined_df.reset_index(drop=True, inplace=True) max_index = combined_df.index.max() combined_df['temp_index'] = combined_df.index combined_df.loc[max_index + 1:, "temp_index"] = combined_df.loc[:max_index, 'counter'] try: # Drop duplicates based on the sorted columns without 'temp_index' clean_df = combined_df.drop_duplicates().drop(columns='temp_index') # sheets_data[sheet_name] = clean_df except Exception as e: print(f"Error processing {sheet_name}: {e}")

This code includes logic to process both regular DataFrames and those potentially holding merged cells by extracting headers manually and adjusting for empty cells. It also suggests a general approach to integrating multiple sheets into a unified dataframe structure. Adjustments will likely be needed based on specific use case needs.

-Original file line number
+Diff line change
@@ Expand Up / @@ -21,7 +21,12 @@ def fill_merged_cells(self, sheet, image_dict): @@
             data = []
             # 获取第一行作为标题行
-            headers = [cell.value for cell in sheet[1]]
+            headers = []
+            for idx, cell in enumerate(sheet[1]):
+                if cell.value is None:
+                    headers.append(' ' * (idx + 1))
+                else:
+                    headers.append(cell.value)
             # 从第二行开始遍历每一行
             for row in sheet.iter_rows(min_row=2, values_only=False):
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Fix excel merge cells header #2265

Diff view

Diff view

There are no files selected for viewing

shaohuzhang1 Feb 13, 2025

fix: Fix excel merge cells header #2265

fix: Fix excel merge cells header #2265

Diff view

Diff view

There are no files selected for viewing

shaohuzhang1 Feb 13, 2025

Choose a reason for hiding this comment