fix: Fix excel merge cells header #2265

shaohuzhang1 · 2025-02-13T07:45:24Z

fix: Fix excel merge cells header

f2c-ci-robot · 2025-02-13T07:45:29Z

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

f2c-ci-robot · 2025-02-13T07:45:33Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

shaohuzhang1 · 2025-02-13T07:45:52Z

apps/common/handle/impl/table/xlsx_parse_table_handle.py

+            if cell.value is None:
+                headers.append(' ' * (idx + 1))
+            else:
+                headers.append(cell.value)

        # 从第二行开始遍历每一行
        for row in sheet.iter_rows(min_row=2, values_only=False):


The provided Python function fill_merged_cells for filling merged cells in an Excel spreadsheet contains a few areas that could be optimized or improved:

Handling Merged Cells: The current implementation does not explicitly handle merged cells within the loop over rows and columns. Ensuring that each cell value is correctly retrieved even if it's part of a merged cell group can be necessary (though this depends on the specific requirements).

Empty Header Handling: The header initialization should ensure that headers are populated even if some cells have missing values. However, the use of spaces indicates manual handling, which may lead to confusion.

Optimization:

Consider using pandas to manipulate the DataFrame instead of directly working with sheets from openpyxl since it provides more robust functionalities.

For handling multiple sheets/tables simultaneously, consider refactoring the function to accept different sheets as input parameters rather than relying solely on a single sheet object.

Validation: Add checks to ensure that all inputs are valid objects before proceeding, particularly when dealing with worksheets and dictionaries.

Here’s an enhanced version of the function incorporating these considerations:

import pandas as pd def fill_merged_cells(sheets_data): """ Fills merged cells in one or more sheets with data from another dictionary. :param sheets_data: A dictionary where keys are sheet names and values are DataFrames containing the data. """ merged_data = {} for sheet_name, df in sheets_data.items(): if isinstance(df, pd.DataFrame) and sum([sheet.has_merges() for sheet in df._xls.sheet_list]) > 0: # Extract headers assuming first non-empty column is the title cols_with_headers = next((i for i, col_vals in enumerate(zip(*df)) if any(col_vals)), None) if cols_with_headers is not None: header_cols = list(df.columns[cols_with_headers:]) # Handle empty headers manually by replacing them with strings # This might depend on actual usage cases headers_replacement = ['' for _ in range(len(header_cols))] # Fill remaining values while respecting row/column order # Example: Assuming we want to preserve column order but replace empty values new_df_values = [] header_mapping = dict(zip(cols_with_headers, headers_replacement)) for index, raw_row in df.iterrows(): formatted_row = [header_mapping.get(i, '') for i in df.columns] raw_row_vals = [value for value in zip(raw_row)] for j, val in enumerate(raw_row_vals): formatted_row[j] += val new_df_values.append(formatted_row) # Create a new DataFrame for merging purposes temp_df_new = pd.DataFrame(new_df_values, columns=new_df_values.pop(0)) # Merge existing data into the temporary DataFrame if sheet_name in sheets_data: existing_df = sheets_data[sheet_name].dropna(subset=headers) combined_df = pd.concat([existing_df, temp_df_new], ignore_index=True).sort_index() # Ensure unique indices; add counter to avoid duplication combined_df.reset_index(drop=True, inplace=True) max_index = combined_df.index.max() combined_df['temp_index'] = combined_df.index combined_df.loc[max_index + 1:, "temp_index"] = combined_df.loc[:max_index, 'counter'] try: # Drop duplicates based on the sorted columns without 'temp_index' clean_df = combined_df.drop_duplicates().drop(columns='temp_index') # sheets_data[sheet_name] = clean_df except Exception as e: print(f"Error processing {sheet_name}: {e}")

This code includes logic to process both regular DataFrames and those potentially holding merged cells by extracting headers manually and adjusting for empty cells. It also suggests a general approach to integrating multiple sheets into a unified dataframe structure. Adjustments will likely be needed based on specific use case needs.

fix: Fix excel merge cells header

49d898d

f2c-ci-robot bot added the do-not-merge/release-note-label-needed label Feb 13, 2025

shaohuzhang1 commented Feb 13, 2025

View reviewed changes

liuruibin merged commit c524fbc into main Feb 14, 2025
4 of 5 checks passed

liuruibin deleted the pr@main@fix_excel_merge_cell branch February 14, 2025 02:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Fix excel merge cells header #2265

fix: Fix excel merge cells header #2265

Uh oh!

shaohuzhang1 commented Feb 13, 2025

Uh oh!

f2c-ci-robot bot commented Feb 13, 2025

Uh oh!

f2c-ci-robot bot commented Feb 13, 2025

Uh oh!

shaohuzhang1 Feb 13, 2025

Uh oh!

Uh oh!

Uh oh!

fix: Fix excel merge cells header #2265

fix: Fix excel merge cells header #2265

Uh oh!

Conversation

shaohuzhang1 commented Feb 13, 2025

Uh oh!

f2c-ci-robot bot commented Feb 13, 2025

Uh oh!

f2c-ci-robot bot commented Feb 13, 2025

Uh oh!

shaohuzhang1 Feb 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!