Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing column names bug fix #37

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

adrianlut
Copy link
Contributor

@adrianlut adrianlut commented May 14, 2021

Edit: removed changes are strikethrough

Bug description

The inspections I implemented need the correct column names to work correctly. However, mlinspect replaces the column names with "array" in some cases, making it impossible to detect which input and output columns belong to each other inside of the inspection.

This bug occurs in projections to single columns (Series in the pandas backend) and in the concatenation operation.

Fix description

  • Projections
    • Extended the get_series_row_iterator with the option to get the column name from the Series, if the Series has a name.
    • Added missing column parameter to the get_iterator_for_type call in iter_input_annotation_output_map
  • Concatenation
    • Added column paramater to iter_input_annotation_output_nary_op
    • Extended iter_input_annotation_output_nary_op to call get_iterator_for_type with the column names
    • Supplied the column names from the sklearn backend to iter_input_annotation_output_nary_op

Alternative fix

No ideas.

@stefan-grafberger
Copy link
Owner

Hi @adrianlut,

That's interesting! I think there are rare cases where it's hard or impossible to trace the column names throughout pipelines, especially when certain sklearn feature selection transformers are used. That's why I used array as a column name in certain cases when we can't guarantee to always track column names successfully. But I agree, we should still try to do it on a best effort basis.

The change itself looks good, but I would expect it to break some existing tests. Would it be possible to merge the latest changes on the master branch into your branches or rebase them onto the newest master version so that the CI works again? I think we might see some tests failing then, would it be possible to fix them if there are any?

@adrianlut adrianlut force-pushed the missing-column-names-bugfix branch from 23fe9c6 to 9964c2a Compare June 9, 2021 18:33
@adrianlut adrianlut changed the base branch from demo to master June 9, 2021 18:34
@adrianlut
Copy link
Contributor Author

I have now changed this PR to the master and removed all changes to concatenations because they caused several problems.

Since one of the fixes was already done in the rework, this PR is now very small.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants