Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

erroranalysis: Fix the scenario when samples in dataset and string_indexed_data don't match #1915

Merged
merged 3 commits into from
Jan 19, 2023

Conversation

gaugup
Copy link
Contributor

@gaugup gaugup commented Jan 18, 2023

Description

For large data scenario, if the test set in provided at run time (after cohort filtering) it could have different number of samples than the test data used at the RAI computation time. The error analysis tree computation is such cases can fail with the following error:-

index 100 is out of bounds for axis 0 with size 100
Traceback (most recent call last):
  File "c:\users\gaugup\documents\github\responsible-ai-toolbox\raiwidgets\raiwidgets\responsibleai_dashboard_input.py", line 174, in debug_ml
    max_depth, num_leaves, min_child_samples)
  File "c:\users\gaugup\documents\github\responsible-ai-toolbox\erroranalysis\erroranalysis\analyzer\error_analyzer.py", line 367, in compute_error_tree_on_dataset
    min_child_samples=min_child_samples)
  File "c:\users\gaugup\documents\github\responsible-ai-toolbox\erroranalysis\erroranalysis\_internal\surrogate_error_tree.py", line 150, in compute_error_tree_on_dataset
    dataset_sub_names, max_depth, num_leaves, min_child_samples)
  File "c:\users\gaugup\documents\github\responsible-ai-toolbox\erroranalysis\erroranalysis\_internal\surrogate_error_tree.py", line 318, in get_surrogate_booster_local
    input_data[:, c_i] = analyzer.string_indexed_data[row_index, idx]
IndexError: index 100 is out of bounds for axis 0 with size 100

The fix is to compute string_indexed_data if the number of dataset samples is different from string_indexed_data.

Checklist

  • I have added screenshots above for all UI changes.
  • I have added e2e tests for all UI changes.
  • Documentation was updated if it was needed.

…dexed_data don't match

Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>
@codecov-commenter
Copy link

codecov-commenter commented Jan 19, 2023

Codecov Report

Merging #1915 (7e5a47f) into main (2351fae) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #1915   +/-   ##
=======================================
  Coverage   93.32%   93.33%           
=======================================
  Files          93       93           
  Lines        4601     4605    +4     
=======================================
+ Hits         4294     4298    +4     
  Misses        307      307           
Flag Coverage Δ
unittests 93.33% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...is/erroranalysis/_internal/surrogate_error_tree.py 88.32% <100.00%> (+0.14%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

1 similar comment
@gaugup gaugup enabled auto-merge (squash) January 19, 2023 19:30
@gaugup gaugup disabled auto-merge January 19, 2023 22:29
@gaugup gaugup merged commit 7e2f497 into main Jan 19, 2023
@gaugup gaugup deleted the gaugup/FixStringIndMismatchError branch January 19, 2023 22:30
@imatiach-msft imatiach-msft mentioned this pull request Feb 13, 2023
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants