erroranalysis: Fix the scenario when samples in dataset and string_indexed_data don't match #1915

gaugup · 2023-01-18T23:56:54Z

Description

For large data scenario, if the test set in provided at run time (after cohort filtering) it could have different number of samples than the test data used at the RAI computation time. The error analysis tree computation is such cases can fail with the following error:-

index 100 is out of bounds for axis 0 with size 100
Traceback (most recent call last):
  File "c:\users\gaugup\documents\github\responsible-ai-toolbox\raiwidgets\raiwidgets\responsibleai_dashboard_input.py", line 174, in debug_ml
    max_depth, num_leaves, min_child_samples)
  File "c:\users\gaugup\documents\github\responsible-ai-toolbox\erroranalysis\erroranalysis\analyzer\error_analyzer.py", line 367, in compute_error_tree_on_dataset
    min_child_samples=min_child_samples)
  File "c:\users\gaugup\documents\github\responsible-ai-toolbox\erroranalysis\erroranalysis\_internal\surrogate_error_tree.py", line 150, in compute_error_tree_on_dataset
    dataset_sub_names, max_depth, num_leaves, min_child_samples)
  File "c:\users\gaugup\documents\github\responsible-ai-toolbox\erroranalysis\erroranalysis\_internal\surrogate_error_tree.py", line 318, in get_surrogate_booster_local
    input_data[:, c_i] = analyzer.string_indexed_data[row_index, idx]
IndexError: index 100 is out of bounds for axis 0 with size 100

The fix is to compute string_indexed_data if the number of dataset samples is different from string_indexed_data.

Checklist

I have added screenshots above for all UI changes.
I have added e2e tests for all UI changes.
Documentation was updated if it was needed.

…dexed_data don't match Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

codecov-commenter · 2023-01-19T00:02:43Z

Codecov Report

Merging #1915 (7e5a47f) into main (2351fae) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #1915   +/-   ##
=======================================
  Coverage   93.32%   93.33%           
=======================================
  Files          93       93           
  Lines        4601     4605    +4     
=======================================
+ Hits         4294     4298    +4     
  Misses        307      307

Flag	Coverage Δ
unittests	`93.33% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...is/erroranalysis/_internal/surrogate_error_tree.py	`88.32% <100.00%> (+0.14%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

github-actions · 2023-01-19T00:39:15Z

https://responsibleai.blob.core.windows.net/pullrequest/microsoft/responsible-ai-toolbox/gaugup/FixStringIndMismatchError/dashboard/index.html

github-actions · 2023-01-19T00:39:22Z

https://responsibleai.blob.core.windows.net/pullrequest/microsoft/responsible-ai-toolbox/gaugup/FixStringIndMismatchError/dashboard/index.html

github-actions · 2023-01-19T20:10:09Z

https://responsibleai.blob.core.windows.net/pullrequest/microsoft/responsible-ai-toolbox/gaugup/FixStringIndMismatchError/dashboard/index.html

erroranalysis: Fix the scenario when samples in dataset and string_in…

2b848b6

…dexed_data don't match Signed-off-by: Gaurav Gupta <gaugup@microsoft.com>

gaugup requested review from imatiach-msft, RubyZ10, vinuthakaranth, tongyu-microsoft, xuke444 and hawestra as code owners January 18, 2023 23:56

Merge branch 'main' into gaugup/FixStringIndMismatchError

7e5a47f

imatiach-msft approved these changes Jan 19, 2023

View reviewed changes

Merge branch 'main' into gaugup/FixStringIndMismatchError

795c29d

gaugup enabled auto-merge (squash) January 19, 2023 19:30

gaugup disabled auto-merge January 19, 2023 22:29

gaugup merged commit 7e2f497 into main Jan 19, 2023

gaugup deleted the gaugup/FixStringIndMismatchError branch January 19, 2023 22:30

imatiach-msft mentioned this pull request Feb 13, 2023

release erroranalysis v0.4.1 #1966

Merged

3 tasks

imatiach-msft mentioned this pull request Mar 1, 2023

release raiwidgets and responsibleai 0.25.0 #1989

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

erroranalysis: Fix the scenario when samples in dataset and string_indexed_data don't match #1915

erroranalysis: Fix the scenario when samples in dataset and string_indexed_data don't match #1915

gaugup commented Jan 18, 2023

codecov-commenter commented Jan 19, 2023 •

edited

Loading

github-actions bot commented Jan 19, 2023

github-actions bot commented Jan 19, 2023

github-actions bot commented Jan 19, 2023

erroranalysis: Fix the scenario when samples in dataset and string_indexed_data don't match #1915

erroranalysis: Fix the scenario when samples in dataset and string_indexed_data don't match #1915

Conversation

gaugup commented Jan 18, 2023

Description

Checklist

codecov-commenter commented Jan 19, 2023 • edited Loading

Codecov Report

github-actions bot commented Jan 19, 2023

github-actions bot commented Jan 19, 2023

github-actions bot commented Jan 19, 2023

codecov-commenter commented Jan 19, 2023 •

edited

Loading