Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving is_training_set_available in sys_info during get_overall_statistics() #453

Open
OscarWang114 opened this issue Sep 6, 2022 · 2 comments

Comments

@OscarWang114
Copy link
Collaborator

OscarWang114 commented Sep 6, 2022

Although this issue occurs in the web interface. I'm writing it here as it's mainly SDK-related.

Problem

In the web interface:

[0]   File "/Users/oscar/opt/anaconda3/envs/exb/lib/python3.9/site-packages/explainaboard/processors/processor.py", line 252, in perform_analyses
[0]     my_analysis.perform(
[0]   File "/Users/oscar/opt/anaconda3/envs/exb/lib/python3.9/site-packages/explainaboard/analysis/analyses.py", line 191, in perform
[0]     raise RuntimeError(f"bucket analysis: feature {self.feature} not found.")

In SDK:

The function _gen_cases_and_stats() in conditional_generation.py (called by processor.py’s get_overall_statistics()) skips saving require_training_set=True example-level features. However, these skipped feature names are saved in sys_info.analysis_levels[0].

This causes perform() in BucketAnalysis in analyses.py to attempt to look up these features and throw the above error as the features cannot be found in the actual cases (since they are skipped).

Quick fix

Set skip_failed_analyses=True.

Long-term solution

Following up on #410, we should save a flag like is_training_set_available in sys_info. If set to false, we should skip the require_training_set=True features during bucket analysis.

@odashi
Copy link
Contributor

odashi commented Sep 6, 2022

@OscarWang114 Thanks for reporting the issue!

First, could skip_failed_analyses=True in Processor.process be a quick fix, or does not it satisfy the use case?

I also agree with having more specific control around feature groups (in this case, train-only or not). Is the flag name just is_trainint_set rather than is_training_set_available?

@OscarWang114
Copy link
Collaborator Author

@odashi Thanks! Yes,skip_failed_analyses=True is a valid quick fix; I updated the issue description. And thanks for catching the typo (also updated).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants