Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add get_size() #208

Merged
merged 3 commits into from
Dec 15, 2021
Merged

Add get_size() #208

merged 3 commits into from
Dec 15, 2021

Conversation

mfeurer
Copy link
Contributor

@mfeurer mfeurer commented Dec 14, 2021

Fixes #31

@codecov
Copy link

codecov bot commented Dec 14, 2021

Codecov Report

Merging #208 (952f58e) into master (f71508c) will increase coverage by 1.83%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #208      +/-   ##
==========================================
+ Coverage   66.33%   68.17%   +1.83%     
==========================================
  Files          17       17              
  Lines        1613     1681      +68     
==========================================
+ Hits         1070     1146      +76     
+ Misses        543      535       -8     
Impacted Files Coverage Δ
ConfigSpace/read_and_write/pcs_new.py 92.11% <0.00%> (+4.11%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f71508c...952f58e. Read the comment docs.

@eddiebergman
Copy link
Contributor

eddiebergman commented Dec 15, 2021

Unfortunately, I think this is slightly harder than calculating the product of possible values. To be accurate it would also have to take into account conditionals and forbidden clauses.

We can seperate them into two cases:

  • Case 1: If the configspace has a float hyperparameter, this is always still np.inf.

    • One exception is the case in which a float hyperparameter is never possible. This turns the problem into case 2.
    • The second is a conditional that enforces a float takes on a specific value. I'm not familiar with the possible conditionals but any that restrict its values to a discrete set or a constant value are the only ones I can think to consider. This also turns the problem into case 2.
  • Case 2: The second case is no float parameters and every hyperparameter takes discrete values.

    • We would have to account for how the conditional impacts the size of the space. The problem is worse when you consider the fact different conditionals/forbiddings could overlap and so no double counting should occur. If it's cheap, the conditionals could return a dict containing the forbidden subset of the configuration. From this you could do len(reduce(lambda a, b: set(a).union(set(b)), illegal_configs, set())) to get the total number of illegal configurations and subtract from the total possible.

It's worth pointing out that this complicated procedure is only required if not in case 1 or we realise that case 1 is actually a case 2 problem.

Edit: I realized the above only works for forbidden clauses and some more thinking would have to be done for conditionals, for example, consider that the entire region of the config space for a random forest is only active when 'classifier' == 'random_forest'.

@mfeurer
Copy link
Contributor Author

mfeurer commented Dec 15, 2021

Yes, I agree, these are all not covered. However, I really don't want to enumerate all possible hyperparameter combinations. Maybe this should be called estimate_size() to get a rough estimate that is actually correct if there are no conditions and no forbiddens and direct a user to generate_grid to obtain all legal configurations (which they can then count)?

@eddiebergman
Copy link
Contributor

I would be okay with the method name change and some documentation on it, just as long as it doesn't give false impressions, I can imagine a research paper incorrectly using it to give the size of their config space.

@mfeurer
Copy link
Contributor Author

mfeurer commented Dec 15, 2021

What do you think about the improved API description?

@eddiebergman
Copy link
Contributor

Seems good to me :) Approved

@mfeurer mfeurer merged commit 9856e62 into master Dec 15, 2021
@mfeurer mfeurer deleted the add_31 branch December 15, 2021 15:25
github-actions bot pushed a commit that referenced this pull request Dec 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add functionality to get size of the configuration space
2 participants