This repository was archived by the owner on Jan 19, 2025. It is now read-only.

Description
- Some parameters like
x and y in fit/predict are always required.
- For the rest compute the entropy of the distribution of values and compare it to a uniform distribution.
Example:
- Parameter has values
1 (4 times), 2 (twice), and 3 (once), 4 (once). Then it has distribution P = {4/8, 2/8, 1/8, 1/8}. Then
H(P) = 1/2*log2(2) + 1/4*log2(4) + 2 * 1/8*log2(8)
= 1/2 + 1/2 + 3/4
= 1.75
- The uniform distribution over n values has entropy
H(U_n) = log2(n), so here this entropy is log2(4) = 2.
- We can now check how similar the distribution of values is to the uniform distribution.
- Option 1 - Kullback-Leibler divergence:
H(U_n) - H(P) = 2 - 1.75 = 0.25. Lower values mean the distribution is more similar to a uniform distribution.
- Option2 - Normed entropy:
H(P) / H(U_n) = 1.75/2 = 0.875. Higher values mean the distribution is more similar to a uniform distribution.
- We define thresholds to determine whether a parameter should be optional or required. For Option 1 parameters below the threshold are made required while values at or above the threshold are made optional. For Option 2 parameters above the threshold are made required while values at or below the threshold are made optional.
- If the parameter is optional, use the most commonly used value as the default (can differ from the previous default).
Note: We need to check whether we can find a threshold that always fits regardless of the number of values or whether the threshold is a function of the number of values the parameter has.