-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better support of binary variables in conditional sampling #126
Comments
hidimstat/src/hidimstat/cpi.py Line 180 in def7341
I suggest that at this line ⬆️ we create a disjunction ⬇️ if is_classifier(imputation_model):
X_j_hat_proba = imputation_model.predict_proba(X_minus_j)
X_perm_j[:, :, group_ids] = rng.bernouilli(X_j_hat_proba) |
In this case, it's better to move the computation of the residual (170) in the branch "if" and start the "if" line 176 for separating the classification problem and regression problems. However, your proposition changes the type of algorithm from a permutation algorithm to a generator algorithm for classification problems. I am not in favour of it. |
|
also ping @AngelReyero |
I agree with Joseph for the conditional sampling. The important part of doing the permutation in the CPI is just that under some assumptions (the residuals have the same distribution, therefore independent of |
Normally CPI should handle also classification, as long as we can compute a meaningful loss (cross-entropy). |
Hello everyone! |
Yes, that's correct I could find something similar in your code: https://github.com/achamma723/Variable_Importance/blob/3f007d75a851acba17a2ae1d067857c3e3fffa6f/BBI_package/src/BBI/compute_importance.py#L400-L442 |
if you want to be close to the code of Ahmad, you should add 2 parameters:
If you can, I think that it's better to use the numpy's function choice than bernouilli for the generation of the new sample from the conditional distribution. |
The conditional permutation step of CPI is designed for continuous variables where the residual is intuitive to compute and shuffle. However, it is not adapted for binary & ordinal variables. Using the
predict_proba
method of the imputation model would make more sense in that case.I wonder if this also applies to knockoffs
The text was updated successfully, but these errors were encountered: