Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

direct_input_pertubation_strategy= isn't passed down #3

Open
afiodorov opened this issue Mar 12, 2017 · 1 comment
Open

direct_input_pertubation_strategy= isn't passed down #3

afiodorov opened this issue Mar 12, 2017 · 1 comment
Assignees

Comments

@afiodorov
Copy link

afiodorov commented Mar 12, 2017

Nice method.

I am examining the code and the thesis more closely as it appears to be very useful.

I don't fully understand the point of perturbation strategy and it's not fully expanded on in the thesis.

I started reading the code and I spotted some bugs.

Firstly

https://github.com/adebayoj/fairml/blob/master/fairml/orthogonal_projection.py#L120

takes the strategy but ignores it, see:

https://github.com/adebayoj/fairml/blob/master/fairml/orthogonal_projection.py#L217

Also, I think that with constant_zero and median perturbation strategies this loop is redundant:

https://github.com/adebayoj/fairml/blob/master/fairml/orthogonal_projection.py#L205

As each run ignores random_sample_selected anyway, so each run should produce the same output_difference_col and total_difference. (because data_col_ptb and total_ptb_data are identical each run).

Finally, it would be great if you could explain more in the documentation the purpose of direct_input_pertubation_strategy. Is it necessary at all to "zero-out" a column? Why?

It appears to me that just by orthogonalising other columns you already take away the effect of the subject column. Not clear to me why zero'ing out is required on top. Is it to be certain the effect of the column is not present?

Many thanks for the code by the way!

@adebayoj
Copy link
Owner

@afiodorov thanks for taking a look at the code and for your feedback.

  1. you are right. with the constant-zero and median strategies, the loop is redundant. I plan on separating the direct perturbation code from overall method to make it so that the run is independent. I am testing a local branch atm that handles this. Will push a fix up later this wk.

  2. The purpose of direction perturbation. This is also a good question, and you are right, it is not explained in the thesis. We are working on suitable documentation to fully explain the overall issue.

For now here is the justification for including direction perturbation: if you have a function f(x_1, x_2, x_2). What, fairml does on a high level is to give you the dependence of f on each of the x_i. Now the dependence is calculated as direct influence + indirect influence. For direct influence, we generate a data transformation using any of the different direct perturbation strategies and then look at the impact of the black-box function on that perturbation. For the indirect influence, we use orthogonal transformation to generate those transformations.

Certainly, we could just use orthogonal transformation on all variables including, but wanted to give people flexibility to pick whatever function that they are interested in using for this task. Hope this helps explain the use of the direct-perturbation strategy requirement.

@adebayoj adebayoj self-assigned this Mar 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants