Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cyclical features when variables are negative - does it make sense? #578

Open
solegalli opened this issue Dec 13, 2022 · 6 comments
Open

Comments

@solegalli
Copy link
Collaborator

Not sure we can make a variable with negative values cyclical with the sine and cosine. We need to investigate more and if it does not make sense, we need to put a safeguard.

@VascoSch92
Copy link
Contributor

I think the main problem is: what happen if the max value is 0? An error is thrown?

In this case what we could also do is taking the max of the absolute value of the column and still having an encoding which make sense.

@solegalli
Copy link
Collaborator Author

Good point.

The absolute maximum would work OK only when the entire variable is negative. If it has positive and negative numbers it will mess up with the cyclicity.

@VascoSch92
Copy link
Contributor

Yes yes you are right. I was just speaking about the case where the max of the column was 0.

From what i understand: you would like to find a solution that doesn't differentiate cases. Right?

I'm thinking that you could always translate the column by the max of the absolute value of the columns. This should not change the cyclical embedding as sin and cos are periodic functions.

But i have to check this last sentence ;-)

@VascoSch92
Copy link
Contributor

VascoSch92 commented Aug 28, 2023

think about the problem, and this is my conclusion:

There is a good and intelligent way to eliminate the problem of negative values. It suffices to take
max_value = | min(column) - max(column) |

Because we are searching the length of the domain of the cyclic feature such that we can embed it in the 2-dim circle using the periodic function (sin(...), cos(...)). The length of the domain is necessary to embed the column bijectively.

In fact, this Is what we were already doing, taking the max of the column. We were supposing that the minimum was 0.

With this approach, we can elegantly tackle all the cases :-)
What do you think?

Let me know if my explanation was clear or if you need more details.

@solegalli
Copy link
Collaborator Author

Hey @VascoSch92

I am not sure I understand the solution.

We basically want to squeze the variable between 0 and 1, so that when we multiply by 2pi it is between 0 and 2pi.

So far, the transformer assumes that the minimum is always 0 and the maximum is always positive, so we divide by the max, that squeezes the variable between 0 and 1, and then we multiply by 2Pi.

If the variable has negative and positive values, like say it ranges from -5 to 10, then we want -5 to be taken to 0 and 10 to be taken to 1, in a way that does not alter the distribution.

One option could be to add the minimum value to the entire distribution and then find the maximum and use the transformer as it is used now.

Another option could be to scale to the min and max like the MinMaxScaler:

(X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0)

Whatever, we do, ideally, we do not want to break backwards compatibility.

Does your solution return the results I expect / describe here as well?

@VascoSch92
Copy link
Contributor

Hey @solegalli

We basically want to squeze the variable between 0 and 1, so that when we multiply by 2pi it is between 0 and 2pi.

So far, the transformer assumes that the minimum is always 0 and the maximum is always positive, so we divide by the max, that squeezes the variable between 0 and 1, and then we multiply by 2Pi.

Exactly, we want to do that because the interval [0, 2Pi] is the period of sin and cos.

If the variable has negative and positive values, like say it ranges from -5 to 10, then we want -5 to be taken to 0 and 10 to be taken to 1, in a way that does not alter the distribution.

If you look at my suggestion we have that: | min(column) - max(column) | = - 5 - 10 = 5. Therefore, -5 goes to 0 and 10 goes to 2. Therefore, 10 will be mapped to (sin(4Pi), cos(4Pi)) = (sin(2Pi), cos(2Pi)) because of periodicity (and the same for other values)

Whatever, we do, ideally, we do not want to break backwards compatibility.

Does your solution return the results I expect / describe here as well?

I understand your point and, of course, the idea is to have a user-friendly (not overcomplicated) transformer.

Therefore, my suggestion is just to check if the columns is positive and raise an error otherwise

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants