-
-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cyclical features when variables are negative - does it make sense? #578
Comments
I think the main problem is: what happen if the max value is 0? An error is thrown? In this case what we could also do is taking the max of the absolute value of the column and still having an encoding which make sense. |
Good point. The absolute maximum would work OK only when the entire variable is negative. If it has positive and negative numbers it will mess up with the cyclicity. |
Yes yes you are right. I was just speaking about the case where the max of the column was 0. From what i understand: you would like to find a solution that doesn't differentiate cases. Right? I'm thinking that you could always translate the column by the max of the absolute value of the columns. This should not change the cyclical embedding as sin and cos are periodic functions. But i have to check this last sentence ;-) |
think about the problem, and this is my conclusion: There is a good and intelligent way to eliminate the problem of negative values. It suffices to take Because we are searching the length of the domain of the cyclic feature such that we can embed it in the 2-dim circle using the periodic function (sin(...), cos(...)). The length of the domain is necessary to embed the column bijectively. In fact, this Is what we were already doing, taking the max of the column. We were supposing that the minimum was 0. With this approach, we can elegantly tackle all the cases :-) Let me know if my explanation was clear or if you need more details. |
Hey @VascoSch92 I am not sure I understand the solution. We basically want to squeze the variable between 0 and 1, so that when we multiply by 2pi it is between 0 and 2pi. So far, the transformer assumes that the minimum is always 0 and the maximum is always positive, so we divide by the max, that squeezes the variable between 0 and 1, and then we multiply by 2Pi. If the variable has negative and positive values, like say it ranges from -5 to 10, then we want -5 to be taken to 0 and 10 to be taken to 1, in a way that does not alter the distribution. One option could be to add the minimum value to the entire distribution and then find the maximum and use the transformer as it is used now. Another option could be to scale to the min and max like the MinMaxScaler:
Whatever, we do, ideally, we do not want to break backwards compatibility. Does your solution return the results I expect / describe here as well? |
Hey @solegalli
Exactly, we want to do that because the interval [0, 2Pi] is the period of
If you look at my suggestion we have that:
I understand your point and, of course, the idea is to have a user-friendly (not overcomplicated) transformer. Therefore, my suggestion is just to check if the columns is positive and raise an error otherwise |
Not sure we can make a variable with negative values cyclical with the sine and cosine. We need to investigate more and if it does not make sense, we need to put a safeguard.
The text was updated successfully, but these errors were encountered: