-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(inverse_transform): enable fit and transform with horizontal_matrix #139
Conversation
Summary of Dimensionality Limitation in PCA, MCA, and FAMD The maximum number of dimensions in methods like PCA (ACP), MCA (ACM), and FAMD (AFDM) is constrained by the rank of the data matrix. This rank is equal to the minimum of the number of rows (n, individuals) and columns (p, variables).
Example: For a dataset with 100 individuals and 50 variables, the maximum number of dimensions after projection will be 50, as there are only 50 variables to define the variability. This constraint is a fundamental mathematical property of matrix rank, which determines the number of independent linear combinations available from the data. Technically other components could be added but with 0 more variance explained as they would necessarily be ortogonal with an already existing one. Please Note that 100% of total variance is still captured in this limited number of components |
Regarding the previous explantion, adding use_approximate_inverse and triggering error when p > n is completely useless as the resulting matrix will be of size (n, min(n, p)) no matter what. |
saiph/projection_test.py
Outdated
) -> None: | ||
"""Verify that the coordinates are a squared matrix even if the input is horizontal.""" | ||
coord, __ = fit_transform(df_to_fit_transform) | ||
assert coord.shape[0] == coord.shape[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
glad you added this test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
I added some theoretical arguments in the Conversation of the PR.
I also wonder what we should do when the user purposely specifies nf > min(n, p) in the fit function? Error, Warning, Nothing?
Personally I would like to avoid returning to many errors when not necessary
Also maybe we could update docsting of retunrs of transform and fit transform function explaining that coords df will always be of size (n, min(n,p)) or (n, nf) if nf specified and < min(n,p)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except the mistake in the test
saiph/projection_test.py
Outdated
) -> None: | ||
"""Verify that the coordinates are a squared matrix even if the input is horizontal.""" | ||
coord, __ = fit_transform(df_to_fit_transform) | ||
assert coord.shape[0] == coord.shape[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert coord.shape[0] == coord.shape[0] | |
assert coord.shape[0] == coord.shape[1] |
The rank is NOT equal to the minimum dimension, but bounded by the minimal dimension. If rank(matrix)=min(n,p), the matrix is said to be full ranked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good to me. I wonder what the approximate inverse is, or how the inverse transformation works when the data frame is horizontal.
if not use_approximate_inverse and n_records < n_dimensions: | ||
raise InvalidParameterException( | ||
f"n_dimensions ({n_dimensions}) is greater than n_records ({n_records})." | ||
) | ||
# Get back scaled_values from coord with inverse matrix operation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what is the approximate inverse if this is the only place where we used it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inverse transform can't be in horizontal matrix as the coordinates are at worst squared.
and we use https://numpy.org/doc/stable/reference/generated/numpy.linalg.pinv.html to do the inverse as https://numpy.org/doc/stable/reference/generated/numpy.linalg.inv.html
is only on squared matrix
After experiments, we conclude that allowing avatarization with a horizontal matrix is not a problem.