feat(inverse_transform): enable fit and transform with horizontal_matrix #139

raimbaultL · 2024-10-16T13:27:33Z

After experiments, we conclude that allowing avatarization with a horizontal matrix is not a problem.

mguillaudeux · 2024-10-16T13:37:26Z

Summary of Dimensionality Limitation in PCA, MCA, and FAMD

The maximum number of dimensions in methods like PCA (ACP), MCA (ACM), and FAMD (AFDM) is constrained by the rank of the data matrix. This rank is equal to the minimum of the number of rows (n, individuals) and columns (p, variables).

In PCA, the covariance matrix's rank cannot exceed the smallest of n or p, meaning the number of principal components is limited by this minimum.
In MCA, working with qualitative data, the number of factor axes is similarly limited by the degrees of freedom in the matrix.
In FAMD, which handles mixed data, the same logic applies: the number of dimensions is restricted by the matrix rank, reflecting the minimum between the number of individuals and variables.

Example: For a dataset with 100 individuals and 50 variables, the maximum number of dimensions after projection will be 50, as there are only 50 variables to define the variability.

This constraint is a fundamental mathematical property of matrix rank, which determines the number of independent linear combinations available from the data.

Technically other components could be added but with 0 more variance explained as they would necessarily be ortogonal with an already existing one.

Please Note that 100% of total variance is still captured in this limited number of components

mguillaudeux · 2024-10-16T13:40:11Z

Regarding the previous explantion, adding use_approximate_inverse and triggering error when p > n is completely useless as the resulting matrix will be of size (n, min(n, p)) no matter what.
However, a warning should still be considered if the nf argument of fit function is set above min(n, p) on purpose by the user

mguillaudeux · 2024-10-16T13:41:15Z

saiph/projection_test.py

+) -> None:
+    """Verify that the coordinates are a squared matrix even if the input is horizontal."""
+    coord, __ = fit_transform(df_to_fit_transform)
+    assert coord.shape[0] == coord.shape[0]


glad you added this test

mguillaudeux

LGTM !
I added some theoretical arguments in the Conversation of the PR.
I also wonder what we should do when the user purposely specifies nf > min(n, p) in the fit function? Error, Warning, Nothing?
Personally I would like to avoid returning to many errors when not necessary

Also maybe we could update docsting of retunrs of transform and fit transform function explaining that coords df will always be of size (n, min(n,p)) or (n, nf) if nf specified and < min(n,p)

albanfelix

LGTM except the mistake in the test

albanfelix · 2024-10-16T14:06:09Z

saiph/projection_test.py

+) -> None:
+    """Verify that the coordinates are a squared matrix even if the input is horizontal."""
+    coord, __ = fit_transform(df_to_fit_transform)
+    assert coord.shape[0] == coord.shape[0]


Suggested change

assert coord.shape[0] == coord.shape[0]

assert coord.shape[0] == coord.shape[1]

albanfelix · 2024-10-16T14:11:13Z

Summary of Dimensionality Limitation in PCA, MCA, and FAMD

The maximum number of dimensions in methods like PCA (ACP), MCA (ACM), and FAMD (AFDM) is constrained by the rank of the data matrix. This rank is equal to the minimum of the number of rows (n, individuals) and columns (p, variables).
In PCA, the covariance matrix's rank cannot exceed the smallest of n or p, meaning the number of principal components is limited by this minimum.
In MCA, working with qualitative data, the number of factor axes is similarly limited by the degrees of freedom in the matrix.
In FAMD, which handles mixed data, the same logic applies: the number of dimensions is restricted by the matrix rank, reflecting the minimum between the number of individuals and variables.
Example: For a dataset with 100 individuals and 50 variables, the maximum number of dimensions after projection will be 50, as there are only 50 variables to define the variability.

This constraint is a fundamental mathematical property of matrix rank, which determines the number of independent linear combinations available from the data.

Technically other components could be added but with 0 more variance explained as they would necessarily be ortogonal with an already existing one.

Please Note that 100% of total variance is still captured in this limited number of components

The rank is NOT equal to the minimum dimension, but bounded by the minimal dimension. If rank(matrix)=min(n,p), the matrix is said to be full ranked.

jpetot

The changes look good to me. I wonder what the approximate inverse is, or how the inverse transformation works when the data frame is horizontal.

jpetot · 2024-10-16T14:40:15Z

saiph/inverse_transform.py

-    if not use_approximate_inverse and n_records < n_dimensions:
-        raise InvalidParameterException(
-            f"n_dimensions ({n_dimensions}) is greater than n_records ({n_records})."
-        )
    # Get back scaled_values from coord with inverse matrix operation


I don't understand what is the approximate inverse if this is the only place where we used it

The inverse transform can't be in horizontal matrix as the coordinates are at worst squared.
and we use https://numpy.org/doc/stable/reference/generated/numpy.linalg.pinv.html to do the inverse as https://numpy.org/doc/stable/reference/generated/numpy.linalg.inv.html
is only on squared matrix

feat(inverse_transform): enable fit and transform with horizontal_matrix

c297fdf

raimbaultL requested review from mguillaudeux, olivierabz, jpetot and albanfelix October 16, 2024 13:27

mguillaudeux reviewed Oct 16, 2024

View reviewed changes

mguillaudeux approved these changes Oct 16, 2024

View reviewed changes

albanfelix approved these changes Oct 16, 2024

View reviewed changes

jpetot approved these changes Oct 16, 2024

View reviewed changes

feat(inverse_transform): add docstring and feedbacks

f2e6949

raimbaultL merged commit 844e95e into main Oct 16, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(inverse_transform): enable fit and transform with horizontal_matrix #139

feat(inverse_transform): enable fit and transform with horizontal_matrix #139

raimbaultL commented Oct 16, 2024 •

edited

Loading

mguillaudeux commented Oct 16, 2024

mguillaudeux commented Oct 16, 2024

mguillaudeux Oct 16, 2024

mguillaudeux left a comment •

edited

Loading

albanfelix left a comment

albanfelix Oct 16, 2024

albanfelix commented Oct 16, 2024 •

edited

Loading

jpetot left a comment

jpetot Oct 16, 2024

raimbaultL Oct 16, 2024

	assert coord.shape[0] == coord.shape[0]
	assert coord.shape[0] == coord.shape[1]

feat(inverse_transform): enable fit and transform with horizontal_matrix #139

feat(inverse_transform): enable fit and transform with horizontal_matrix #139

Conversation

raimbaultL commented Oct 16, 2024 • edited Loading

mguillaudeux commented Oct 16, 2024

mguillaudeux commented Oct 16, 2024

mguillaudeux Oct 16, 2024

Choose a reason for hiding this comment

mguillaudeux left a comment • edited Loading

Choose a reason for hiding this comment

albanfelix left a comment

Choose a reason for hiding this comment

albanfelix Oct 16, 2024

Choose a reason for hiding this comment

albanfelix commented Oct 16, 2024 • edited Loading

jpetot left a comment

Choose a reason for hiding this comment

jpetot Oct 16, 2024

Choose a reason for hiding this comment

raimbaultL Oct 16, 2024

Choose a reason for hiding this comment

raimbaultL commented Oct 16, 2024 •

edited

Loading

mguillaudeux left a comment •

edited

Loading

albanfelix commented Oct 16, 2024 •

edited

Loading