Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dawid-Sebastiani Score #75

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Conversation

simon-hirsch
Copy link
Contributor

Hi @frazane as discussed, please have a look.
Compared to #32 I've calculated the batched covariance here as matrix multiplication which I feel is the cleaner way - but a bit more tricky to figure out the correct axes to multiply along.

To-Do:

  • Some kind of axis / rowvar parameter for the covariance in the batched case
  • Other backends
  • Documentation
  • Docstring
  • Test

@simon-hirsch simon-hirsch marked this pull request as draft September 18, 2024 12:12
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be curious to see the performance benefit here compared to plain numpy. Usually numba doesn't optimize much when using a lot of numpy operations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and the DSS is also by far not as computationlly intensive as e.g. ES or VS so I guess even if we get a good relative speed improvement the user might not feel it

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this would look more or less the same for the other backends?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much I guess. I have not so much experience in tf/torch though, so I'm not sure yet about the exact implementation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The det and inv functions should be easy/similar to define in the other backends.

For cov:

  • jax should be almost exactly the same as numpy.
  • torch would need "tranpose" being replaced with "permute".
  • tensorflow also uses "transpose" but does not have a "cov" function, so this would need to be adapted.

Although, as mentioned in a separate comment, the transpose function currently assumes we only have 3 dimensions.

return np.cov(x)
else:
centered = x - x.mean(axis=-2, keepdims=True)
return (centered.transpose(0, 2, 1) @ centered) / (x.shape[-2] - 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

centered.transpose(0, 2, 1) assumes the array is 3D. Is there a way to generalise this to calculate the covariance matrix for higher dimensions?

Copy link
Contributor Author

@simon-hirsch simon-hirsch Dec 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merry Christmas :)
If we assume that we always have the square matrix in the last two dimensions (like numpy does on the batched version of e.g. np.linalg.inv()), something like centered.swapaxes(-1, -2) should do the trick. Is that convention fine for you @sallen12?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants