Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More missing numpy and matlab #649

Merged
merged 7 commits into from
May 12, 2024

Conversation

AngelEzquerra
Copy link
Contributor

@AngelEzquerra AngelEzquerra commented Apr 25, 2024

This PR adds multiple functions that are available both in numpy and Matlab:

  • union: returns the unique, unsorted Tensor of values that are found in either of the two input Tensors.
  • intersection: returns the "intersection" of 2 Tensors as an unsorted rank-1 Tensor.
  • setDiff: returns the (symmetric or non symmetric) "difference" between 2 Tensors as an unsorted rank-1 Tensor.
  • toHashSet: convert a Tensor into HashSet.
  • contains (and thus in and notin).
  • sub2ind and ind2sub: convert "subscript" sequences into linear tensor indexes and vice-versa.
  • almostEqual: missing from nim's std/math.

`union` returns the unique, unsorted Tensor of values that are found in either of the two input Tensors.

Note that an equivalent function exists both in `numpy` (where it is called `union1d`) and in `Matlab` (where it is called `union1d`). However, those functions always sort the output, while Arraymancer's version does not. To replicate the same behavior, simply apply `sort` to the output of this function.
@AngelEzquerra AngelEzquerra force-pushed the more_missing_numpy_and_matlab branch 4 times, most recently from 02be765 to 28c8ec1 Compare April 29, 2024 12:27
Comment on lines +269 to +270
let data = toSeq(a)
result = toTensor(data, shape)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we could also do:

result = newTensorUninit[T](a.card)
for i, x in a:
  result[i] = x

to avoid the double copy (set -> seq -> tensor)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I'll make that change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly a performance measurement using timeit showed that the original version was faster on Windows. We will leave this as is for now while we investigate what is going on (and if we find out why we'll update this in a separate PR).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's investigate it further after this PR.

@Vindaar
Copy link
Collaborator

Vindaar commented May 7, 2024

sub2ind is essentially getIndex https://github.com/mratsim/Arraymancer/blob/master/src/arraymancer/tensor/private/p_accessors.nim#L60, no? (just with seq input)

Maybe we can reuse the internal logic? In general sub2ind and ind2sub seem very Matlab focused concepts / names? I can see how it can be useful sometimes though. In Julia apparently they have since deprecated these for a CartesianIndex / LinearIndex object helper.

@AngelEzquerra
Copy link
Contributor Author

sub2ind is essentially getIndex https://github.com/mratsim/Arraymancer/blob/master/src/arraymancer/tensor/private/p_accessors.nim#L60, no? (just with seq input)

Maybe we can reuse the internal logic? In general sub2ind and ind2sub seem very Matlab focused concepts / names? I can see how it can be useful sometimes though. In Julia apparently they have since deprecated these for a CartesianIndex / LinearIndex object helper.

You are right. Let me remove those from this PR. We can discuss offline and I can create another separate PR later.

This will let us avoid having to convert HashSets into seqs before converting them into tensors.

Note that this also improves a little the docstrings of a couple of the existing `toTensor` procedures.
`intersection` returns the "intersection" of 2 Tensors as an unsorted rank-1 Tensor.

Note that an equivalent function exists both in `numpy` (where it is called `intersect1d`) and in `Matlab` (where it is called `intersect`). However, those functions always sort the output, while Arraymancer's version does not. To replicate the same behavior, simply apply `sort` to the output of this function.

Also note that to implement this feature we moved (and made public) the existing, private toHashSet procedure from spatial/distances.nim into tensor/initialization.nim.
`setDiff` returns the (symmetric or non symmetric) "difference" between 2 Tensors as an unsorted rank-1 Tensor.

Note that an equivalent function exists both in `numpy` (where it is called `setdiff1d`) and in `Matlab` (where it is called `setdiff`). However, those functions always sort the output, while Arraymancer's version does not. To replicate the same behavior, simply apply `sort` to the output of this function.
`find` (which is used to implement `contains`) was already supported (since `system.find` is generic and works with Tensors) but was untested, so this also adds a test for it.
This was a useful std/math function that we did not support yet.
@AngelEzquerra AngelEzquerra force-pushed the more_missing_numpy_and_matlab branch from 28c8ec1 to 381c5ba Compare May 7, 2024 21:10
@AngelEzquerra
Copy link
Contributor Author

sub2ind is essentially getIndex https://github.com/mratsim/Arraymancer/blob/master/src/arraymancer/tensor/private/p_accessors.nim#L60, no? (just with seq input)
Maybe we can reuse the internal logic? In general sub2ind and ind2sub seem very Matlab focused concepts / names? I can see how it can be useful sometimes though. In Julia apparently they have since deprecated these for a CartesianIndex / LinearIndex object helper.

You are right. Let me remove those from this PR. We can discuss offline and I can create another separate PR later.

I've updated the PR as discussed here.

@AngelEzquerra AngelEzquerra force-pushed the more_missing_numpy_and_matlab branch from 381c5ba to 2c7752b Compare May 8, 2024 16:23
@AngelEzquerra
Copy link
Contributor Author

AngelEzquerra commented May 12, 2024

I just pushed a fix to the original (non gemm) version which solves a problem with convolving a slice (was using unsafe_raw_data where I should have used unsafe_raw_offset). The change also rebases the branch on the latest main branch.

Comment on lines 18 to 21
# The folling export is needed to avoid an compilation error in
# algorithms.nim/intersection() when running the test_algorithms test:
# `Error: type mismatch - Expression: items(s1)`
export sets
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we could also get away with adding a bind sets.items inside of intersection and setDiff. However, given that we now depend on sets, I don't see an issue exporting it too.

Fix typo in export comment & add alternative for reader
@Vindaar
Copy link
Collaborator

Vindaar commented May 12, 2024

Merging, we can look into the toTensor conversion from a HashSet performance later.

@Vindaar Vindaar merged commit 2c4f2cd into mratsim:master May 12, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants