Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for doing a masked fill from a tensor #612

Merged
merged 4 commits into from
Feb 15, 2024

Conversation

AngelEzquerra
Copy link
Contributor

This is supported by numpy but was not supported by Arraymancer yet.

I'm hoping I've implemented it in a sufficiently performant way. If I can do it in a better way, please let me know.

inc n
except IndexDefect:
raise newException(IndexDefect, "The size of the value tensor (" & $value.size &
") is smaller than the number of true elements in the mask (" & $mask.size & ")")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have tried this on Nim 1.6 and Nim 2.0?

This for sure crashes, likely with an incomprehensible error message on Nim 1.6 because exceptions are allocated in a thread-local manner.

In general, OpenMP sections should not allocate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I had tried the "happy path" on both 2.0 and 1.6 but I had only tried the unhappy path in nim 2.0 (in which it works fine, at least with the small tensors that I tried it with). In 1.6 it does cause a crash.
What is the solution? I don't see any error handling on any of the existing OpenMP sections. Even the following (stupid) code in which I create the message and the exception beforehand crashes in 1.6:

  let error_msg = "The size of the value tensor (" & $value.size &
        ") is smaller than the number of true elements in the mask (" & $mask.size & ")"
  let indexDefectException = newException(IndexDefect, error_msg)

  omp_parallel_blocks(block_offset, block_size, t.size):
    var n = block_offset
    try:
      for tElem, maskElem in mzip(t, mask, block_offset, block_size):
        if maskElem:
          tElem = value[n]
          inc n
    except IndexDefect:
      raise indexDefectException

I guess I could have 2 versions of the function, one for 2.0 with error checking and one for 1.6 without it. Is that the way to go?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way to go is to have no exceptions in parallel sections.

And in general, no seq or strings allocation, using only preallocated stuff, otherwise the parallel section needs this: https://github.com/mratsim/laser/blob/e23b5d6/laser/openmp.nim#L88-L110.

template attachGC*(): untyped =
  ## If you are allocating reference types, sequences or strings
  ## in a parallel section, you need to attach and detach
  ## a GC for each thread. Those should be thread-local temporaries.
  ##
  ## This attaches the GC.
  ##
  ## Note: this creates too strange error messages
  ## when --threads is not on: https://github.com/nim-lang/Nim/issues/9489
  if(omp_get_thread_num()!=0):
    setupForeignThreadGc()

template detachGC*(): untyped =
  ## If you are allocating reference types, sequences or strings
  ## in a parallel section, you need to attach and detach
  ## a GC for each thread. Those should be thread-local temporaries.
  ##
  ## This detaches the GC.
  ##
  ## Note: this creates too strange error messages
  ## when --threads is not on: https://github.com/nim-lang/Nim/issues/9489
  if(omp_get_thread_num()!=0):
    teardownForeignThreadGc()

I'm not sure what's the overhead of attacjing/detaching the GC.

This is the main motivation behind Nim switching to arc/orc GCs.

In the future it will be fine to require Nim 2.0, but I need a better Cuda story (it only works on Nim v1.2 at the moment unless you rebuild a nim.cfg from scratch that replaces --std:gnu++14 by --std:stdc++14.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made a change that ensures that when non ARC/ORC memory management the exception error message is shown. I have also added a test for the exception code when using ARC or ORC.

if too_few_values:
let error_msg = "masked_fill error: the size of the value tensor (" & $value.size &
") is smaller than the number of true elements in the mask"
when not(compileOption("mm", "arc") or compileOption("mm", "orc")):
Copy link
Collaborator

@Vindaar Vindaar Feb 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alterative that could work for older Nim would be:

Suggested change
when not(compileOption("mm", "arc") or compileOption("mm", "orc")):
when not defined(gcDestructors):

but I don't mind to just drop 1.4.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try this change. In any case I think dropping 1.4 makes sense.

@Vindaar
Copy link
Collaborator

Vindaar commented Feb 11, 2024

It looks fine to me now. One could clarify a bit more in the docstrings how different ranks are handled and specify that the mask and fill value tensors must have the same shape. The next thing 'missing' thing would then be a single element fill I guess (which could be convenient e.g. precisely for the type of NaN test case).

Anyway, I think this fine to merge. Please just drop the Nim 1.4 from the CI. I don't see a purpose in continuing to support it officially. Most of it will work regardless.

@AngelEzquerra
Copy link
Contributor Author

It looks fine to me now. One could clarify a bit more in the docstrings how different ranks are handled and specify that the mask and fill value tensors must have the same shape. The next thing 'missing' thing would then be a single element fill I guess (which could be convenient e.g. precisely for the type of NaN test case).

Anyway, I think this fine to merge. Please just drop the Nim 1.4 from the CI. I don't see a purpose in continuing to support it officially. Most of it will work regardless.

How would I drop Nim 1.4 from the CI? Is that something that I can do myself?

@Vindaar
Copy link
Collaborator

Vindaar commented Feb 12, 2024

It looks fine to me now. One could clarify a bit more in the docstrings how different ranks are handled and specify that the mask and fill value tensors must have the same shape. The next thing 'missing' thing would then be a single element fill I guess (which could be convenient e.g. precisely for the type of NaN test case).
Anyway, I think this fine to merge. Please just drop the Nim 1.4 from the CI. I don't see a purpose in continuing to support it officially. Most of it will work regardless.

How would I drop Nim 1.4 from the CI? Is that something that I can do myself?

Ah, sorry. I wanted to add that to my response, but I forgot. Yes, just remove it here in this line:

https://github.com/mratsim/Arraymancer/blob/master/.github/workflows/ci.yml#L17

@AngelEzquerra AngelEzquerra force-pushed the multi_value_masked_fill branch from 8d9cdec to 61fb5f0 Compare February 12, 2024 20:43
@AngelEzquerra
Copy link
Contributor Author

It looks fine to me now. One could clarify a bit more in the docstrings how different ranks are handled and specify that the mask and fill value tensors must have the same shape. The next thing 'missing' thing would then be a single element fill I guess (which could be convenient e.g. precisely for the type of NaN test case).

I've pushed a new commit that improves the documentation of all the masked_fill procedures (including the ones that fill with a scalar value).

Anyway, I think this fine to merge. Please just drop the Nim 1.4 from the CI. I don't see a purpose in continuing to support it officially. Most of it will work regardless.

Nim 1.4 has been dropped on a separate PR.

Document that boolean masks can be used to access and change the elements of a tensor.
Comment on lines 287 to 294
foo[foo >. 27] = -arange(9)

# Tensor[system.int] of shape "[5, 5]" on backend "Cpu"
# |1 1 1 1 1|
# |2 4 8 16 -1|
# |3 9 27 -1 -1|
# |4 16 -1 -1 -1|
# |5 25 -1 -1 -1|
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the output all -1? Shouldn't the numbers decrease? Or is it picking the wrong overload and only uses the first element for some reason?

Copy link
Collaborator

@Vindaar Vindaar Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the latter might be happening, because this (on current master):

import arraymancer

var t = arange(25).reshape([5, 5])
t[t >=. 8] = 111
echo t

yields

Tensor[system.int] of shape "[5, 5]" on backend "Cpu"
|0        1      2      3      4|
|5        6      7    111    111|
|111    111    111    111    111|
|111    111    111    111    111|
|111    111    111    111    111|

(but note that the code in your new example of course does not even compile on master. So I'm not exactly sure why this behavior is as it is)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I just checked out your branch. It seems to work fine on my end. I assume you maybe just copied the wrong output tensor?

echo error_msg
raise newException(IndexDefect, error_msg)

proc masked_fill*[T](t: var Tensor[T], mask: openArray, value: Tensor[T]) =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it openArray and not openArray[bool] for a reason?

@Vindaar
Copy link
Collaborator

Vindaar commented Feb 15, 2024

Really nice work! Could you update the tutorial vandermonde boolean mask example? I'll merge it after. 🥳

@AngelEzquerra AngelEzquerra force-pushed the multi_value_masked_fill branch from 48eaab3 to 7f8eb76 Compare February 15, 2024 11:13
@Vindaar Vindaar merged commit 40ceb53 into mratsim:master Feb 15, 2024
6 checks passed
@AngelEzquerra AngelEzquerra deleted the multi_value_masked_fill branch February 15, 2024 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants