Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic @exclusive sizes #121

Closed
dmed256 opened this issue Jun 3, 2018 · 14 comments · Fixed by #625
Closed

Dynamic @exclusive sizes #121

dmed256 opened this issue Jun 3, 2018 · 14 comments · Fixed by #625
Labels
bug Use this label when reporting bugs! parser

Comments

@dmed256
Copy link
Member

dmed256 commented Jun 3, 2018

@exclusive array sizes for CPU modes are hard-coded to 256

We should use allocate the size depending on the full @inner loop size

@dmed256 dmed256 added parser bug Use this label when reporting bugs! labels Jun 3, 2018
@dmed256 dmed256 changed the title Dynamic @-exclusive sizes Dynamic @exclusive sizes Jun 3, 2018
@pdhahn
Copy link
Contributor

pdhahn commented Jul 25, 2018

What do you do when there is more than one @inner loop sharing the same @exclusive variable, but those @inner loops have different sizes? Do the sizes of all @inner loops within the same @outer loop have to have the same size? If so, I suppose there is no issue, except you really ought to verify that either statically at compile time if possible, or dynamically at runtime. But if they can have different sizes, then it brings into question what @exclusive means logically in terms of usage.

I am wondering if @exclusive is an idea that is best associated with per-thread use cases, rather than per-loop-iteration use cases? The former can be accommodated readily since the max number of threads can be determined and used to size a @exclusive shared array variable.

@dmed256
Copy link
Member Author

dmed256 commented Jul 25, 2018

Right now, @inner loops need to share the same number of iterations. However, @exclusive arrays are hard-coded to 256 entries which creates hard-to-debug bugs

I agree there should be a runtime check before launching kernels to verify the sizes

There isn't any thread local storage-like memory in OCCA, not sure how that would map to GPU memory but maybe something to keep in mind :)

@pdhahn
Copy link
Contributor

pdhahn commented Jul 25, 2018

Maybe only require (emit verification code) that all @inner loops have the same size when @exclusive is present. If @exclusive is not present within the @outer loop, then allow @inner loops to have different sizes.

@dmed256
Copy link
Member Author

dmed256 commented Jul 25, 2018

@pdhahn Make sure to use

`

to escape @attributes to prevent emailing random people

@pdhahn
Copy link
Contributor

pdhahn commented Jul 25, 2018

oops :-)

@dmed256
Copy link
Member Author

dmed256 commented Jul 25, 2018

np, I was accidentally emailing @dim a lot and he kindly replied with that info

@dmed256
Copy link
Member Author

dmed256 commented Jul 25, 2018

The @inner loop size restriction is due to GPU blocks/work groups needing to be the same size :(

@pdhahn
Copy link
Contributor

pdhahn commented Jul 25, 2018

Yes that is partly the motivation for my earlier allusion to a thread-oriented meaning for exclusive vs. an iteration-oriented meaning. But the loop variable lower and upper bounds, at least as specified by the OKL programmer, are arbitrary and are what is proposed to determine the size of the exclusive variable array, correct?

@dmed256
Copy link
Member Author

dmed256 commented Jul 25, 2018

Yeah, based on the number of iterations and like the docs say

The concept of exclusive memory is similar to thread-local storage, where a single variable actually has one value per thread.

so it's more like TLS than iterations

@pdhahn
Copy link
Contributor

pdhahn commented Jul 25, 2018

OK. I think I misinterpreted your first comment about allocating the exclusive memory array based on "full inner loop size", where I thought you meant the latter was defined by the loop index variable bounds at the logical OKL program code level, as specified by the OKL programmer, so there would always be one array element per iteration (unrelated to threads). But one element per thread (TLS) makes total sense, at least when the inner loop index variable does not exceed the max. number of threads per block. Like you said, the latter can be readily determined, e.g. as device work group size.

BTW it would be ideal if the OKL programmer did not have to consider any issues related to physical device constraints on granularity of the parallelization in the outer/inner loops (i.e., how computationally, for the ubiquitous block-oriented topology assumed by OCCA, the device hardware dimensions map to logical dimensions), such as max threads per work group. Ideally, that is all abstracted away for him completely, and he is free to specify outer/inner dimensions based on the raw, ungrouped extent of the data to be processed (e.g., like we can do using OpenMP parallel for).. Or practically, abstract away at least as much as possible. OCCA/OKL goes a really long way in this regard, but may not be at the ideal point quite yet.

@dmed256
Copy link
Member Author

dmed256 commented Jul 25, 2018

I think your first interpretation was right, I meant the concept was similar

  • TLS: 1 thread - 1 value
  • @exclusive: 1 iteration - 1 value

it would be ideal if the OKL programmer did not have to consider any issues related to physical device constraints on granularity of the parallelization in the outer/inner loops

👍 I agree

It might mean OKL auto-tiles outer and inner loops if the loops go out of the device bounds (like too many threads or too many iterations for exclusives)

@jlchan
Copy link

jlchan commented Feb 14, 2019

A note - I've run into memory errors related to this limitation when the size of inner(0) > 256.

@dmed256
Copy link
Member Author

dmed256 commented Feb 14, 2019

@jlchan sorry about that! Maybe we should increase the number as a temporary fix

@jlchan
Copy link

jlchan commented Feb 14, 2019

no worries. I don't need it at the moment, but would it be useful to just add a warning flag during the OKL build?

@dmed256 dmed256 added this to the v1.1.0 milestone Feb 16, 2019
@dmed256 dmed256 modified the milestones: v1.1.0, v1.2.0 Jun 7, 2020
@dmed256 dmed256 removed this from the v1.2.0 milestone Jan 19, 2021
@noelchalmers noelchalmers linked a pull request Sep 30, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Use this label when reporting bugs! parser
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants