Dynamic @exclusive sizes #121

dmed256 · 2018-06-03T14:27:22Z

@exclusive array sizes for CPU modes are hard-coded to 256

We should use allocate the size depending on the full @inner loop size

The text was updated successfully, but these errors were encountered:

pdhahn · 2018-07-25T14:19:23Z

What do you do when there is more than one @inner loop sharing the same @exclusive variable, but those @inner loops have different sizes? Do the sizes of all @inner loops within the same @outer loop have to have the same size? If so, I suppose there is no issue, except you really ought to verify that either statically at compile time if possible, or dynamically at runtime. But if they can have different sizes, then it brings into question what @exclusive means logically in terms of usage.

I am wondering if @exclusive is an idea that is best associated with per-thread use cases, rather than per-loop-iteration use cases? The former can be accommodated readily since the max number of threads can be determined and used to size a @exclusive shared array variable.

dmed256 · 2018-07-25T14:32:54Z

Right now, @inner loops need to share the same number of iterations. However, @exclusive arrays are hard-coded to 256 entries which creates hard-to-debug bugs

I agree there should be a runtime check before launching kernels to verify the sizes

There isn't any thread local storage-like memory in OCCA, not sure how that would map to GPU memory but maybe something to keep in mind :)

pdhahn · 2018-07-25T15:09:28Z

Maybe only require (emit verification code) that all @inner loops have the same size when @exclusive is present. If @exclusive is not present within the @outer loop, then allow @inner loops to have different sizes.

dmed256 · 2018-07-25T15:12:18Z

@pdhahn Make sure to use

to escape @attributes to prevent emailing random people

pdhahn · 2018-07-25T15:13:21Z

oops :-)

dmed256 · 2018-07-25T15:14:02Z

np, I was accidentally emailing @dim a lot and he kindly replied with that info

dmed256 · 2018-07-25T15:14:38Z

The @inner loop size restriction is due to GPU blocks/work groups needing to be the same size :(

pdhahn · 2018-07-25T15:22:19Z

Yes that is partly the motivation for my earlier allusion to a thread-oriented meaning for exclusive vs. an iteration-oriented meaning. But the loop variable lower and upper bounds, at least as specified by the OKL programmer, are arbitrary and are what is proposed to determine the size of the exclusive variable array, correct?

dmed256 · 2018-07-25T20:15:18Z

Yeah, based on the number of iterations and like the docs say

The concept of exclusive memory is similar to thread-local storage, where a single variable actually has one value per thread.

so it's more like TLS than iterations

pdhahn · 2018-07-25T21:53:13Z

OK. I think I misinterpreted your first comment about allocating the exclusive memory array based on "full inner loop size", where I thought you meant the latter was defined by the loop index variable bounds at the logical OKL program code level, as specified by the OKL programmer, so there would always be one array element per iteration (unrelated to threads). But one element per thread (TLS) makes total sense, at least when the inner loop index variable does not exceed the max. number of threads per block. Like you said, the latter can be readily determined, e.g. as device work group size.

BTW it would be ideal if the OKL programmer did not have to consider any issues related to physical device constraints on granularity of the parallelization in the outer/inner loops (i.e., how computationally, for the ubiquitous block-oriented topology assumed by OCCA, the device hardware dimensions map to logical dimensions), such as max threads per work group. Ideally, that is all abstracted away for him completely, and he is free to specify outer/inner dimensions based on the raw, ungrouped extent of the data to be processed (e.g., like we can do using OpenMP parallel for).. Or practically, abstract away at least as much as possible. OCCA/OKL goes a really long way in this regard, but may not be at the ideal point quite yet.

dmed256 · 2018-07-25T22:03:47Z

I think your first interpretation was right, I meant the concept was similar

TLS: 1 thread - 1 value
@exclusive: 1 iteration - 1 value

it would be ideal if the OKL programmer did not have to consider any issues related to physical device constraints on granularity of the parallelization in the outer/inner loops

👍 I agree

It might mean OKL auto-tiles outer and inner loops if the loops go out of the device bounds (like too many threads or too many iterations for exclusives)

jlchan · 2019-02-14T20:55:44Z

A note - I've run into memory errors related to this limitation when the size of inner(0) > 256.

dmed256 · 2019-02-14T23:20:46Z

@jlchan sorry about that! Maybe we should increase the number as a temporary fix

jlchan · 2019-02-14T23:24:21Z

no worries. I don't need it at the moment, but would it be useful to just add a warning flag during the OKL build?

dmed256 added parser bug Use this label when reporting bugs! labels Jun 3, 2018

dmed256 changed the title ~~Dynamic @-exclusive sizes~~ Dynamic @exclusive sizes Jun 3, 2018

dmed256 mentioned this issue Jul 15, 2018

exclusive ints not preserved past barrier #158

Closed

dmed256 added this to the v1.1.0 milestone Feb 16, 2019

dmed256 modified the milestones: v1.1.0, v1.2.0 Jun 7, 2020

dmed256 removed this from the v1.2.0 milestone Jan 19, 2021

dmed256 mentioned this issue Mar 15, 2021

Hardwired exclusive index in Serial mode #488

Closed

noelchalmers mentioned this issue Sep 30, 2022

[Serial] Add dynamic exclusive variable sizes #625

Merged

noelchalmers linked a pull request Sep 30, 2022 that will close this issue

[Serial] Add dynamic exclusive variable sizes #625

Merged

kris-rowe closed this as completed Dec 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic @exclusive sizes #121

Dynamic @exclusive sizes #121

dmed256 commented Jun 3, 2018

pdhahn commented Jul 25, 2018 •

edited by dmed256

Loading

dmed256 commented Jul 25, 2018

pdhahn commented Jul 25, 2018 •

edited by dmed256

Loading

dmed256 commented Jul 25, 2018 •

edited

Loading

pdhahn commented Jul 25, 2018

dmed256 commented Jul 25, 2018

dmed256 commented Jul 25, 2018

pdhahn commented Jul 25, 2018

dmed256 commented Jul 25, 2018

pdhahn commented Jul 25, 2018 •

edited

Loading

dmed256 commented Jul 25, 2018

jlchan commented Feb 14, 2019

dmed256 commented Feb 14, 2019

jlchan commented Feb 14, 2019

Dynamic @exclusive sizes #121

Dynamic @exclusive sizes #121

Comments

dmed256 commented Jun 3, 2018

pdhahn commented Jul 25, 2018 • edited by dmed256 Loading

dmed256 commented Jul 25, 2018

pdhahn commented Jul 25, 2018 • edited by dmed256 Loading

dmed256 commented Jul 25, 2018 • edited Loading

pdhahn commented Jul 25, 2018

dmed256 commented Jul 25, 2018

dmed256 commented Jul 25, 2018

pdhahn commented Jul 25, 2018

dmed256 commented Jul 25, 2018

pdhahn commented Jul 25, 2018 • edited Loading

dmed256 commented Jul 25, 2018

jlchan commented Feb 14, 2019

dmed256 commented Feb 14, 2019

jlchan commented Feb 14, 2019

pdhahn commented Jul 25, 2018 •

edited by dmed256

Loading

pdhahn commented Jul 25, 2018 •

edited by dmed256

Loading

dmed256 commented Jul 25, 2018 •

edited

Loading

pdhahn commented Jul 25, 2018 •

edited

Loading