Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] Create a fresh new Context for args+earg+rets per-launch #1739

Closed
archibate opened this issue Aug 20, 2020 · 4 comments
Closed

[Refactor] Create a fresh new Context for args+earg+rets per-launch #1739

archibate opened this issue Aug 20, 2020 · 4 comments
Assignees
Labels
c++ C++ engineering related enhancement Make existing things or codebases better

Comments

@archibate
Copy link
Collaborator

archibate commented Aug 20, 2020

(Cont. #1722)

Concisely describe the proposed feature
It seems @k-ye's plan is:
ctx->args for passing arguments.
ctx->extra_args for passing extra arguments.
program->result_buffer for passing return values.

And my plan is:
program->ctx_buffer for passing arguments.
program->ctx_buffer + 8 for passing extra arguments.
program->ctx_buffer for passing return values.

This make the logic simpler and less nested.
And hopefully we can unify Args and Earg buffer for further better compatibility on low-end OpenGL 'implementations'.

Describe the solution you'd like (if any)
As described above.

Additional comments
@k-ye Could you let people know your complete plan? Am I guessing right above?

What make me hate Context most is:
It is placed in taichi/runtime/llvm/context.h, however every backend use it 🤣 We can make Context to be LLVM backend specific after this.

@archibate archibate added enhancement Make existing things or codebases better c++ C++ engineering related labels Aug 20, 2020
@archibate archibate changed the title [Refactor] Deprecate struct Context, make arg+ret buffer read-and-write like gtmp does [Refactor] Deprecate struct Context, add program->ctx_buffer read-and-write like gtmp does, unify args and earg Aug 20, 2020
@archibate archibate changed the title [Refactor] Deprecate struct Context, add program->ctx_buffer read-and-write like gtmp does, unify args and earg [Refactor] Deprecate struct Context, add stationary program->ctx_buffer read-and-write like gtmp does, unify args and earg Aug 20, 2020
@archibate archibate changed the title [Refactor] Deprecate struct Context, add stationary program->ctx_buffer read-and-write like gtmp does, unify args and earg [Refactor] Deprecate struct Context, add stationary program->ctx_buffer like gtmp does, unify args and earg Aug 20, 2020
@k-ye
Copy link
Member

k-ye commented Aug 20, 2020

Could you let people know your complete plan? Am I guessing right above?

Yes, you've listed out my plan, so I don't have too much to supplement. There won't be program->context later (or program->ctx_buffer -- program no longer holds Context). Instead, every time we launch a kernel, we create a fresh new Context (see #1723). The fact that there is a single Context throughout the program has created lots of couplings and hacks, especially for the CUDA backend and the async kernel launching works. We're kind of in an awkward situation of having a global variable again.

And hopefully we can unify Args and Earg buffer for further better compatibility on low-end OpenGL 'implementations'.

I hear your concerns. As I have explained in #1724 (review), this will not affect how OpenGL implements the input args/return value SSBOs. That is, it is perfectly fine for OpenGL to use a single SSBO to pass in the input args, and have the OpenGL shader to write back the return value to that SSBO. (Actually I believe Metal is doing the same thing here). All that a backend needs to change is, instead of copying the data from SSBO to context.args[i], we copy it back to program.result_buffer[i], which you've helped fix in #1724 .

To conclude, this shouldn't affect how a backend runtime implements its input/output with that backend's shaders, but does affect how a backend runtime should pass that output back to the Taichi program runtime.

It is placed in taichi/runtime/llvm/context.h, however every backend use it 🤣

+1, this is a good point. I think we should move it to a more generic place, e.g. taichi/program?

We can make Context to be LLVM backend specific after this.

Not really. Context is a generic struct holding the input args for all the backends (therefore i think the title "Deprecate struct Context" is off). If we think about the function produced by each codegen, you'll notice that they all have the fixed signature of void(Context &). Like you've pointed out, what needs to be fixed is that Context probably shouldn't have been placed in the LLVM backends folder.

@archibate
Copy link
Collaborator Author

Instead, every time we launch a kernel, we create a fresh new Context (see #1723).

So, after the WeCreateAFreshNewContext, will we deprecate result_buffer and use WeCreateAFreshNewContext for return values?
If you want 'async', then we'd better deprecate result_buffer too, since it's embed in Program.

@archibate archibate changed the title [Refactor] Deprecate struct Context, add stationary program->ctx_buffer like gtmp does, unify args and earg [Refactor] Create a fresh new Context for args+rets per-launch Aug 21, 2020
@archibate archibate changed the title [Refactor] Create a fresh new Context for args+rets per-launch [Refactor] Create a fresh new Context for args+earg+rets per-launch Aug 21, 2020
@k-ye
Copy link
Member

k-ye commented Aug 21, 2020

So, after the WeCreateAFreshNewContext, will we deprecate result_buffer and use WeCreateAFreshNewContext for return values?

I think we could, but it might take much more effort to decouple result_buffer. And I would worry less about result buffer, because a kernel who returns value is a natural synchronizing point, so there is no async execution issue.

IMHO it might be better to get all the backends aligned on how result_buffer is used. Like @yuanming-hu suggested in #1709 (comment), when syncing back the return value, the print buffer, error code, etc, all the backends should follow the same convention, so that we don't need to duplicate ourselves. WDYT?

I'd also like to confirm on my understanding that switching to result_buffer shouldn't really create SSBO exceeded issue on OpenGL?

@archibate
Copy link
Collaborator Author

So, after the WeCreateAFreshNewContext, will we deprecate result_buffer and use WeCreateAFreshNewContext for return values?

I think we could, but it might take much more effort to decouple result_buffer. And I would worry less about result buffer, because a kernel who returns value is a natural synchronizing point, so there is no async execution issue.

What's the downside and difficulty of decoupling result_buffer?

IMHO it might be better to get all the backends aligned on how result_buffer is used. Like @yuanming-hu suggested in #1709 (comment), when syncing back the return value, the print buffer, error code, etc, all the backends should follow the same convention, so that we don't need to duplicate ourselves. WDYT?

OOK, so the is CUDA and CPU using print buffer like we did in Metal and OpenGL? I.e. string table?
Also note that the C backend actually use native printf instead..

I'd also like to confirm on my understanding that switching to result_buffer shouldn't really create SSBO exceeded issue on OpenGL?

Yes it doesn't create (with some mocks). I don't feel like these dirty mocks.
Using result_buffer make OpenGL abd CC sick. Using args[0] make OpenGL and CC happy.
If CUDA have a different sync infrastructure, why not make result_buffer a CUDA only variable?
And the CUDA backend should args[0] = *result_buffer in kernels that have return.

@yuanming-hu yuanming-hu mentioned this issue Oct 22, 2020
21 tasks
@k-ye k-ye closed this as completed Jul 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ C++ engineering related enhancement Make existing things or codebases better
Projects
None yet
Development

No branches or pull requests

2 participants