-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tir][testing] Enable intra-IRModule primfunc call #13569
base: main
Are you sure you want to change the base?
Conversation
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.
Generated by tvm-bot |
dd4e852
to
22fe7b6
Compare
22fe7b6
to
03433e4
Compare
7545f1d
to
650f930
Compare
7f3fb2a
to
5edefbe
Compare
@Lunderberg @areusch : Mind giving this a look? @JosephTheOctonaut : FYI |
5edefbe
to
ea72ff3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good, just a couple of changes that would be useful, mostly with regard to numpy.
* Indicates that various signature transformations (e.g. those provided by the | ||
* MakePackedAPI pass) are not desired. | ||
*/ | ||
kIntraModule = 3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the specification of the intent here, rather than just stating "disable packed API" or something equivalent. I think I'd expand on the statement that all callers of this function may be assumed to be within the current module, specifically that this function provides no guarantees as to function signature to end-users.
That way, it makes it clear that this isn't just a statement of the current behavior, but also what sort of changes may be made in the future. (e.g. If a parameter of an internal function is unused, it may be stripped from the internal function and removed from all callsites, but this wouldn't be legal for a user-facing function)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great to specify that this is a convention that will be lowered (to another low-level convention), or if it is supposed to a final C convention (with restrictions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great to specify that this is a convention that will be lowered (to another low-level convention), or if it is supposed to a final C convention (with restrictions)
Actually I was planning to leave that unstated, because I wanted the freedom to tweak the details without breaking any TCMScript that uses this.
Do you feel that there's a benefit in being explicit that outweighs that flexibility?
min_value = np.iinfo(np_dtype).min | ||
max_value = np.iinfo(np_dtype).max | ||
|
||
next_value = min_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Numpy has a couple functions that may be useful here. The array could be generated as np.arange(shape[0]*shape[1]).reshape(shape).astype(np_dtype)
. For integer types, this would start at zero and wrap at the maximum value of each int.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Turns out it's a bit moot, because I found an existing utility (tests/python/contrib/test_hexagon/pytest_util.py
) that mostly does this already. I'm going to extend it a little and use it in an updated version of this PR.
ea72ff3
to
bb879bb
Compare
@Lunderberg : Ready for re-review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you on the changes, and LGTM!
would be great to get @vinx13 's input on this as well |
* - Supported use cases, and their corresponding unit tests, are all expressed as | ||
* TVMScript. | ||
* | ||
* - The callsite must use `T.call_extern`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor note, we might want to think a bit about possible intrinsics here, given call_extern already have semantics that normally assumes(C semantics). It might be preferrable to have a different intrinsic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, perhaps we can clarify it as C calling conv, effectively pass things as they are, which seems to match the behavior of call_extern and current behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback!
Could you clarify what you mean by "C calling convention" in this context?
I've only heard that term used to describe the mapping of C-language types to (e.g.) x86-64 ABI -compliant function calls.
But in this PR's context, the question (AFAICT) is how we map TVMScript callsite arguments to TVMScript formal parameters. So I'm not sure how this would relate to a C calling convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to get some clarity here. That boils down how we map this cross function call in codegen phase.
- If the call maps into call_extern => caller side LLVM call, callee side just present the function as LLVM function. Then it is the common calling convention(either known as default per defined by platform)
- The reason why it is related to "C" calling convention is that if the callee side say was implemented as a C function with the same signature(and compile that say through hexagon-llvm, or platform-c), and we compile caller side as it is, it will still work
If the cross function call will be further lowered(and the behavior), then it might be useful to give it a different name. IntraModule might be slightly too generic(since we can also use Packed call intra module), also the caller side might need a different intrinsic(as call_extern implies call into a function with C calling convention). So if we want to state that this is a convention that will be further lowered, just a strawman: (e.g. InternalModuleCall, call_internal)
So the main comment to clarify this is to relate the calling convention as much as possible to existing known ones, or clarify it so it is differentiated from existing known ones(by different intrinsics and name).
After reading the context, seems we lean towards later. In that case, it might still be useful to specify what would be the default lowering of InternalModuleCall and call_internal. Since they have to be lowered together(if we map call_internal say to packed convention and function with internal InternalModuleCall without packed), then we will get an error. By reading the spec, the most likely outcome now seems to be lower to C calling convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as call_extern implies call into a function with C calling convention
To double-check, is this the case for functions that exist within the same module? That is, because the name resolution in the LLVM codegen preferentially resolves to names within the existing module, my mental model was that the calling convention was only fixed for calls into external functions, or for callees that may be called externally. Therefore, the internal-only function would be sufficient to modify any matching call_extern
instances.
So if we want to state that this is a convention that will be further lowered, just a strawman: (e.g. InternalModuleCall, call_internal)
From the earlier design discussions, this was brought up as a potential later step, for when using call_extern
becomes too unwieldy, or unable to express the degrees of freedom required by internal calls. Since it seemed that the changes to call_extern
were legal with the current semantics, we hadn't moved to make a new builtin for it.
IntraModule might be slightly too generic(since we can also use Packed call intra module),
I'm not quite seeing the issue with the name here. The semantics we were picturing was that while any calling convention may be used for internal calls, only the kIntraModule
calling convention That is, kIntraModule
implies an internal function, but not being kIntraMoudule
does not imply the existence of external calls. The difference would be that a pass could modify the signature of a kIntraModule
function (e.g. removing an unused argument from the signature, simultaneously updating all callsites), while a PackedFunc could not be modified without breaking the potential external callsites.
After reading the context, seems we lean towards later. In that case, it might still be useful to specify what would be the default lowering of InternalModuleCall and call_internal.
The concern was that by specifying that behavior, it would enable assumptions of the final calling convention before it had been selected, mixing the different layers of abstraction. That is, somebody could erroneously generate a callsite that follows the PackedFunc API, on the assumption that the callee will also be lowered to the PackedFunc API. By leaving it unspecified, it makes that misunderstanding less likely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the discussion.
The difference would be that a pass could modify the signature of a kIntraModule function (e.g. removing an unused argument from the signature, simultaneously updating all callsites), while a PackedFunc could not be modified without breaking the potential external callsites.
This have to do with linkage which is somewhat parallel to the calling convention. We should be able to modify a signature of a function if it is not being referred from outside (in our current case, no global_symbol
attribute indicates a private linkage), and we can do such setups independent from the calling convention.
it would enable assumptions of the final calling convention before it had been selected, mixing the different layers of abstraction
Indeed this is what we aim to avoid here. That is why normally we try to have pairs of(callsite intrinsic, and callee side calling convention annotations). For example,
InternalModuleCall
(callee side conv) andcall_internal
(caller side intrinsic) would make such pairing clear.
In the current proposal, we are trying to pair (kIntraModule
and call_extern
). Where previously in our convention call_extern
was normally paired with C calling conv(mainly because we also can generate such calls).
BTW, I agree that it is OK to leave default lowering behavior unspecified, as long as the pairing is clear. But in this case, the possible confusion is that we already have lowering behavior for call_extern
defined by our backend. So only leaving the callee site unspecified could leads to possible point of confusion.
So the main comment here is let us make sure our pairing is clear. Ideally we have a unique set of pairings, e.g. call_extern
only maps to one callee-site calling conv. Introducing a extra intrinsic to pair with this one, or clarify that this is the only thing that call_extern pairs(in which case perhaps C calling conv is more proper) might be more clear.
The internal property have things to do with linkage and can be handled separately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introducing a extra intrinsic
Could we leverage the fact that call->op
can be GlobalVar
to explicitly pair with kIntraModule
function?
For call_extern
, I believe call_extern
an intra-module function by name should work in general settings. While except the calling convention issue, it seems also assume the presence of symbol lookup machanism in runtime (like dlsym in glibc), thus may not perfectly match so-called "intra-module" semantic? For example, one may want to compile both __global__ function and __device__ function (maybe kIntraModule) into single cuda module and invoke the __device__ function directly from __global__ function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wrongtest-intellif : Thanks very much for the thoughtful suggestion!
I don't have a ready answer to your suggestion, which means I have some more homework to do on my side before proceeding with the PR. Much appreciated!
bb879bb
to
2f25578
Compare
Changes / fixes to allow `PrimFunc`-`PrimFunc` calls within the same `IRModule`: - Fix bug in `BufferAllocationLocator` constructor in which `BufferDecl` nodes could trigger duplicate buffer allocations. - Add `tvm::CallingConv::kIntraModule` calling convetion, to allow a PrimFunc's signature to remain unchanged by the `MakePackedAPI` pass during lowering. - Add `tests/python/contrib/test_hexagon/test_call_tir.py` unit tests to demonstrate one viable approach for intra-module `PrimFunc` calls. - Add a new tensor-content generator, `TensorContentFullRangeCOrder`, to `tests/python/contrib/test_hexagon/pytest_util.py`.
2f25578
to
dd44766
Compare
Changes / fixes to allow
PrimFunc
-PrimFunc
callswithin the same
IRModule
:Fix bug in
BufferAllocationLocator
constructor in whichBufferDecl
nodes could trigger duplicate buffer allocations.Add
tvm::CallingConv::kIntraModule
calling convetion, toallow a PrimFunc's signature to remain unchanged by the
MakePackedAPI
pass during lowering.Add
tests/python/contrib/test_hexagon/test_call_tir.py
unit teststo demonstrate one viable approach for intra-module
PrimFunc
calls.
Add a new tensor-content generator,
TensorContentFullRangeCOrder
,to
tests/python/contrib/test_hexagon/pytest_util.py
.