Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tir][testing] Enable intra-IRModule primfunc call #13569

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cconvey
Copy link
Contributor

@cconvey cconvey commented Dec 7, 2022

Changes / fixes to allow PrimFunc-PrimFunc calls
within the same IRModule:

  • Fix bug in BufferAllocationLocator constructor in which
    BufferDecl nodes could trigger duplicate buffer allocations.

  • Add tvm::CallingConv::kIntraModule calling convetion, to
    allow a PrimFunc's signature to remain unchanged by the
    MakePackedAPI pass during lowering.

  • Add tests/python/contrib/test_hexagon/test_call_tir.py unit tests
    to demonstrate one viable approach for intra-module PrimFunc
    calls.

  • Add a new tensor-content generator, TensorContentFullRangeCOrder,
    to tests/python/contrib/test_hexagon/pytest_util.py.

@tvm-bot
Copy link
Collaborator

tvm-bot commented Dec 7, 2022

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

@cconvey cconvey force-pushed the call-extern-ptr branch 2 times, most recently from dd4e852 to 22fe7b6 Compare December 7, 2022 17:24
@cconvey cconvey force-pushed the call-extern-ptr branch 5 times, most recently from 7545f1d to 650f930 Compare January 9, 2023 20:29
@cconvey cconvey changed the title WIP [hexagon][tir][testing] same-module call tests [tir][testing] same-module call tests Jan 9, 2023
@cconvey cconvey changed the title [tir][testing] same-module call tests [tir][test] Enable intra-IRModule primfunc call Jan 9, 2023
@cconvey cconvey force-pushed the call-extern-ptr branch 2 times, most recently from 7f3fb2a to 5edefbe Compare January 11, 2023 17:12
@cconvey cconvey changed the title [tir][test] Enable intra-IRModule primfunc call [tir][testing] Enable intra-IRModule primfunc call Jan 11, 2023
@cconvey cconvey marked this pull request as ready for review January 11, 2023 19:28
@cconvey
Copy link
Contributor Author

cconvey commented Jan 11, 2023

@Lunderberg @areusch : Mind giving this a look?

@JosephTheOctonaut : FYI

Copy link
Contributor

@Lunderberg Lunderberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good, just a couple of changes that would be useful, mostly with regard to numpy.

* Indicates that various signature transformations (e.g. those provided by the
* MakePackedAPI pass) are not desired.
*/
kIntraModule = 3,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the specification of the intent here, rather than just stating "disable packed API" or something equivalent. I think I'd expand on the statement that all callers of this function may be assumed to be within the current module, specifically that this function provides no guarantees as to function signature to end-users.

That way, it makes it clear that this isn't just a statement of the current behavior, but also what sort of changes may be made in the future. (e.g. If a parameter of an internal function is unused, it may be stripped from the internal function and removed from all callsites, but this wouldn't be legal for a user-facing function)

Copy link
Member

@tqchen tqchen Jan 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to specify that this is a convention that will be lowered (to another low-level convention), or if it is supposed to a final C convention (with restrictions)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to specify that this is a convention that will be lowered (to another low-level convention), or if it is supposed to a final C convention (with restrictions)

Actually I was planning to leave that unstated, because I wanted the freedom to tweak the details without breaking any TCMScript that uses this.

Do you feel that there's a benefit in being explicit that outweighs that flexibility?

min_value = np.iinfo(np_dtype).min
max_value = np.iinfo(np_dtype).max

next_value = min_value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numpy has a couple functions that may be useful here. The array could be generated as np.arange(shape[0]*shape[1]).reshape(shape).astype(np_dtype). For integer types, this would start at zero and wrap at the maximum value of each int.

Copy link
Contributor Author

@cconvey cconvey Jan 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Turns out it's a bit moot, because I found an existing utility (tests/python/contrib/test_hexagon/pytest_util.py) that mostly does this already. I'm going to extend it a little and use it in an updated version of this PR.

tests/python/contrib/test_hexagon/test_call_tir.py Outdated Show resolved Hide resolved
@cconvey
Copy link
Contributor Author

cconvey commented Jan 21, 2023

@Lunderberg : Ready for re-review.

@cconvey cconvey requested a review from Lunderberg January 21, 2023 19:12
Copy link
Contributor

@Lunderberg Lunderberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you on the changes, and LGTM!

@areusch
Copy link
Contributor

areusch commented Jan 23, 2023

@mbaret @tqchen should also have a look at this

@tqchen
Copy link
Member

tqchen commented Jan 24, 2023

would be great to get @vinx13 's input on this as well

* - Supported use cases, and their corresponding unit tests, are all expressed as
* TVMScript.
*
* - The callsite must use `T.call_extern`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor note, we might want to think a bit about possible intrinsics here, given call_extern already have semantics that normally assumes(C semantics). It might be preferrable to have a different intrinsic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, perhaps we can clarify it as C calling conv, effectively pass things as they are, which seems to match the behavior of call_extern and current behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback!

Could you clarify what you mean by "C calling convention" in this context?

I've only heard that term used to describe the mapping of C-language types to (e.g.) x86-64 ABI -compliant function calls.

But in this PR's context, the question (AFAICT) is how we map TVMScript callsite arguments to TVMScript formal parameters. So I'm not sure how this would relate to a C calling convention.

Copy link
Member

@tqchen tqchen Jan 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to get some clarity here. That boils down how we map this cross function call in codegen phase.

  • If the call maps into call_extern => caller side LLVM call, callee side just present the function as LLVM function. Then it is the common calling convention(either known as default per defined by platform)
  • The reason why it is related to "C" calling convention is that if the callee side say was implemented as a C function with the same signature(and compile that say through hexagon-llvm, or platform-c), and we compile caller side as it is, it will still work

If the cross function call will be further lowered(and the behavior), then it might be useful to give it a different name. IntraModule might be slightly too generic(since we can also use Packed call intra module), also the caller side might need a different intrinsic(as call_extern implies call into a function with C calling convention). So if we want to state that this is a convention that will be further lowered, just a strawman: (e.g. InternalModuleCall, call_internal)

So the main comment to clarify this is to relate the calling convention as much as possible to existing known ones, or clarify it so it is differentiated from existing known ones(by different intrinsics and name).

After reading the context, seems we lean towards later. In that case, it might still be useful to specify what would be the default lowering of InternalModuleCall and call_internal. Since they have to be lowered together(if we map call_internal say to packed convention and function with internal InternalModuleCall without packed), then we will get an error. By reading the spec, the most likely outcome now seems to be lower to C calling convention.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as call_extern implies call into a function with C calling convention

To double-check, is this the case for functions that exist within the same module? That is, because the name resolution in the LLVM codegen preferentially resolves to names within the existing module, my mental model was that the calling convention was only fixed for calls into external functions, or for callees that may be called externally. Therefore, the internal-only function would be sufficient to modify any matching call_extern instances.

So if we want to state that this is a convention that will be further lowered, just a strawman: (e.g. InternalModuleCall, call_internal)

From the earlier design discussions, this was brought up as a potential later step, for when using call_extern becomes too unwieldy, or unable to express the degrees of freedom required by internal calls. Since it seemed that the changes to call_extern were legal with the current semantics, we hadn't moved to make a new builtin for it.

IntraModule might be slightly too generic(since we can also use Packed call intra module),

I'm not quite seeing the issue with the name here. The semantics we were picturing was that while any calling convention may be used for internal calls, only the kIntraModule calling convention That is, kIntraModule implies an internal function, but not being kIntraMoudule does not imply the existence of external calls. The difference would be that a pass could modify the signature of a kIntraModule function (e.g. removing an unused argument from the signature, simultaneously updating all callsites), while a PackedFunc could not be modified without breaking the potential external callsites.

After reading the context, seems we lean towards later. In that case, it might still be useful to specify what would be the default lowering of InternalModuleCall and call_internal.

The concern was that by specifying that behavior, it would enable assumptions of the final calling convention before it had been selected, mixing the different layers of abstraction. That is, somebody could erroneously generate a callsite that follows the PackedFunc API, on the assumption that the callee will also be lowered to the PackedFunc API. By leaving it unspecified, it makes that misunderstanding less likely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the discussion.

The difference would be that a pass could modify the signature of a kIntraModule function (e.g. removing an unused argument from the signature, simultaneously updating all callsites), while a PackedFunc could not be modified without breaking the potential external callsites.

This have to do with linkage which is somewhat parallel to the calling convention. We should be able to modify a signature of a function if it is not being referred from outside (in our current case, no global_symbol attribute indicates a private linkage), and we can do such setups independent from the calling convention.

it would enable assumptions of the final calling convention before it had been selected, mixing the different layers of abstraction

Indeed this is what we aim to avoid here. That is why normally we try to have pairs of(callsite intrinsic, and callee side calling convention annotations). For example,

  • InternalModuleCall (callee side conv) and call_internal(caller side intrinsic) would make such pairing clear.

In the current proposal, we are trying to pair (kIntraModule and call_extern). Where previously in our convention call_extern was normally paired with C calling conv(mainly because we also can generate such calls).

BTW, I agree that it is OK to leave default lowering behavior unspecified, as long as the pairing is clear. But in this case, the possible confusion is that we already have lowering behavior for call_extern defined by our backend. So only leaving the callee site unspecified could leads to possible point of confusion.

So the main comment here is let us make sure our pairing is clear. Ideally we have a unique set of pairings, e.g. call_extern only maps to one callee-site calling conv. Introducing a extra intrinsic to pair with this one, or clarify that this is the only thing that call_extern pairs(in which case perhaps C calling conv is more proper) might be more clear.

The internal property have things to do with linkage and can be handled separately

Copy link
Contributor

@wrongtest-intellif wrongtest-intellif Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introducing a extra intrinsic

Could we leverage the fact that call->op can be GlobalVar to explicitly pair with kIntraModule function?

For call_extern , I believe call_extern an intra-module function by name should work in general settings. While except the calling convention issue, it seems also assume the presence of symbol lookup machanism in runtime (like dlsym in glibc), thus may not perfectly match so-called "intra-module" semantic? For example, one may want to compile both __global__ function and __device__ function (maybe kIntraModule) into single cuda module and invoke the __device__ function directly from __global__ function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wrongtest-intellif : Thanks very much for the thoughtful suggestion!

I don't have a ready answer to your suggestion, which means I have some more homework to do on my side before proceeding with the PR. Much appreciated!

Changes / fixes to allow `PrimFunc`-`PrimFunc` calls
within the same `IRModule`:

- Fix bug in `BufferAllocationLocator` constructor in which
    `BufferDecl` nodes could trigger duplicate buffer allocations.

- Add `tvm::CallingConv::kIntraModule` calling convetion, to
  allow a PrimFunc's signature to remain unchanged by the
  `MakePackedAPI` pass during lowering.

- Add `tests/python/contrib/test_hexagon/test_call_tir.py` unit tests
  to demonstrate one viable approach for intra-module `PrimFunc`
  calls.

- Add a new tensor-content generator, `TensorContentFullRangeCOrder`,
  to `tests/python/contrib/test_hexagon/pytest_util.py`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants