Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for compiling to SoA ABI for CUDA #1103

Open
wants to merge 14 commits into
base: branch-24.03
Choose a base branch
from

Conversation

gmarkall
Copy link

@gmarkall gmarkall commented Dec 6, 2023

This adds a new function, compile_ptx_soa(), which works similarly to Numba's compile_ptx() function, but the compiled code's writes the elements of tuple return types into SoA data structures passed in as pointers.

Example

The Python function:

def addsub(x, y):
    return x + y, x - y

compiled with the compile_ptx_soa() function where:

  • Arguments are of int32 type
  • The return type is a tuple of (int32, int32)

compiles to PTX equivalent to th C function:

void (int32_t *r1, int32_t *r2, int32_t x, int32_t y)
{
  *r1 = x + y;
  *r2 = x - y;
}

Or, as the actual PTX produced:

.visible .func addsub(
	.param .b64 addsub_param_0,
	.param .b64 addsub_param_1,
	.param .b32 addsub_param_2,
	.param .b32 addsub_param_3
)
{
	.reg .b32 	%r<5>;
	.reg .b64 	%rd<3>;


	ld.param.u64 	%rd1, [addsub_param_0];
	ld.param.u64 	%rd2, [addsub_param_1];
	ld.param.u32 	%r1, [addsub_param_2];
	ld.param.u32 	%r2, [addsub_param_3];
	add.s32 	%r3, %r2, %r1;
	sub.s32 	%r4, %r1, %r2;
	st.u32 	[%rd1], %r3;
	st.u32 	[%rd2], %r4;
	ret;

}

Returning a heterogeneous tuple is also possible, For example where the return
type is specified as a tuple of (int32, float32) we get:

.visible .func addsub(
	.param .b64 addsub_param_0,
	.param .b64 addsub_param_1,
	.param .b32 addsub_param_2,
	.param .b32 addsub_param_3
)
{
	.reg .f32 	%f<2>;
	.reg .b32 	%r<4>;
	.reg .b64 	%rd<6>;


	ld.param.u64 	%rd1, [addsub_param_0];
	ld.param.u64 	%rd2, [addsub_param_1];
	ld.param.u32 	%r1, [addsub_param_2];
	ld.param.u32 	%r2, [addsub_param_3];
	add.s32 	%r3, %r2, %r1;
	cvt.s64.s32 	%rd3, %r1;
	cvt.s64.s32 	%rd4, %r2;
	sub.s64 	%rd5, %rd3, %rd4;
	cvt.rn.f32.s64 	%f1, %rd5;
	st.u32 	[%rd1], %r3;
	st.f32 	[%rd2], %f1;
	ret;
}

(Note the st.u32 for the first return value vs. st.f32 for the second).

Copy link

copy-pr-bot bot commented Dec 6, 2023

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@magnatelee
Copy link
Contributor

/ok to test

@magnatelee magnatelee added the category:new-feature PR introduces a new feature and will be classified as such in release notes label Dec 6, 2023
self,
agg: Value,
idx: Union[Iterable[int], int],
name: Optional[str] = "",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Frustratingly, Optional[str] is not a reference to the fact that the parameter has a default, it's just an alias for Union[str, None]. So if the function doesn't actually accept None (which would typically be the case if it's set to "" if not provided), it would be more appropriate to type it simply as str.

@@ -0,0 +1 @@
from llvmlite import ir
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these typings files should also get copyright headers?

# values as SoA and some simplifications to keep it short
if not device:
raise NotImplementedError(
"Only device functions can be compiled for " "the SoA ABI"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Only device functions can be compiled for " "the SoA ABI"
"Only device functions can be compiled for the SoA ABI"

@gmarkall
Copy link
Author

@manopapad Many thanks for having a look over this - I'll resolve the issues you noted shortly.

It looks like tests also failed because Numba is not a dependency of cuNumeric, but it will need to be (at least for numba_utils to work) - I have looked through the various actions scripts and bits and pieces but I can't see what the correct way to ensure Numba is installed is - could you suggest how I can make sure Numba gets installed as required please?

@manopapad
Copy link
Contributor

could you suggest how I can make sure Numba gets installed as required please?

Can you try adding numba around here? https://github.com/nv-legate/cunumeric/blob/branch-24.01/conda/conda-build/meta.yaml#L148

@gmarkall
Copy link
Author

Thanks, just giving that a try now.

@gmarkall
Copy link
Author

/ok to test

@RAMitchell
Copy link

What are the python dependencies being introduced here? Just a note that we may want to make some features optional if they significantly increase packaging complexity.

@gmarkall
Copy link
Author

The dependency being added is Numba, which also depends on llvmlite. These are both available on PyPI as wheels and on conda-forge for Linux x86_64, ppcle64, AArch64, macOS x86_64 and arm64, and Windows on x86_64.

In general I think projects adding Numba as a dependency don't tend to have issues with increased packaging complexity - are there any special cuNumeric packaging requirements I should think about / elaborate on here?

@manopapad
Copy link
Contributor

As @gmarkall noted, numba is widely available, so I'm not very worried about making it a hard dependency. That said, if there's pushback we can certainly make it optional (only necessary if the user wants to use np.vectorize or similar UDF-accepting functions).

@marcinz marcinz changed the base branch from branch-24.01 to branch-24.03 February 22, 2024 01:03
@manopapad
Copy link
Contributor

@bryevdv we've gotten a bit bogged down with mypy on this, would you have some bandwidth to help resolve them?

@bryevdv
Copy link
Contributor

bryevdv commented Feb 26, 2024

@manopapad These errors are because the stubs under typings/numba are not sufficiently fleshed out. For instance, this change:

diff --git a/typings/numba/core/codegen.pyi b/typings/numba/core/codegen.pyi
index a4288cce..409c94c0 100644
--- a/typings/numba/core/codegen.pyi
+++ b/typings/numba/core/codegen.pyi
@@ -1,4 +1,7 @@
 class CodeLibrary:
     codegen: "Codegen"
 
+    @property
+    def name(self) -> str: ...
+
 class Codegen: ...

fixes this error:

cunumeric/numba_utils.py:153:9: error: "CodeLibrary" has no attribute "name"  [attr-defined]
            f"{lib.name}_function_",
            ^~~~~~~~~~~

Either the stubs need to be expanded to cover the usage in this file, or else this file added to the override exclusions in pyproject.toml

@manopapad
Copy link
Contributor

@gmarkall Maybe for now it would be best to add the numba modules that are causing issues to the [[tool.mypy.overrides]] > module section of pyproject.toml, and you can later decide if you want to continue the effort of providing type hints for numba modules?

@gmarkall
Copy link
Author

gmarkall commented May 1, 2024

In the end I couldn't figure out how to make the overrides work so I ended up fixing up the typing. I've merged branch 24.03 and got CI green, so I think this is now ready for consideration / feedback.

@gmarkall gmarkall changed the title [WIP] Add support for compiling to SoA ABI for CUDA Add support for compiling to SoA ABI for CUDA May 1, 2024
@gmarkall gmarkall marked this pull request as ready for review May 1, 2024 15:41
@bryevdv
Copy link
Contributor

bryevdv commented May 1, 2024

@manopapad This is fairly self-contained, so it should be straightforward to port to internal without much conflict, but at this point, should it just land in internal to begin with?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:new-feature PR introduces a new feature and will be classified as such in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants