-
Notifications
You must be signed in to change notification settings - Fork 637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add build and test workflow for PJRT plugin in pkgci #19222
Conversation
Signed-off-by: PragmaTwice <twice@apache.org>
# Customize defaults. | ||
option(IREE_BUILD_COMPILER "Disable compiler for runtime-library build" OFF) | ||
# IREE_BUILD_COMPILER should be enabled to make target IREELLVMIncludeSetup available, | ||
# which is required by PJRT dylib targets | ||
option(IREE_BUILD_COMPILER "Enable compiler for runtime-library build" ON) | ||
option(IREE_BUILD_SAMPLES "Disable samples for runtime-library build" OFF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A full compiler build shouldn't be required but the logs are reassuring: https://github.com/iree-org/iree/actions/runs/11927773223/job/33275201028?pr=19222#step:9:17454 1 minute for that build, and only 463 build targets. Much better than a full compiler build with 7500 targets (upwards of 20 minutes on large CPU build machines, 4 hours+ on smaller runners).
Let's see where the PJRT dylib targets depend on IREELLVMIncludeSetup
or other compiler details...
These deps point at the compiler headers and loader
iree/integrations/pjrt/src/iree_pjrt/common/CMakeLists.txt
Lines 44 to 59 in b5b8059
iree_cc_library( | |
NAME | |
compiler | |
HDRS | |
"compiler.h" | |
SRCS | |
"hlo_partitioner.cc" | |
"iree_compiler.cc" | |
DEPS | |
::debugging | |
iree::compiler::bindings::c::headers | |
iree::compiler::bindings::c::loader | |
iree_pjrt::partitioner_api | |
iree_pjrt::partitioner_api::loader | |
PUBLIC | |
) |
Those shouldn't need a full source build, but they do actually need this option defined to work (today). See also the notes in this template repository that uses the compiler API in a similar way: https://github.com/iree-org/iree-template-compiler-cmake/blob/10e7a3a3e7a08d0956aa3b8ea878b38f17ca3651/CMakeLists.txt#L20-L31
Ah... actually, they should be defined regardless of the option?
Lines 24 to 26 in b5b8059
# Always build the C bindings, since the API is available apart from | |
# actually building the compiler. | |
add_subdirectory(bindings/c) |
So maybe your comment is getting closer to the actual problem:
iree/compiler/bindings/c/CMakeLists.txt
Lines 31 to 32 in b5b8059
DEPS | |
IREELLVMIncludeSetup |
Lines 816 to 829 in b5b8059
# Also add a library that can be depended on to get LLVM includes setup | |
# properly. bazel_to_cmake targets this for some header only pseudo deps. | |
add_library(IREELLVMIncludeSetup INTERFACE) | |
foreach(_d ${LLVM_INCLUDE_DIRS} ${MLIR_INCLUDE_DIRS} ${LLD_INCLUDE_DIRS}) | |
# BUILD_INTERFACE only works one at a time. | |
target_include_directories(IREELLVMIncludeSetup INTERFACE | |
$<BUILD_INTERFACE:${_d}> | |
) | |
endforeach() | |
iree_install_targets( | |
TARGETS IREELLVMIncludeSetup | |
COMPONENT IREEPublicLibraries-Compiler | |
EXPORT_SET Compiler | |
) |
Maybe we should refactor that somehow.
Anyways, considering the current support status for the integrations/pjrt code, I'm sorta comfortable with just flipping -DIREE_BUILD_COMPILER=ON
for now and leaving refactoring so that it isn't required later. As long as the build is actually fast and doesn't build LLVM/MLIR targets (just the IREE compiler C API) then we're in a good spot. Keeping the build option off is a good way to enforce that things stay that way though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, without IREELLVMIncludeSetup the linker will get abort with missing libs while building the PJRT dylibs. And seems that the IREE compiler C binding headers requires some mlir-c headers to be available (something like MlirAttribute
, MlirContext
appear in func sigs in the binding). I think maybe we can make IREELLVMIncludeSetup available regardless of whether IREE_BUILD_COMPILER is ON? since it looks only a interface target composed of header dirs. After all maybe we can do this later : )
Signed-off-by: PragmaTwice <twice@apache.org>
78930a8
to
6f59229
Compare
Hi @ScottTodd , thanks for your kindly review. Learn a lot. I've refactored it to a separate PJRT build & test workflow now, feel free to give suggestions : ) Currently only the CPU plugin is enabled in CI (but the workflow is extensible itself, can just uncomment some matrix item) since:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes! The structure of this looks good to me now. Just a few small comments and then getting the test to pass.
build_tools/cmake/run_jax_tests.sh
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this script isn't doing anything with CMake, we could move it to https://github.com/iree-org/iree/tree/main/build_tools/testing, or put it under integrations/pjrt
.
Should probably do the same for https://github.com/iree-org/iree/blob/main/build_tools/cmake/run_tf_tests.sh (that started next to a build_tools/cmake/build_tf_binaries.sh
)
I'd actually prefer if the tests could just be run via workflow code like
- name: Run tests
env:
JAX_PLATFORMS: iree_${{ matrix.pjrt_platform }}
run: |
source ${VENV_DIR}/bin/activate
pytest integrations/pjrt
Starting with a script that runs something very specific is fine though. I just wouldn't want the script to grow too much in complexity. Aim for workflows to match user docs (e.g. https://github.com/iree-org/iree/tree/main/integrations/pjrt#running-the-jax-test-suite), and for both to be simple
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree that it can at least be pure-python, without shell scripts.
I did try to run the JAX test suites (e.g. the nn_test.py
, also core_test.py
array_test.py
inside jax/tests
), but I found that some test cases inside these tests will fail in iree PJRT mode. Added some notes here:
iree/build_tools/cmake/run_jax_tests.sh
Lines 49 to 52 in 6a4b5f5
# FIXME: we can also utilize the native test cases from JAX, | |
# e.g. `tests/nn_test.py` from the JAX repo, as below, | |
# but currently some test cases in this file will fail. | |
# NOTE that `absl-py` is required to run these tests. |
So I then try to switch to some more simple tests with a differential testing manner (here). And I'm currently not sure if it's easy to do that in pytest. Do you have suggestions for this?
I'll first move the script to the right place.
Update the copyright year Co-authored-by: Scott Todd <scott.todd0@gmail.com> Remove IREE_VMVX_DISABLE Co-authored-by: Scott Todd <scott.todd0@gmail.com> Install numpy<2 Signed-off-by: PragmaTwice <twice@apache.org>
54fc60c
to
6a4b5f5
Compare
I can also send you an invite to iree-org so the checks will run automatically for your PRs if you want. |
Thank you for your quick actions!
Ahh thank you. That would be great. I'll accept the invitation : ) |
No problem. Caught me just before making dinner 😀 . Invite sent. Docs for access levels are at https://iree.dev/developers/general/contributing/#obtaining-commit-access |
Signed-off-by: PragmaTwice <twice@apache.org>
89ff876
to
90d3022
Compare
Signed-off-by: PragmaTwice <twice@apache.org>
Signed-off-by: PragmaTwice <twice@apache.org>
Signed-off-by: PragmaTwice <twice@apache.org>
Signed-off-by: PragmaTwice <twice.mliu@gmail.com>
# FIXME: due to #19223, we need to use jax no higher than 0.4.20, | ||
# but in such version of jax, 'stablehlo.broadcast_in_dim' op | ||
# will be emitted without attribute 'broadcast_dimensions', | ||
# which leads to an error in IREE PJRT plugin. | ||
# So currently any program with broadcast will fail, | ||
# e.g. test/test_simple.py. | ||
# After #19223 is fixed, we can uncomment the line below. | ||
|
||
# diff_jax_test test/test_simple.py | ||
|
||
diff_jax_test test/test_add.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the FIXME here, in JAX 0.4.20, 'stablehlo.broadcast_in_dim' op will be emitted without attribute 'broadcast_dimensions', which will make the IREE fail to compile. So currently some existing tests like integerations/pjrt/test/test_simple.py
cannot pass.
I add a new (and simpler) test test_add.py
instead of running test_simple.py
. And I observed that after #19223 is fixed (e.g. via the patch #19241) and JAX is updated to the latest version, this error will disappear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Good to know that rolling the versions forward will enable more tests to pass.
StableHLO does have a mechanism to actually be "stable" within a support window - using VHLO ("Versioned StableHLO"):
- https://github.com/openxla/stablehlo/blob/main/docs/compatibility.md
- https://github.com/openxla/stablehlo/blob/main/docs/vhlo.md
It's been a while since I've looked at the specifics across projects, but if JAX can emit VHLO instead of StableHLO then usage like this should be more robust. Since 4b9755f, IREE's import pipeline has a "deserialize VHLO to StableHLO" pass that runs close to the start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Thank you!
# FIXME: due to #19223, we need to use jax no higher than 0.4.20, | ||
# but in such version of jax, 'stablehlo.broadcast_in_dim' op | ||
# will be emitted without attribute 'broadcast_dimensions', | ||
# which leads to an error in IREE PJRT plugin. | ||
# So currently any program with broadcast will fail, | ||
# e.g. test/test_simple.py. | ||
# After #19223 is fixed, we can uncomment the line below. | ||
|
||
# diff_jax_test test/test_simple.py | ||
|
||
diff_jax_test test/test_add.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Good to know that rolling the versions forward will enable more tests to pass.
StableHLO does have a mechanism to actually be "stable" within a support window - using VHLO ("Versioned StableHLO"):
- https://github.com/openxla/stablehlo/blob/main/docs/compatibility.md
- https://github.com/openxla/stablehlo/blob/main/docs/vhlo.md
It's been a while since I've looked at the specifics across projects, but if JAX can emit VHLO instead of StableHLO then usage like this should be more robust. Since 4b9755f, IREE's import pipeline has a "deserialize VHLO to StableHLO" pass that runs close to the start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha! Our new warning for CPU performance (#18561) showed up here too. FYI @bjacob
Fri, 22 Nov 2024 05:48:05 GMT
/home/runner/work/iree/iree/integrations/pjrt/test/test_add.py:9:6: warning: while creating CPU target:
Fri, 22 Nov 2024 05:48:05 GMT
Defaulting to targeting a generic CPU for the target architecture will result in poor performance. Please specify a target CPU and/or a target CPU feature set. If it is intended to target a generic CPU, specify "generic" as the CPU.
As PJRT is acting as a JIT, compiling on the same machine that the code will run on, we likely want to include the --iree-llvmcpu-target-cpu=host
flag here:
iree/integrations/pjrt/src/iree_pjrt/cpu/client.cc
Lines 97 to 99 in 12476d9
bool CPUClientInstance::SetDefaultCompilerFlags(CompilerJob* compiler_job) { | |
return compiler_job->SetFlag("--iree-hal-target-backends=llvm-cpu"); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I also notice that but forget to mention here. Also I'm available to submit a patch for it : )
This changes are for adding the build workflow for `iree-pjrt-plugin-*` packages in `build_tools/pkgci` scripts. Also, it enables the build of package `iree-pjrt-plugin-cpu` in Github Actions (pkgci.yaml). Some other changes are for fixing some build problems: - ninja is missing from pyproject.toml - IREE_BUILD_COMPILER need to be enabled to make IREELLVMIncludeSetup available It closes iree-org#19221. ci-exactly: build_packages, test_pjrt --------- Signed-off-by: PragmaTwice <twice@apache.org> Signed-off-by: PragmaTwice <twice.mliu@gmail.com>
This changes are for adding the build workflow for `iree-pjrt-plugin-*` packages in `build_tools/pkgci` scripts. Also, it enables the build of package `iree-pjrt-plugin-cpu` in Github Actions (pkgci.yaml). Some other changes are for fixing some build problems: - ninja is missing from pyproject.toml - IREE_BUILD_COMPILER need to be enabled to make IREELLVMIncludeSetup available It closes iree-org#19221. ci-exactly: build_packages, test_pjrt --------- Signed-off-by: PragmaTwice <twice@apache.org> Signed-off-by: PragmaTwice <twice.mliu@gmail.com> Signed-off-by: Giacomo Serafini <179146510+giacs-epic@users.noreply.github.com>
As mentioned in #19222 (comment), since PJRT is acting as a JIT compiler, we can always expect that the compilation and execution is in the same machine. So we can add `target-cpu=host` to it for performance. ci-exactly: build_packages, test_pjrt Signed-off-by: PragmaTwice <twice@apache.org>
This changes are for adding the build workflow for
iree-pjrt-plugin-*
packages inbuild_tools/pkgci
scripts.Also, it enables the build of package
iree-pjrt-plugin-cpu
in Github Actions (pkgci.yaml).Some other changes are for fixing some build problems:
It closes #19221.
ci-exactly: build_packages, test_pjrt