Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add build and test workflow for PJRT plugin in pkgci #19222

Merged
merged 8 commits into from
Nov 22, 2024

Conversation

PragmaTwice
Copy link
Member

@PragmaTwice PragmaTwice commented Nov 20, 2024

This changes are for adding the build workflow for iree-pjrt-plugin-* packages in build_tools/pkgci scripts.

Also, it enables the build of package iree-pjrt-plugin-cpu in Github Actions (pkgci.yaml).

Some other changes are for fixing some build problems:

  • ninja is missing from pyproject.toml
  • IREE_BUILD_COMPILER need to be enabled to make IREELLVMIncludeSetup available

It closes #19221.

ci-exactly: build_packages, test_pjrt

Signed-off-by: PragmaTwice <twice@apache.org>
Comment on lines 40 to 44
# Customize defaults.
option(IREE_BUILD_COMPILER "Disable compiler for runtime-library build" OFF)
# IREE_BUILD_COMPILER should be enabled to make target IREELLVMIncludeSetup available,
# which is required by PJRT dylib targets
option(IREE_BUILD_COMPILER "Enable compiler for runtime-library build" ON)
option(IREE_BUILD_SAMPLES "Disable samples for runtime-library build" OFF)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A full compiler build shouldn't be required but the logs are reassuring: https://github.com/iree-org/iree/actions/runs/11927773223/job/33275201028?pr=19222#step:9:17454 1 minute for that build, and only 463 build targets. Much better than a full compiler build with 7500 targets (upwards of 20 minutes on large CPU build machines, 4 hours+ on smaller runners).

Let's see where the PJRT dylib targets depend on IREELLVMIncludeSetup or other compiler details...

These deps point at the compiler headers and loader

iree_cc_library(
NAME
compiler
HDRS
"compiler.h"
SRCS
"hlo_partitioner.cc"
"iree_compiler.cc"
DEPS
::debugging
iree::compiler::bindings::c::headers
iree::compiler::bindings::c::loader
iree_pjrt::partitioner_api
iree_pjrt::partitioner_api::loader
PUBLIC
)

Those shouldn't need a full source build, but they do actually need this option defined to work (today). See also the notes in this template repository that uses the compiler API in a similar way: https://github.com/iree-org/iree-template-compiler-cmake/blob/10e7a3a3e7a08d0956aa3b8ea878b38f17ca3651/CMakeLists.txt#L20-L31

Ah... actually, they should be defined regardless of the option?

# Always build the C bindings, since the API is available apart from
# actually building the compiler.
add_subdirectory(bindings/c)

So maybe your comment is getting closer to the actual problem:

DEPS
IREELLVMIncludeSetup

iree/CMakeLists.txt

Lines 816 to 829 in b5b8059

# Also add a library that can be depended on to get LLVM includes setup
# properly. bazel_to_cmake targets this for some header only pseudo deps.
add_library(IREELLVMIncludeSetup INTERFACE)
foreach(_d ${LLVM_INCLUDE_DIRS} ${MLIR_INCLUDE_DIRS} ${LLD_INCLUDE_DIRS})
# BUILD_INTERFACE only works one at a time.
target_include_directories(IREELLVMIncludeSetup INTERFACE
$<BUILD_INTERFACE:${_d}>
)
endforeach()
iree_install_targets(
TARGETS IREELLVMIncludeSetup
COMPONENT IREEPublicLibraries-Compiler
EXPORT_SET Compiler
)

Maybe we should refactor that somehow.

Anyways, considering the current support status for the integrations/pjrt code, I'm sorta comfortable with just flipping -DIREE_BUILD_COMPILER=ON for now and leaving refactoring so that it isn't required later. As long as the build is actually fast and doesn't build LLVM/MLIR targets (just the IREE compiler C API) then we're in a good spot. Keeping the build option off is a good way to enforce that things stay that way though.

Copy link
Member Author

@PragmaTwice PragmaTwice Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, without IREELLVMIncludeSetup the linker will get abort with missing libs while building the PJRT dylibs. And seems that the IREE compiler C binding headers requires some mlir-c headers to be available (something like MlirAttribute, MlirContext appear in func sigs in the binding). I think maybe we can make IREELLVMIncludeSetup available regardless of whether IREE_BUILD_COMPILER is ON? since it looks only a interface target composed of header dirs. After all maybe we can do this later : )

build_tools/pkgci/build_linux_packages.sh Outdated Show resolved Hide resolved
build_tools/pkgci/build_linux_packages.sh Outdated Show resolved Hide resolved
@ScottTodd ScottTodd added infrastructure Relating to build systems, CI, or testing integrations/pjrt OpenXLA PJRT Integration Work labels Nov 20, 2024
Signed-off-by: PragmaTwice <twice@apache.org>
@PragmaTwice PragmaTwice changed the title Build packages for PJRT plugin in pkgci Add build and test workflow for PJRT plugin in pkgci Nov 21, 2024
@PragmaTwice
Copy link
Member Author

Hi @ScottTodd , thanks for your kindly review. Learn a lot.

I've refactored it to a separate PJRT build & test workflow now, feel free to give suggestions : )

Currently only the CPU plugin is enabled in CI (but the workflow is extensible itself, can just uncomment some matrix item) since:

  • for CUDA the runner looks not available (maybe WIP?)
  • for AMD sadly I don't have a device to verify on my local so I make it a TODO.

@ScottTodd ScottTodd self-requested a review November 21, 2024 13:41
Copy link
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! The structure of this looks good to me now. Just a few small comments and then getting the test to pass.

build_tools/cmake/run_jax_tests.sh Outdated Show resolved Hide resolved
.github/workflows/pkgci_test_pjrt.yml Outdated Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this script isn't doing anything with CMake, we could move it to https://github.com/iree-org/iree/tree/main/build_tools/testing, or put it under integrations/pjrt.

Should probably do the same for https://github.com/iree-org/iree/blob/main/build_tools/cmake/run_tf_tests.sh (that started next to a build_tools/cmake/build_tf_binaries.sh)

I'd actually prefer if the tests could just be run via workflow code like

      - name: Run tests
        env:
          JAX_PLATFORMS: iree_${{ matrix.pjrt_platform }}
        run: |
          source ${VENV_DIR}/bin/activate
          pytest integrations/pjrt

Starting with a script that runs something very specific is fine though. I just wouldn't want the script to grow too much in complexity. Aim for workflows to match user docs (e.g. https://github.com/iree-org/iree/tree/main/integrations/pjrt#running-the-jax-test-suite), and for both to be simple

Copy link
Member Author

@PragmaTwice PragmaTwice Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree that it can at least be pure-python, without shell scripts.

I did try to run the JAX test suites (e.g. the nn_test.py, also core_test.py array_test.py inside jax/tests), but I found that some test cases inside these tests will fail in iree PJRT mode. Added some notes here:

# FIXME: we can also utilize the native test cases from JAX,
# e.g. `tests/nn_test.py` from the JAX repo, as below,
# but currently some test cases in this file will fail.
# NOTE that `absl-py` is required to run these tests.

So I then try to switch to some more simple tests with a differential testing manner (here). And I'm currently not sure if it's easy to do that in pytest. Do you have suggestions for this?

I'll first move the script to the right place.

build_tools/cmake/run_jax_tests.sh Outdated Show resolved Hide resolved
Update the copyright year

Co-authored-by: Scott Todd <scott.todd0@gmail.com>

Remove IREE_VMVX_DISABLE

Co-authored-by: Scott Todd <scott.todd0@gmail.com>

Install numpy<2

Signed-off-by: PragmaTwice <twice@apache.org>
@ScottTodd
Copy link
Member

  • Edited the tags in the PR description to just ci-exactly: build_packages, test_pjrt since other workflows aren't affected by these changes.
  • Approved the workflow runs

I can also send you an invite to iree-org so the checks will run automatically for your PRs if you want.

@PragmaTwice
Copy link
Member Author

Thank you for your quick actions!

I can also send you an invite to iree-org so the checks will run automatically for your PRs if you want.

Ahh thank you. That would be great. I'll accept the invitation : )

@ScottTodd
Copy link
Member

No problem. Caught me just before making dinner 😀 . Invite sent. Docs for access levels are at https://iree.dev/developers/general/contributing/#obtaining-commit-access

Signed-off-by: PragmaTwice <twice@apache.org>
PragmaTwice and others added 4 commits November 22, 2024 11:10
Signed-off-by: PragmaTwice <twice@apache.org>
Signed-off-by: PragmaTwice <twice@apache.org>
Signed-off-by: PragmaTwice <twice@apache.org>
Signed-off-by: PragmaTwice <twice.mliu@gmail.com>
Comment on lines +47 to +57
# FIXME: due to #19223, we need to use jax no higher than 0.4.20,
# but in such version of jax, 'stablehlo.broadcast_in_dim' op
# will be emitted without attribute 'broadcast_dimensions',
# which leads to an error in IREE PJRT plugin.
# So currently any program with broadcast will fail,
# e.g. test/test_simple.py.
# After #19223 is fixed, we can uncomment the line below.

# diff_jax_test test/test_simple.py

diff_jax_test test/test_add.py
Copy link
Member Author

@PragmaTwice PragmaTwice Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the FIXME here, in JAX 0.4.20, 'stablehlo.broadcast_in_dim' op will be emitted without attribute 'broadcast_dimensions', which will make the IREE fail to compile. So currently some existing tests like integerations/pjrt/test/test_simple.py cannot pass.

I add a new (and simpler) test test_add.py instead of running test_simple.py. And I observed that after #19223 is fixed (e.g. via the patch #19241) and JAX is updated to the latest version, this error will disappear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Good to know that rolling the versions forward will enable more tests to pass.

StableHLO does have a mechanism to actually be "stable" within a support window - using VHLO ("Versioned StableHLO"):

It's been a while since I've looked at the specifics across projects, but if JAX can emit VHLO instead of StableHLO then usage like this should be more robust. Since 4b9755f, IREE's import pipeline has a "deserialize VHLO to StableHLO" pass that runs close to the start.

Copy link
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Thank you!

Comment on lines +47 to +57
# FIXME: due to #19223, we need to use jax no higher than 0.4.20,
# but in such version of jax, 'stablehlo.broadcast_in_dim' op
# will be emitted without attribute 'broadcast_dimensions',
# which leads to an error in IREE PJRT plugin.
# So currently any program with broadcast will fail,
# e.g. test/test_simple.py.
# After #19223 is fixed, we can uncomment the line below.

# diff_jax_test test/test_simple.py

diff_jax_test test/test_add.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Good to know that rolling the versions forward will enable more tests to pass.

StableHLO does have a mechanism to actually be "stable" within a support window - using VHLO ("Versioned StableHLO"):

It's been a while since I've looked at the specifics across projects, but if JAX can emit VHLO instead of StableHLO then usage like this should be more robust. Since 4b9755f, IREE's import pipeline has a "deserialize VHLO to StableHLO" pass that runs close to the start.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha! Our new warning for CPU performance (#18561) showed up here too. FYI @bjacob

https://github.com/iree-org/iree/actions/runs/11967082438/job/33364009615?pr=19222#step:7:34

Fri, 22 Nov 2024 05:48:05 GMT
/home/runner/work/iree/iree/integrations/pjrt/test/test_add.py:9:6: warning: while creating CPU target: 
Fri, 22 Nov 2024 05:48:05 GMT
Defaulting to targeting a generic CPU for the target architecture will result in poor performance. Please specify a target CPU and/or a target CPU feature set. If it is intended to target a generic CPU, specify "generic" as the CPU.

As PJRT is acting as a JIT, compiling on the same machine that the code will run on, we likely want to include the --iree-llvmcpu-target-cpu=host flag here:

bool CPUClientInstance::SetDefaultCompilerFlags(CompilerJob* compiler_job) {
return compiler_job->SetFlag("--iree-hal-target-backends=llvm-cpu");
}

Copy link
Member Author

@PragmaTwice PragmaTwice Nov 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I also notice that but forget to mention here. Also I'm available to submit a patch for it : )

@ScottTodd ScottTodd merged commit 38d8d0a into iree-org:main Nov 22, 2024
25 checks passed
Groverkss pushed a commit to Groverkss/iree that referenced this pull request Dec 1, 2024
This changes are for adding the build workflow for `iree-pjrt-plugin-*`
packages in `build_tools/pkgci` scripts.

Also, it enables the build of package `iree-pjrt-plugin-cpu` in Github
Actions (pkgci.yaml).

Some other changes are for fixing some build problems:
- ninja is missing from pyproject.toml
- IREE_BUILD_COMPILER need to be enabled to make IREELLVMIncludeSetup
available

It closes iree-org#19221.

ci-exactly: build_packages, test_pjrt

---------

Signed-off-by: PragmaTwice <twice@apache.org>
Signed-off-by: PragmaTwice <twice.mliu@gmail.com>
giacs-epic pushed a commit to giacs-epic/iree that referenced this pull request Dec 4, 2024
This changes are for adding the build workflow for `iree-pjrt-plugin-*`
packages in `build_tools/pkgci` scripts.

Also, it enables the build of package `iree-pjrt-plugin-cpu` in Github
Actions (pkgci.yaml).

Some other changes are for fixing some build problems:
- ninja is missing from pyproject.toml
- IREE_BUILD_COMPILER need to be enabled to make IREELLVMIncludeSetup
available

It closes iree-org#19221.

ci-exactly: build_packages, test_pjrt

---------

Signed-off-by: PragmaTwice <twice@apache.org>
Signed-off-by: PragmaTwice <twice.mliu@gmail.com>
Signed-off-by: Giacomo Serafini <179146510+giacs-epic@users.noreply.github.com>
ScottTodd pushed a commit that referenced this pull request Dec 9, 2024
As mentioned in
#19222 (comment),
since PJRT is acting as a JIT compiler, we can always expect that the
compilation and execution is in the same machine. So we can add
`target-cpu=host` to it for performance.

ci-exactly: build_packages, test_pjrt

Signed-off-by: PragmaTwice <twice@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure Relating to build systems, CI, or testing integrations/pjrt OpenXLA PJRT Integration Work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add CI workflow for PJRT plugin
2 participants