-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-kernel tests (plus a few others) don't work with driver date 20240819 #700
Comments
Could you call out specific tests... They might not all get resolved in one go. |
for name in [
"two_matmul_switching",
"matmul_f32_8_8_4",
"matmul_f32_8_4_8",
]:
aie_vs_llvm_cpu(config, test_files_dir / f"{name}.mlir")
aie_vs_llvm_cpu(
config,
test_files_dir / "three_matmuls.mlir",
function_name="three_$mm$",
)
# Test(s) of the form matmul(A,B) where A:MxK, B:KxN
test_name = output_dir / "test_from_template.mlir"
template_name = matmul_template_dir / "matmul_MxK_KxN.mlir"
generate_matmul_test(test_name, template_name, 32, 32, 64, "bf16", "f32")
aie_vs_llvm_cpu(config, test_name)
# Test(s) of the form matmul(A,B) + C where A:MxK, B:KxN, C:N
test_name = output_dir / "test_from_template_bias_N.mlir"
template_name = matmul_template_dir / "matmul_bias_MxK_KxN_N.mlir"
generate_matmul_test(
test_name, template_name, 1024, 1024, 512, "bf16", "f32"
)
if config.vitis_dir:
aie_vs_llvm_cpu(
config, test_name, tile_pipeline="pack-peel", use_ukernel=True
)
aie_vs_llvm_cpu(
config, test_name, tile_pipeline="pack-peel", use_ukernel=False
) |
Note, I believe I did do the whole stratification thing (some on, some off) and they indeed all do fail independtly of each other. But I'll do that again to be sure. |
Btw, locally I have debug turned on for my driver (linux) and I see this for every test:
|
On the new driver, for
|
Inside iree_hal_xrt_direct_command_buffer_dispatch:
prints
So seems |
static bool runOnce;
static iree_status_t iree_hal_xrt_direct_command_buffer_dispatch(...) {
...
if (runOnce)
return iree_ok_status();
run.start();
run.wait();
runOnce = true;
IREE_TRACE_ZONE_END(z0);
return iree_ok_status();
} gives
So no crash but not numerically correct. |
I guess their implementation of multiple kernels has changed (I vaguely remember Sonal discussing this during our on-site). |
For the multi-kernel test this is expected for the way it is implemented, if you have N calls in you test to M dispatches, you will see N |
The Windows E2E PR #689 is blocked because in order to get Windows to work I had to use a particular commit of XRT (required by XRT-MCDM). At that commit, the current linux driver (date 20240819) fails some tests while the older driver (such as the one on the sharkbox phoenix nuc) passes.
Possibly there's some connection to https://github.com/nod-ai/iree-amd-aie/blob/main/runtime/src/iree-amd-aie/driver/xrt/native_executable.cc#L270, which I had to
ifndef _WIN32
away on Windows (which @jtuyls added when battling numerics issues) but it's not clear since it's not only the multi-kernel tests that fail.The text was updated successfully, but these errors were encountered: