Fail case when running caffe opencl branch with isaac #12

listenlink · 2017-01-10T05:11:14Z

Hi,
I am running caffe opencl branch (https://github.com/BVLC/caffe/tree/opencl) with isaac master branch on INTEL BROADWELL platform with below command:
./build/test/test.testbin --gtest_filter=NetTest/2.TestLossWeight, that will bring fail.
While comment out the line 94 to 141 on file https://github.com/ptillet/isaac/blob/master/lib/runtime/profiles.cpp, that will pass the test case.

Can you reproduce the fail case, seems there is some problem with the copy operation on predict_ logic?

The text was updated successfully, but these errors were encountered:

ptillet · 2017-01-10T10:24:46Z

This all makes sense! The idea is that I ended up choosing top-5 to increase the prediction accuracy. But if you benchmark multiple kernels that modify one of their inputs, then you need to be careful to copy back the result. Can you confirm that setting N_TOP=1 resolves the issue?

listenlink · 2017-01-10T12:21:36Z

Hi, it still failed when setting N_TOP=1

gongzg · 2017-01-12T02:03:18Z

@ptillet could you reproduce this issue at your BDW machine?

ptillet · 2017-01-12T04:44:40Z

Yes, I will try this tomorrow. I suspect it should come from copying back the output's result. Perhaps a problem with queues. In the meantime, could you try N_TOP=1 and set modify_output = false ?

listenlink · 2017-01-12T08:58:40Z

Sorry, I still failed when setting N_TOP=1 and modify_output = false on my side

ptillet · 2017-01-12T13:30:41Z

Haha. That sounds pretty bad! I'll definitely take a look at it.

ptillet · 2017-01-14T20:32:14Z

Some updates: I've found a bunch of other bugs when trying to reproduce the issue with caffe. I've fixed some things, and I'm tuning ISAAC for intel's latest driver and double-precision on broadwell. I hope to have everything up and running by the end of the week-end.

ptillet · 2017-01-15T08:01:18Z

Could you try the latest master? It should not only fix bugs for clCaffe, but also add double-precision support and performance improvements with the latest Intel OpenCL 2.0 driver.

Caffe seems to work fine with ISAAC on my BROADWELL machine.

gongzg · 2017-01-16T09:54:14Z

@ptillet I tried the latest master with clcaffe. The test suite will not crash now, but still has some failures:

[ FAILED ] SGDSolverTest/2.TestSnapshot, where TypeParam = caffe::GPUDevice
[ FAILED ] SGDSolverTest/2.TestLeastSquaresUpdateWithEverything, where TypeParam = caffe::GPUDevice
[ FAILED ] SGDSolverTest/2.TestLeastSquaresUpdateWithWeightDecayMultiIter, where TypeParam = caffe::GPUDevice
[ FAILED ] SGDSolverTest/2.TestLeastSquaresUpdateWithEverythingAccum, where TypeParam = caffe::GPUDevice
[ FAILED ] AdaGradSolverTest/2.TestSnapshot, where TypeParam = caffe::GPUDevice
[ FAILED ] NesterovSolverTest/2.TestNesterovLeastSquaresUpdateWithEverything, where TypeParam = caffe::GPUDevice
[ FAILED ] AdaDeltaSolverTest/2.TestLeastSquaresUpdateWithEverythingAccum, where TypeParam = caffe::GPUDevice
[ FAILED ] AdaDeltaSolverTest/2.TestSnapshot, where TypeParam = caffe::GPUDevice
[ FAILED ] AdamSolverTest/2.TestSnapshot, where TypeParam = caffe::GPUDevice
[ FAILED ] AdamSolverTest/2.TestAdamLeastSquaresUpdateWithEverything, where TypeParam = caffe::GPUDevice
[ FAILED ] RMSPropSolverTest/2.TestRMSPropLeastSquaresUpdateWithEverything, where TypeParam = caffe::GPUDevice
[ FAILED ] RMSPropSolverTest/2.TestSnapshot, where TypeParam = caffe::GPUDevice
[ FAILED ] InnerProductLayerTest/2.TestGradientTranspose, where TypeParam = caffe::GPUDevice

And if I ran some of the test cases directly, such as :
test/test.testbin --gtest_filter=SGDSolverTest/2.TestLeastSquaresUpdateWithEverything
It may fail or success at random manner. But If I use the old version:

commit 6ac5e1f55b1cae59394758f823d5c58f57ca561d
Author: Philippe Tillet <ptillet@g.harvard.edu>
Date:   Fri Jan 1 05:44:28 2016 -0500

    Templates/Reduce1D: now properly loading 2D scalars, it always passes all of the float gpu test cases.

It always passes all GPU float type testing.

ptillet · 2017-01-16T13:11:27Z

I see, thanks. I'm also having issues with some tests passing when called individually but failing when running the entire test suite. I'll solve this ASAP.

ptillet · 2017-01-18T01:03:56Z

Sometimes, ISAAC's GEMM uses two kernels. The event returned by clBlasSgemm always corresponded to the first one, which led to some synchronization issues. Setting label=0 solved the problem because it forced the library to use only one kernel. Fixed in f226837.
Could you retry now?

gongzg · 2017-01-18T05:21:15Z

@ptillet Nice catch, and it works great with clcaffe now. Thanks for your quick fix!

ptillet · 2017-01-18T06:50:35Z

That's good to hear :)

* Update * Update * Update * Update

When running [convert_blocked1d_to_slice0](https://github.com/triton-lang/triton/blob/0ba5f0c3cd029d5c3d1f01b9bf29dac32c27345e/test/Conversion/tritongpu_to_llvm.mlir#L924) Triton ends up computing a rank of a matrix with 0 columns during linear layout lowering, which trips up f2reduce, and causes undefined behavior, detectable through [UBSAN](https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html). Fix this by returning the rank (0) early in these cases, without calling f2reduce. <details><summary>Stack trace</summary> <p> ``` third_party/triton/third_party/f2reduce/f2reduce.cpp:421:30: runtime error: shift exponent 18446744073709551615 is too large for 64-bit type 'unsigned long long' #0 0x556ee2fea3be in inplace_rref_small third_party/triton/third_party/f2reduce/f2reduce.cpp:421:30 #1 0x556ee2fea3be in f2reduce::inplace_rref_strided(unsigned long*, unsigned long, unsigned long, unsigned long) third_party/triton/third_party/f2reduce/f2reduce.cpp:470:9 #2 0x556ee2ea70da in getMatrixRank third_party/triton/lib/Tools/LinearLayout.cpp:125:3 #3 0x556ee2ea70da in mlir::triton::LinearLayout::checkInvariants(bool) third_party/triton/lib/Tools/LinearLayout.cpp:299:7 #4 0x556ee2ea656d in mlir::triton::LinearLayout::tryCreate(llvm::MapVector<mlir::StringAttr, std::__u::vector<std::__u::vector<int, std::__u::allocator<int>>, std::__u::allocator<std::__u::vector<int, std::__u::allocator<int>>>>, llvm::DenseMap<mlir::StringAttr, unsigned int, llvm::DenseMapInfo<mlir::StringAttr, void>, llvm::detail::DenseMapPair<mlir::StringAttr, unsigned int>>, llvm::SmallVector<std::__u::pair<mlir::StringAttr, std::__u::vector<std::__u::vector<int, std::__u::allocator<int>>, std::__u::allocator<std::__u::vector<int, std::__u::allocator<int>>>>>, 0u>>, llvm::ArrayRef<std::__u::pair<mlir::StringAttr, int>>, bool) third_party/triton/lib/Tools/LinearLayout.cpp:190:41 #5 0x556ee2eb2150 in mlir::triton::LinearLayout::divideRight(mlir::triton::LinearLayout const&) third_party/triton/lib/Tools/LinearLayout.cpp:654:51 #6 0x556ee2ee1c39 in mlir::cvtNeedsSharedMemory(mlir::RankedTensorType, mlir::RankedTensorType) third_party/triton/lib/Analysis/Utility.cpp:652:14 #7 0x556ee2cf38fd in mlir::triton::getRepShapeForCvtLayout(mlir::triton::gpu::ConvertLayoutOp) third_party/triton/lib/Analysis/Allocation.cpp:66:8 #8 0x556ee2cf3efa in mlir::triton::getScratchConfigForCvtLayout(mlir::triton::gpu::ConvertLayoutOp, unsigned int&, unsigned int&) third_party/triton/lib/Analysis/Allocation.cpp:95:19 #9 0x556ee2cf6057 in mlir::triton::AllocationAnalysis::getScratchValueSize(mlir::Operation*) third_party/triton/lib/Analysis/Allocation.cpp:272:24 #10 0x556ee2cf5499 in operator() third_party/triton/lib/Analysis/Allocation.cpp:343:7 #11 0x556ee2cf5499 in void llvm::function_ref<void (mlir::Operation*)>::callback_fn<mlir::triton::AllocationAnalysis::getValuesAndSizes()::'lambda'(mlir::Operation*)>(long, mlir::Operation*) third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12 #12 0x556edeeee7a9 in operator() third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12 #13 0x556edeeee7a9 in void mlir::detail::walk<mlir::ForwardIterator>(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>, mlir::WalkOrder) third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:174:5 #14 0x556edeeee87c in void mlir::detail::walk<mlir::ForwardIterator>(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>, mlir::WalkOrder) third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:182:9 #15 0x556ee2cf49e7 in walk<(mlir::WalkOrder)0, mlir::ForwardIterator, (lambda at third_party/triton/lib/Analysis/Allocation.cpp:341:42), mlir::Operation *, void> third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:313:10 #16 0x556ee2cf49e7 in walk<(mlir::WalkOrder)0, mlir::ForwardIterator, (lambda at third_party/triton/lib/Analysis/Allocation.cpp:341:42), void> third_party/llvm/llvm-project/mlir/include/mlir/IR/Operation.h:794:12 #17 0x556ee2cf49e7 in mlir::triton::AllocationAnalysis::getValuesAndSizes() third_party/triton/lib/Analysis/Allocation.cpp:341:16 #18 0x556ee2cf4852 in run third_party/triton/lib/Analysis/Allocation.cpp:182:5 #19 0x556ee2cf4852 in AllocationAnalysis third_party/triton/lib/Analysis/Allocation.cpp:169:5 #20 0x556ee2cf4852 in mlir::Allocation::run(llvm::DenseMap<mlir::FunctionOpInterface, mlir::Allocation, llvm::DenseMapInfo<mlir::FunctionOpInterface, void>, llvm::detail::DenseMapPair<mlir::FunctionOpInterface, mlir::Allocation>>&) third_party/triton/lib/Analysis/Allocation.cpp:627:3 #21 0x556ee1677402 in operator() third_party/triton/include/triton/Analysis/Allocation.h:227:26 #22 0x556ee1677402 in void mlir::CallGraph<mlir::Allocation>::doWalk<(mlir::WalkOrder)0, (mlir::WalkOrder)1, mlir::ModuleAllocation::ModuleAllocation(mlir::ModuleOp)::'lambda'(mlir::CallOpInterface, mlir::FunctionOpInterface), mlir::ModuleAllocation::ModuleAllocation(mlir::ModuleOp)::'lambda'(mlir::FunctionOpInterface)>(mlir::FunctionOpInterface, llvm::DenseSet<mlir::FunctionOpInterface, llvm::DenseMapInfo<mlir::FunctionOpInterface, void>>&, mlir::ModuleAllocation::ModuleAllocation(mlir::ModuleOp)::'lambda'(mlir::CallOpInterface, mlir::FunctionOpInterface), mlir::ModuleAllocation::ModuleAllocation(mlir::ModuleOp)::'lambda'(mlir::FunctionOpInterface)) third_party/triton/include/triton/Analysis/Utility.h:350:7 #23 0x556ee16756b3 in walk<(mlir::WalkOrder)0, (mlir::WalkOrder)1, (lambda at third_party/triton/include/triton/Analysis/Allocation.h:222:9), (lambda at third_party/triton/include/triton/Analysis/Allocation.h:224:9)> third_party/triton/include/triton/Analysis/Utility.h:242:7 #24 0x556ee16756b3 in mlir::ModuleAllocation::ModuleAllocation(mlir::ModuleOp) third_party/triton/include/triton/Analysis/Allocation.h:220:5 #25 0x556ee2c2bf18 in (anonymous namespace)::AllocateSharedMemory::runOnOperation() third_party/triton/lib/Conversion/TritonGPUToLLVM/AllocateSharedMemory.cpp:26:22 ... UndefinedBehaviorSanitizer: invalid-shift-exponent third_party/triton/third_party/f2reduce/f2reduce.cpp:421:30 ``` </p> </details>

Signed-off-by: Ilya Enkovich <ilya.enkovich@intel.com> Co-authored-by: Minjang Kim <minjang@meta.com>

When running [convert_blocked1d_to_slice0](https://github.com/triton-lang/triton/blob/0ba5f0c3cd029d5c3d1f01b9bf29dac32c27345e/test/Conversion/tritongpu_to_llvm.mlir#L924) Triton ends up computing a rank of a matrix with 0 columns during linear layout lowering, which trips up f2reduce, and causes undefined behavior, detectable through [UBSAN](https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html). Fix this by returning the rank (0) early in these cases, without calling f2reduce. <details><summary>Stack trace</summary> <p> ``` third_party/triton/third_party/f2reduce/f2reduce.cpp:421:30: runtime error: shift exponent 18446744073709551615 is too large for 64-bit type 'unsigned long long' #0 0x556ee2fea3be in inplace_rref_small third_party/triton/third_party/f2reduce/f2reduce.cpp:421:30 triton-lang#1 0x556ee2fea3be in f2reduce::inplace_rref_strided(unsigned long*, unsigned long, unsigned long, unsigned long) third_party/triton/third_party/f2reduce/f2reduce.cpp:470:9 triton-lang#2 0x556ee2ea70da in getMatrixRank third_party/triton/lib/Tools/LinearLayout.cpp:125:3 triton-lang#3 0x556ee2ea70da in mlir::triton::LinearLayout::checkInvariants(bool) third_party/triton/lib/Tools/LinearLayout.cpp:299:7 triton-lang#4 0x556ee2ea656d in mlir::triton::LinearLayout::tryCreate(llvm::MapVector<mlir::StringAttr, std::__u::vector<std::__u::vector<int, std::__u::allocator<int>>, std::__u::allocator<std::__u::vector<int, std::__u::allocator<int>>>>, llvm::DenseMap<mlir::StringAttr, unsigned int, llvm::DenseMapInfo<mlir::StringAttr, void>, llvm::detail::DenseMapPair<mlir::StringAttr, unsigned int>>, llvm::SmallVector<std::__u::pair<mlir::StringAttr, std::__u::vector<std::__u::vector<int, std::__u::allocator<int>>, std::__u::allocator<std::__u::vector<int, std::__u::allocator<int>>>>>, 0u>>, llvm::ArrayRef<std::__u::pair<mlir::StringAttr, int>>, bool) third_party/triton/lib/Tools/LinearLayout.cpp:190:41 triton-lang#5 0x556ee2eb2150 in mlir::triton::LinearLayout::divideRight(mlir::triton::LinearLayout const&) third_party/triton/lib/Tools/LinearLayout.cpp:654:51 triton-lang#6 0x556ee2ee1c39 in mlir::cvtNeedsSharedMemory(mlir::RankedTensorType, mlir::RankedTensorType) third_party/triton/lib/Analysis/Utility.cpp:652:14 triton-lang#7 0x556ee2cf38fd in mlir::triton::getRepShapeForCvtLayout(mlir::triton::gpu::ConvertLayoutOp) third_party/triton/lib/Analysis/Allocation.cpp:66:8 triton-lang#8 0x556ee2cf3efa in mlir::triton::getScratchConfigForCvtLayout(mlir::triton::gpu::ConvertLayoutOp, unsigned int&, unsigned int&) third_party/triton/lib/Analysis/Allocation.cpp:95:19 triton-lang#9 0x556ee2cf6057 in mlir::triton::AllocationAnalysis::getScratchValueSize(mlir::Operation*) third_party/triton/lib/Analysis/Allocation.cpp:272:24 triton-lang#10 0x556ee2cf5499 in operator() third_party/triton/lib/Analysis/Allocation.cpp:343:7 triton-lang#11 0x556ee2cf5499 in void llvm::function_ref<void (mlir::Operation*)>::callback_fn<mlir::triton::AllocationAnalysis::getValuesAndSizes()::'lambda'(mlir::Operation*)>(long, mlir::Operation*) third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12 triton-lang#12 0x556edeeee7a9 in operator() third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12 triton-lang#13 0x556edeeee7a9 in void mlir::detail::walk<mlir::ForwardIterator>(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>, mlir::WalkOrder) third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:174:5 triton-lang#14 0x556edeeee87c in void mlir::detail::walk<mlir::ForwardIterator>(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>, mlir::WalkOrder) third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:182:9 triton-lang#15 0x556ee2cf49e7 in walk<(mlir::WalkOrder)0, mlir::ForwardIterator, (lambda at third_party/triton/lib/Analysis/Allocation.cpp:341:42), mlir::Operation *, void> third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:313:10 triton-lang#16 0x556ee2cf49e7 in walk<(mlir::WalkOrder)0, mlir::ForwardIterator, (lambda at third_party/triton/lib/Analysis/Allocation.cpp:341:42), void> third_party/llvm/llvm-project/mlir/include/mlir/IR/Operation.h:794:12 triton-lang#17 0x556ee2cf49e7 in mlir::triton::AllocationAnalysis::getValuesAndSizes() third_party/triton/lib/Analysis/Allocation.cpp:341:16 triton-lang#18 0x556ee2cf4852 in run third_party/triton/lib/Analysis/Allocation.cpp:182:5 triton-lang#19 0x556ee2cf4852 in AllocationAnalysis third_party/triton/lib/Analysis/Allocation.cpp:169:5 triton-lang#20 0x556ee2cf4852 in mlir::Allocation::run(llvm::DenseMap<mlir::FunctionOpInterface, mlir::Allocation, llvm::DenseMapInfo<mlir::FunctionOpInterface, void>, llvm::detail::DenseMapPair<mlir::FunctionOpInterface, mlir::Allocation>>&) third_party/triton/lib/Analysis/Allocation.cpp:627:3 triton-lang#21 0x556ee1677402 in operator() third_party/triton/include/triton/Analysis/Allocation.h:227:26 triton-lang#22 0x556ee1677402 in void mlir::CallGraph<mlir::Allocation>::doWalk<(mlir::WalkOrder)0, (mlir::WalkOrder)1, mlir::ModuleAllocation::ModuleAllocation(mlir::ModuleOp)::'lambda'(mlir::CallOpInterface, mlir::FunctionOpInterface), mlir::ModuleAllocation::ModuleAllocation(mlir::ModuleOp)::'lambda'(mlir::FunctionOpInterface)>(mlir::FunctionOpInterface, llvm::DenseSet<mlir::FunctionOpInterface, llvm::DenseMapInfo<mlir::FunctionOpInterface, void>>&, mlir::ModuleAllocation::ModuleAllocation(mlir::ModuleOp)::'lambda'(mlir::CallOpInterface, mlir::FunctionOpInterface), mlir::ModuleAllocation::ModuleAllocation(mlir::ModuleOp)::'lambda'(mlir::FunctionOpInterface)) third_party/triton/include/triton/Analysis/Utility.h:350:7 triton-lang#23 0x556ee16756b3 in walk<(mlir::WalkOrder)0, (mlir::WalkOrder)1, (lambda at third_party/triton/include/triton/Analysis/Allocation.h:222:9), (lambda at third_party/triton/include/triton/Analysis/Allocation.h:224:9)> third_party/triton/include/triton/Analysis/Utility.h:242:7 triton-lang#24 0x556ee16756b3 in mlir::ModuleAllocation::ModuleAllocation(mlir::ModuleOp) third_party/triton/include/triton/Analysis/Allocation.h:220:5 triton-lang#25 0x556ee2c2bf18 in (anonymous namespace)::AllocateSharedMemory::runOnOperation() third_party/triton/lib/Conversion/TritonGPUToLLVM/AllocateSharedMemory.cpp:26:22 ... UndefinedBehaviorSanitizer: invalid-shift-exponent third_party/triton/third_party/f2reduce/f2reduce.cpp:421:30 ``` </p> </details>

Signed-off-by: Ilya Enkovich <ilya.enkovich@intel.com> Co-authored-by: Minjang Kim <minjang@meta.com>

ptillet closed this as completed Jan 18, 2017

goostavz pushed a commit to goostavz/triton that referenced this issue Aug 4, 2023

[TRANSFORM] Use scf.if for boundary checks (triton-lang#12)

e2fca48

* Update * Update * Update * Update

oraluben pushed a commit to oraluben/triton that referenced this issue Sep 11, 2024

Support tl.histogram for CPU. (triton-lang#12)

725ec53

Signed-off-by: Ilya Enkovich <ilya.enkovich@intel.com> Co-authored-by: Minjang Kim <minjang@meta.com>

gglin001 pushed a commit to gglin001/triton that referenced this issue Nov 13, 2024

Support tl.histogram for CPU. (triton-lang#12)

435e750

Signed-off-by: Ilya Enkovich <ilya.enkovich@intel.com> Co-authored-by: Minjang Kim <minjang@meta.com>

stephen-huan pushed a commit to stephen-huan/triton that referenced this issue Dec 24, 2024

Support tl.histogram for CPU. (triton-lang#12)

159fac4

Signed-off-by: Ilya Enkovich <ilya.enkovich@intel.com> Co-authored-by: Minjang Kim <minjang@meta.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail case when running caffe opencl branch with isaac #12

Fail case when running caffe opencl branch with isaac #12

listenlink commented Jan 10, 2017

ptillet commented Jan 10, 2017

listenlink commented Jan 10, 2017

gongzg commented Jan 12, 2017

ptillet commented Jan 12, 2017

listenlink commented Jan 12, 2017

ptillet commented Jan 12, 2017

ptillet commented Jan 14, 2017

ptillet commented Jan 15, 2017 •

edited

Loading

gongzg commented Jan 16, 2017

ptillet commented Jan 16, 2017

ptillet commented Jan 18, 2017 •

edited

Loading

gongzg commented Jan 18, 2017

ptillet commented Jan 18, 2017

Fail case when running caffe opencl branch with isaac #12

Fail case when running caffe opencl branch with isaac #12

Comments

listenlink commented Jan 10, 2017

ptillet commented Jan 10, 2017

listenlink commented Jan 10, 2017

gongzg commented Jan 12, 2017

ptillet commented Jan 12, 2017

listenlink commented Jan 12, 2017

ptillet commented Jan 12, 2017

ptillet commented Jan 14, 2017

ptillet commented Jan 15, 2017 • edited Loading

gongzg commented Jan 16, 2017

ptillet commented Jan 16, 2017

ptillet commented Jan 18, 2017 • edited Loading

gongzg commented Jan 18, 2017

ptillet commented Jan 18, 2017

ptillet commented Jan 15, 2017 •

edited

Loading

ptillet commented Jan 18, 2017 •

edited

Loading