Slice op for NC4HW4 input tensor does not respect C4 data layout #2967

h6197627 · 2024-07-24T16:42:50Z

MNN: 2.9.3
API: C++

I encountered problem that Slicing op does not respect data layout for models converted from Caffe framework (NC4HW4).
Here is simple Caffe prototxt

layer {
	type: "Input"
	name: "input"
	top: "input"
	input_param {
		shape {
			dim: 1
			dim: 8
			dim: 6
			dim: 6
		}
	}
}
layer {
	type: "Slice"
	name: "slice"
	bottom: "input"
	top: "output1"
	top: "output2"
	top: "output3"
	top: "output4"
	slice_param {
		slice_point: 1
		slice_point: 5
		slice_point: 7
	}
}

It slices 1x8x6x6 tensor into 4 tensors 1x1x6x6, 1x4x6x6, 1x2x6x6, 1x1x6x6
After converting it with MNNConvert tool it turned out that 4th channel of the second slice contains wrong values.
After debugging MNN code I found that it probably happens because data format for Caffe models is NC4HW4, so input data is packed into two 1x4x6x6 chunks and the second slice should take 3 channels from the first chunk and forth channel from the second chunk.
However, when it goes to final method that does data copying MNNTranspose32Bit it does not care about C4 data chunking.
For this particular model I tried to add quick hack after this line:

if (i == 3)
{
    si = srcO + 143;
}

to force proceeding to next C4 chunk and results are as expected.
Interestingly, the temp output Tensor in CPURaster has NCHW format that is converted to NC4HW4 as the last step. Maybe it is forgotten to convert input tensor to NCHW layout before slicing.

Also, I tried to convert ONNX model that does the same and it works normally. However, debugging shows that it uses different tensor layout internally: NCHW.

I attach sample models (slice_caffe.mnn - converted from Caffe, slice_onnx.mnn - converted from ONNX) and sample code (inference.cpp) that can be used to check the bug (it creates 1x8x6x6 tensor filled from 0 to 287 passes through slice model and checks that outputs are the same as input):
slice_caffe.mnn.zip
slice_onnx.mnn.zip
inference.cpp.zip

The text was updated successfully, but these errors were encountered:

jxt1234 · 2024-07-26T02:53:40Z

Can you use testMNNFromOnnx.py to reproduce the error?

jxt1234 · 2024-07-26T03:32:27Z

Reproduced, debuging...

jxt1234 · 2024-07-28T15:12:11Z

It's bug for region fuse, can modify this code to solve it.

--- a/source/core/TensorUtils.cpp
+++ b/source/core/TensorUtils.cpp
@@ -538,10 +538,24 @@ static bool _ClipDst(int* stride, int srcOffset, int dstOffset, const int* srcSi
if (0 != srcOffset || 0 != dstOffset) {
return false;
}

int srcMax = 0;
for (int i=0; i<sizeNum; ++i) {

   srcMax += srcSize[i] * stride[i];
   dstMin[i] = ALIMAX(0, -o[i]);
   dstMax[i] = ALIMIN(srcSize[i]-o[i], dstSize[i]);

}

// Check If dstMax is inside src, it means one region can't describe dst - src
// TODO: Support slice region to support fuse
for (int i=0; i<sizeNum; ++i) {
```
   if (dstMax[i] == dstSize[i]) {
```
```
       continue;
```
```
   }
```

   int bias = offsetBias + dstMax[i] * stride[i];

```
   if (bias < srcMax) {
```

       // for [dstMax, dstSize], exist value match formula

```
       return false;
```
```
   }
```
}

h6197627 · 2024-07-29T18:18:58Z

Thank you for the patch - it works!
May I ask you to help to port this fix to MNN v2.8.3? I am using it in my project and I can't easily change the lib's version. I tried to do it by myself, I guess it should be somewhere in ‎_RegionValid function, but the code is completely different between versions.
Also, should I close this issue or wait till it will be merged into repository code?

jxt1234 · 2024-08-08T08:46:32Z

For easy you can close region fuse:

diff --git a/source/core/TensorUtils.cpp b/source/core/TensorUtils.cpp
index 104a9556..74bc0339 100644
--- a/source/core/TensorUtils.cpp
+++ b/source/core/TensorUtils.cpp
@@ -485,6 +485,7 @@ static bool _RegionValid(int* stride, int offset, int* size, int sizeNum, size_t

// fuse srcRegion and dstRegion to dstRegion if return true
bool TensorUtils::fuseRegion(Tensor::InsideDescribe::Region& srcReg, Tensor::InsideDescribe::Region& dstReg) {

return false;
// src data isnot full data of dst
if (srcReg.dst.offset > dstReg.src.offset ||
srcReg.dst.stride[1] > srcReg.size[2] ||

h6197627 · 2024-08-18T16:25:41Z

Well, it works, but unfortunately networks with slice op that were unaffected by this bug (slice channels count is multiple of 4) has performance degradation ~15%

jxt1234 · 2024-08-24T07:23:43Z

Best way is to update...

jxt1234 added the question Further information is requested label Jul 26, 2024

jxt1234 added bug Something isn't working and removed question Further information is requested labels Jul 26, 2024

jxt1234 closed this as completed Aug 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slice op for NC4HW4 input tensor does not respect C4 data layout #2967

Slice op for NC4HW4 input tensor does not respect C4 data layout #2967

h6197627 commented Jul 24, 2024 •

edited

Loading

jxt1234 commented Jul 26, 2024

jxt1234 commented Jul 26, 2024

jxt1234 commented Jul 28, 2024

h6197627 commented Jul 29, 2024

jxt1234 commented Aug 8, 2024

h6197627 commented Aug 18, 2024

jxt1234 commented Aug 24, 2024

Slice op for NC4HW4 input tensor does not respect C4 data layout #2967

Slice op for NC4HW4 input tensor does not respect C4 data layout #2967

Comments

h6197627 commented Jul 24, 2024 • edited Loading

jxt1234 commented Jul 26, 2024

jxt1234 commented Jul 26, 2024

jxt1234 commented Jul 28, 2024

h6197627 commented Jul 29, 2024

jxt1234 commented Aug 8, 2024

h6197627 commented Aug 18, 2024

jxt1234 commented Aug 24, 2024

h6197627 commented Jul 24, 2024 •

edited

Loading