Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slice op for NC4HW4 input tensor does not respect C4 data layout #2967

Closed
h6197627 opened this issue Jul 24, 2024 · 7 comments
Closed

Slice op for NC4HW4 input tensor does not respect C4 data layout #2967

h6197627 opened this issue Jul 24, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@h6197627
Copy link

h6197627 commented Jul 24, 2024

MNN: 2.9.3
API: C++

I encountered problem that Slicing op does not respect data layout for models converted from Caffe framework (NC4HW4).
Here is simple Caffe prototxt

layer {
	type: "Input"
	name: "input"
	top: "input"
	input_param {
		shape {
			dim: 1
			dim: 8
			dim: 6
			dim: 6
		}
	}
}
layer {
	type: "Slice"
	name: "slice"
	bottom: "input"
	top: "output1"
	top: "output2"
	top: "output3"
	top: "output4"
	slice_param {
		slice_point: 1
		slice_point: 5
		slice_point: 7
	}
}

It slices 1x8x6x6 tensor into 4 tensors 1x1x6x6, 1x4x6x6, 1x2x6x6, 1x1x6x6
After converting it with MNNConvert tool it turned out that 4th channel of the second slice contains wrong values.
After debugging MNN code I found that it probably happens because data format for Caffe models is NC4HW4, so input data is packed into two 1x4x6x6 chunks and the second slice should take 3 channels from the first chunk and forth channel from the second chunk.
However, when it goes to final method that does data copying MNNTranspose32Bit it does not care about C4 data chunking.
For this particular model I tried to add quick hack after this line:

if (i == 3)
{
    si = srcO + 143;
}

to force proceeding to next C4 chunk and results are as expected.
Interestingly, the temp output Tensor in CPURaster has NCHW format that is converted to NC4HW4 as the last step. Maybe it is forgotten to convert input tensor to NCHW layout before slicing.

Also, I tried to convert ONNX model that does the same and it works normally. However, debugging shows that it uses different tensor layout internally: NCHW.

I attach sample models (slice_caffe.mnn - converted from Caffe, slice_onnx.mnn - converted from ONNX) and sample code (inference.cpp) that can be used to check the bug (it creates 1x8x6x6 tensor filled from 0 to 287 passes through slice model and checks that outputs are the same as input):
slice_caffe.mnn.zip
slice_onnx.mnn.zip
inference.cpp.zip

@jxt1234 jxt1234 added the question Further information is requested label Jul 26, 2024
@jxt1234
Copy link
Collaborator

jxt1234 commented Jul 26, 2024

Can you use testMNNFromOnnx.py to reproduce the error?

@jxt1234
Copy link
Collaborator

jxt1234 commented Jul 26, 2024

Reproduced, debuging...

@jxt1234 jxt1234 added bug Something isn't working and removed question Further information is requested labels Jul 26, 2024
@jxt1234
Copy link
Collaborator

jxt1234 commented Jul 28, 2024

It's bug for region fuse, can modify this code to solve it.

--- a/source/core/TensorUtils.cpp
+++ b/source/core/TensorUtils.cpp
@@ -538,10 +538,24 @@ static bool _ClipDst(int* stride, int srcOffset, int dstOffset, const int* srcSi
if (0 != srcOffset || 0 != dstOffset) {
return false;
}

  • int srcMax = 0;
    for (int i=0; i<sizeNum; ++i) {
  •    srcMax += srcSize[i] * stride[i];
       dstMin[i] = ALIMAX(0, -o[i]);
       dstMax[i] = ALIMIN(srcSize[i]-o[i], dstSize[i]);
    
    }
  • // Check If dstMax is inside src, it means one region can't describe dst - src
  • // TODO: Support slice region to support fuse
  • for (int i=0; i<sizeNum; ++i) {
  •    if (dstMax[i] == dstSize[i]) {
    
  •        continue;
    
  •    }
    
  •    int bias = offsetBias + dstMax[i] * stride[i];
    
  •    if (bias < srcMax) {
    
  •        // for [dstMax, dstSize], exist value match formula
    
  •        return false;
    
  •    }
    
  • }

@h6197627
Copy link
Author

Thank you for the patch - it works!
May I ask you to help to port this fix to MNN v2.8.3? I am using it in my project and I can't easily change the lib's version. I tried to do it by myself, I guess it should be somewhere in ‎_RegionValid function, but the code is completely different between versions.
Also, should I close this issue or wait till it will be merged into repository code?

@jxt1234
Copy link
Collaborator

jxt1234 commented Aug 8, 2024

For easy you can close region fuse:

diff --git a/source/core/TensorUtils.cpp b/source/core/TensorUtils.cpp
index 104a9556..74bc0339 100644
--- a/source/core/TensorUtils.cpp
+++ b/source/core/TensorUtils.cpp
@@ -485,6 +485,7 @@ static bool _RegionValid(int* stride, int offset, int* size, int sizeNum, size_t

// fuse srcRegion and dstRegion to dstRegion if return true
bool TensorUtils::fuseRegion(Tensor::InsideDescribe::Region& srcReg, Tensor::InsideDescribe::Region& dstReg) {

  • return false;
    // src data isnot full data of dst
    if (srcReg.dst.offset > dstReg.src.offset ||
    srcReg.dst.stride[1] > srcReg.size[2] ||

@h6197627
Copy link
Author

Well, it works, but unfortunately networks with slice op that were unaffected by this bug (slice channels count is multiple of 4) has performance degradation ~15%

@jxt1234
Copy link
Collaborator

jxt1234 commented Aug 24, 2024

Best way is to update...

@jxt1234 jxt1234 closed this as completed Aug 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants