-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow inplace memory optimization for different data type #1696
Conversation
The patch will introduce bug in cases such as int8->float32 inplace, because of the difference in terms of memory, if you want to make it work, at least we should check the data type size are the same. Usually in place memory optimization is not as significant, we can even simply turn it off, as long as memory sharing is turned on. This is because the memory sharing optimization will enable sharing of int8 to the next few layers anyway.. |
@tqchen Thanks for your comment. From my point, In place memory optimization should only care the total size of memory, instead of the element size. If there's a pooling operator can convert 2x2 int8 to 1x1 float32, I don't see any problem for allowing in place memory for it. And for our case, it's not just an optimization for reducing memory usage, but a mandatory requirement from a special operator. It's like for an add operator, C = A + B, our library will always place result C into operand A. I guess memory sharing optimization can't figure out whether A or B should be in placed. |
@ZhennanQin I think @tqchen is not saying that in-place optimization is not needed, his point is (please correct me if i am going the wrong way :) @tqchen ) if you are going to optimize this scenario, a well-designed mechanism is needed - rather than some workaround-like thing that looks like to work (with respect)... btw, is it supposed to be data type in title :)? and a personal curiosity, are you from MLT? |
@jackwish Thanks for pointing out the typo. Yes, I'm from intel MLT team, and working on integrating MKLDNN into mxnet. But I'm still not understanding, why you're thinking this is just a work around? Is there any other option to describe such in place memory limitation from computing library? |
Quote @tqchen
I am not sure what particular scenario (i mean the data types) you are trying to optimize, by |
Yes. In another words, it's not an optimization, but a requirement from underlying library need to satisfy, just like the instruction restrictions need to handle by register allocation in compiler field. We need to describe it in some way to memory planning and scheduler to ensure that certain operand and output share same memory and the overrided operand won't be used anymore. |
By describe it in some way to memory planning and scheduler, you mean in TVM? I understand that underlying library (MKL-DNN?) may have restrictions, however, personally, in this context I think it could be a bit aggressive to change like this patch - is it safe (regarding memory layout)? |
Currently, in place optimization doesn't care about memory layout, and I don't think we need to care it after removing data type check. Memory layout may change after in place computation, just like data layout. Also, 'FInplaceOption' needs developer to explicitly set for those operators needed. It's developer's responsibility to use it carefully and correctly, to ensure it doesn't break any potential rule if has on particular backend. |
While this may be OK for MLKDNN, if we directly do an inplace optimization to do int8->fp32 cast, there will be a problem The following inplace cast code can cause bug, because when we write dfloat[0], it override dint[1]; void* data = malloc(4 * sizeof(float));
int8_t* dint8 = data;
float* dfloat = data;
for (int i = 0; i < 4; ++i) {
dfloat[i] = dint8[i];
} |
Looks like in this specific case, simply check dtype size can satisfy both sides. |
Firstly, we still check the memory size before doing in place optimization, so direct cast int8 -> fp32 won't be in placed. Secondly, if we define a new operator to do data type cast and remain memory size unchanged, then whether it could be in placed depends on the algorithm it used. Eg, if the new cast is defined as slice+cast, which only cast first N / 4 elements to output(N is the element number), then we shouldn't do in place optimization for it because of the bug you mentioned. But if the new case is defined as pool(kernel=[2, 2], stride=2) + cast, then it could be in placed. So SliceCast operator should do more check on data type size when register 'FInplaceOption' attribute, like
Overall, data type size check isn't mandatory for all in place optimization, it should be done when register 'FInplaceOption' attribute if necessary. I understand that you're worry about existing code may rely on the data type size check, I will add it for this PR to keep consistency. |
@tqchen Code is refactored to use a helper function to check data size. Please review. |
nnvm/src/pass/plan_memory.cc
Outdated
@@ -13,6 +13,30 @@ | |||
namespace nnvm { | |||
namespace pass { | |||
namespace { | |||
// Return bytes of data flag. | |||
static int GetDTypeSize(int type_flag) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tqchen Sorry I messed up previous PR, so I closed that and recreate this clean one for review. For example, for quantized relu, it may inplace convert data from int8 to uint8. We should support this kind of operation.