-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TVM][Bugfix] fix storage_rewrite bug when input is big #2580
Conversation
https://github.com/dmlc/tvm/blob/919bea8c79de5de9996cb4714fdb92b2149a023b/src/arithmetic/detect_linear_equation.cc#L29 needs to support |
@tqchen Could you please take a look? |
Let us make the behavior so that for now, we can keep arithmetic in int32 if there is no overflow, and use int64 only when there is a chance of overflow |
@tqchen Thanks. I will add it and update the PR. |
b451aba
to
00c41da
Compare
LGTM from my side. @wweic @yzhliu , please https://docs.tvm.ai/contribute/code_review.html#approve-and-request-changes-explicitly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
I have reservation about this code change. It seems to me this is too much a cost of increased risk and code complexity for something not of too much practical value, e.g., number of bits exceeds max int32. You end up having code which uses int32 in one case and int64 in another? Super confusing! If number of bits exceeds max int32 you likely have other problem to worry about, like, unable to allocate memory that big. I suggest either this should be left not fixed. Or fix in more conservative way, like using unsigned int32 instead of signed, so you can have up to 4 billion bits. Or if you feel strongly about supporting more than 4 billion bits, uniformly use int64. Just don't have a mixed case of int32/int64 which is too risky. |
@Anthony-Mai has some point on this. Perhaps a simple way forward is to first use a CHECK to make sure we do not go OOM and use int32 atm. As discussed in #2588 , we want to transition to int64 once we have a proper narrowing pass that detects the best data types, and we can move on from there |
@Anthony-Mai Thanks for your comment. I think allocating more than 2G or even 4G memory on a 64-bit system should be okay. We have an usage that requires a large amount of memory. We are also probably going to test very large input images on some networks, where uint32 might not be enough. I am not sure if we need to use int64 all around the other places because it seems that VTA at least only uses int32, it would result in many casts. In addition, as per @yzhliu comment, we are warning the users here. I understand that using one consistent type is ideal, but casting as needed in this case is more like a small optimization. Anyway, I am happy to hear other voices. @tqchen @yzhliu @wweic |
@tqchen Sorry, I didn't see your comment above. It seems we sent around the same time... Please take another look. But it seems we need more than max int32 to support the usage. |
I think we all agree on fixing the problem. The question is should we fix it now or wait and come back to fix it after we transition most defaults from int32->int64 and apply optimal narrowing. |
Does uint32 work? If so this can be a valid fix which satisfy both the special use case @zhiics and the team met so far, and avoiding type mismatch. |
@yzhliu uint32 should be enough for that special case. NVM, for even larger tests, I think we might be able to wait till everything is fixed. Let me change it to uint32. |
I would advise against uint32, int64 is a better choice mainly because the additional 1 bit wont really bring too much benefit |
To be honest, I personally also think int64 is better. uint32 doesn't change anything in this context because they are still different types. |
okay, if nobody disagrees, I will change it back to int64. |
oh I thought with uint32 we can avoid mismatch. If not I'm good with the i64 fix. Let's also track it in #2588 |
Thanks @zhiics @wweic @tqchen @Anthony-Mai Let’s merge it for now, and come back when i32->i64 transition has been finished. |
* fix storage_rewrite bug when input is big * cast when necessary * simplification * simplification * int64->uint32 * revert uint32->int64
* fix storage_rewrite bug when input is big * cast when necessary * simplification * simplification * int64->uint32 * revert uint32->int64
* fix storage_rewrite bug when input is big * cast when necessary * simplification * simplification * int64->uint32 * revert uint32->int64
This PR fixes a bug when input size is large. For example, the size of a tensor in terms of bits may exceed the maximum of int32. It causes core dump at runtime. This bug is identified with the help from @yzhliu
@yzhliu @tqchen @wweic please review