-
Notifications
You must be signed in to change notification settings - Fork 12.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[X86][BF16] Improve vectorization of BF16 (#88486)
1. Move expansion to combineFP_EXTEND to help with small vectors; 2. Combine FP_ROUND to reduce fptrunc then fpextend after promotion;
- Loading branch information
1 parent
37ebf2a
commit 3cf8535
Showing
3 changed files
with
66 additions
and
229 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
3cf8535
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an heads-up, we are seeing internal tests fail in bf16 due to numerical differences after this commit.
The differences are likely due to the change in the function
combineFP_EXTEND
.I am not an x86/floating point expert myself but I wanted to bring up the problem earlier as we are working on a reproducer.
3cf8535
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgive my ignorance here, but we are having some trouble reducing the differences we see with this change into a small test case.
Shoud we expect any precision differences after this change? If so, what are they?
Someone internally suggests that this change may just truncate from f32 to bf16, instead of preserving the "round-to-nearest-even" semantics during the conversion.
The algorithm is described here: https://github.com/DIPlib/diplib/blob/master/dependencies/eigen3/Eigen/src/Core/arch/Default/BFloat16.h, starting on line 283.
Any help would be greatly appreciated.
3cf8535
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patch is to improve both performance and intermediate precision. It is expected to see final numerical difference, but it might be arguable which one is more deserved. BF16 is not a IEEE type, and considering the fewer fraction bits, improving the intermediate precision and allowing final numerical difference is a better orientation to me.
BF16 conversion instruction doesn't support rouding mode other than "round-to-nearest-even". In fact, if we consider the difference with a similar type FP16 from the instruction design perspective, we can see the similar thing here.
VCVTPS2PH(X)
supports exceptions, different rounding modes and DAZ (no FTZ compared to FP32/64) whileVCVTNEPS2BF16
supports none of them. It implies exact numerical result may not be a concern to BF16.3cf8535
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@phoebewang What do you mean by improving the intermediate precision? The lack of standardization of rounding modes does not imply that numerical accuracy is not a concern - that would be a dangerous conclusion. In any case, if the new code implements RTNE that is desirable. Does it support conversion to and from subnormal bfloat16 values?
3cf8535
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@phoebewang could you please specify precisely how this change altered the conversion semantics?
3cf8535
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rmlarsen the change doesn't intend to alter conversion semantics but the intermediate calculation, for example if we have
before this change, it's
now it becomes
The insruction
VCVTNEPS2BF16
always take subnormal value as zero. If accuracy is the concern, we cannot leverage native instruction in this case either.3cf8535
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@phoebewang thank you for the clarification. That is indeed a strict improvement.
3cf8535
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hold on, isn't this sort of thing only legal in the presence of
contract
?That is, I'd argue that if we start with
which, per promotion, becomes
and then this change rewrites this to
That is equivalent to the transformation
which is only legal in the presence of
contract
fastmath.Therefore, I'm still of the opinion that this merger violates LLVM's semantics.
3cf8535
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's more similar to excess-precision, which independent of
contract
.3cf8535
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, and per the documentation on that
"assignments and explicit casts force the operand to be converted to its formal type, discarding any excess precision".
So, I'm arguing that the LLVM IR
, which is equivalent to
float x2 = (float)(bfloat)(x);
must be protected from this optimization.That is, there's a difference between the trunc/ext pairs introduced by promotion (ex, if you're doing
with
%1
as a float because you cancelled the trunc/ext pairs, that's entirely fine, but the semantics of input IR should be preserved.Or, alternatively, the documentation should be updated to reflect that you need an arithmetic fence to prevent extended precision ... which is probably too strict.
3cf8535
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice the documentation emphasize it applys only for
_Float16
.I think it's reasonable
__bf16
owns a more loose rule under the same concept.3cf8535
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@phoebewang I disagree. Also notice that in the original Google implementation of bfloat16, which now resides in the Eigen library, we explicitly disallow implicit casting from float to bfloat16, as it is extremely lossy.
3cf8535
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rmlarsen @krzysz00 I think you are right, it's too aggressive to combine explicit fptrunc/fpext.
Created #91420 for it, PTAL.
3cf8535
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@phoebewang I think that looks good. I'm not familiar enough with LLVM internals to give you a formal review. Thank you!
3cf8535
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rmlarsen You are welcome :)