Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a quite convoluted way of writing it over just
if
,else
etc?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth checking
@code_llvm
; it's true that LLVM now does the conversion ofif-else
=>ifelse
automatically pretty well these days.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They seem to produce different output at the
@code_llvm
and@code_native
stages consistent with my earlier experience thatifelse
generates LLVMselect
andif
generates LLVMbr
, but benchmarking suggests their performance is very similar in scalar applications and broadcasting: https://gist.github.com/johnmyleswhite/548f4eb18a028a237d52ae06811ca33cNote that the benchmarks also suggest that changing from the old implementation to the new implementation causes performance to drop because of processing subnormals in a mirror image: the old formulation was fast for
z = -710.0
and slow forz = +710.0
, whereas the new formulation is slow forz = -710.0
and fast forz = +710.0
.Happy to choose whichever form people prefer. I tend to defer to the side of assuming
select
is more likely to play well with SIMD, but it doesn't seem like that issue applies in this case for the benchmarks I've run.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I don't really care, just felt like the
if
version was easier to read and I couldn't find a benchmark where it was slower. I guess if you correctly predict the branch version is faster and otherwise the branchless is faster? 🤷♂️There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think in the absence of useful evidence either way I'll maintain my existing superstitious faith in the importance of the LLVM
select
instruction. Revising in the future would be trivial.