Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suboptimal codegen for Bool assignment #21712

Open
ancapdev opened this issue May 5, 2017 · 6 comments
Open

Suboptimal codegen for Bool assignment #21712

ancapdev opened this issue May 5, 2017 · 6 comments
Labels
compiler:codegen Generation of LLVM IR and native code performance Must go faster

Comments

@ancapdev
Copy link
Contributor

ancapdev commented May 5, 2017

Observing what looks like suboptimal codegen when loading and assigning Bool variables. It seems to undergo some transformation where I'm guessing it loses the information that it's a Bool type and upon assignment forces its value to [0, 1] by binary and, which at least naively to me seems redundant.

Example below, tested on 0.5.1 and 0.6.0-pre.alpha.

type Foo
    flag1::Bool
    flag2::Bool
end

function update(foo::Foo)
    foo.flag2 = foo.flag1
end

code_native(update, (Foo,))
code_llvm(update, (Foo,))

LLVM output

define i8 @julia_update_60814(i8** dereferenceable(2)) #0 !dbg !5 {
top:
  %1 = bitcast i8** %0 to i8*
  %2 = load i8, i8* %1, align 16
  %3 = getelementptr i8, i8* %1, i64 1
  %4 = and i8 %2, 1
  store i8 %4, i8* %3, align 1
  ret i8 %4
}

Native x64 output

	.text
Filename: In[1]
	pushq	%rbp
	movq	%rsp, %rbp
Source line: 9
	movb	(%rdi), %al
	andb	$1, %al
	movb	%al, 1(%rdi)
	popq	%rbp
	retq
	nopl	(%rax)
@JeffBezanson JeffBezanson added the compiler:codegen Generation of LLVM IR and native code label May 5, 2017
@Keno
Copy link
Member

Keno commented May 5, 2017

Could probably be fixed by putting !range metadata on boolean loads.

@StefanKarpinski
Copy link
Member

Would using i1 in our LLVM for Bool be a feasible approach?

@Keno
Copy link
Member

Keno commented May 8, 2017

In theory that should probably work, yes. In practice you'll probably run into LLVM bugs if you try to do, because no other frontend does it.

@StefanKarpinski
Copy link
Member

IIRC that's probably why we didn't do it in the first place.

@tkelman
Copy link
Contributor

tkelman commented May 13, 2017

I thought we did for a while, and changed because of said LLVM bugs

@iamed2
Copy link
Contributor

iamed2 commented Aug 13, 2021

The claim that no other frontend does it is no longer true! :)

Clang appears to use i1:
https://blog.matthieud.me/2020/exploring-clang-llvm-optimization-on-programming-horror/

Some of us were looking at that blog post it on Slack and wondering why Julia couldn't make the same optimizations. Julia currently generates code for the naive loop, and then vectorizes it during optimization.

function isEven2(num::T) where T 
   numComp = zero(T) 
   even = true 
   while num != numComp
      even = !even 
      numComp = numComp + one(T)
   end
   return even
end

I took the unoptimized code (@code_llvm optimize=false dump_module=true isEven2(Int32(300))) and popped it into Godbolt, compiling with clang -O1. This generates the naive loop in asm. But I noticed that one of the passes in the blog said "even is no longer a byte (i8) but stored directly as a single bit (i1), thus removing several conversion instructions." I tried manually removing the conversions and that resulted in the O(1) version being generated by clang!

Here's the version before and after my rewriting: https://gist.github.com/iamed2/863a8f329b006bdc200a1128dbf5bd6e

All this to say it seems like it might be a good idea to take another look at using i1s for Bools.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code performance Must go faster
Projects
None yet
Development

No branches or pull requests

7 participants