Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

T.init on arrays unrolls initialisation loop #119

Closed
dansanduleac opened this issue Jun 10, 2012 · 6 comments
Closed

T.init on arrays unrolls initialisation loop #119

dansanduleac opened this issue Jun 10, 2012 · 6 comments

Comments

@dansanduleac
Copy link
Contributor

If we have code like:

alias ubyte[4096] buf;

buf initUsingValue() { buf x = 0; return x; }
buf initDefault() { return buf.init; } // can just as easily replace buf for typeof(return)

then in LLVM, initialisation using the value 0 uses straight @llvm.memset, and alternatively when using a value other than 0 it uses a loop called arrayinit.
However, the .init unrolls the loop into 4096 llvm stores.

Looking at the assembly generated by DMD 2.059 (64-bit) on OSX, it seems to do the same thing i.e. generate 4096 movb instructions, or 512 movq if -Optimized, but it seems to me that this behavior is inefficient and should be changed to use a memset with the element's init / loop.

@dnadlinger
Copy link
Member

Yes, I noticed this as well. When building with optimizations enabled, LLVM recognizes the »loop« and replaces it with a zeroinitializer resp. efficient unrolled memset.

For codegen speed, it might still be beneficial to emit less IR instructions, though.

@dansanduleac
Copy link
Contributor Author

The memset vs loop performance is probably comparable, but what about the array size x MOV instructions that get generated? Do you think we should submit a patch report upstream about this, since DMD has the same behavior?

@dnadlinger
Copy link
Member

The codegen could (and should) definitely be improved, no question. Should not be too hard, although I'd regard it as a rather low-priority item compared to the other open bugs.

I can't judge the performance characteristics of what DMD currently emits without detailed benchmarking, but I suppose the size of the generated code alone is reason enough to report an upstream issue.

@redstar
Copy link
Member

redstar commented Aug 18, 2013

I looked into this. The loop unrolling is in method TypeSArray::defaultInitLiteral() in mtype.c. The IR is generated in method ArrayLiteralExp::toElem() in toir.cpp.
I already changed the IR generation to use a zeroinitializer and a memcpy, but I think this can still be improved.

redstar added a commit to redstar/ldc that referenced this issue Aug 18, 2013
Instead of creating individual stores to the array elements an constant
array is created and assigned to the destination memory. This is much
less IR than before. With -O it is optimized to a memset.
redstar added a commit to redstar/ldc that referenced this issue Aug 18, 2013
Instead of creating individual stores to the array elements an constant
array is created and assigned to the destination memory. This is much
less IR than before. With -O it is optimized to a memset.
redstar added a commit to redstar/ldc that referenced this issue Aug 18, 2013
Instead of creating individual stores to the array elements an constant
array is created and assigned to the destination memory. This is much
less IR than before. With -O it is optimized to a memset.
redstar added a commit to redstar/ldc that referenced this issue Aug 18, 2013
Instead of creating individual stores to the array elements an constant
array is created and assigned to the destination memory. This is much
less IR than before. With -O it is optimized to a memset.
redstar added a commit to redstar/ldc that referenced this issue Aug 18, 2013
Instead of creating individual stores to the array elements an constant
array is created and assigned to the destination memory. This is much
less IR than before. With -O it is optimized to a memset.
redstar added a commit to redstar/ldc that referenced this issue Aug 18, 2013
Instead of creating individual stores to the array elements an constant
array is created and assigned to the destination memory. This is much
less IR than before. With -O it is optimized to a memset.
redstar added a commit to redstar/ldc that referenced this issue Aug 18, 2013
Instead of creating individual stores to the array elements an constant
array is created and assigned to the destination memory. This is much
less IR than before. With -O it is optimized to a memset.
redstar added a commit that referenced this issue Aug 18, 2013
@redstar
Copy link
Member

redstar commented Aug 18, 2013

Turned out to be a bit more demanding than expected.
If all literals are constant then a llvm::ConstantArray is created with the values. Depending on the type of the destination, this array is directly stored or assigned to a global variable and then copied to the destination.

This should work in other cases, too. It looks like that there is a similar issue in method DtoArrayInit() in file arrays.cpp.

@redstar redstar closed this as completed Aug 18, 2013
@redstar
Copy link
Member

redstar commented Aug 18, 2013

As a side effect, this fixed the std.range failure on ubuntu-x86-64.

redstar pushed a commit that referenced this issue Sep 27, 2014
add back functions used in some Windows DLLs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants