-
-
Notifications
You must be signed in to change notification settings - Fork 749
ARM and Android fixes: version out some HardFloat tests, use TMPDIR logic and declaration from druntime, and correct comment #5358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
As these two changes are in completely unrelated parts of the code, it would be nice to at least do them in separate commits. |
Alternatively, you could just fix the problem with llvm. |
|
The problem isn't with llvm. The cast takes a possibly unaligned If you like, try it with gdc and see if it does any better. The last unaligned tests in the module trip up when optimized by ldc. |
|
Split the changes into two commits and fixed a misnamed temp variable. @kinke and @gchatelet, review needed. |
Yes, I know. It still seems strange that you are taking the It's probably not just ARM, affected, so maybe some |
I didn't want to touch anything on x86/x64, which @gchatelet did a lot of work to optimize, in #3916.
Sure, he mentioned MIPS also in a comment earlier in the module. I figure it's better for those ports to add it themselves, as needed. If you have some arch in mind, try it with gdc and let me know what needs adding. |
std/digest/murmurhash.d
Outdated
| { | ||
| // Since the block may be unaligned, this can trip up ARM | ||
| // codegen, so place it in a newly aligned Element first. | ||
| Element alignedTemp = block; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this fix is correct. Evaluating the right hand side (block) is still an unaligned read. The generated code just happens to be able to deal with that. The actual issue is the cast to Element[], which violates alignment guarantees. "Fixing" it like this is just asking for the problem to return later.
A correct fix would be a loop that iterates over data in chunks of the right size and memcpys it to a stack local (e.g. by using slice assignment on the variable casted to void[]).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'm not familiar with these alignment issues and exactly how D code is likely to trigger them: I just used the simplest fix that worked. I suppose you want something more like @kinke's patch, which uses different types based on the alignment, though maybe he had problems with certain types that are commented out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually surprised this apparently fixed the issue for LDC, it smells like wasted optimization opportunity somewhere. A rather compact proper (I hope, I just typed it blindly) fix with no optimizations for alignments of 4/2 bytes would be:
version(ARM)
{
// data.ptr appropriately aligned for type Element?
if ((cast(size_t) data.ptr) & (Element.alignof - 1) == 0)
{
foreach (ref const Element block; cast(const(Element[]))(data[0 .. remainderStart]))
putElement(block);
}
else
{
Element alignedBlock = void;
for (size_t i = 0; i != remainderStart; i += Element.sizeof)
{
(cast(ubyte*) &alignedBlock)[0 .. Element.sizeof] = data[i .. i + Element.sizeof];
putElement(alignedBlock);
}
}
}
else
{
foreach (ref const Element block; cast(const(Element[]))(data[0 .. remainderStart]))
putElement(block);
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'm not familiar with these alignment issues and exactly how D code is likely to trigger them
Although explicit documentation is scarce (with DMD being so x86-centric, though the same problem occurs with vector instructions there as well), the basic idea is that every pointer of type T* is guaranteed to point to an address aligned to T.alignof. The code generator can of course then assume this when using the pointer in loads/stores. In your code, the cast(Element[]) is still wrong, as .ptr is not aligned afterwards.
Note that assuming alignment for arrays is the only sensible thing to do, otherwise every access to an array with elements larger than a byte would be dog-slow on architectures that don't support unaligned loads, as we would need to emit a byte-by-byte load. (We should add a debug check to array casts to catch this, by the way.)
std/digest/murmurhash.d
Outdated
| numChunks = processChunks!(ushort[size / 16])(); | ||
| else | ||
| numChunks = processChunks!(ubyte[size / 8])(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lightly modified version of @kinke's fix, let me know if this suffices. It passes the extended tests below on ARM.
std/digest/murmurhash.d
Outdated
| numChunks = processChunks!(uint)(); | ||
| /* TODO: cannot cast ushort/ubyte slice to Element = uint | ||
| else if ((startAddress & 1) == 0) | ||
| numChunks = processChunks!(ushort[2])(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't a ushort[2] be cast to a uint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still needs to be sorted out, otherwise the 32-bit hasher will still fail for unaligned data.
| } | ||
|
|
||
| @system unittest | ||
| version(D_HardFloat) @system unittest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note this added commit, needed when running the tests on Android/ARM for the merge-2.074 branch of ldc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs moving to a separate pr IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's so small, I'd rather just do it here. What's the problem with this commit? These three commits are what I need to get the tests passing on Android/ARM, fairly small PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one of the commits is fine, we can merge it now without waiting on the other two. I'm not sure about this change, but I haven't looked at the body, and I don't wish to discuss here, as the topic is about murmurhash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The topic is some small changes I had to make for Android/ARM, ie the contents of this PR. The notion that one can't discuss this tiny commit here because of the surrounding discussion about murmurhash3 is laughable, as most non-negligible Phobos PRs have many such topical threads of discussion.
|
ping, made changes asked for. |
std/digest/murmurhash.d
Outdated
| immutable unalignedHash = digest!H(data[1 .. $]); // 1 .. 1024 | ||
| assert(alignedHash == unalignedHash); | ||
| immutable ubyte[1028] data = 0xAC; | ||
| immutable alignedHash = digest!H(data[0 .. 1023]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0 .. 1024 (contrary to the misleading previous comment) if you want to preserve the previous behavior (hashing a full 1K chunk); same for 3 lines below.
data isn't guaranteed to be aligned (wrt. 'alignedHash'), its declaration would need an align(Element.alignof) for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I thought that looked off, was just lazy to check it, corrected the last index now. As for the alignment, it doesn't have to be but it usually is, and the right behavior will be checked regardless, as it's cycling through 5 different byte alignments.
std/digest/murmurhash.d
Outdated
| import core.stdc.string : memcpy; | ||
| Element block; | ||
| ()@trusted{memcpy(&block, &chunk, Element.sizeof);}(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put this in to fix the 32-bit unaligned read you pointed out is still there, @kinke, following David's suggestion. I'm not having any issues with the way it was before or now, ie the test passes, but this won't screw up with some optimizer pass later on.
ibuclaw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make the std.math and allocator patches as a separate pr each? Discussing them here would just create noise in the thread.
| } | ||
|
|
||
| @system unittest | ||
| version(D_HardFloat) @system unittest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs moving to a separate pr IMO.
std/digest/murmurhash.d
Outdated
| import core.stdc.string : memcpy; | ||
| Element block; | ||
| ()@trusted{memcpy(&block, &chunk, Element.sizeof);}(); | ||
| putElement(block); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merged and reorganized the 32-bit alignment fix I added yesterday, @kinke, here's the memcpy workaround David suggested earlier. I benchmarked it on unaligned data using @gchatelet's benchmark, it's usually only 1-2% slower, around 4 GiB/s.
|
@ibuclaw, I'm not moving a two-line patch to a different PR, I see no reason not to discuss all three commits here. |
|
Hey Iain, since you did all that work to get gdc into gcc, I'd like to point out that this PR is essentially a gift from ldc to gdc, stemming from issues that we ran into when updating to 2.072 and 2.074, as dmd won't use the two ARM-related commits anyway. After David approved this PR, I merged the first two commits into ldc's branch of phobos, and when I tried to merge the last one into @kinke's WIP ldc branch for 2.074, I saw that he already stubbed out those tests because of other issues. Whenever gdc updates to a newer phobos, you will likely need these patches for ARM. If you want me to modify them in some way to better suit gdc, I'll be happy to discuss that. Until then, this PR is only here to benefit gdc, as ldc is already using it. |
|
(Closing & reopening the PR in order to restart the CI tests.) |
|
Thanks for your pull request, @joakim-noah! Bugzilla referencesYour PR doesn't reference any Bugzilla issue. If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog. |
std/digest/murmurhash.d
Outdated
| assert(alignedHash == unalignedHash); | ||
| immutable ubyte[1028] data = 0xAC; | ||
| immutable alignedHash = digest!H(data[0 .. 1024]); | ||
| foreach(i; 1 .. 5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CirclCI fails because there needs to be a space between foreach and (.
@CyberShadow it looks like DAutoTest was trying to build the newCTFE dmd branch, instead of master or stable. Any ideas? Probably restarting the test would fix the error. Is there a way I can do that without force-pushing in Joakim's branch? (I'm on the phone currently and I don't have a proper build environment setup). Edit: Nevermind closing & reopening tge PR did the trick, though there was a slight delay, compared to the other CIs. |
|
Thanks for the spacing tip, Petar, added it to my commit and removed yours. |
| { | ||
| // Don't check for a global temporary directory as | ||
| // Android doesn't have one. | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added another Android commit, as ldc is now packaged on Android itself, and the Termux app does set TMPDIR. This was versioned out because it was assumed that most would run D code from their own Android GUI app, ie an apk, which don't have any global temp directory. However, now that D can be used from the command-line on Android itself, with an app that sets TMPDIR for all command-line D tools, just use the below Posix logic. You will have to set the tempDir by hand for each apk anyway, if you use it.
This already hit me when building rdmd for Android, where I had to patch it to check TMPDIR, can take that out with this change.
The murmurhash changes are no longer part of this PR.
Commits merged from druntime.
Fix struct tls_index definition on x32
dlang/druntime#2354
Update SectionGroup signatures to match on all targets
dlang/druntime#2401
Fix issue 19128 - argument to alloca may be too large
dlang/druntime#2409
Define some common filesystem limits in core.stdc.limits
dlang/druntime#2460
Use version Darwin instead of OSX in core.sys.posix.aio
dlang/druntime#2470
Commits merged from phobos.
Don't run HardFloat tests on SoftFloat systems
dlang/phobos#5358
Remove reliance on stdin, stdout, stderr being aliasable
dlang/phobos#5718
Solaris: add import clock_gettime to currStdTime
dlang/phobos#5807
Don't print debug messages when building unittests
dlang/phobos#6827
Add HPPA support to phobos
Fixes https://gcc.gnu.org/PR89054
dlang/phobos#6836
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@268293 138bc75d-0d04-0410-961f-82ee72b054a4
[dmd-cxx] Backport non-math related patches from #5358
x86/x64 has no problem with unaligned reads, but the llvm optimizer gets confused by the cast in the
foreachand assumes theElementis 32-bit aligned, generating ARM instructions that require word alignment and causing problems with the last unaligned test.As for
posix_memalign, turns out its declared in druntime but wasn't forOSX, so added it in dlang/druntime#1819 and used the import instead. That druntime PR will need to be merged first.