ARM and Android fixes: version out some HardFloat tests, use TMPDIR logic and declaration from druntime, and correct comment #5358

joakim-noah · 2017-05-03T05:14:52Z

x86/x64 has no problem with unaligned reads, but the llvm optimizer gets confused by the cast in the foreach and assumes the Element is 32-bit aligned, generating ARM instructions that require word alignment and causing problems with the last unaligned test.

As for posix_memalign, turns out its declared in druntime but wasn't for OSX, so added it in dlang/druntime#1819 and used the import instead. That druntime PR will need to be merged first.

CyberShadow · 2017-05-03T05:30:12Z

As these two changes are in completely unrelated parts of the code, it would be nice to at least do them in separate commits.

ibuclaw · 2017-05-03T06:50:58Z

the llvm optimizer gets confused by the cast in the foreach and assumes the Element is 32-bit aligned,

Alternatively, you could just fix the problem with llvm. ☺️

joakim-noah · 2017-05-03T08:06:50Z

The problem isn't with llvm. The cast takes a possibly unaligned ubyte[] chunk and puts it in a type that is word-aligned by default in D, like uint or ulong. That works fine on CPU arches that always allow unaligned reads, like x86, but not ARM.

If you like, try it with gdc and see if it does any better. The last unaligned tests in the module trip up when optimized by ldc.

joakim-noah · 2017-05-03T08:12:46Z

Split the changes into two commits and fixed a misnamed temp variable.

@kinke and @gchatelet, review needed.

ibuclaw · 2017-05-03T08:40:06Z

That works fine on CPU arches that always allow unaligned reads, like x86, but not ARM.

Yes, I know. It still seems strange that you are taking the ref const, then removing the ref. Why not just make the loop foreach (const Element; ...).

It's probably not just ARM, affected, so maybe some version (NeedAlignedLoads) might be better, that's a topic for the bikeshed anyhow.

joakim-noah · 2017-05-03T08:59:29Z

It still seems strange that you are taking the ref const, then removing the ref. Why not just make the loop foreach (const Element; ...).

I didn't want to touch anything on x86/x64, which @gchatelet did a lot of work to optimize, in #3916.

It's probably not just ARM, affected, so maybe some version (NeedAlignedLoads) might be better, that's a topic for the bikeshed anyhow.

Sure, he mentioned MIPS also in a comment earlier in the module. I figure it's better for those ports to add it themselves, as needed. If you have some arch in mind, try it with gdc and let me know what needs adding.

dnadlinger · 2017-05-03T10:31:03Z

std/digest/murmurhash.d

+            {
+                // Since the block may be unaligned, this can trip up ARM
+                // codegen, so place it in a newly aligned Element first.
+                Element alignedTemp = block;


I don't think this fix is correct. Evaluating the right hand side (block) is still an unaligned read. The generated code just happens to be able to deal with that. The actual issue is the cast to Element[], which violates alignment guarantees. "Fixing" it like this is just asking for the problem to return later.

A correct fix would be a loop that iterates over data in chunks of the right size and memcpys it to a stack local (e.g. by using slice assignment on the variable casted to void[]).

OK, I'm not familiar with these alignment issues and exactly how D code is likely to trigger them: I just used the simplest fix that worked. I suppose you want something more like @kinke's patch, which uses different types based on the alignment, though maybe he had problems with certain types that are commented out.

I'm actually surprised this apparently fixed the issue for LDC, it smells like wasted optimization opportunity somewhere. A rather compact proper (I hope, I just typed it blindly) fix with no optimizations for alignments of 4/2 bytes would be:

version(ARM) { // data.ptr appropriately aligned for type Element? if ((cast(size_t) data.ptr) & (Element.alignof - 1) == 0) { foreach (ref const Element block; cast(const(Element[]))(data[0 .. remainderStart])) putElement(block); } else { Element alignedBlock = void; for (size_t i = 0; i != remainderStart; i += Element.sizeof) { (cast(ubyte*) &alignedBlock)[0 .. Element.sizeof] = data[i .. i + Element.sizeof]; putElement(alignedBlock); } } } else { foreach (ref const Element block; cast(const(Element[]))(data[0 .. remainderStart])) putElement(block); }

OK, I'm not familiar with these alignment issues and exactly how D code is likely to trigger them

Although explicit documentation is scarce (with DMD being so x86-centric, though the same problem occurs with vector instructions there as well), the basic idea is that every pointer of type T* is guaranteed to point to an address aligned to T.alignof. The code generator can of course then assume this when using the pointer in loads/stores. In your code, the cast(Element[]) is still wrong, as .ptr is not aligned afterwards.

Note that assuming alignment for arrays is the only sensible thing to do, otherwise every access to an array with elements larger than a byte would be dog-slow on architectures that don't support unaligned loads, as we would need to emit a byte-by-byte load. (We should add a debug check to array casts to catch this, by the way.)

joakim-noah · 2017-05-21T11:10:28Z

std/digest/murmurhash.d

+                    numChunks = processChunks!(ushort[size / 16])();
+                else
+                    numChunks = processChunks!(ubyte[size / 8])();
+            }


Lightly modified version of @kinke's fix, let me know if this suffices. It passes the extended tests below on ARM.

joakim-noah · 2017-05-21T11:11:02Z

std/digest/murmurhash.d

+                    numChunks = processChunks!(uint)();
+                /* TODO: cannot cast ushort/ubyte slice to Element = uint
+                else if ((startAddress & 1) == 0)
+                    numChunks = processChunks!(ushort[2])();


Why can't a ushort[2] be cast to a uint?

This still needs to be sorted out, otherwise the 32-bit hasher will still fail for unaligned data.

joakim-noah · 2017-05-21T11:14:28Z

std/math.d

 }

-@system unittest
+version(D_HardFloat) @system unittest


Note this added commit, needed when running the tests on Android/ARM for the merge-2.074 branch of ldc.

Needs moving to a separate pr IMO.

It's so small, I'd rather just do it here. What's the problem with this commit? These three commits are what I need to get the tests passing on Android/ARM, fairly small PR.

Only one of the commits is fine, we can merge it now without waiting on the other two. I'm not sure about this change, but I haven't looked at the body, and I don't wish to discuss here, as the topic is about murmurhash.

The topic is some small changes I had to make for Android/ARM, ie the contents of this PR. The notion that one can't discuss this tiny commit here because of the surrounding discussion about murmurhash3 is laughable, as most non-negligible Phobos PRs have many such topical threads of discussion.

joakim-noah · 2017-05-24T09:33:16Z

ping, made changes asked for.

kinke · 2017-05-24T11:26:21Z

std/digest/murmurhash.d

-        immutable unalignedHash = digest!H(data[1 .. $]); // 1 .. 1024
-        assert(alignedHash == unalignedHash);
+        immutable ubyte[1028] data = 0xAC;
+        immutable alignedHash = digest!H(data[0 .. 1023]);


0 .. 1024 (contrary to the misleading previous comment) if you want to preserve the previous behavior (hashing a full 1K chunk); same for 3 lines below.
data isn't guaranteed to be aligned (wrt. 'alignedHash'), its declaration would need an align(Element.alignof) for that.

Yeah, I thought that looked off, was just lazy to check it, corrected the last index now. As for the alignment, it doesn't have to be but it usually is, and the right behavior will be checked regardless, as it's cycling through 5 different byte alignments.

joakim-noah · 2017-05-24T18:03:12Z

std/digest/murmurhash.d

+                        import core.stdc.string : memcpy;
+                        Element block;
+                        ()@trusted{memcpy(&block, &chunk, Element.sizeof);}();
+                    }


Put this in to fix the 32-bit unaligned read you pointed out is still there, @kinke, following David's suggestion. I'm not having any issues with the way it was before or now, ie the test passes, but this won't screw up with some optimizer pass later on.

ibuclaw

Can we make the std.math and allocator patches as a separate pr each? Discussing them here would just create noise in the thread.

ibuclaw · 2017-05-25T06:36:34Z

std/math.d

 }

-@system unittest
+version(D_HardFloat) @system unittest


Needs moving to a separate pr IMO.

joakim-noah · 2017-05-25T12:14:40Z

std/digest/murmurhash.d

+                        import core.stdc.string : memcpy;
+                        Element block;
+                        ()@trusted{memcpy(&block, &chunk, Element.sizeof);}();
+                        putElement(block);


Merged and reorganized the 32-bit alignment fix I added yesterday, @kinke, here's the memcpy workaround David suggested earlier. I benchmarked it on unaligned data using @gchatelet's benchmark, it's usually only 1-2% slower, around 4 GiB/s.

joakim-noah · 2017-06-03T18:36:50Z

@ibuclaw, I'm not moving a two-line patch to a different PR, I see no reason not to discuss all three commits here.

joakim-noah · 2017-06-21T22:51:25Z

Hey Iain, since you did all that work to get gdc into gcc, I'd like to point out that this PR is essentially a gift from ldc to gdc, stemming from issues that we ran into when updating to 2.072 and 2.074, as dmd won't use the two ARM-related commits anyway.

After David approved this PR, I merged the first two commits into ldc's branch of phobos, and when I tried to merge the last one into @kinke's WIP ldc branch for 2.074, I saw that he already stubbed out those tests because of other issues.

Whenever gdc updates to a newer phobos, you will likely need these patches for ARM. If you want me to modify them in some way to better suit gdc, I'll be happy to discuss that. Until then, this PR is only here to benefit gdc, as ldc is already using it.

PetarKirov · 2017-06-23T09:17:11Z

(Closing & reopening the PR in order to restart the CI tests.)

dlang-bot · 2017-06-23T09:17:15Z

Thanks for your pull request, @joakim-noah!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

PetarKirov · 2017-06-23T09:19:17Z

std/digest/murmurhash.d

-        assert(alignedHash == unalignedHash);
+        immutable ubyte[1028] data = 0xAC;
+        immutable alignedHash = digest!H(data[0 .. 1024]);
+        foreach(i; 1 .. 5)


CirclCI fails because there needs to be a space between foreach and (.

PetarKirov · 2017-06-23T09:25:15Z

Sizeof BCValue: 40LU
60 opcodes remaining
/dev/shm/dtest/work/repo/dlang.org/../dmd/src/ddmd/ctfe/bc_skeleton.d(9): Error: undefined identifier BCFunction
SDCs LLVM Header are not avilable
LLVM_Backend is not compiled
BCGen
immutable(const(BCValue) function(BCValue[], BCHeap*) @safe)
C_BCGen
immutable(BCValue function(BCValue[], BCHeap*) pure @safe)
posix.mak:527: recipe for target '.generated/docs-prerelease.json' failed
make: *** [.generated/docs-prerelease.json] Error

@CyberShadow it looks like DAutoTest was trying to build the newCTFE dmd branch, instead of master or stable. Any ideas? Probably restarting the test would fix the error. Is there a way I can do that without force-pushing in Joakim's branch? (I'm on the phone currently and I don't have a proper build environment setup).

Edit: Nevermind closing & reopening tge PR did the trick, though there was a slight delay, compared to the other CIs.

joakim-noah · 2017-06-24T21:17:47Z

Thanks for the spacing tip, Petar, added it to my commit and removed yours.

joakim-noah · 2017-09-16T07:37:16Z

std/file.d

-        {
-            // Don't check for a global temporary directory as
-            // Android doesn't have one.
-        }


Added another Android commit, as ldc is now packaged on Android itself, and the Termux app does set TMPDIR. This was versioned out because it was assumed that most would run D code from their own Android GUI app, ie an apk, which don't have any global temp directory. However, now that D can be used from the command-line on Android itself, with an app that sets TMPDIR for all command-line D tools, just use the below Posix logic. You will have to set the tempDir by hand for each apk anyway, if you use it.

This already hit me when building rdmd for Android, where I had to patch it to check TMPDIR, can take that out with this change.

…o on Android.

joakim-noah · 2017-12-22T13:17:43Z

Removed the largest murmurhash commit, as it was originally written by @kinke and he has since reworked it and submitted a new version in another pull. Also added one more commit with a doc fix, this pull is very small and easy to review now, @wilzbach.

The murmurhash changes are no longer part of this PR.

Commits merged from druntime. Fix struct tls_index definition on x32 dlang/druntime#2354 Update SectionGroup signatures to match on all targets dlang/druntime#2401 Fix issue 19128 - argument to alloca may be too large dlang/druntime#2409 Define some common filesystem limits in core.stdc.limits dlang/druntime#2460 Use version Darwin instead of OSX in core.sys.posix.aio dlang/druntime#2470 Commits merged from phobos. Don't run HardFloat tests on SoftFloat systems dlang/phobos#5358 Remove reliance on stdin, stdout, stderr being aliasable dlang/phobos#5718 Solaris: add import clock_gettime to currStdTime dlang/phobos#5807 Don't print debug messages when building unittests dlang/phobos#6827 Add HPPA support to phobos Fixes https://gcc.gnu.org/PR89054 dlang/phobos#6836 git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@268293 138bc75d-0d04-0410-961f-82ee72b054a4

[dmd-cxx] Backport non-math related patches from #5358

joakim-noah mentioned this pull request May 3, 2017

Android and ARM regressions in master 1.2 ldc-developers/ldc#2024

Closed

dnadlinger suggested changes May 3, 2017

View reviewed changes

joakim-noah commented May 21, 2017

View reviewed changes

joakim-noah changed the title ~~Add ARM alignment fix and use declaration from druntime~~ Add ARM alignment fix, version out some HardFloat tests, and use declaration from druntime May 21, 2017

joakim-noah commented May 21, 2017

View reviewed changes

kinke reviewed May 24, 2017

View reviewed changes

joakim-noah commented May 24, 2017

View reviewed changes

ibuclaw previously requested changes May 25, 2017

View reviewed changes

std/math.d

}

@system unittest

version(D_HardFloat) @system unittest

Copy link

Member

ibuclaw May 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs moving to a separate pr IMO.

joakim-noah commented May 25, 2017

View reviewed changes

dnadlinger approved these changes May 25, 2017

View reviewed changes

PetarKirov closed this Jun 23, 2017

PetarKirov reopened this Jun 23, 2017

PetarKirov reviewed Jun 23, 2017

View reviewed changes

joakim-noah requested review from CyberShadow, andralex and wilzbach as code owners September 16, 2017 07:26

joakim-noah changed the title ~~Add ARM alignment fix, version out some HardFloat tests, and use declaration from druntime~~ ARM and Android fixes: Add ARM alignment fix, version out some HardFloat tests, use TMPDIR logic and declaration from druntime Sep 16, 2017

joakim-noah commented Sep 16, 2017

View reviewed changes

dnadlinger referenced this pull request in ldc-developers/phobos Dec 1, 2017

Clean up LDC-specific std.digest.murmurhash extension

4057efb

kinke mentioned this pull request Dec 9, 2017

Upstream some LDC patches #5902

Merged

joakim-noah added 4 commits December 22, 2017 18:42

Use posix_memalign declaration from druntime instead

7fde88d

Don't run HardFloat tests on SoftFloat systems, like ARM_SoftFloat.

0080647

Use the Posix TMPDIR logic, as D can be used from the command-line to…

ee4036a

…o on Android.

std.system: Add comment fix for Android

f2186ef

joakim-noah changed the title ~~ARM and Android fixes: Add ARM alignment fix, version out some HardFloat tests, use TMPDIR logic and declaration from druntime~~ ARM and Android fixes: version out some HardFloat tests, use TMPDIR logic and declaration from druntime, and correct comment Dec 22, 2017

dnadlinger added the Merge:auto-merge label Dec 22, 2017

dlang-bot merged commit 15adac4 into dlang:master Dec 22, 2017

ibuclaw added a commit to ibuclaw/phobos that referenced this pull request Feb 10, 2019

[dmd-cxx] Backport non-math related patches from dlang#5358

2fd9573

ibuclaw mentioned this pull request Feb 10, 2019

[dmd-cxx] Backport non-math related patches from #5358 #6859

Merged

PetarKirov added a commit that referenced this pull request Feb 11, 2019

Merge pull request #6859 from ibuclaw/dmd-cxx-pr5358

791c5d2

[dmd-cxx] Backport non-math related patches from #5358

Uh oh!

ARM and Android fixes: version out some HardFloat tests, use TMPDIR logic and declaration from druntime, and correct comment #5358

ARM and Android fixes: version out some HardFloat tests, use TMPDIR logic and declaration from druntime, and correct comment #5358

Uh oh!

Conversation

joakim-noah commented May 3, 2017

Uh oh!

CyberShadow commented May 3, 2017

Uh oh!

ibuclaw commented May 3, 2017

Uh oh!

joakim-noah commented May 3, 2017

Uh oh!

joakim-noah commented May 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ibuclaw commented May 3, 2017

Uh oh!

joakim-noah commented May 3, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joakim-noah May 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joakim-noah May 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joakim-noah commented May 24, 2017

Uh oh!

kinke May 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ibuclaw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joakim-noah commented Jun 3, 2017

Uh oh!

joakim-noah commented Jun 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PetarKirov commented Jun 23, 2017

Uh oh!

dlang-bot commented Jun 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bugzilla references

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PetarKirov commented Jun 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

joakim-noah commented May 3, 2017 •

edited

Loading

joakim-noah May 21, 2017 •

edited

Loading

joakim-noah May 21, 2017 •

edited

Loading

kinke May 24, 2017 •

edited

Loading

joakim-noah commented Jun 21, 2017 •

edited

Loading

dlang-bot commented Jun 23, 2017 •

edited

Loading

PetarKirov commented Jun 23, 2017 •

edited

Loading

joakim-noah Sep 16, 2017 •

edited

Loading