Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AArch64 Fix getFrameRegs assertion during indirect fixup #7912

Conversation

jim-saxman
Copy link
Contributor

On AA4ch64 plaforms when calling getFrameRegs() and an indirect
fixup is found, the RIP needs to be present at the correct
offset. It was not, because the destructor was compiled with a
tail-call. It needs to have a standard stack frame.

By disabling the tail-call via the compiler flag on all destructors
in g_destructors[], we can ensure that all destructors for AArch64
are walked correctly.

This bug occurred sometimes in OSS Mediawiki after a few minutes.
We tried to create a simplified unit test but were unsuccessful.

This patch introduces no new unit test failures for either DebugOpt
or Release build types on AArch64 platforms and doesn't affect x64
or PPC64 destructors.

On AA4ch64 plaforms when calling getFrameRegs() and an indirect
fixup is found, the RIP needs to be present at the correct
offset. It was not, because the destructor was compiled with a
tail-call. It needs to have a standard stack frame.

By disabling the tail-call via the compiler flag on all destructors
in g_destructors[], we can ensure that all destructors for AArch64
are walked correctly.

This bug occurred sometimes in OSS Mediawiki after a few minutes.
We tried to create a simplified unit test but were unsuccessful.

This patch introduces no new unit test failures for either DebugOpt
or Release build types on AArch64 platforms and doesn't affect x64
or PPC64 destructors.
@jim-saxman
Copy link
Contributor Author

@mxw Hi Max, can you lease review this PR.
@swalk-cavium Are you still seeing the errors in test/quick/*dtor.php unti tests, and does this patch help?

@apinski-cavium
Copy link

apinski-cavium commented Jul 12, 2017

You just use __aarch64__ instead __AARCH64EL__ (not that matters but it is just a style thing).
Also it seems better to do something like:

inline void ArrayData::release() noexcept {
  assert(hasExactlyOneRef());
  g_array_funcs.release[kind()](this);
  AARCH64_WALKABLE_FRAME ();
 }

And have AARCH64_WALKABLE_FRAME (); expand to something like:
__asm__("":::"memory");

I am not a fan of the optimize attribute in this case :).

@mxw mxw self-assigned this Jul 12, 2017
@mxw
Copy link
Contributor

mxw commented Jul 12, 2017

I think __aarch64__ is more consistent with what we're using elsewhere.

I chatted with @dave-estes offline who seems to like the memory barrier approach. You can continue ifdeffing that macro to do nothing on x86-64.

@hhvm-bot
Copy link
Contributor

@jim-saxman updated the pull request - view changes

Copy link
Contributor

@mxw mxw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of nits.

return g_array_funcs.release[kind()](this);
g_array_funcs.release[kind()](this);
AARCH64_WALKABLE_FRAME();
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just delete this return; this function returns void.

@@ -468,7 +468,6 @@ ArrayData* MixedArray::MakeDictFromAPC(const APCArray* apc) {

//=============================================================================
// Destruction

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put this back?

This patch addresses @mxw's code review comments.
@hhvm-bot
Copy link
Contributor

@jim-saxman updated the pull request - view changes

@swalk-cavium
Copy link
Contributor

@jim-saxman - I ran a full MOP on this change.

These two tests now pass with these option sets
hphp/test/run -v -m jit -a '-vEval.JitPGO=1'
hphp/test/run -v -m jit -r -a '-vEval.JitPGO=1'
hphp/test/quick/setg-dtor.php
hphp/test/quick/sets-dtor.php

hphp/test/slow/async/simple_meth.php passes all option sets.

A bunch of failures have reappeared with this option set
hphp/test/run -v -m jit -r -a '-vEval.JitPGO=1'
hphp/test/zend/good/ext/spl/tests/iterator_002.php
hphp/test/zend/good/ext/spl/tests/iterator_033.php
hphp/test/zend/good/ext/spl/tests/iterator_040.php
hphp/test/zend/good/ext/spl/tests/iterator_048.php
hphp/test/zend/good/ext/spl/tests/observer_001.php
hphp/test/zend/good/ext/spl/tests/observer_002.php
hphp/test/zend/good/ext/spl/tests/RecursiveCallbackFilterIteratorTest.php

I think I saw them last between fixes for relocation.

@hhvm-bot
Copy link
Contributor

@mxw has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@jim-saxman
Copy link
Contributor Author

Hi @swalk-cavium, Thanks for running tests on this patch. I re-ran all unit tests in all configurations and observed the following:

For PGO with and without repoAuth:
hphp/test/quick/setg-dtor.php
hphp/test/quick/sets-dtor.php
pass both before and after this patch.

hphp/test/slow/async/simple_meth.php
still fails with -r -m pgo

php/test/zend/good/ext/spl/tests/iterator_002.php
hphp/test/zend/good/ext/spl/tests/iterator_033.php
hphp/test/zend/good/ext/spl/tests/iterator_040.php
hphp/test/zend/good/ext/spl/tests/iterator_048.php
hphp/test/zend/good/ext/spl/tests/observer_001.php
hphp/test/zend/good/ext/spl/tests/observer_002.php
hphp/test/zend/good/ext/spl/tests/RecursiveCallbackFilterIteratorTest.php
never failed both before and after this patch, for -r -m pgo.

I performed these tests on both CentOS with GCC 6.2.1 and Ubuntu with GCC 5.4.

@swalk-cavium
Copy link
Contributor

Hi @jim-saxman, I'm wondering if one of my machines needs a firmware upgrade.

@vielmetti
Copy link

Hi @swalk-cavium @jim-saxman - just did a build and test cycle and am seeing these errors:

HipHop VM 3.20.0-dev (rel)
Compiler: heads/master-0-g26a2f8c2795726e89dd636c16b2e1e6316dee7df
Repo schema: c9ccdb2b68ae4d61f3c1c0232b4b2426ff39c26e
gcc version 5.4.0 20160609 (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.4)

test/slow/ext_icu/uspoof.php
test/slow/ext_image/1794.php
test/slow/ext_posix/ext_posix.php
test/slow/ext_spl_file/1795.php
test/slow/ir_inlining/1.php
test/quick/setg-dtor.php
test/quick/sets-dtor.php

System under test is a 96-core Packet Type 2A server, with Cavium ThunderX, using Ubuntu 16.04. Rather than noting a new issue, since the set*-dtor failures seem to be related, I'll just report here.

@swalk-cavium
Copy link
Contributor

swalk-cavium commented Jul 26, 2017 via email

@vielmetti
Copy link

Steve -

I did a fresh build, with the following results.

test/quick succeeds with only two failures (the dtor failures). I have not yet tested this patch which should address those.

ext_posix succeeded after I added the wheel group.

uspoof failures are described in #3925 as a result of Facebook testing against a system with ICU 49 instead of the current ICU 55 (or so).

I was not able to get 1794.php or 1795.php to fail again in a second test.

@vielmetti
Copy link

After applying this patch all of test/quick passes ("SHIP IT").

All of test/slow passes except test/slow/ext_icu/uspoof.php which is described by #3925.

Looking good!

@facebook-github-bot
Copy link
Contributor

@mxw has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@hhvm-bot hhvm-bot closed this in 1ecac73 Aug 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants