Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix some non-deterministic bugs. #638

Merged
merged 3 commits into from
Apr 7, 2023
Merged

Conversation

vext01
Copy link
Contributor

@vext01 vext01 commented Apr 5, 2023

This fixes our non-deterministic failures.

The first commit is the meat, the next commit raises a question I'd like to investigate next.

I have an idea why this is happening, but don't want to address it in this same PR.

(this was hard work :( )

vext01 added 2 commits April 5, 2023 14:46
Periodically the PT hardware emits a PSB+ packet sequence. Each of these
consists of:

 - A PSB packet.
 - other packets.
 - A PSBEnd packet.

The Intel manual (Section 33.3.7) explains that the "other packets" are special
and don't conform to the normal rules:

> When a PSB packet is generated, it is followed by a PSBEND packet (Section
> 33.4.2.18). One or more packets may be generated in between those two
> packets, and these inform the decoder of the current state of the
> processor. These packets, known collectively as PSB+, should be interpreted
> as “status only”, since they do not imply any change of state at the time
> of the PSB, nor are they associated directly with any instruction or event.
> Thus, the normal binding and ordering rules that apply to these packets
> outside of PSB+ can be ignored when these packets are between a PSB and
> PSBEND. They inform the decoder of the state of the processor at the time
> of the PSB.

Our mistake was to allow target ip payloads in these "other packets" to update
our decoder state, namely `self.cur_loc`. In some cases this would make it
appear as though certain blocks were being re-executed in part. In turn this
confused later parts of the JIT pipeline.
@ltratt
Copy link
Contributor

ltratt commented Apr 5, 2023

This looks fine to me, but let's put cargo test in a loop inside .buildbot.sh for, say, 10 iterations, to catch "obvious" non-determinism. [Our overnight cron version of this should probably run things 1000 times or similar.]

@vext01
Copy link
Contributor Author

vext01 commented Apr 5, 2023

5a3d807 looks good?

@ltratt
Copy link
Contributor

ltratt commented Apr 5, 2023

Works for me! Merge as-is?

@vext01
Copy link
Contributor Author

vext01 commented Apr 5, 2023

yup

@ltratt
Copy link
Contributor

ltratt commented Apr 5, 2023

bors r+

bors bot added a commit that referenced this pull request Apr 5, 2023
638: Fix some non-deterministic bugs. r=ltratt a=vext01



Co-authored-by: Edd Barrett <vext01@gmail.com>
@bors
Copy link
Contributor

bors bot commented Apr 5, 2023

Build failed:

@vext01
Copy link
Contributor Author

vext01 commented Apr 6, 2023

OK, so this is caused by new Rust.

The lazy_cell fix is easy:

-#![feature(once_cell)]
+#![feature(lazy_cell)]

But this is scary:

thread '<unnamed>' panicked at 'misaligned pointer dereference: address must be a multiple of 0x8 but is 0x200b2d', ykutil/src/obj.rs:154:23

Investigating...

@vext01
Copy link
Contributor Author

vext01 commented Apr 6, 2023

The loader has placed the embedded bitcode at an unaligned address!

@ltratt
Copy link
Contributor

ltratt commented Apr 6, 2023

Yikes!

@vext01
Copy link
Contributor Author

vext01 commented Apr 6, 2023

It has probably always been this way, but Rust is now checking for misaligned pointer derefs it seems...

It's not clear to me why there is an alignment requirement here. I thought that misaligned reads were valid on x86_64, but I may be wrong? I know sparc64 would outright crash...

You will probably tell me it is UB I imagine :)

@ltratt
Copy link
Contributor

ltratt commented Apr 6, 2023

I think "Rust the language" defines misaligned reads/writes as UB, no matter the underlying platform. [IMHO this is the correct definition: you don't want people to write code that will explode in weird ways on different platforms!]

@vext01
Copy link
Contributor Author

vext01 commented Apr 6, 2023

Here's a discussion I had on the Rust Zulip about this:

https://rust-lang.zulipchat.com/#narrow/stream/122651-general/topic/misaligned.20pointer.20deref.20after.20updating.20Rust.2E

And here's an issue about the error message not being quite right:

rust-lang/rust#109997

@vext01
Copy link
Contributor Author

vext01 commented Apr 6, 2023

The failures in this PR have been addressed separately in ykjit/ykllvm#60 and #639

@ltratt
Copy link
Contributor

ltratt commented Apr 7, 2023

bors r+

@bors
Copy link
Contributor

bors bot commented Apr 7, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants