Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regent: miscompilation with LLVM 11 #1385

Closed
elliottslaughter opened this issue Jan 25, 2023 · 5 comments
Closed

Regent: miscompilation with LLVM 11 #1385

elliottslaughter opened this issue Jan 25, 2023 · 5 comments
Assignees
Labels
bug QChem Regent Issues pertaining to Regent

Comments

@elliottslaughter
Copy link
Contributor

elliottslaughter commented Jan 25, 2023

@seemamirch sent me a reproducer code that appears to miscompile with LLVM 11. I am currently requesting permission to post the reproducer, and I will update this issue once I have it.

A failed compilation looks like:

[0 - 7f18633d2780]    0.094491 {4}{runtime}: [warning 1071] LEGION WARNING: Region requirement 0 of operation g3d (UID 3) in parent task main (UID 2) is using uninitialized data for field(s) 101 of logical region (1,1,1) (from file /scratch2/eslaught/legion_repro_for_seema/runtime/legion/legion_ops.cc:744)
For more information see:
http://legion.stanford.edu/messages/warning_code.html#warning_code_1071

0   terra (JIT)                         0x00007f18690156a3 $<g3d> + 1699 
[0x73726f6c6f63203d]

Failures are non-deterministic and happen about 90% of the time. (Or more, it's hard to tell. But it's not 100% because I do sometimes get the test to pass.)

If I build Terra with LLVM 6 then the test runs successfully 100% of the time:

[0 - 7f0228d9b780]    0.088464 {4}{runtime}: [warning 1071] LEGION WARNING: Region requirement 0 of operation g3d (UID 3) in parent task main (UID 2) is using uninitialized data for field(s) 101 of logical region (1,1,1) (from file /scratch2/eslaught/legion_repro_for_seema/runtime/legion/legion_ops.cc:744)
For more information see:
http://legion.stanford.edu/messages/warning_code.html#warning_code_1071

In initial discussions we thought this might be related to the Legion and/or Terra versions, but the only variable I have found to matter is the LLVM version. Legion master (b94ea70) and Terra master (687166f57447c17434b303709327be5d2b78e5c4) with LLVM 11 exhibit this problem, the same with LLVM 6 do not.

I suppose next I'll try to bisect on LLVM versions to determine what the first problematic version is.

Bisect progress:

  • LLVM 9: fails
  • LLVM 7: fails
  • Last working version is LLVM 6
@elliottslaughter
Copy link
Contributor Author

Bisect appears to show the first failing version is LLVM 7 (i.e., only LLVM 6 works).

Next step is probably to try to reduce the reproducer, if we can, to isolate the miscompilation that is occurring.

@elliottslaughter
Copy link
Contributor Author

elliottslaughter commented Aug 9, 2023

Here's a reproducer for this issue:

https://gist.github.com/elliottslaughter/22feeeb495cc8fd429c83b77c957b37a

It is slightly minimized relative to what @seemamirch gave me; most notably, I was able to reduce the C array in the reproducer from [18][700][5] to [18][700], i.e., flattening one dimension and cutting the data size by 80%.

The code contains several versions of getGammaTableArrayBig. All of them reproduce. The Regent version comes with the bonus that you can run with -fbounds-checks 1 to confirm that there are no out of bounds accesses.

An example failure looks like:

$ ./regent.py seema_test3_min.rg
extern global gamma_table : double[700][18]
[0 - 7ff85a925700]    0.000171 {4}{threads}: reservation ('dedicated worker (generic) #1') cannot be satisfied
[0 - 7000013b0000]    0.111873 {4}{runtime}: [warning 1071] LEGION WARNING: Region requirement 0 of operation g3d (UID 3) in parent task main (UID 2) is using uninitialized data for field(s) 101 of logical region (1,1,1) (from file /Users/elliott/Programming/Legion/legion/runtime/legion/legion_ops.cc:752)
For more information see:
http://legion.stanford.edu/messages/warning_code.html#warning_code_1071

0   terra (JIT)                         0x000000010ce051ef $<getGammaTableArrayBig> + 703

Optionally, you can run with -fvectorize 0 to demonstrate that the vectorizer isn't causing issues. (It makes no difference.)

Running latest master and Terra and tested with LLVM 6, 13 and 16. (LLVM 6 works, the others fail.) On macOS, x86_64.

@elliottslaughter
Copy link
Contributor Author

I minimized the reproducer to pure Terra below, so I'll move this over to the Terra issue tracker now:

https://gist.github.com/elliottslaughter/c8bc2252c92639f8d3618b9bf3d33fb2

@elliottslaughter
Copy link
Contributor Author

I have confirmed that all of my reproducers are now fixed with the latest Terra master terralang/terra@a3a6799. Assigning back to @seemamirch to confirm the original code is fixed with LLVM 11+.

@seemamirch
Copy link
Contributor

It works - I tested it with LLVM 13.0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug QChem Regent Issues pertaining to Regent
Projects
None yet
Development

No branches or pull requests

2 participants