Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimization exhibits non-deterministic behavior #143

Open
NeuralCoder3 opened this issue Nov 11, 2022 · 4 comments
Open

optimization exhibits non-deterministic behavior #143

NeuralCoder3 opened this issue Nov 11, 2022 · 4 comments
Labels
bug Something isn't working

Comments

@NeuralCoder3
Copy link
Collaborator

NeuralCoder3 commented Nov 11, 2022

Sometimes, the behavior of the optimization pipeline seems to be non-deterministic.

Example:
./build/bin/thorin -d mem -o - lit/mem/no_mem.thorin -VVVV
in
https://github.com/NeuralCoder3/thorin2/tree/ad_ptr_merge
702d848

The issue might be due to the add_mem optimization, the pipeline builder, or an underlying bug in thorin.

This behavior might also be a side effect of the previous (not merged yet) changes to mem and clos conv with long-reaching impact that did not manifest up to now.

@NeuralCoder3 NeuralCoder3 added the bug Something isn't working label Nov 11, 2022
@leissa
Copy link
Member

leissa commented Nov 14, 2022

Yes. this is super annoying. Another source is this:

world.app(emit1(), emit2());

It's implementation defined whether emit1() is happened first or second. This code has different behavior on different compilers/OS's.

I have implemented the --trace-gids switch that we could somehow use to test for this in our CI.

@NeuralCoder3
Copy link
Collaborator Author

The issue happens only sometimes on with the same executable on the same computer in the same cirumstances.
Therefore, timing issues or randomness might be the cause.

Probably related issue:
./build/bin/thorin -d matrix -d affine lit/matrix/mapReduce_mult.thorin -o - -VVVV in matrix_dialect f3a3def
sometimes generates thorin code and sometimes prints the following error:

:4294967295: error: cannot pass argument 
  '(__806508#2:(.Idx 3), ‹__806508#2:(.Idx 3); .Idx 4294967296›, 0)' of type 
  '[.Nat, «__806508#2:(.Idx 3); ★», .Nat]' to 
  '%mem.lea' of domain 
  '[n_834521: .Nat, _834535: «n_836768; ★», _834540: .Nat]'

which seems odd to me as the arguments are of the style

(n, <n; T>; 0)

which should be the type

[n:.Nat, <<n; *>>; .Nat]

which should agree with lea.

@leissa
Copy link
Member

leissa commented Mar 8, 2023

Was fighting this issue in #184 as a Debug build produced different outputs as the Release one

  • 05e833b
    A few asserts created new Defs resulting in slightly different behavior between Debug and Release builds. This commit fixes the issue.
  • 2997a1d
    This one fixes a subtle problem when a Def has coincidentally the same name as an external Def.

As mentioned above --trace-gids and --reeval-breakpoints helped me tracking down the problem. We could probably write a test case with some non-trivial code, run it with --trace-gids and double-check in our CI that all builds produce the same output.

This was referenced Mar 8, 2023
@leissa
Copy link
Member

leissa commented Mar 27, 2023

While #185 fixes part of this problem, there are still some odd things happening and we need a test case to test for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants