-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Program compiled with cuda backend hangs when you try to run it, but runs if you add the -D flag. #1883
Comments
Thank you for the report. I can reproduce a barrier divergence in generated code for the GPU backends. It's not surprising this can manifest as nontermination. I'm a little busy this morning, but I hope to have it fixed sometime today. |
Feel free to reopen this issue if my diagnosis was wrong and this does not fix your problem. |
razetime
pushed a commit
to razetime/futhark
that referenced
this issue
May 27, 2023
athas
added a commit
that referenced
this issue
Jun 15, 2023
* Start function flattening * `cmp-bench-json.py` rewritten in Haskell (Issue #748) (#1860) * Note in CHANGELOG. * Use new tool. * Remove cmp-bench-json.py. * Fix #1863. (#1864) * This is 0.23.1. * Onwards! * Fix typo. * Remove copyCopyToCopy rule. (#1866) This is a very old (5+ years) rule that is much too naive in its handling of memory. We have better optimisations now, that aren't buggy. * Remove SrcLoc from ImportName. Syntactic information does not belong in semantic objects. * Use ImportName consistently. (#1869) Previously some parts of the compiler would use FilePaths directly, and it is ambiguous whether those refer to canonical import names. Now it should be clearer. * futhark-benchmarks: bump * Workaround for tiny /tmp on these servers. * futhark-benchmarks: bump * futhark-benchmarks: bump * futhark-benchmarks: bump * Workaround for temporary ghcup breakage. * Switch to GHC 9.4 in Cabal CI. (#1871) If this does not fix Windows, then I will remove it (again). * Plain values should never be Unique. * No need for this. * Also no setUniqueness here. * futhark-benchmarks: bump * Fix #1874. * Avoid spurious space. * Make consumption an effect on functions, rather than types. (#1873) This is a breaking change, because until now we allowed functions like def f (a: *[]i32, b: []i32) = ... where we could then pass in a tuple where in an application `f (x,y)` the value `x` would be consumed, but not `y`. However, this became increasingly difficult to support as the language grew (and frankly, it was always buggy). With this commit, the syntax above is still permitted, but it is interpreted as def f ((a,b): *([]i32, []i32)) = ... i.e. the single tuple argument is consumed *as a whole*. Long term we can also consider amending the syntax or warning about cases where it is misleading, but that is less urgent. I've wanted to make this simplification for a long time, but I always hit various snags. Today I managed to make it work, and the next step will be cleaning up the notion of "uniqueness" in return types as well (it should be the more general notion of "aliases"). * Forgot a test for #1874. * Avoid warnings about "potentially uninitialized" variables. C compilers are (understandably) not smart enough to see that these are never actually used uninitialised. * Make source language Apply AST node multi-argument. (#1875) This is a deviation from the concrete syntax, but humans tend to think of function calls having multiple arguments. Also, the AST had to keep a lot of useless metadata around to express the results of the intermediate applications. And again, it is related to making #1872 more feasible. * Better constant folding for CmpOp PrimExps. This mostly has the effect of making generated code a little neater. * futhark-benchmarks: bump * Add some comments. * More explicit. * Fix #1878. * Forbid access to interpreter. * Ensure no apply-of-apply. The symptom of this being wrong is that defunctionalisation would create duplicate functions. No more! * Handle array results. * Flattening of Copy. * Use Hendrix for CI. (#1862) * First experiment at using Hendrix for CI. * Maybe like this. * Import everything locally. * Try this. * More systems. * Also OpenCL. * Also depend on these. * More readable when split. * Import new CI actions. * Testing with slurm. * Forgot to specify hendrix and the partition flag might also be needed. * The wrong composite actions was included * Trying cuda and opencl on hendrix * Trying to use the composite test action for benchmarks. * Wrong amount of indentation * Forgot to add a |. * Some small changes that will most likely not change things. * trying to use sbatch * switching to titanrtx and used the p flag wrong. * Trailing whitespace purge. * Skip these on TITAN X. * Any GPU will work for these. * Trying to run benchmarks without slurmbench.py * Syntax errors * Accidentally used old keyword test. * found another syntax error i think * I think the equality sign broke it * maybe this will work * Used gres wrongly. * Do not use old futhark-benchmarks. * Trying to use srun and cleaned up composite actions. * Add some comments. * More explicit. * Fix #1878. * Forbid access to interpreter. * Ensure no apply-of-apply. The symptom of this being wrong is that defunctionalisation would create duplicate functions. No more! * Revert "Trying to use srun and cleaned up composite actions." This reverts commit 6c4111f. * using srun and fixing commit history hopefully? * Adding an 8 hour time limit. * Missing -. * Newer version og futhark-benchmarks * Trying to use `${{ always() }}`. * Revert "Newer version og futhark-benchmarks" because of `${{ always() }}` This reverts commit 965e788. * Hopefully this is the correct version of the futhark-benchmarks * Remove always() --------- Co-authored-by: due <williamhenrichdue@gmail.com> * Do not use hendrix except where needed. * Cleanup whitespace. * Matplotlib is handy. * Add job names. * Avoid unnecessary deallocation. * These seem broken. * Style fixes. * Bump GHC. * Not needed anymore. * Seems to fix the nontermination. * Support rev AD of scanomaps and scatters with non-identity lambdas. (#1880) * Fix #1883. * Loop over all dimensions here. * Precompute more chunk counts. This is mostly to track the change in the parallelisation of Replicate in the preceding commit. * Allow arbitrary expressions in size expressions. We still only permit elaboration of expressions that correspond to variables or integer constants. This is a step on the path to realising #1659. * Always forget about the unit tests. * Avoid extra braces when printing. * Oops; fix copy/paste error. * These brackets are necessary. * Fix typo. * A few other wording fixes. * A few more text improvements. * Fix error in manifest schema discovered by @Erk-. * Newer action. * Fix invalid link Thanks to @lkuty for noticing. * Use explicit entry. * Fix #1885. * Better style. * Plotting tool. (#1877) Closes #1861. * Make executable. * Remove trailing whitespace. * Final status message. * Use GitHub machines for Python tests. * Generate tuning param definitions in GenericC. (#1890) This is a step towards #1884. Now that GenericC is responsible for all the work (and has all the information), it can generate new API functions. * Record which tuning params are relevant to which entry points. (#1891) This involves extending the manifest and server protocol, and modifying 'futhark autotune' to use this new information. The main advantage (apart from general cleanup) is that we can now tune threshold parameters used in non-inlined functions. * This is 0.24.1. * Onwards! * Fix #1895. * Do not use interpreter. * Incomplete work on nested maps. * More work on nested maps. * Fix #1896. * This goes in tests. * Use Hendrix for A100 jobs. (#1898) * Fail early. * All these SegOps should be virtualised. * Start function flattening * Incomplete work on function lifting * Very rudimentary lifted function results Currently only handles lifting of functions whose return types are scalar typed variables i.e. no constants or arrays. * Work on lifted function results * Further work on lifted function results * Change way return types are lifted * Correctly return constants from lifted functions * Existential size return for lifted functions Merge building of body statements and results for lifted functions. Will probably need to filter out existential size quantifiers before lifting results. * Filter existential sizes from lifted functions Remove existential quantifiers from the return type and result of a function before lifting as I believe their lifted version aren't needed. * Revert "Filter existential sizes from lifted functions" This reverts commit d04ecc5. It might be useful later but for now it complicates things. * Application of lifted functions * Do not lift entry points. * Work in progress match-expression flattening * Fix bug in lifting function parameters Lifting irregular parameters was (wrongly) in the order `[offsets, flags, segments, elements]`. When calling, the arguments were (rightly) given in the order `[segments, flags, offsets, elements]`. * Fix bug in lifting of if-then-else Wrote too many elements in the final scatters. * Make lifted if-then-else a little nicer * Handle irregular inputs to if-expressions * Handle irregular results of if-expressions * Handle general irregular match-expressions * Irregular match-expr: handle empty arrays * Better error messages * Handle free variables in `liftArg` `inputReps` now also gives type information, which is used by `liftArg` to determine if free variables are regular or irregular. * Flatten builtins scans over multi-dim arrays Let scan functions (genScanomap, genScan, genExScan, ...) in the flatten builtins module operate on multi-dimensional arrays. Of note is that `exScanAndSum`, when given a single-dimensional array, will return the # of segments and sum of segment sizes as scalar values and when given a multi-dimensional array will return them as arrays. Also move `segMap` from Flatten.hs to Flatten.Builtins.hs * Make sure flag and elems array have same size When passing flag and elems array to a function, or returning them from a function, resize them to please the type checker. * Replicate free vars in result of lifted functions * Handle free variables in match-expressions Move the common "if a subexp is a constant or free variable, replicate it, and otherwise do a lookup in dist inputs and dist env" code to a function `liftSubExp`. This is used in `liftArg`, `liftResult` and lifting match-expressions. * Add tests for lifting functions * Add tests for flattening match-expressions --------- Co-authored-by: Troels Henriksen <athas@sigkill.dk>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
A program I wrote (called thetaPOT.fut shown below) does not execute when first compiled with "futhark cuda thetaPOT.fut" and then run with "cat data.in | ./thetaPOT" (where data.in is a file with data shown at the bottom). All it does is hang until you manually stop the program e.g with Ctrl + C. However if you try to run it by adding the -D flag like this: "cat data.in | ./thetaPOT -D" it executes. This is the issue.
The resulting array that this produces differs from the one you get when you run it with the C backend. ("futhark c thetaPOT.fut" and then "cat data.in | ./thetaPOT")
I use WSL. my futhark version is: 0.23.1, and my nvcc version is: 11.5
The program thetaPOT.fut:
My data file (data.in):
2i64 [[1.0f32, 2.0f32, 3.0f32, 4.0f32, 5.0f32, 6.0f32, 7.0f32, 8.0f32],[37.0f32, 63.0f32, 60.0f32, 20.0f32, 5.0f32, 6.0f32, 7.0f32, 8.0f32] , [37.0f32, 63.0f32, 60.0f32, 20.0f32, 5.0f32, 6.0f32, 7.0f32, 8.0f32] ]
The text was updated successfully, but these errors were encountered: