Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Condition protecting 2D array read moved after read in compiled C code #1874

Closed
nqpz opened this issue Feb 12, 2023 · 1 comment
Closed

Condition protecting 2D array read moved after read in compiled C code #1874

nqpz opened this issue Feb 12, 2023 · 1 comment

Comments

@nqpz
Copy link
Member

nqpz commented Feb 12, 2023

Using the newest Futhark compiler from git, the following program results in correct compiled C code for the one-dimensional step, but incorrect (I think) C code for the two-dimensional step:

module dim1 = {
  type~ state = []f32

  def init (m: i64): state =
    replicate (m * m) 1f32

  def step' [n] (cells: *[n]f32): *[n]f32 =
    let step_cell (cell: f32) (i: i64) =
      let x = if i > 0
              then cells[i - 1]
              else 0
      in cell * x
    in map2 step_cell cells (0..<n)

  def step (cells: state): state =
    step' (copy cells)
}

module dim2 = {
  type~ state = [][]f32

  def init (m: i64): state =
    unflatten m m (replicate (m * m) 1f32)

  def step' [m] (cells: *[m][m]f32): *[m][m]f32 =
    let step_cell (cell: f32) ((y, x): (i64, i64)) =
      let x = if y > 0
              then cells[y - 1, x]
              else 0
      in cell * x
    in map2 (map2 step_cell) cells (tabulate_2d m m (\y x -> (y, x)))

  def step (cells: state): state =
    step' (copy cells)
}

type~ state_1d = dim1.state
entry init_1d: i64 -> state_1d = dim1.init
entry step_1d: state_1d -> state_1d = dim1.step

type~ state_2d = dim2.state
entry init_2d: i64 -> state_2d = dim2.init
entry step_2d: state_2d -> state_2d = dim2.step

Using the compiled futhark_entry_step_2d causes a segfault. See https://gist.github.com/nqpz/94e0e7499be770e86c90231ff0060014 for an example of how to trigger it.

Maybe the issue extends to other backends as well. I have only tried this program with the C backend. The original program (https://github.com/nqpz/graffito/blob/main/stencils/gameoflifeprob/stencil.fut) that provoked a segfault does work with the OpenCL backend on my laptop, but maybe my Intel driver is just more accepting towards reading bad memory.

The compiled futrts_entry_step_1d function looks like this:

    for (int64_t i_7603 = 0; i_7603 < dz2080U_6827; i_7603++) {
        float x_7424 = ((float *) x_mem_7617.mem)[i_7603];
        bool cond_7426 = slt64((int64_t) 0, i_7603);
        float x_7427;
        
        if (cond_7426 == 1) {
            int64_t i_7580 = sub64(i_7603, (int64_t) 1);
            float x_t_res_7581 = ((float *) x_mem_7617.mem)[i_7580];
            
            x_7427 = x_t_res_7581;
        } else {
            x_7427 = 0.0F;
        }
        // ...
    }

However, the inner loop of the compiled futrts_entry_step_2d function looks like this:

        for (int64_t i_7603 = 0; i_7603 < dz2082U_7251; i_7603++) {
            float x_7586 = ((float *) x_mem_7617.mem)[i_7607 * dz2082U_7251 + i_7603];
            float x_elem_7600 = ((float *) x_mem_7617.mem)[i_7591 * dz2082U_7251 + i_7603];
            float x_7590;
            
            if (cond_7589 == 1) {
                x_7590 = x_elem_7600;
            } else {
                x_7590 = 0.0F;
            }
            // ...
        }

I think that the assignment of x_elem_7600 should only happen if the condition holds true, like in the one-dimensional step.

This problem probably also extends to higher dimensions, would be my guess.

@athas
Copy link
Member

athas commented Feb 12, 2023

I can reproduce with this test block:

-- ==
-- entry: step_2d
-- script input { init_2d 10i64 }

@athas athas closed this as completed in d5b691f Feb 12, 2023
athas added a commit that referenced this issue Feb 12, 2023
cornelius-sevald pushed a commit that referenced this issue Mar 25, 2023
cornelius-sevald pushed a commit that referenced this issue Mar 25, 2023
razetime pushed a commit to razetime/futhark that referenced this issue May 27, 2023
razetime pushed a commit to razetime/futhark that referenced this issue May 27, 2023
athas added a commit that referenced this issue Jun 15, 2023
* Start function flattening

* `cmp-bench-json.py` rewritten in Haskell (Issue #748) (#1860)

* Note in CHANGELOG.

* Use new tool.

* Remove cmp-bench-json.py.

* Fix #1863. (#1864)

* This is 0.23.1.

* Onwards!

* Fix typo.

* Remove copyCopyToCopy rule. (#1866)

This is a very old (5+ years) rule that is much too naive in its
handling of memory.  We have better optimisations now, that aren't
buggy.

* Remove SrcLoc from ImportName.

Syntactic information does not belong in semantic objects.

* Use ImportName consistently. (#1869)

Previously some parts of the compiler would use FilePaths directly,
and it is ambiguous whether those refer to canonical import names.
Now it should be clearer.

* futhark-benchmarks: bump

* Workaround for tiny /tmp on these servers.

* futhark-benchmarks: bump

* futhark-benchmarks: bump

* futhark-benchmarks: bump

* Workaround for temporary ghcup breakage.

* Switch to GHC 9.4 in Cabal CI. (#1871)

If this does not fix Windows, then I will remove it (again).

* Plain values should never be Unique.

* No need for this.

* Also no setUniqueness here.

* futhark-benchmarks: bump

* Fix #1874.

* Avoid spurious space.

* Make consumption an effect on functions, rather than types. (#1873)

This is a breaking change, because until now we allowed functions like

    def f (a: *[]i32, b: []i32) = ...

where we could then pass in a tuple where in an application `f (x,y)`
the value `x` would be consumed, but not `y`.  However, this became
increasingly difficult to support as the language grew (and frankly,
it was always buggy).  With this commit, the syntax above is still
permitted, but it is interpreted as

    def f ((a,b): *([]i32, []i32)) = ...

i.e. the single tuple argument is consumed *as a whole*.  Long term we
can also consider amending the syntax or warning about cases where it
is misleading, but that is less urgent.

I've wanted to make this simplification for a long time, but I always
hit various snags.  Today I managed to make it work, and the next step
will be cleaning up the notion of "uniqueness" in return types as well
(it should be the more general notion of "aliases").

* Forgot a test for #1874.

* Avoid warnings about "potentially uninitialized" variables.

C compilers are (understandably) not smart enough to see that these
are never actually used uninitialised.

* Make source language Apply AST node multi-argument. (#1875)

This is a deviation from the concrete syntax, but humans tend to think
of function calls having multiple arguments.  Also, the AST had to
keep a lot of useless metadata around to express the results of the
intermediate applications.

And again, it is related to making #1872 more feasible.

* Better constant folding for CmpOp PrimExps.

This mostly has the effect of making generated code a little neater.

* futhark-benchmarks: bump

* Add some comments.

* More explicit.

* Fix #1878.

* Forbid access to interpreter.

* Ensure no apply-of-apply.

The symptom of this being wrong is that defunctionalisation would
create duplicate functions.  No more!

* Handle array results.

* Flattening of Copy.

* Use Hendrix for CI. (#1862)

* First experiment at using Hendrix for CI.

* Maybe like this.

* Import everything locally.

* Try this.

* More systems.

* Also OpenCL.

* Also depend on these.

* More readable when split.

* Import new CI actions.

* Testing with slurm.

* Forgot to specify hendrix and the partition flag might also be needed.

* The wrong composite actions was included

* Trying cuda and opencl on hendrix

* Trying to use the composite test action for benchmarks.

* Wrong amount of indentation

* Forgot to add a |.

* Some small changes that will most likely not change things.

* trying to use sbatch

* switching to titanrtx and used the p flag wrong.

* Trailing whitespace purge.

* Skip these on TITAN X.

* Any GPU will work for these.

* Trying to run benchmarks without slurmbench.py

* Syntax errors

* Accidentally used old keyword test.

* found another syntax error i think

* I think the equality sign broke it

* maybe this will work

* Used gres wrongly.

* Do not use old futhark-benchmarks.

* Trying to use srun and cleaned up composite actions.

* Add some comments.

* More explicit.

* Fix #1878.

* Forbid access to interpreter.

* Ensure no apply-of-apply.

The symptom of this being wrong is that defunctionalisation would
create duplicate functions.  No more!

* Revert "Trying to use srun and cleaned up composite actions."

This reverts commit 6c4111f.

* using srun and fixing commit history hopefully?

* Adding an 8 hour time limit.

* Missing -.

* Newer version og futhark-benchmarks

* Trying to use `${{ always() }}`.

* Revert "Newer version og futhark-benchmarks" because of `${{ always() }}`

This reverts commit 965e788.

* Hopefully this is the correct version of the futhark-benchmarks

* Remove always()

---------

Co-authored-by: due <williamhenrichdue@gmail.com>

* Do not use hendrix except where needed.

* Cleanup whitespace.

* Matplotlib is handy.

* Add job names.

* Avoid unnecessary deallocation.

* These seem broken.

* Style fixes.

* Bump GHC.

* Not needed anymore.

* Seems to fix the nontermination.

* Support rev AD of scanomaps and scatters with non-identity lambdas. (#1880)

* Fix #1883.

* Loop over all dimensions here.

* Precompute more chunk counts.

This is mostly to track the change in the parallelisation of Replicate
in the preceding commit.

* Allow arbitrary expressions in size expressions.

We still only permit elaboration of expressions that correspond to
variables or integer constants.  This is a step on the path to
realising #1659.

* Always forget about the unit tests.

* Avoid extra braces when printing.

* Oops; fix copy/paste error.

* These brackets are necessary.

* Fix typo.

* A few other wording fixes.

* A few more text improvements.

* Fix error in manifest schema discovered by @Erk-.

* Newer action.

* Fix invalid link

Thanks to @lkuty for noticing.

* Use explicit entry.

* Fix #1885.

* Better style.

* Plotting tool. (#1877)

Closes #1861.

* Make executable.

* Remove trailing whitespace.

* Final status message.

* Use GitHub machines for Python tests.

* Generate tuning param definitions in GenericC. (#1890)

This is a step towards #1884.  Now that GenericC is responsible for
all the work (and has all the information), it can generate new API
functions.

* Record which tuning params are relevant to which entry points. (#1891)

This involves extending the manifest and server protocol, and
modifying 'futhark autotune' to use this new information.

The main advantage (apart from general cleanup) is that we can now
tune threshold parameters used in non-inlined functions.

* This is 0.24.1.

* Onwards!

* Fix #1895.

* Do not use interpreter.

* Incomplete work on nested maps.

* More work on nested maps.

* Fix #1896.

* This goes in tests.

* Use Hendrix for A100 jobs. (#1898)

* Fail early.

* All these SegOps should be virtualised.

* Start function flattening

* Incomplete work on function lifting

* Very rudimentary lifted function results

Currently only handles lifting of functions whose return types are
scalar typed variables i.e. no constants or arrays.

* Work on lifted function results

* Further work on lifted function results

* Change way return types are lifted

* Correctly return constants from lifted functions

* Existential size return for lifted functions

Merge building of body statements and results for lifted functions.
Will probably need to filter out existential size quantifiers before
lifting results.

* Filter existential sizes from lifted functions

Remove existential quantifiers from the return type and result of a
function before lifting as I believe their lifted version aren't needed.

* Revert "Filter existential sizes from lifted functions"

This reverts commit d04ecc5.

It might be useful later but for now it complicates things.

* Application of lifted functions

* Do not lift entry points.

* Work in progress match-expression flattening

* Fix bug in lifting function parameters

Lifting irregular parameters was (wrongly) in the order
`[offsets, flags, segments, elements]`.
When calling, the arguments were (rightly) given in the order
`[segments, flags, offsets, elements]`.

* Fix bug in lifting of if-then-else

Wrote too many elements in the final scatters.

* Make lifted if-then-else a little nicer

* Handle irregular inputs to if-expressions

* Handle irregular results of if-expressions

* Handle general irregular match-expressions

* Irregular match-expr: handle empty arrays

* Better error messages

* Handle free variables in `liftArg`

`inputReps` now also gives type information, which is used by `liftArg`
to determine if free variables are regular or irregular.

* Flatten builtins scans over multi-dim arrays

Let scan functions (genScanomap, genScan, genExScan, ...) in the flatten
builtins module operate on multi-dimensional arrays.

Of note is that `exScanAndSum`, when given a single-dimensional array,
will return the # of segments and sum of segment sizes as scalar values
and when given a multi-dimensional array will return them as arrays.

Also move `segMap` from Flatten.hs to Flatten.Builtins.hs

* Make sure flag and elems array have same size

When passing flag and elems array to a function, or returning them from
a function, resize them to please the type checker.

* Replicate free vars in result of lifted functions

* Handle free variables in match-expressions

Move the common "if a subexp is a constant or free variable, replicate
it, and otherwise do a lookup in dist inputs and dist env" code to a
function `liftSubExp`. This is used in `liftArg`, `liftResult` and
lifting match-expressions.

* Add tests for lifting functions

* Add tests for flattening match-expressions

---------

Co-authored-by: Troels Henriksen <athas@sigkill.dk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants