Corrupted double-linked list when running Galley scheduler #81

mtsokol · 2024-12-02T09:20:20Z

I worked on benchmarks last week and I faced an issue when running Galley scheduler for "counting triangles" example with Finch 1.0 on my linux machine.
Here's a default scheduler example that runs correctly:

import networkx as nx
import os
import finch

LEN = 1000
DENSITY = 0.1

G = nx.gnp_random_graph(n=LEN, p=DENSITY)
a_sps = nx.to_scipy_sparse_array(G)

a = finch.asarray(a_sps)
a_l = finch.lazy(a)
b_l = finch.sum(a_l @ a_l * a_l) / finch.asarray(6)
finch.compute(b_l, opt="default", verbose=False)

LEN = 2000

G = nx.gnp_random_graph(n=LEN, p=DENSITY)
a_sps = nx.to_scipy_sparse_array(G)

a = finch.asarray(a_sps)
a_l = finch.lazy(a)
b_l = finch.sum(a_l @ a_l * a_l) / finch.asarray(6)
finch.compute(b_l, opt="default", verbose=False)

And here's one that fails:

import networkx as nx
import os
import finch

LEN = 1000
DENSITY = 0.1

G = nx.gnp_random_graph(n=LEN, p=DENSITY)
a_sps = nx.to_scipy_sparse_array(G)

a = finch.asarray(a_sps)
a_l = finch.lazy(a)
b_l = finch.sum(a_l @ a_l * a_l) / finch.asarray(6)
finch.compute(b_l, opt="galley", verbose=True)

LEN = 2000

G = nx.gnp_random_graph(n=LEN, p=DENSITY)
a_sps = nx.to_scipy_sparse_array(G)

a = finch.asarray(a_sps)
a_l = finch.lazy(a)
b_l = finch.sum(a_l @ a_l * a_l) / finch.asarray(6)
finch.compute(b_l, opt="galley", verbose=True)

Here's the output with the verbose=True option:

Input Queries :
Query(Alias(##A#254),
   Materialize(
      MapJoin(Value(/),
         Aggregate(Value(+),Value(0),##i#249,##i#250,
            MapJoin(Value(*),
               Aggregate(Value(+),Value(0),##i#236,
                  MapJoin(Value(*),Input(Value(FIBER),##i#236,##i#249),Input(Value(FIBER),##i#250,##i#236))),Input(Value(FIBER),##i#250,##i#249))),Input(Value(FIBER)))))
Used DNF: false
 QUERY: Query(Alias(##A#254),
   Materialize(
      MapJoin(Value(/),
         Aggregate(Value(+),Value(0),##i#249,##i#250,##i#236,
            MapJoin(Value(*),Input(Value(FIBER),##i#236,##i#249),Input(Value(FIBER),##i#250,##i#236),Input(Value(FIBER),##i#250,##i#249))),Input(Value(FIBER)))))
FAQ Opt Time: 3.1587700843811035
--------------- Logical Plan ---------------
Query(Alias(A_28),
   Aggregate(Value(+),Value(0),##i#236,##i#250,##i#249,
      MapJoin(Value(*),Input(Value(FIBER),##i#236,##i#249),Input(Value(FIBER),##i#250,##i#236),Input(Value(FIBER),##i#250,##i#249))))
Query(##A#254,
   Materialize(
      MapJoin(Value(/),Alias(A_28),Input(Value(FIBER)))))
--------------------------------------------
Physical Opt Time: 3.1587700843811035
--------------- Physical Plan ---------------
Query(Alias(A_34,##i#236,##i#249),
   Materialize(Value(t_bytemap),Value(t_dense),##i#249,##i#236,
      Aggregate(Value(Finch.FinchNotation.InitWriter{0}()),Value(0),Input(##i#236::t_walk,##i#249::t_default))),##i#249,##i#236)
Query(Alias(A_35,##i#250,##i#249),
   Materialize(Value(t_bytemap),Value(t_dense),##i#249,##i#250,
      Aggregate(Value(Finch.FinchNotation.InitWriter{0}()),Value(0),Input(##i#250::t_walk,##i#249::t_default))),##i#249,##i#250)
Query(Alias(A_28),
   Materialize(
      Aggregate(Value(+),Value(0),##i#250,##i#236,##i#249,
         MapJoin(Value(*),Alias(A_34,##i#249::t_walk,##i#236::t_default),Input(##i#250::t_walk,##i#236::t_follow),Alias(A_35,##i#249::t_follow,##i#250::t_follow)))),##i#236,##i#250,##i#249)
Query(##A#254,
   Materialize(
      Aggregate(Value(Finch.FinchNotation.InitWriter{NaN}()),Value(nothing),
         MapJoin(Value(/),Alias(A_28),Input()))))
--------------------------------------------
Expected Output Size: 100220
Expected Output Size: 100220
Expected Output Size: 1
Expected Output Size: 1
Executing:
:(function var"##compute#265"(prgm)
      begin
          V = ((((((((((((((((((((((((((((((prgm.children[1]).children[2]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[2]).children[3]).children[1]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[2]).children[3]).children[2]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{DenseLevel{Int64, SparseListLevel{Int64, PlusOneVector{Int64, PyArray{Int64, 1, true, true, Int64}}, PlusOneVector{Int64, PyArray{Int64, 1, true, true, Int64}}, ElementLevel{0, Int64, Int64, PyArray{Int64, 1, true, true, Int64}}}}}
          V_2 = ((((((((((((((((((((((((((((((prgm.children[1]).children[2]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[2]).children[3]).children[1]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[2]).children[3]).children[3]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{DenseLevel{Int64, SparseListLevel{Int64, PlusOneVector{Int64, PyArray{Int64, 1, true, true, Int64}}, PlusOneVector{Int64, PyArray{Int64, 1, true, true, Int64}}, ElementLevel{0, Int64, Int64, PyArray{Int64, 1, true, true, Int64}}}}}
          V_3 = ((((((((((((((((((((((((((prgm.children[1]).children[2]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[2]).children[3]).children[1]).children[1]).children[1]).children[2]).children[1]).children[3]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{DenseLevel{Int64, SparseListLevel{Int64, PlusOneVector{Int64, PyArray{Int64, 1, true, true, Int64}}, PlusOneVector{Int64, PyArray{Int64, 1, true, true, Int64}}, ElementLevel{0, Int64, Int64, PyArray{Int64, 1, true, true, Int64}}}}}
          V_4 = ((((((((((((prgm.children[1]).children[2]).children[1]).children[1]).children[2]).children[1]).children[3]).children[1]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{ElementLevel{0, Int64, Int64, PyArray{Int64, 1, true, true, Int64}}}
          _A_1 = Tensor(Element{NaN, Float64, Int64}(Float64[]))
          A_28 = Tensor(Element{0, Int64, Int64}(Int64[]))
          A_35 = Tensor(Dense{Int32}(SparseByteMap{Int32}(Element{0, Int64, Int64}(Int64[]), 1000, [1], Bool[], Tuple{Int64, Int32}[]), 1000))
          A_34 = Tensor(Dense{Int32}(SparseByteMap{Int32}(Element{0, Int64, Int64}(Int64[]), 1000, [1], Bool[], Tuple{Int64, Int32}[]), 1000))
          ()
          @finch mode = :safe begin
                  A_34 .= 0
                  for _i_2 = _
                      for _i_3 = _
                          A_34[_i_2, _i_3] << Finch.FinchNotation.InitWriter{0}() >>= V[(Finch.FinchNotation.walk)(_i_3), _i_2]
                      end
                  end
              end
          @finch mode = :safe begin
                  A_35 .= 0
                  for _i_2 = _
                      for _i_4 = _
                          A_35[_i_2, _i_4] << Finch.FinchNotation.InitWriter{0}() >>= V_3[(Finch.FinchNotation.walk)(_i_4), _i_2]
                      end
                  end
              end
          @finch mode = :safe begin
                  A_28 .= 0
                  for _i_3 = _
                      for _i_4 = _
                          for _i_2 = _
                              A_28[] << + >>= (*)(V_2[(Finch.FinchNotation.walk)(_i_4), (Finch.FinchNotation.follow)(_i_3)], A_34[(Finch.FinchNotation.walk)(_i_2), _i_3], A_35[(Finch.FinchNotation.follow)(_i_2), (Finch.FinchNotation.follow)(_i_4)])
                          end
                      end
                  end
              end
          @finch mode = :safe begin
                  _A_1 .= NaN
                  _A_1[] << Finch.FinchNotation.InitWriter{NaN}() >>= (/)(A_28[], V_4[])
              end
          return Tuple([_A_1])
      end
  end)
┌ Warning: Performance Warning: non-concordant traversal of A_34[_i_2, _i_3] (hint: most arrays prefer column major or first index fast, run in fast mode to ignore this warning)
└ @ Finch ?:0
┌ Warning: Performance Warning: non-concordant traversal of A_35[_i_2, _i_4] (hint: most arrays prefer column major or first index fast, run in fast mode to ignore this warning)
└ @ Finch ?:0
Executing:
:(function var"##compute#265"(prgm)
      begin
          V = ((((((((((((((((((((((((((((((prgm.children[1]).children[2]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[2]).children[3]).children[1]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[2]).children[3]).children[2]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{DenseLevel{Int64, SparseListLevel{Int64, PlusOneVector{Int64, PyArray{Int64, 1, true, true, Int64}}, PlusOneVector{Int64, PyArray{Int64, 1, true, true, Int64}}, ElementLevel{0, Int64, Int64, PyArray{Int64, 1, true, true, Int64}}}}}
          V_2 = ((((((((((((((((((((((((((((((prgm.children[1]).children[2]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[2]).children[3]).children[1]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[2]).children[3]).children[3]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{DenseLevel{Int64, SparseListLevel{Int64, PlusOneVector{Int64, PyArray{Int64, 1, true, true, Int64}}, PlusOneVector{Int64, PyArray{Int64, 1, true, true, Int64}}, ElementLevel{0, Int64, Int64, PyArray{Int64, 1, true, true, Int64}}}}}
          V_3 = ((((((((((((((((((((((((((prgm.children[1]).children[2]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[2]).children[3]).children[1]).children[1]).children[1]).children[2]).children[1]).children[3]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{DenseLevel{Int64, SparseListLevel{Int64, PlusOneVector{Int64, PyArray{Int64, 1, true, true, Int64}}, PlusOneVector{Int64, PyArray{Int64, 1, true, true, Int64}}, ElementLevel{0, Int64, Int64, PyArray{Int64, 1, true, true, Int64}}}}}
          V_4 = ((((((((((((prgm.children[1]).children[2]).children[1]).children[1]).children[2]).children[1]).children[3]).children[1]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{ElementLevel{0, Int64, Int64, PyArray{Int64, 1, true, true, Int64}}}
          _A_1 = Tensor(Element{NaN, Float64, Int64}([167838.0]))
          A_28 = Tensor(Element{0, Int64, Int64}([1007028]))
          A_35 = Tensor(Dense{Int32}(SparseByteMap{Int32}(Element{0, Int64, Int64}([0, 0, 1, 0, 0, 0, 0, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), 1000, [1, 101, 210, 321, 403, 501, 600, 708, 817, 915  …  99327, 99438, 99527, 99623, 99713, 99818, 99933, 100018, 100112, 100221], Bool[0, 0, 1, 0, 0, 0, 0, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0], Tuple{Int64, Int32}[(1, 3), (1, 23), (1, 38), (1, 39), (1, 47), (1, 59), (1, 68), (1, 82), (1, 96), (1, 106)  …  (1000, 874), (1000, 889), (1000, 891), (1000, 892), (1000, 899), (1000, 926), (1000, 930), (1000, 941), (1000, 947), (1000, 965)]), 1000))
          A_34 = Tensor(Dense{Int32}(SparseByteMap{Int32}(Element{0, Int64, Int64}([0, 0, 1, 0, 0, 0, 0, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), 1000, [1, 101, 210, 321, 403, 501, 600, 708, 817, 915  …  99327, 99438, 99527, 99623, 99713, 99818, 99933, 100018, 100112, 100221], Bool[0, 0, 1, 0, 0, 0, 0, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0], Tuple{Int64, Int32}[(1, 3), (1, 23), (1, 38), (1, 39), (1, 47), (1, 59), (1, 68), (1, 82), (1, 96), (1, 106)  …  (1000, 874), (1000, 889), (1000, 891), (1000, 892), (1000, 899), (1000, 926), (1000, 930), (1000, 941), (1000, 947), (1000, 965)]), 1000))
          ()
          @finch mode = :safe begin
                  A_34 .= 0
                  for _i_2 = _
                      for _i_3 = _
                          A_34[_i_2, _i_3] << Finch.FinchNotation.InitWriter{0}() >>= V[(Finch.FinchNotation.walk)(_i_3), _i_2]
                      end
                  end
              end
          @finch mode = :safe begin
                  A_35 .= 0
                  for _i_2 = _
                      for _i_4 = _
                          A_35[_i_2, _i_4] << Finch.FinchNotation.InitWriter{0}() >>= V_3[(Finch.FinchNotation.walk)(_i_4), _i_2]
                      end
                  end
              end
          @finch mode = :safe begin
                  A_28 .= 0
                  for _i_3 = _
                      for _i_4 = _
                          for _i_2 = _
                              A_28[] << + >>= (*)(V_2[(Finch.FinchNotation.walk)(_i_4), (Finch.FinchNotation.follow)(_i_3)], A_34[(Finch.FinchNotation.walk)(_i_2), _i_3], A_35[(Finch.FinchNotation.follow)(_i_2), (Finch.FinchNotation.follow)(_i_4)])
                          end
                      end
                  end
              end
          @finch mode = :safe begin
                  _A_1 .= NaN
                  _A_1[] << Finch.FinchNotation.InitWriter{NaN}() >>= (/)(A_28[], V_4[])
              end
          return Tuple([_A_1])
      end
  end)
┌ Warning: Performance Warning: non-concordant traversal of A_34[_i_2, _i_3] (hint: most arrays prefer column major or first index fast, run in fast mode to ignore this warning)
└ @ Finch ?:0
┌ Warning: Performance Warning: non-concordant traversal of A_35[_i_2, _i_4] (hint: most arrays prefer column major or first index fast, run in fast mode to ignore this warning)
└ @ Finch ?:0
corrupted double-linked list (not small)
Aborted (core dumped)

The text was updated successfully, but these errors were encountered:

kylebd99 · 2024-12-03T20:40:58Z

Looking into this now. I'm not getting an error. However, it does seem like the verbose option is trying to print the full matrix which it shouldn't do.

Do you still get these errors when the verbose option is turned off?

kylebd99 · 2024-12-06T17:33:16Z

Accidentally closed this with the PR.

willow-ahrens · 2024-12-06T20:01:54Z

Is this still not fixed?

mtsokol · 2024-12-11T12:41:29Z

@kylebd99 That's correct, I'm getting an error only for verbose=True passed to finch.compute(...) and Galley scheduler. Without verbose mode it completes as the passing example.

kylebd99 · 2024-12-12T00:18:55Z

Should be fixed with #670

willow-ahrens · 2024-12-12T15:36:42Z

finch-tensor/Finch.jl#670 has been merged. Do we need a patch release?

willow-ahrens · 2024-12-12T15:37:44Z

I don't think this bug should be blocking anyone, so if we can verify that Finch main has fixed the bug, we can push the update with whenever the next patch comes along

kylebd99 · 2024-12-16T23:08:28Z

Should be fixed by the latest Finch patch release.

willow-ahrens assigned kylebd99 Dec 2, 2024

kylebd99 linked a pull request Dec 3, 2024 that will close this issue

Add Galley option to compute() #79

Merged

kylebd99 closed this as completed in #79 Dec 5, 2024

kylebd99 reopened this Dec 6, 2024

kylebd99 closed this as completed Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Corrupted double-linked list when running Galley scheduler #81

Corrupted double-linked list when running Galley scheduler #81

mtsokol commented Dec 2, 2024 •

edited by willow-ahrens

Loading

kylebd99 commented Dec 3, 2024

kylebd99 commented Dec 6, 2024

willow-ahrens commented Dec 6, 2024

mtsokol commented Dec 11, 2024 •

edited

Loading

kylebd99 commented Dec 12, 2024 •

edited

Loading

willow-ahrens commented Dec 12, 2024

willow-ahrens commented Dec 12, 2024

kylebd99 commented Dec 16, 2024

Corrupted double-linked list when running Galley scheduler #81

Corrupted double-linked list when running Galley scheduler #81

Comments

mtsokol commented Dec 2, 2024 • edited by willow-ahrens Loading

kylebd99 commented Dec 3, 2024

kylebd99 commented Dec 6, 2024

willow-ahrens commented Dec 6, 2024

mtsokol commented Dec 11, 2024 • edited Loading

kylebd99 commented Dec 12, 2024 • edited Loading

willow-ahrens commented Dec 12, 2024

willow-ahrens commented Dec 12, 2024

kylebd99 commented Dec 16, 2024

mtsokol commented Dec 2, 2024 •

edited by willow-ahrens

Loading

mtsokol commented Dec 11, 2024 •

edited

Loading

kylebd99 commented Dec 12, 2024 •

edited

Loading