Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrupted double-linked list when running Galley scheduler #81

Closed
mtsokol opened this issue Dec 2, 2024 · 8 comments · Fixed by #79
Closed

Corrupted double-linked list when running Galley scheduler #81

mtsokol opened this issue Dec 2, 2024 · 8 comments · Fixed by #79
Assignees

Comments

@mtsokol
Copy link
Member

mtsokol commented Dec 2, 2024

Hi @kylebd99,

I worked on benchmarks last week and I faced an issue when running Galley scheduler for "counting triangles" example with Finch 1.0 on my linux machine.
Here's a default scheduler example that runs correctly:

import networkx as nx
import os
import finch

LEN = 1000
DENSITY = 0.1

G = nx.gnp_random_graph(n=LEN, p=DENSITY)
a_sps = nx.to_scipy_sparse_array(G)

a = finch.asarray(a_sps)
a_l = finch.lazy(a)
b_l = finch.sum(a_l @ a_l * a_l) / finch.asarray(6)
finch.compute(b_l, opt="default", verbose=False)

LEN = 2000

G = nx.gnp_random_graph(n=LEN, p=DENSITY)
a_sps = nx.to_scipy_sparse_array(G)

a = finch.asarray(a_sps)
a_l = finch.lazy(a)
b_l = finch.sum(a_l @ a_l * a_l) / finch.asarray(6)
finch.compute(b_l, opt="default", verbose=False)

And here's one that fails:

import networkx as nx
import os
import finch

LEN = 1000
DENSITY = 0.1

G = nx.gnp_random_graph(n=LEN, p=DENSITY)
a_sps = nx.to_scipy_sparse_array(G)

a = finch.asarray(a_sps)
a_l = finch.lazy(a)
b_l = finch.sum(a_l @ a_l * a_l) / finch.asarray(6)
finch.compute(b_l, opt="galley", verbose=True)

LEN = 2000

G = nx.gnp_random_graph(n=LEN, p=DENSITY)
a_sps = nx.to_scipy_sparse_array(G)

a = finch.asarray(a_sps)
a_l = finch.lazy(a)
b_l = finch.sum(a_l @ a_l * a_l) / finch.asarray(6)
finch.compute(b_l, opt="galley", verbose=True)

Here's the output with the verbose=True option:

Input Queries :
Query(Alias(##A#254),
   Materialize(
      MapJoin(Value(/),
         Aggregate(Value(+),Value(0),##i#249,##i#250,
            MapJoin(Value(*),
               Aggregate(Value(+),Value(0),##i#236,
                  MapJoin(Value(*),Input(Value(FIBER),##i#236,##i#249),Input(Value(FIBER),##i#250,##i#236))),Input(Value(FIBER),##i#250,##i#249))),Input(Value(FIBER)))))
Used DNF: false
 QUERY: Query(Alias(##A#254),
   Materialize(
      MapJoin(Value(/),
         Aggregate(Value(+),Value(0),##i#249,##i#250,##i#236,
            MapJoin(Value(*),Input(Value(FIBER),##i#236,##i#249),Input(Value(FIBER),##i#250,##i#236),Input(Value(FIBER),##i#250,##i#249))),Input(Value(FIBER)))))
FAQ Opt Time: 3.1587700843811035
--------------- Logical Plan ---------------
Query(Alias(A_28),
   Aggregate(Value(+),Value(0),##i#236,##i#250,##i#249,
      MapJoin(Value(*),Input(Value(FIBER),##i#236,##i#249),Input(Value(FIBER),##i#250,##i#236),Input(Value(FIBER),##i#250,##i#249))))
Query(##A#254,
   Materialize(
      MapJoin(Value(/),Alias(A_28),Input(Value(FIBER)))))
--------------------------------------------
Physical Opt Time: 3.1587700843811035
--------------- Physical Plan ---------------
Query(Alias(A_34,##i#236,##i#249),
   Materialize(Value(t_bytemap),Value(t_dense),##i#249,##i#236,
      Aggregate(Value(Finch.FinchNotation.InitWriter{0}()),Value(0),Input(##i#236::t_walk,##i#249::t_default))),##i#249,##i#236)
Query(Alias(A_35,##i#250,##i#249),
   Materialize(Value(t_bytemap),Value(t_dense),##i#249,##i#250,
      Aggregate(Value(Finch.FinchNotation.InitWriter{0}()),Value(0),Input(##i#250::t_walk,##i#249::t_default))),##i#249,##i#250)
Query(Alias(A_28),
   Materialize(
      Aggregate(Value(+),Value(0),##i#250,##i#236,##i#249,
         MapJoin(Value(*),Alias(A_34,##i#249::t_walk,##i#236::t_default),Input(##i#250::t_walk,##i#236::t_follow),Alias(A_35,##i#249::t_follow,##i#250::t_follow)))),##i#236,##i#250,##i#249)
Query(##A#254,
   Materialize(
      Aggregate(Value(Finch.FinchNotation.InitWriter{NaN}()),Value(nothing),
         MapJoin(Value(/),Alias(A_28),Input()))))
--------------------------------------------
Expected Output Size: 100220
Expected Output Size: 100220
Expected Output Size: 1
Expected Output Size: 1
Executing:
:(function var"##compute#265"(prgm)
      begin
          V = ((((((((((((((((((((((((((((((prgm.children[1]).children[2]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[2]).children[3]).children[1]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[2]).children[3]).children[2]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{DenseLevel{Int64, SparseListLevel{Int64, PlusOneVector{Int64, PyArray{Int64, 1, true, true, Int64}}, PlusOneVector{Int64, PyArray{Int64, 1, true, true, Int64}}, ElementLevel{0, Int64, Int64, PyArray{Int64, 1, true, true, Int64}}}}}
          V_2 = ((((((((((((((((((((((((((((((prgm.children[1]).children[2]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[2]).children[3]).children[1]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[2]).children[3]).children[3]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{DenseLevel{Int64, SparseListLevel{Int64, PlusOneVector{Int64, PyArray{Int64, 1, true, true, Int64}}, PlusOneVector{Int64, PyArray{Int64, 1, true, true, Int64}}, ElementLevel{0, Int64, Int64, PyArray{Int64, 1, true, true, Int64}}}}}
          V_3 = ((((((((((((((((((((((((((prgm.children[1]).children[2]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[2]).children[3]).children[1]).children[1]).children[1]).children[2]).children[1]).children[3]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{DenseLevel{Int64, SparseListLevel{Int64, PlusOneVector{Int64, PyArray{Int64, 1, true, true, Int64}}, PlusOneVector{Int64, PyArray{Int64, 1, true, true, Int64}}, ElementLevel{0, Int64, Int64, PyArray{Int64, 1, true, true, Int64}}}}}
          V_4 = ((((((((((((prgm.children[1]).children[2]).children[1]).children[1]).children[2]).children[1]).children[3]).children[1]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{ElementLevel{0, Int64, Int64, PyArray{Int64, 1, true, true, Int64}}}
          _A_1 = Tensor(Element{NaN, Float64, Int64}(Float64[]))
          A_28 = Tensor(Element{0, Int64, Int64}(Int64[]))
          A_35 = Tensor(Dense{Int32}(SparseByteMap{Int32}(Element{0, Int64, Int64}(Int64[]), 1000, [1], Bool[], Tuple{Int64, Int32}[]), 1000))
          A_34 = Tensor(Dense{Int32}(SparseByteMap{Int32}(Element{0, Int64, Int64}(Int64[]), 1000, [1], Bool[], Tuple{Int64, Int32}[]), 1000))
          ()
          @finch mode = :safe begin
                  A_34 .= 0
                  for _i_2 = _
                      for _i_3 = _
                          A_34[_i_2, _i_3] << Finch.FinchNotation.InitWriter{0}() >>= V[(Finch.FinchNotation.walk)(_i_3), _i_2]
                      end
                  end
              end
          @finch mode = :safe begin
                  A_35 .= 0
                  for _i_2 = _
                      for _i_4 = _
                          A_35[_i_2, _i_4] << Finch.FinchNotation.InitWriter{0}() >>= V_3[(Finch.FinchNotation.walk)(_i_4), _i_2]
                      end
                  end
              end
          @finch mode = :safe begin
                  A_28 .= 0
                  for _i_3 = _
                      for _i_4 = _
                          for _i_2 = _
                              A_28[] << + >>= (*)(V_2[(Finch.FinchNotation.walk)(_i_4), (Finch.FinchNotation.follow)(_i_3)], A_34[(Finch.FinchNotation.walk)(_i_2), _i_3], A_35[(Finch.FinchNotation.follow)(_i_2), (Finch.FinchNotation.follow)(_i_4)])
                          end
                      end
                  end
              end
          @finch mode = :safe begin
                  _A_1 .= NaN
                  _A_1[] << Finch.FinchNotation.InitWriter{NaN}() >>= (/)(A_28[], V_4[])
              end
          return Tuple([_A_1])
      end
  end)
┌ Warning: Performance Warning: non-concordant traversal of A_34[_i_2, _i_3] (hint: most arrays prefer column major or first index fast, run in fast mode to ignore this warning)
└ @ Finch ?:0
┌ Warning: Performance Warning: non-concordant traversal of A_35[_i_2, _i_4] (hint: most arrays prefer column major or first index fast, run in fast mode to ignore this warning)
└ @ Finch ?:0
Executing:
:(function var"##compute#265"(prgm)
      begin
          V = ((((((((((((((((((((((((((((((prgm.children[1]).children[2]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[2]).children[3]).children[1]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[2]).children[3]).children[2]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{DenseLevel{Int64, SparseListLevel{Int64, PlusOneVector{Int64, PyArray{Int64, 1, true, true, Int64}}, PlusOneVector{Int64, PyArray{Int64, 1, true, true, Int64}}, ElementLevel{0, Int64, Int64, PyArray{Int64, 1, true, true, Int64}}}}}
          V_2 = ((((((((((((((((((((((((((((((prgm.children[1]).children[2]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[2]).children[3]).children[1]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[2]).children[3]).children[3]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{DenseLevel{Int64, SparseListLevel{Int64, PlusOneVector{Int64, PyArray{Int64, 1, true, true, Int64}}, PlusOneVector{Int64, PyArray{Int64, 1, true, true, Int64}}, ElementLevel{0, Int64, Int64, PyArray{Int64, 1, true, true, Int64}}}}}
          V_3 = ((((((((((((((((((((((((((prgm.children[1]).children[2]).children[1]).children[1]).children[2]).children[1]).children[2]).children[1]).children[1]).children[1]).children[1]).children[2]).children[3]).children[1]).children[1]).children[1]).children[2]).children[1]).children[3]).children[1]).children[1]).children[1]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{DenseLevel{Int64, SparseListLevel{Int64, PlusOneVector{Int64, PyArray{Int64, 1, true, true, Int64}}, PlusOneVector{Int64, PyArray{Int64, 1, true, true, Int64}}, ElementLevel{0, Int64, Int64, PyArray{Int64, 1, true, true, Int64}}}}}
          V_4 = ((((((((((((prgm.children[1]).children[2]).children[1]).children[1]).children[2]).children[1]).children[3]).children[1]).children[1]).children[1]).children[1]).children[2]).tns.val::Tensor{ElementLevel{0, Int64, Int64, PyArray{Int64, 1, true, true, Int64}}}
          _A_1 = Tensor(Element{NaN, Float64, Int64}([167838.0]))
          A_28 = Tensor(Element{0, Int64, Int64}([1007028]))
          A_35 = Tensor(Dense{Int32}(SparseByteMap{Int32}(Element{0, Int64, Int64}([0, 0, 1, 0, 0, 0, 0, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), 1000, [1, 101, 210, 321, 403, 501, 600, 708, 817, 915  …  99327, 99438, 99527, 99623, 99713, 99818, 99933, 100018, 100112, 100221], Bool[0, 0, 1, 0, 0, 0, 0, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0], Tuple{Int64, Int32}[(1, 3), (1, 23), (1, 38), (1, 39), (1, 47), (1, 59), (1, 68), (1, 82), (1, 96), (1, 106)  …  (1000, 874), (1000, 889), (1000, 891), (1000, 892), (1000, 899), (1000, 926), (1000, 930), (1000, 941), (1000, 947), (1000, 965)]), 1000))
          A_34 = Tensor(Dense{Int32}(SparseByteMap{Int32}(Element{0, Int64, Int64}([0, 0, 1, 0, 0, 0, 0, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), 1000, [1, 101, 210, 321, 403, 501, 600, 708, 817, 915  …  99327, 99438, 99527, 99623, 99713, 99818, 99933, 100018, 100112, 100221], Bool[0, 0, 1, 0, 0, 0, 0, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0], Tuple{Int64, Int32}[(1, 3), (1, 23), (1, 38), (1, 39), (1, 47), (1, 59), (1, 68), (1, 82), (1, 96), (1, 106)  …  (1000, 874), (1000, 889), (1000, 891), (1000, 892), (1000, 899), (1000, 926), (1000, 930), (1000, 941), (1000, 947), (1000, 965)]), 1000))
          ()
          @finch mode = :safe begin
                  A_34 .= 0
                  for _i_2 = _
                      for _i_3 = _
                          A_34[_i_2, _i_3] << Finch.FinchNotation.InitWriter{0}() >>= V[(Finch.FinchNotation.walk)(_i_3), _i_2]
                      end
                  end
              end
          @finch mode = :safe begin
                  A_35 .= 0
                  for _i_2 = _
                      for _i_4 = _
                          A_35[_i_2, _i_4] << Finch.FinchNotation.InitWriter{0}() >>= V_3[(Finch.FinchNotation.walk)(_i_4), _i_2]
                      end
                  end
              end
          @finch mode = :safe begin
                  A_28 .= 0
                  for _i_3 = _
                      for _i_4 = _
                          for _i_2 = _
                              A_28[] << + >>= (*)(V_2[(Finch.FinchNotation.walk)(_i_4), (Finch.FinchNotation.follow)(_i_3)], A_34[(Finch.FinchNotation.walk)(_i_2), _i_3], A_35[(Finch.FinchNotation.follow)(_i_2), (Finch.FinchNotation.follow)(_i_4)])
                          end
                      end
                  end
              end
          @finch mode = :safe begin
                  _A_1 .= NaN
                  _A_1[] << Finch.FinchNotation.InitWriter{NaN}() >>= (/)(A_28[], V_4[])
              end
          return Tuple([_A_1])
      end
  end)
┌ Warning: Performance Warning: non-concordant traversal of A_34[_i_2, _i_3] (hint: most arrays prefer column major or first index fast, run in fast mode to ignore this warning)
└ @ Finch ?:0
┌ Warning: Performance Warning: non-concordant traversal of A_35[_i_2, _i_4] (hint: most arrays prefer column major or first index fast, run in fast mode to ignore this warning)
└ @ Finch ?:0
corrupted double-linked list (not small)
Aborted (core dumped)
@kylebd99
Copy link
Collaborator

kylebd99 commented Dec 3, 2024

Looking into this now. I'm not getting an error. However, it does seem like the verbose option is trying to print the full matrix which it shouldn't do.

Do you still get these errors when the verbose option is turned off?

@kylebd99 kylebd99 linked a pull request Dec 3, 2024 that will close this issue
@kylebd99 kylebd99 reopened this Dec 6, 2024
@kylebd99
Copy link
Collaborator

kylebd99 commented Dec 6, 2024

Accidentally closed this with the PR.

@willow-ahrens
Copy link
Collaborator

Is this still not fixed?

@mtsokol
Copy link
Member Author

mtsokol commented Dec 11, 2024

@kylebd99 That's correct, I'm getting an error only for verbose=True passed to finch.compute(...) and Galley scheduler. Without verbose mode it completes as the passing example.

@kylebd99
Copy link
Collaborator

kylebd99 commented Dec 12, 2024

Should be fixed with #670

@willow-ahrens
Copy link
Collaborator

finch-tensor/Finch.jl#670 has been merged. Do we need a patch release?

@willow-ahrens
Copy link
Collaborator

I don't think this bug should be blocking anyone, so if we can verify that Finch main has fixed the bug, we can push the update with whenever the next patch comes along

@kylebd99
Copy link
Collaborator

Should be fixed by the latest Finch patch release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants