Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CHOLMOD default ordering options: METIS vs AMD #548

Closed
gerw opened this issue Jul 30, 2024 · 28 comments · Fixed by #550
Closed

CHOLMOD default ordering options: METIS vs AMD #548

gerw opened this issue Jul 30, 2024 · 28 comments · Fixed by #550

Comments

@gerw
Copy link

gerw commented Jul 30, 2024

Consider the following snippet

using SparseArrays
using LinearAlgebra

function setup(n)
	e = ones(n)
	e2 = -ones(n-1)
	K = spdiagm(n, n, 0 => 2.0e, 1 => e2, -1 => e2)
	Id = sparse(1.0*I, n, n)
	return kron(K, Id) + kron(Id, K)
end

function solve(n)
	A = setup(n)
	F = ones(n^2)
	@time A \ F
end

It just builds a discretization of a PDE and solves it. Now let us execute

solve(1156);
solve(1157);

Using julia 1.9.3 I get

  2.095628 seconds (66 allocations: 1.339 GiB, 0.08% gc time)
  2.134111 seconds (66 allocations: 1.339 GiB, 0.08% gc time)

and all is fine. With julia 1.10.4 instead I get

  1.966967 seconds (69 allocations: 1.339 GiB, 0.17% gc time)
  7.775528 seconds (1.99 M allocations: 9.856 GiB, 2.97% gc time)

Thus, the second solve on a slightly larger system is suddenly much slower and allocates memory. The profiling shows that cholmod_l_analyze runs much longer. The same happens with the RC of 1.11.

Digging a little bit deeper with

A = setup(1157);
B = SparseArrays.CHOLMOD.Sparse(A);
@time F = SparseArrays.CHOLMOD.symbolic(B, perm=nothing);
@time cholesky!(F, A; shift = 1., check = true);

and looking at the output of top, one can see that symbolic runs only on a single core and allocates memory, cholesky! seems to be fine (and seems to run in parallel).

I also tried to fiddle around with

SparseArrays.CHOLMOD.getcommon()[].nthreads_max

(which is 1) after startup, but it does not seem to change anything.

@ViralBShah
Copy link
Member

It seems the first order of business should be to see what is driving those allocations and fix that.

@gerw
Copy link
Author

gerw commented Jul 30, 2024

Can you reproduce this behaviour?

If I see it correctly, symbolic just boils down to a ccall to cholmod_l_analyze...

@ViralBShah
Copy link
Member

ViralBShah commented Jul 30, 2024

Yes, I see the same thing with 1.10.4. The allocations are all in

julia> S = CHOLMOD.Sparse(A);

julia> @time CHOLMOD.cholmod_l_analyze(S, g)
  6.854387 seconds (1.99 M allocations: 9.025 GiB, 0.04% gc time)
Ptr{SparseArrays.LibSuiteSparse.cholmod_factor_struct} @0x0000600003638010

@ViralBShah
Copy link
Member

@KristofferC Any ideas?

@KristofferC
Copy link
Member

Not immediately, but I will do a bisection and start from there.

@KristofferC
Copy link
Member

Bisected to JuliaLang/julia#48977 on the julia side.

@KristofferC
Copy link
Member

Reverting to SuiteSparse_jll 5.10.1 seems to fix it (basically reverting the libsuitesparse.version change in JuliaLang/julia#48977):

  2.333998 seconds (4.16 k allocations: 1.618 GiB, 2.01% gc time, 0.66% compilation time)
  1.968816 seconds (140 allocations: 1.698 GiB, 1.90% gc time)

cc @rayegun

@ViralBShah
Copy link
Member

ViralBShah commented Jul 30, 2024

But the offending line is just a 1-liner that is showing 2M Julia allocations - that suggests the issue is not in SuiteSparse.

@KristofferC
Copy link
Member

KristofferC commented Jul 30, 2024

that suggests the issue is not in SuiteSparse.

This is not accurate because we register the CHOLMOD allocation functions with our GC:

cnfg = cglobal((:SuiteSparse_config, :libsuitesparseconfig), Ptr{Cvoid})
unsafe_store!(cnfg, cglobal(:jl_malloc, Ptr{Cvoid}), 1)
unsafe_store!(cnfg, cglobal(:jl_calloc, Ptr{Cvoid}), 2)
unsafe_store!(cnfg, cglobal(:jl_realloc, Ptr{Cvoid}), 3)
unsafe_store!(cnfg, cglobal(:jl_free, Ptr{Cvoid}), 4)

so even allocations within CHOLMOD are counted in @time. Doing a profile

julia> @profile CHOLMOD.cholmod_l_analyze(S, g)

julia> Profile.print(noisefloor=2, C=true)

it seems to be in CHOLMOD a lot:

3646 …c/solvers/lib/aarch64-apple-darwin20.jl:1286; cholmod_l_analyze(arg1::SparseArrays.CHOLMOD.Sparse{Float64, Int64}…
 3646 …arwin14/lib/julia/libcholmod.4.2.1.dylib:?; cholmod_l_analyze_p2
  211  …rwin14/lib/julia/libcholmod.4.2.1.dylib:?; cholmod_l_amd
191   ╎    194  …e.darwin14/lib/julia/libamd.3.2.1.dylib:?; amd_l2
      ╎   3313 …rwin14/lib/julia/libcholmod.4.2.1.dylib:?; cholmod_l_metis
  1   ╎    3292 …rwin14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_METIS_NodeND
      ╎     3268 …rwin14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
      ╎    ╎ 2787 …win14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
      ╎    ╎  2339 …win14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
      ╎    ╎   1896 …in14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
      ╎    ╎    1475 …in14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
      ╎    ╎     1057 …in14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
      ╎    ╎    ╎ 682  …n14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
      ╎    ╎    ╎  386  …n14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
      ╎    ╎    ╎   194  …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
  1   ╎    ╎    ╎    71   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
      ╎    ╎    ╎     23   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNestedDissection
      ╎    ╎    ╎    ╎ 15   …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNodeBisectionL2
      ╎    ╎    ╎    ╎  9    …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__InitSeparator
      ╎    ╎    ╎    ╎   8    …/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__GrowBisection
  6   ╎    ╎    ╎    ╎    6    …/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__FM_2WayCutRefine
      ╎    ╎    ╎     25   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNodeBisectionL2
      ╎    ╎    ╎    ╎ 14   …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__CoarsenGraph
  4   ╎    ╎    ╎    ╎  10   …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__Match_SHEM
      ╎    ╎    ╎    ╎ 11   …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__InitSeparator
  1   ╎    ╎    ╎    ╎  10   …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__GrowBisection
  6   ╎    ╎    ╎    ╎   7    …/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__FM_2WayCutRefine
      ╎    ╎    ╎    71   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNodeBisectionL2
      ╎    ╎    ╎     26   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__CoarsenGraph
  8   ╎    ╎    ╎    ╎ 17   …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__Match_SHEM
      ╎    ╎    ╎     44   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__InitSeparator
  1   ╎    ╎    ╎    ╎ 33   …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__GrowBisection
 29   ╎    ╎    ╎    ╎  30   …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__FM_2WayCutRefine
      ╎    ╎    ╎    39   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__Refine2WayNode
 32   ╎    ╎    ╎     32   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__FM_2WayNodeRefine1Sided
      ╎    ╎    ╎   119  …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNodeBisectionL2
  2   ╎    ╎    ╎    59   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__CoarsenGraph
  6   ╎    ╎    ╎     18   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__Match_RM
  9   ╎    ╎    ╎    ╎ 9    …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__CreateCoarseGraphNoMask
 25   ╎    ╎    ╎     38   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__Match_SHEM
 12   ╎    ╎    ╎    ╎ 12   …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__CreateCoarseGraphNoMask
      ╎    ╎    ╎    56   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__InitSeparator
  3   ╎    ╎    ╎     40   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__GrowBisection
 29   ╎    ╎    ╎    ╎ 33   …4/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__FM_2WayCutRefine
      ╎    ╎    ╎   59   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__Refine2WayNode
 49   ╎    ╎    ╎    51   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__FM_2WayNodeRefine1Sided
      ╎    ╎    ╎  184  …n14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__MlevelNodeBisectionL2
      ╎    ╎    ╎   78   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__CoarsenGraph
 10   ╎    ╎    ╎    26   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__Match_RM
 11   ╎    ╎    ╎     11   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__CreateCoarseGraphNoMask
 22   ╎    ╎    ╎    52   …14/lib/julia/libcholmod.4.2.1.dylib:?; SuiteSparse_metis_libmetis__Match_SHEM

@KristofferC
Copy link
Member

I don't know if the lack of multithreading is coupled to the increase in allocations but both performance and allocations seems to be worse when going from v5 -> v7.

@ViralBShah
Copy link
Member

Ok - that's good to know. So should we report the issue upstream then?

@KristofferC
Copy link
Member

KristofferC commented Jul 30, 2024

On v5, I only see the following in the profile:

julia> @time CHOLMOD.cholmod_l_analyze(S, g)
  0.549651 seconds (26 allocations: 373.199 MiB, 9.98% gc time)

julia> @profile CHOLMOD.cholmod_l_analyze(S, g);

julia> Profile.print(noisefloor=2, C=true)
...
   ╎  321 ...src/solvers/lib/aarch64-apple-darwin20.jl:1084; cholmod_l_analyze(arg1::SparseArrays.CHOLMOD.Sparse{Float64}, a...
  3╎   321 ...apple.darwin14/lib/julia/libcholmod.dylib:?; cholmod_l_analyze_p2
   ╎    215 ...apple.darwin14/lib/julia/libcholmod.dylib:?; cholmod_l_amd
192╎   ╎ 195 ...ple.darwin14/lib/julia/libamd.2.4.6.dylib:?; amd_l2
   ╎    50  ...apple.darwin14/lib/julia/libcholmod.dylib:?; cholmod_l_analyze_ordering
 18╎   ╎ 18  ...apple.darwin14/lib/julia/libcholmod.dylib:?; cholmod_l_rowcolcounts
   ╎   ╎ 20  ...apple.darwin14/lib/julia/libcholmod.dylib:?; permute_matrices
   ╎   ╎  20  ...pple.darwin14/lib/julia/libcholmod.dylib:?; cholmod_l_ptranspose
 18╎   ╎   18  ...pple.darwin14/lib/julia/libcholmod.dylib:?; cholmod_l_transpose_sym

@ViralBShah
Copy link
Member

ViralBShah commented Jul 30, 2024

SuiteSparse 7 comes with METIS built in for partitioning, which is where it seems most of the time is being spent. Is this kind of thing expected from METIS? Perhaps @fredrikekre has some experience with it.

@KristofferC
Copy link
Member

KristofferC commented Jul 30, 2024

So should we report the issue upstream then?

Maybe, assuming it isn't our fault due to different build options etc. The following is a decent Julia reproducer at least:

using SparseArrays, LinearAlgebra
using SparseArrays: CHOLMOD, LibSuiteSparse

function setup(n)
	e = ones(n)
	e2 = -ones(n-1)
	K = spdiagm(n, n, 0 => 2.0e, 1 => e2, -1 => e2)
	Id = sparse(1.0*I, n, n)
	return kron(K, Id) + kron(Id, K)
end

n = 1157;
A = setup(n);
F = ones(n^2);

S = CHOLMOD.Sparse(A);
g = CHOLMOD.getcommon()

using .LibSuiteSparse: libcholmod, cholmod_l_analyze, cholmod_sparse, cholmod_common, cholmod_factor
@time @ccall libcholmod.cholmod_l_analyze(S::Ptr{cholmod_sparse},
                                    g::Ptr{cholmod_common})::Ptr{cholmod_factor}

1.9:

  0.560903 seconds (125 allocations: 454.909 MiB, 6.46% gc time, 1.14% compilation time)

1.10:

  5.704131 seconds (1.99 M allocations: 9.105 GiB, 1.86% gc time, 0.13% compilation time)

@ViralBShah
Copy link
Member

@DrTimothyAldenDavis Is this kind of slowdown expected when METIS kicks in?

@DrTimothyAldenDavis
Copy link

METIS produces good orderings but it takes a very long time, particularly for single analyze/factorize/solves.

The default ordering is to try AMD. Then if AMD produces a good ordering I use that. Otherwise, I try METIS. You can disable METIS.

See page 104 of the user guide for CHOLMOD 5.2.1. Or the cholmod.h include file here:

https://github.com/DrTimothyAldenDavis/SuiteSparse/blob/26ababc7f3af725c5fb9168a1b94850eab74b666/CHOLMOD/Include/cholmod.h#L555-L574

@ViralBShah
Copy link
Member

@DrTimothyAldenDavis I assume METIS can also be disabled in SuiteSparse 7.

@DrTimothyAldenDavis
Copy link

Yes, CHOLMOD 5.2.1 is the most recent stable release of CHOLMOD, in SuiteSparse 7.7.0.

@ViralBShah
Copy link
Member

Ah thank you. I misread CHOLMOD 5 to mean SuiteSparse 5.

@gerw
Copy link
Author

gerw commented Jul 30, 2024

Thank you for all the discussion. Running

@SparseArrays.CHOLMOD.cholmod_param nmethods = 2 solve(1157);

gives back the behaviour from julia 1.9.3.

Now, I am wondering whether this should be used automatically if the user just wants to solve a single system?

@DrTimothyAldenDavis
Copy link

Now, I am wondering whether this should be used automatically if the user just wants to solve a single system?

That might be a good idea. It would be the least disruptive to the performance seen in julia.

@DrTimothyAldenDavis
Copy link

p.s. The title of this issue could be revised, to something like "CHOLMOD default ordering options: METIS vs AMD".

@gerw gerw changed the title cholmod no longer uses multithreading? CHOLMOD default ordering options: METIS vs AMD Jul 31, 2024
@gerw
Copy link
Author

gerw commented Jul 31, 2024

Now, I am wondering whether this should be used automatically if the user just wants to solve a single system?

That might be a good idea. It would be the least disruptive to the performance seen in julia.

And somewhere in the documentation, one should point to the possibility of increasing nmethods when the factorization is reused for multiple solves.

This being said, I cannot find any reference to the cholesky decomposition from SparseArrays in the documentation?

@ViralBShah
Copy link
Member

ViralBShah commented Jul 31, 2024

The documentation for the solvers has been, uh, sparse. But I don't know why it doesn't show up at all. We should at least have the solvers show up in Julia docs like they do in SparseArrays docs.

I suspect the index.md from SparseArrays just gets picked up and none of the other files do when Julia stdlib docs are built. So perhaps we should just integrate solvers.md in to index.md and have one big file for all SparseArrays.jl documentation.

Based on the discussion and findings here, METIS should certainly opt-in and documented. As for threads, I suppose we should use threading by default. Would you be up for making a PR?

@gerw
Copy link
Author

gerw commented Jul 31, 2024

Based on the discussion and findings here, METIS should certainly opt-in and documented. As for threads, I suppose we should use threading by default. Would you be up for making a PR?

Uh, I don't know, rather not. I do not have set up the infrastructure to build julia (or at least SparseArrays)...

@ViralBShah
Copy link
Member

ViralBShah commented Jul 31, 2024

I'll get around to it in the next few days, unless someone beats me to it.

@ViralBShah
Copy link
Member

@DrTimothyAldenDavis Is it possible to change the number of threads for symbolic() and other phases after CHOLMOD is aleady loaded - or between solves and such?

@DrTimothyAldenDavis
Copy link

Yes, just change the Common->nthreads_max, for the "Common" object passed in as the last parameter to all CHOLMOD methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants