Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large 2000x peformance regression from 5.1 to 5.2 #71

Open
FreddieWitherden opened this issue Aug 11, 2023 · 7 comments
Open

Large 2000x peformance regression from 5.1 to 5.2 #71

FreddieWitherden opened this issue Aug 11, 2023 · 7 comments

Comments

@FreddieWitherden
Copy link

Using METIS 5.1:

pyfr -p partition 8 -ebalanced -pmetis inc-cylinder.pyfrm foo/
 • Combine mesh parts (0.02s)
 • Construct graph (0.00s)
 • Partition graph (0.01s)
 • Renumber vertices (0.03s)
 • Repartition mesh (0.01s)
 • Write mesh (0.01s)

where the partitioning and renumbering (both of which make calls to METIS_PartGraphRecursive) complete almost immediately. By contrast using METIS 5.2.1:

pyfr -p partition 8 -ebalanced -pmetis inc-cylinder.pyfrm foo/
 • Combine mesh parts (0.01s)
 • Construct graph (0.00s)
 • Partition graph (17.33s)
 • Renumber vertices (7.22s)
 • Repartition mesh (0.01s)
 • Write mesh (0.01s)

where we can see a huge slow down (on the order of ~2000x) for the partition graph portion which makes a single call to METIS_PartGraphRecursive. The inputs are identical in both cases, also reproduced with METIS_PartGraphKway. Also reproduced on both Linux (x86-64) and macOS (AARCH64).

This occurs with all of our grids/meshes. Profiling 5.2.1 with perf record we find:

    17.05%  pyfr      libmetis.so.0                                      [.] libmetis__FM_Mc2WayCutRefine
     8.88%  pyfr      libmetis.so.0                                      [.] libmetis__CreateCoarseGraph
     8.03%  pyfr      libmetis.so.0                                      [.] libmetis__FM_2WayCutRefine
     7.93%  pyfr      libmetis.so.0                                      [.] libmetis__rpqInsert
     5.24%  pyfr      libmetis.so.0                                      [.] libmetis__rpqUpdate
     5.12%  pyfr      libc.so.6                                          [.] random
     4.21%  pyfr      libmetis.so.0                                      [.] libmetis__Compute2WayPartitionParams
     4.21%  pyfr      libmetis.so.0                                      [.] libmetis__rpqGetTop
     4.21%  pyfr      libmetis.so.0                                      [.] libmetis__Match_SHEM
     4.01%  pyfr      libmetis.so.0                                      [.] libmetis__SelectQueue
     2.79%  pyfr      libmetis.so.0                                      [.] libmetis__iset
     2.77%  pyfr      libmetis.so.0                                      [.] libmetis__Project2WayPartition
     2.20%  pyfr      libmetis.so.0                                      [.] libmetis__Match_RM
     1.99%  pyfr      libmetis.so.0                                      [.] libmetis__ComputeLoadImbalanceDiffVe
c
     1.93%  pyfr      libmetis.so.0                                      [.] libmetis__McGeneral2WayBalance
     1.85%  pyfr      libmetis.so.0                                      [.] libmetis__iaxpy
     1.34%  pyfr      libmetis.so.0                                      [.] libmetis__rpqDelete
     1.22%  pyfr      libmetis.so.0                                      [.] libmetis__BucketSortKeysInc

whereas with 5.1 (good) we find:

    10.18%  pyfr      libopenblas64_p-r0-15028c96.3.21.so                [.] blas_thread_server
     9.73%  pyfr      [unknown]                                          [k] 0xffffffff900001a2
     9.30%  pyfr      libc.so.6                                          [.] __sched_yield
     8.22%  pyfr      libpython3.11.so.1.0                               [.] _PyEval_EvalFrameDefault
     1.00%  pyfr      libpython3.11.so.1.0                               [.] 0x0000000000192fb0
     0.96%  pyfr      libpython3.11.so.1.0                               [.] 0x00000000001949c0
     0.80%  pyfr      libmetis.so.0                                      [.] libmetis__FM_Mc2WayCutRefine
     0.59%  pyfr      libpython3.11.so.1.0                               [.] _PyType_Lookup
     0.57%  pyfr      libmetis.so.0                                      [.] libmetis__rpqInsert

where METIS is just a rounding error in the runtime.

@karypis
Copy link
Contributor

karypis commented Aug 16, 2023

Can you share the graphs in Metis format to reproduce this locally?

@FreddieWitherden
Copy link
Author

So I sat down and bisected the git revisions and found the culprit was:

5ba1580

which causes ABI breakage. Without recompilation, any METIS 5.1 application will pass an incorrect options array with 5.2 due to every option past METIS_OPTION_DBGLVL being shifted down by one.

I'll put together a PR later which gives these enum options explicit values so such breakage can be avoided in the future as/when new options are added.

@svigerske
Copy link

This reordering of options broke using METIS 5.2.1 from MUMPS for me.
They have code like

  MUMPS_INT ncon, edgecut, options[40];
  ierr=METIS_SetDefaultOptions(options);
  options[0]  = 0;
  /* Use 1-based fortran numbering */
  options[17] = 1;
  ncon        = 1;
  ierr = METIS_PartGraphKway(n, &ncon, iptr, jcn,
                             NULL, NULL, NULL,
                             k, NULL, NULL, options,
                             &edgecut, part);

and I got a lot of complaints from Metis about the graph to the log, and then some crash.
Of course, it's not good that the Mumps people assumed that METIS_OPTION_NUMBERING will always be 17 (they even include metis.h), but it seems that it could have been easily avoided in the metis side, too (or could be fixed in 5.2.2).

If I change

@@ -271,12 +271,10 @@ typedef enum {
   METIS_OPTION_IPTYPE,
   METIS_OPTION_RTYPE,
   METIS_OPTION_DBGLVL,
-  METIS_OPTION_NIPARTS,
   METIS_OPTION_NITER,
   METIS_OPTION_NCUTS,
   METIS_OPTION_SEED,
   METIS_OPTION_NO2HOP,
-  METIS_OPTION_ONDISK,
   METIS_OPTION_MINCONN,
   METIS_OPTION_CONTIG,
   METIS_OPTION_COMPRESS,
@@ -285,6 +283,8 @@ typedef enum {
   METIS_OPTION_NSEPS,
   METIS_OPTION_UFACTOR,
   METIS_OPTION_NUMBERING,
+  METIS_OPTION_NIPARTS,
+  METIS_OPTION_ONDISK,
   METIS_OPTION_DROPEDGES,
 
   /* Used for command-line parameter purposes */

Mumps works fine again. (I would have complained there if they had a public issue tracker :))

@mikemhenry
Copy link

I am maintaining the conda-forge build of METIS, so when the dust settles here let me know and I can bump the version and/or add a patch.

@traversaro
Copy link

So I sat down and bisected the git revisions and found the culprit was:

5ba1580

which causes ABI breakage. Without recompilation, any METIS 5.1 application will pass an incorrect options array with 5.2 due to every option past METIS_OPTION_DBGLVL being shifted down by one.

Just for the sake of completeness, that commit is also included in METIS 5.1.1, so even an application built with METIS 5.1.0 will already gave wrong results when used at runtime with METIS 5.1.1 .

@traversaro
Copy link

For reference, this is moptions_et in METIS 5.1.0 :

/*! Options codes (i.e., options[]) */
typedef enum {
  METIS_OPTION_PTYPE,
  METIS_OPTION_OBJTYPE,
  METIS_OPTION_CTYPE,
  METIS_OPTION_IPTYPE,
  METIS_OPTION_RTYPE,
  METIS_OPTION_DBGLVL,
  METIS_OPTION_NITER,
  METIS_OPTION_NCUTS,
  METIS_OPTION_SEED,
  METIS_OPTION_NO2HOP,
  METIS_OPTION_MINCONN,
  METIS_OPTION_CONTIG,
  METIS_OPTION_COMPRESS,
  METIS_OPTION_CCORDER,
  METIS_OPTION_PFACTOR,
  METIS_OPTION_NSEPS,
  METIS_OPTION_UFACTOR,
  METIS_OPTION_NUMBERING,

  /* Used for command-line parameter purposes */
  METIS_OPTION_HELP,
  METIS_OPTION_TPWGTS,
  METIS_OPTION_NCOMMON,
  METIS_OPTION_NOOUTPUT,
  METIS_OPTION_BALANCE,
  METIS_OPTION_GTYPE,
  METIS_OPTION_UBVEC
} moptions_et;

and this is in METIS 5.1.1 and 5.2.1 :

/*! Options codes (i.e., options[]) */
typedef enum {
  METIS_OPTION_PTYPE,
  METIS_OPTION_OBJTYPE,
  METIS_OPTION_CTYPE,
  METIS_OPTION_IPTYPE,
  METIS_OPTION_RTYPE,
  METIS_OPTION_DBGLVL,
  METIS_OPTION_NIPARTS,
  METIS_OPTION_NITER,
  METIS_OPTION_NCUTS,
  METIS_OPTION_SEED,
  METIS_OPTION_NO2HOP,
  METIS_OPTION_ONDISK,
  METIS_OPTION_MINCONN,
  METIS_OPTION_CONTIG,
  METIS_OPTION_COMPRESS,
  METIS_OPTION_CCORDER,
  METIS_OPTION_PFACTOR,
  METIS_OPTION_NSEPS,
  METIS_OPTION_UFACTOR,
  METIS_OPTION_NUMBERING,
  METIS_OPTION_DROPEDGES,

  /* Used for command-line parameter purposes */
  METIS_OPTION_HELP,
  METIS_OPTION_TPWGTS,
  METIS_OPTION_NCOMMON,
  METIS_OPTION_NOOUTPUT,
  METIS_OPTION_BALANCE,
  METIS_OPTION_GTYPE,
  METIS_OPTION_UBVEC
} moptions_et;

the diff is:

--- 5.1.0
+++ 5.1.1
@@ -6,10 +6,12 @@
   METIS_OPTION_IPTYPE,
   METIS_OPTION_RTYPE,
   METIS_OPTION_DBGLVL,
+  METIS_OPTION_NIPARTS,
   METIS_OPTION_NITER,
   METIS_OPTION_NCUTS,
   METIS_OPTION_SEED,
   METIS_OPTION_NO2HOP,
+  METIS_OPTION_ONDISK,
   METIS_OPTION_MINCONN,
   METIS_OPTION_CONTIG,
   METIS_OPTION_COMPRESS,
@@ -18,6 +20,7 @@
   METIS_OPTION_NSEPS,
   METIS_OPTION_UFACTOR,
   METIS_OPTION_NUMBERING,
+  METIS_OPTION_DROPEDGES,
 
   /* Used for command-line parameter purposes */
   METIS_OPTION_HELP,

@traversaro
Copy link

For anyone interested, a patch that make mumps 5.2.1 work with metis 5.1.1 and 5.2.1 (but breaking compatibility with metis 5.1.0) that seems to work is available at https://github.com/conda-forge/mumps-feedstock/blob/c524cb3c71686bee59d9b12df5d9d6ce20782ce4/recipe/mumps_support_only_metis_5_1_1.patch .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants