Unexpected memory usage when scaling up via num_generation_jobs #8

khurtado · 2021-07-27T16:27:01Z

Hello,

This was discussed via slack at some point, so I just wanted to open an issue so this is not forgotten.
When scaling up a workflow via num_generation_jobs, the number of jobs in the physics stage increases properly, but the memory per job also considerably increases per job.

E.g.: If num_generation_jobs is increased by a factor 10 (from 6 to 60), memory usage per delphes job goes from ~700 MB to 7 GB e.g.:

num_generation_jobs: 6
12        733         /reana/users/00000000-0000-0000-0000-000000000000/workflows/653743d7-6878-4e6a-b991-72529e19aeed madminertool/madminer-workflow-ph:0.3.0 sh -c '/madminer/scripts/4_delphes.sh -p /madminer -m software/MG5_aMC_v2_9_3 -c /reana/users/00000000-0000-0000-0000-000000000000/workflows/653743d
7-6878-4e6a-b991-72529e19aeed/workflow_ph/configure/data/madminer_config.h5 -i /reana/users/00000000-0000-0000-0000-000000000000/workflows/653743d7-6878-4e6a-b991-72529e19aeed/ph/input.yml -e /reana/users/00000000-0000-0000-0000-000000000000/workflows/653743d7-6878-4e6a-b991-72529e19aeed/workflow_ph/pythia
_0/events/Events.tar.gz -o /reana/users/00000000-0000-0000-0000-000000000000/workflows/653743d7-6878-4e6a-b991-72529e19aeed/workflow_ph/delphes_0'```

num_generation_jobs: 60
122 7325 /reana/users/00000000-0000-0000-0000-000000000000/workflows/8aa9df1b-168b-4622-981e-01be73344b90 madminertool/madminer-workflow-ph:0.3.0 sh -c '/madminer/scripts/4_delphes.sh -p /madminer -m software/MG5_aMC_v2_9_3 -c /reana/users/00000000-0000-0000-0000-000000000000/workflows/8aa9df1b-168b-4622-981e-01be73344b90/workflow_ph/configure/data/madminer_config.h5 -i /reana/users/00000000-0000-0000-0000-000000000000/workflows/8aa9df1b-168b-4622-981e-01be73344b90/ph/input.yml -e /reana/users/00000000-0000-0000-0000-000000000000/workflows/8aa9df1b-168b-4622-981e-01be73344b90/workflow_ph/pythia_33/events/Events.tar.gz -o /reana/users/00000000-0000-0000-0000-000000000000/workflows/8aa9df1b-168b-4622-981e-01be73344b90/workflow_ph/delphes_33'


This will put a limit on the factor the madminer workflow can be scaled up to, as this will be linked to the memory available per worker in the cluster we are submitting to.

Is this something that is understood or does it need some investigation?
Is this something that require any fix?

The text was updated successfully, but these errors were encountered:

Sinclert · 2021-07-28T13:56:46Z

Hi @khurtado ,

Thanks for the debugging efforts 😄

In principle, that seems like an undesired behaviour. I said "in principle" because I do not fully understand how Pythia and Delphes makes use of the scaling factor (a.k.a number of jobs the workflow needs * N) internally.

In theory, each parallel job computes a set of values for a given benchmark (sm, w...) so it makes sense to compute them in parallel. In this scenario, increasing the number of jobs so that there are more than one job per benchmark, is a way to parallelize the computation of every single benchmark on its own. I am unsure if Pythia / Delphes are prepared to handle this, and if so, how it is done.

If you could confirm that this internal parallelization of benchmark-based computed values makes sense, and it is done correctly, then we could start debugging the memory consumption of each job.

As an initial hint, I always found this particular code snippet a bit funny. Bear in mind it is a rewrite from its older version, which generated a similar list. Maybe @irinaespejo knows where this code snippet comes from.

irinaespejo · 2021-08-02T15:55:26Z

Hi @khurtado,

Thanks for the update. I think @Sinclert 's intuition is right. We need to investigate how to parallelize the jobs that have the same benchmark within a Pythia+Delphes step (so 6 times) instead of calling Pythia+Delphes 6*n_jobs times. I'm looking into the snippet. Luckily, Delphes is on github delphes/delphes and Pythia alisw/pythia8 so we can ask the developer team.

irinaespejo · 2021-08-03T14:34:56Z

Hi @khurtado, is there a way we can access the cluster you are using for debugging purposes? thank you!

khurtado · 2021-08-03T17:24:49Z

@irinaespejo Yes, let's discuss via slack

irinaespejo · 2021-08-24T16:38:10Z

Hi all,

@Sinclert and I discussed a solution offline and I'll write it here for the record:

proposed solution

The problem of this issue is that the madminer-workflow, particularly the Pythia and Delphes steps do no scale well.
Right now, we control the number of jobs by an external parameter called num_generation_jobs (here) i.e. the number of arrows (or jobs) leaving the generate step in the current architecture is num_generation_jobs. Each arrow leaving the generate step will make computations according to the distribution of the benchmarks which is controlled by this snippet. This means a Pythia and a Delphes instance is called num_generation_jobs times. Which could be a cause for the bad scalability.

Instead, we propose a subtle change in the architecture of the workflow. The number of arrows (jobs) leaving the generate step will be num_benchmarks and not num_generation_jobs. The each arrow will pass num_jobs to the Pythia and Delphes state. We hope that Delphes and Pythia will know how to internally parallelize a big chunk of jobs. Maybe @khurtado can comment on this Delphes/Pythia internal parallelization.

The num_benchmarks depends on the user-specified benchmarks here and on morphing max_overall_power

Changes to make:

2_generate.sh change num_generation_jobs for num_benchmarks
generate.py remove snippet and replace by distribution of jobs for each arrow (benchmark). This is unclear how to do.

(please do not hesitate to update the to-do list in the comments below)

Non-solved questions about the proposed solution

Crank up scalability using events instead of jobs

khurtado · 2021-08-24T18:46:31Z

This makes sense and sounds good to me!
I don't know much about the internal parallelization details on Delphes/Pythia unfortunately, so I can't comment on that.

Please, let me know once changes are done and I would be happy to test (or if I can help with anything besides testing).

Sinclert · 2021-08-26T14:19:15Z

After a bit of research, it seems that MadGraph (the pseudo-engine used to run Pythia and Delphes), have an optional argument called run_mode (MadGraph forum comment).

This could be used to specify:

👎🏻 run_mode=0: single core (no parallelization).
👎🏻 run_mode=1: cluster mode (not useful, as we are relying on REANA to deal with back-ends).
⭐ run_mode=2: multi-core (process-based parallelization).

Sadly, I could not find an official reference to this argument, so not sure if the accepted values have changed on modern versions of MadGraph (2.9.X and 3.X.X). In any case, this would be the "last piece" to migrate:

From: split num_jobs among M benchmarks.
To: assign num_jobs to each of the M benchmarks.

irinaespejo · 2021-08-26T18:35:08Z

Wow that's interesting. Maybe just assigning run_mode=2 with the current architecture is able to scale. I'll try it and get back to you tomorrow.

khurtado · 2021-08-31T17:39:19Z

@Sinclert the options for run_mode seem to be the same in modern versions of Madgraph:

https://bazaar.launchpad.net/~madteam/mg5amcnlo/3.x/view/head:/Template/LO/README#L80

Sinclert · 2021-09-01T11:59:33Z

@khurtado @irinaespejo

I have created a new branch, mg_process_parallelization, to implement the changes we discussed about. In principle, the Docker image coming from that branch (madminer-workflow-ph:0.5.0-test) should be able to parallelize the MadGraph steps of each benchmark.

In a nutshell:

The jobs-to-benchmark distribution snippet has been removed.
The shell script loop generating the .tar.gz folders now iterates on the number of benchmarks.
A me5_configuration.txt file has been added to the set of cards, with options:
- run_mode=2: to run in multi-core mode.
- nb_core=None: to assign as many processes as cores detected.

Bear in mind that the num_generation_jobs workflow-level parameter has not been removed, but it is currently useless, as we are setting the number of parallel processes, per benchmark, by the maximum number possible (usingnb_core=None).

Let me know if fine-tunning the number of processes per benchmark is something of interest.

Please, run the sub-workflow with the new Docker image (0.5.0-test), and compare the results with the old one (0.4.0).

irinaespejo · 2021-09-01T14:48:49Z

@Sinclert wow nice, I was also working on this without success. Regarding point 3

A me5_configuration.txt file has been added to the set of cards, with options:
run_mode=2: to run in multi-core mode.
nb_core=None: to assign as many processes as cores detected.

When I uncommented # run_mode=2 and ran the workflow on yadage-run I saw that there where cards still with the uncommented # run_mode=2 begin created in the generate step ans transmitted to he pythia step.

I think the easiest solution to check whether we are really running on run_mode=2 is that @khurtado runs the branch mg-process-parallelization on the VT3 cluster and lets us know if the scalability issue is solved. @khurtado let us know right away of you run into trouble. Thank you!!

irinaespejo · 2021-09-01T14:51:09Z

Actually, since I have access to the cluster, I'm going to run the branch mg-process-parallelization workflow now

irinaespejo · 2021-09-01T19:50:42Z

Hi everyone,

The results from running scailfin/madminer-workflow-ph (mg-process-parallelization) on VC3:

Sanity checks:

The workflow finishes successfully (the status is running but all files of the steps are there)
The physics workflow indeed uses the branch code

Other checks:
3. The pythia stage preserves the change run_mode = 2 introduced in the docker image here

The command grep -R "run_mode = 2" shows indeed that

./pythia_3/mg_processes/signal/Cards/me5_configuration.txt:run_mode = 2
./pythia_3/mg_processes/signal/madminer/cards/me5_configuration_0.txt:run_mode = 2
./delphes_3/extract/madminer/cards/me5_configuration_3.txt:run_mode = 2
(and all the other pythia and delphes steps)
All good!

Now, scalability tests? Answering to Sinclert, yes we are interested in fine-tunning num of processes per benchmark.

irinaespejo · 2021-09-02T17:51:16Z

Memory usage results from running branch mg_process_parallelization

example of Delphes
ClusterId MemoryUsage Args
217 318 /reana/users/00000000-0000-0000-0000-000000000000/workflows/e23b2ff1-8625-45da-8c13-0c05665dd6e2 madminertool/madminer-workflow-ph:0.5.0-test sh -c '/madminer/scripts/4_delphes.sh -p /madminer -m software/MG5_aMC_v2_9_4 -c /reana/users/00000000-0000-0000-0000-000000000000/workflows/e23b2ff1-8625-45da-8c13-0c05665dd6e2/workflow_ph/configure/data/madminer_config.h5 -i /reana/users/00000000-0000-0000-0000-000000000000/workflows/e23b2ff1-8625-45da-8c13-0c05665dd6e2/ph/input.yml -e /reana/users/00000000-0000-0000-0000-000000000000/workflows/e23b2ff1-8625-45da-8c13-0c05665dd6e2/workflow_ph/pythia_4/events/Events.tar.gz -o /reana/users/00000000-0000-0000-0000-000000000000/workflows/e23b2ff1-8625-45da-8c13-0c05665dd6e2/workflow_ph/delphes_4'

example of Pythia:
210 196 /reana/users/00000000-0000-0000-0000-000000000000/workflows/e23b2ff1-8625-45da-8c13-0c05665dd6e2 madminertool/madminer-workflow-ph:0.5.0-test sh -c '/madminer/scripts/3_pythia.sh -p /madminer -m software/MG5_aMC_v2_9_4 -z /reana/users/00000000-0000-0000-0000-000000000000/workflows/e23b2ff1-8625-45da-8c13-0c05665dd6e2/workflow_ph/generate/folder_0.tar.gz -o /reana/users/00000000-0000-0000-0000-000000000000/workflows/e23b2ff1-8625-45da-8c13-0c05665dd6e2/workflow_ph/pythia_3'

irinaespejo · 2021-09-07T16:16:26Z

Hi @Sinclert, I've been testing the mg-process-parallelization branch on scailfin/workflow-madminer-ph. When running make yadage-run I found the following error

on the file .yadage/workflow_ph/generate/_packtivity/generate.run.log there's

2021-09-07 09:10:38,583 | pack.generate.run | INFO | starting file logging for topic: run
2021-09-07 09:11:03,362 | pack.generate.run | INFO | b'Benchmark: 0 sm'
2021-09-07 09:11:03,362 | pack.generate.run | INFO | b'Benchmark: 1 w'
2021-09-07 09:11:03,362 | pack.generate.run | INFO | b'Benchmark: 2 morphing_basis_vector_2'
2021-09-07 09:11:03,362 | pack.generate.run | INFO | b'Benchmark: 3 morphing_basis_vector_3'
2021-09-07 09:11:03,362 | pack.generate.run | INFO | b'Benchmark: 4 morphing_basis_vector_4'
2021-09-07 09:11:03,362 | pack.generate.run | INFO | b'Benchmark: 5 morphing_basis_vector_5'
2021-09-07 09:11:03,610 | pack.generate.run | INFO | b"sed: can't read s/nb_core = None/nb_core = 1/: No such file or directory"

This was solved by doing the following changes:

remove from this line in scripts/2_generate.sh the " ". The new line should look like sed -i \
re-build the madminer-workflow-ph image, in this case I named it madminertool/madminer-workflow-ph:0.5.0-test-2
update the image tag in steps.yml here
make yadage-run

The workflow finishes successfully now without any further errors.

Sinclert · 2021-09-07T16:33:56Z

Hi @irinaespejo ,

I included "" because of macOS compatibility. I thought it was a quick fix to make the script runnable both in macOS and Linux. It seems it did not work.

According to this StackOverflow post, we could achieve this by using the -e flag instead. Could you try the following snippet and confirm that it runs on Linux?

sed -i \
    -e "s/${default_spec}/${custom_spec}/" \
    "${SIGNAL_ABS_PATH}/madminer/cards/me5_configuration_${i}.txt"

irinaespejo · 2021-09-07T17:36:14Z

I just tested the snippet you posted and it runs successfully ✔️ (my upload internet connection is pretty slow)

Sinclert · 2021-09-09T09:14:04Z

The PR changing the parallelization strategy (#11) has been merged.

We should be in a better spot to test the total time + memory consumption of each benchmark job.

Sinclert · 2021-10-05T16:10:45Z

Hi @khurtado and @irinaespejo,

Is there anything else to discuss within this issue? Have you tried the latest version of the workflow?

irinaespejo · 2021-10-07T17:28:47Z

The last version of the workflow ran succesfully after Kenyi did some fixing with the cluster permits.
@khurtado how is the situation in the cluster to submit computationally intensive workflows? Can we just try? Thanks!!

khurtado · 2021-10-07T20:49:22Z

@irinaespejo Yes, the cluster should have workers to work with. I still need to fix the website certs, I will do that tomorrow.

Sinclert · 2021-10-19T18:24:47Z

Hi. I am closing this issue for now.

For future reporting of performance issues / configuration tweaks / etc, please, open a separate issue.

Sinclert mentioned this issue Aug 30, 2021

increasing num_generation_jobs fails madminer-tool/madminer-workflow#43

Closed

Sinclert mentioned this issue Sep 8, 2021

Change parallelization strategy #11

Merged

Sinclert changed the title ~~Scaling up via num_generation_jobs increases the memory usage per job by a somewhat similar factor~~ Unexpected memory usage when scaling up via num_generation_jobs Oct 5, 2021

Sinclert closed this as completed Oct 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected memory usage when scaling up via num_generation_jobs #8

Unexpected memory usage when scaling up via num_generation_jobs #8

khurtado commented Jul 27, 2021

Sinclert commented Jul 28, 2021

irinaespejo commented Aug 2, 2021 •

edited

Loading

irinaespejo commented Aug 3, 2021

khurtado commented Aug 3, 2021

irinaespejo commented Aug 24, 2021

khurtado commented Aug 24, 2021

Sinclert commented Aug 26, 2021

irinaespejo commented Aug 26, 2021

khurtado commented Aug 31, 2021

Sinclert commented Sep 1, 2021 •

edited

Loading

irinaespejo commented Sep 1, 2021 •

edited

Loading

irinaespejo commented Sep 1, 2021

irinaespejo commented Sep 1, 2021

irinaespejo commented Sep 2, 2021

irinaespejo commented Sep 7, 2021 •

edited

Loading

Sinclert commented Sep 7, 2021

irinaespejo commented Sep 7, 2021

Sinclert commented Sep 9, 2021

Sinclert commented Oct 5, 2021

irinaespejo commented Oct 7, 2021

khurtado commented Oct 7, 2021

Sinclert commented Oct 19, 2021

Unexpected memory usage when scaling up via num_generation_jobs #8

Unexpected memory usage when scaling up via num_generation_jobs #8

Comments

khurtado commented Jul 27, 2021

Sinclert commented Jul 28, 2021

irinaespejo commented Aug 2, 2021 • edited Loading

irinaespejo commented Aug 3, 2021

khurtado commented Aug 3, 2021

irinaespejo commented Aug 24, 2021

proposed solution

khurtado commented Aug 24, 2021

Sinclert commented Aug 26, 2021

irinaespejo commented Aug 26, 2021

khurtado commented Aug 31, 2021

Sinclert commented Sep 1, 2021 • edited Loading

irinaespejo commented Sep 1, 2021 • edited Loading

irinaespejo commented Sep 1, 2021

irinaespejo commented Sep 1, 2021

irinaespejo commented Sep 2, 2021

irinaespejo commented Sep 7, 2021 • edited Loading

Sinclert commented Sep 7, 2021

irinaespejo commented Sep 7, 2021

Sinclert commented Sep 9, 2021

Sinclert commented Oct 5, 2021

irinaespejo commented Oct 7, 2021

khurtado commented Oct 7, 2021

Sinclert commented Oct 19, 2021

irinaespejo commented Aug 2, 2021 •

edited

Loading

Sinclert commented Sep 1, 2021 •

edited

Loading

irinaespejo commented Sep 1, 2021 •

edited

Loading

irinaespejo commented Sep 7, 2021 •

edited

Loading