Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MUON: add remote workflows for fwd matched tracks #638

Merged
merged 2 commits into from
Mar 18, 2024

Conversation

aferrero2707
Copy link
Contributor

This adds a new switch to enable/disable the QC node workflows for forward matched racks, similar to what is already done for ITS-TPC.

Three alternative workflows are selected depending on the list of detectors (MFT/MCH/MID) included in the data taking.

This adds a new switch to enable/disable the QC node workflows for
forward matched racks, similar to what is already done for ITS-TPC.

Three alternative workflows are selected depending on the list of
detectors (MFT/MCH/MID) included in the data taking.
- name: qc-remote-workflow-fwd # GLO is not a detector so we won't iterate on it with 'detectors' and we need special enable logic
enabled: "{{ 'MCH' in json.Unmarshal(detectors) }}"
vars:
qc_remote_workflow: "{{ util.PrefixedOverride( 'qc_remote_workflow', 'glo' ) }}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@knopers8 it is ok to use the glo prefix for the workflow in two separate rules?

Copy link
Collaborator

@knopers8 knopers8 Mar 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, i don't think this variable is used for generating the actual workflow. It is probably a leftover of earlier approaches.

The only potential issue I can see is that qc_remote_workflow != 'none' is used to enable shmem cleanup. While I see in a prod environment that it is not enabled, to be on the safe side, I would propose to remove the fairmq-shmmonitor from this role and from the role qc-remote-workflow-glo.

The problem with this could appear if we have two environments, one for ITS, one for MCH, running at the same time. The first one to be destroyed will affect the rest of the processing of the other environment on the same machine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@knopers8 thanks! I have removed the fairmq-shmmonitor from the two roles. I will organize a test in production with replay data to check that all is fine, and post here the outcome. The initial version was already checked in staging.

@aferrero2707
Copy link
Contributor Author

@vascobarroso @knopers8 @teo could you please have a look and check if it is all good? Thanks!

@aferrero2707
Copy link
Contributor Author

@knopers8 @teo we just took one validation run in production (548347), seems to be working properly.

@knopers8 knopers8 merged commit 0d27fb9 into AliceO2Group:master Mar 18, 2024
1 check passed
@vascobarroso
Copy link
Member

@aferrero2707 @knopers8 should this go into production tomorrow with the deployment ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants