Skip to content
This repository has been archived by the owner on Jan 30, 2024. It is now read-only.

2.2.0 #219

Merged
merged 68 commits into from
Oct 10, 2019
Merged

2.2.0 #219

merged 68 commits into from
Oct 10, 2019

Conversation

PalNilsson
Copy link
Collaborator

  • Added MANIFEST file - Pilot 2 is now registered with pypi which means it can be pip-installed without referencing github
    • pip install panda-pilot
  • Data component upgrade
    • Refactored and unified ES StagingClients
    • Automatically prefer LAN protocol (read_lan/write_lan) for stage-in/stage-out file if source/destination RSE is local for given PQ (defined in inputddms=astorages['read_lan'])
    • Base movers workflow upgraded
    • Introduced require_input_protocols mode to look up and manually form input replicas for specific copytool (activated for the objectstore mover, ES workflow)
    • Refactored and simplified the objectstore copytool
    • Implemented fail-over transfer for ES stage-out
  • Preparing for containerized middleware commands [minor update]
  • Added debug messages for potential problem with relying on SC_CLK_TCK
  • Added LFN in diagnostics message for checksum errors. Corrected mislabelled checksum types (MD5SUM reported instead of ADLER32). Requested by R. Walker
  • Cleaned up stage-in/out error messages containing irrelevant Traceback info (should now be concise)
  • Following an update in the auto-setup script, the pilot is now using RUCIO_LOCAL_SITE_ID instead of the deprecated DQ2_LOCAL_SITE_ID for localsite in Rucio traces
  • Simplification of pilot arguments: now using resource name from queuedata instead of relying on pilot option -r (which can now be removed from wrapper)
  • Instead of a traceback, now reporting the real error returned from rucio download or upload. However, the current version of rucio does not propagate errors well so the message will always be "None of the requested files have been downloaded". D. Cameron is working on fixing this so a future version of Rucio will report the real error
  • Changed minimum allowed local space from 5 GB to 2 GB (as verified during payload running); the higher limit affected event index jobs run at OU. Requested by H. Severini
  • Pilot is now always setting ATHENA_CORE_NUMBER (previously only set for event service jobs)
  • Updated memory leak calculation to be consistent with new prmon field names (changed PSS+Swap to pss+swap)
  • Added new error code 1352, “Failed to stat proc file for CPU consumption calculation” which is set when the pilot cannot access /prod/pid/stat. Requested by P. Svirin
  • Corrected the local/remoteSite sent with the traces - previously if the pilot overwrote the requested ddmendpoint (ie if the requested ddmendpoint was not allowed), then the trace was not updated as well. Now it is.

Code contributions from D. Cameron, A. Anisenkov, W. Guan, F. Barreiro, P. Nilsson

anisyonk and others added 30 commits August 3, 2019 16:50
 - refactor and unify ES StagingClients
 - automatically prefer `LAN` protocol for stage-in/stage-out transfers if source/destination RSE is local for given PQ (defined in `inputddms=astorages['read_lan']`)
 - introduce `require_input_protocols` mode to look up and manually form input replicas for specific copytool (activated for the objecstore mover)
 - refactor and simplify `objecstore` copytool

To BE TESTED
… for allowed schemas)

 - add root:// protocol into supported list for the gfal sitemover
fix typo in StageInESClient
PalNilsson and others added 29 commits September 24, 2019 16:31
…lot side as an extra layer of protection for cases where the time-out mechanism on the rucio api side does not work - pilot waits an additional 10 s to let rucio abort first
fix stage in/out code and propagate first error message
Propagate error info from rucio traces
… in case the requested ddmpoint was not allowed and thus overwritten - trace report was not updated for this; also, the localsite was only set to the RSE value and not the env variable RUCIO_LOCAL_SITE_ID)
…t calls since the rucio client was rolled back to the previous version
@PalNilsson PalNilsson merged commit 092335b into master Oct 10, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants