Skip to content
This repository has been archived by the owner on Jan 30, 2024. It is now read-only.

2.2.0 (25) #217

Merged
merged 47 commits into from
Oct 10, 2019
Merged

2.2.0 (25) #217

merged 47 commits into from
Oct 10, 2019

Conversation

PalNilsson
Copy link
Collaborator

  • Added MANIFEST file - Pilot 2 is now registered with pypi which means it can be pip-installed without referencing github
    • pip install panda-pilot
  • Data component upgrade
    • Refactored and unified ES StagingClients
    • Automatically prefer LAN protocol (read_lan/write_lan) for stage-in/stage-out file if source/destination RSE is local for given PQ (defined in inputddms=astorages['read_lan'])
    • Base movers workflow upgraded
    • Introduced require_input_protocols mode to look up and manually form input replicas for specific copytool (activated for the objectstore mover, ES workflow)
    • Refactored and simplified the objectstore copytool
    • Implemented fail-over transfer for ES stage-out
  • Preparing for containerized middleware commands [minor update]
  • Added debug messages for potential problem with relying on SC_CLK_TCK
  • Added LFN in diagnostics message for checksum errors. Corrected mislabelled checksum types (MD5SUM reported instead of ADLER32). Requested by R. Walker
  • Cleaned up stage-in/out error messages containing irrelevant Traceback info (should now be concise)
  • Following an update in the auto-setup script, the pilot is now using RUCIO_LOCAL_SITE_ID instead of the deprecated DQ2_LOCAL_SITE_ID for localsite in Rucio traces
  • Simplification of pilot arguments: now using resource name from queuedata instead of relying on pilot option -r (which can now be removed from wrapper)
  • Instead of a traceback, now reporting the real error returned from rucio download or upload. However, the current version of rucio does not propagate errors well so the message will always be "None of the requested files have been downloaded". D. Cameron is working on fixing this so a future version of Rucio will report the real error
  • Changed minimum allowed local space from 5 GB to 2 GB (as verified during payload running); the higher limit affected event index jobs run at OU. Requested by H. Severini
  • Pilot is now always setting ATHENA_CORE_NUMBER (previously only set for event service jobs)
  • Updated memory leak calculation to be consistent with new prmon field names (changed PSS+Swap to pss+swap)
  • Added new error code 1352, “Failed to stat proc file for CPU consumption calculation” which is set when the pilot cannot access /prod/pid/stat. Requested by P. Svirin
  • Corrected the local/remoteSite sent with the traces - previously if the pilot overwrote the requested ddmendpoint (ie if the requested ddmendpoint was not allowed), then the trace was not updated as well. Now it is.

Code contributions from D. Cameron, A. Anisenkov, W. Guan, F. Barreiro, P. Nilsson

PalNilsson and others added 30 commits September 9, 2019 13:48
… instead of DQ2_LOCAL_SITE_ID in traces if it is available, otherwise still use DQ2_LOCAL_SITE_ID (remove asap)
…data instead of relying on pilot option -r <resource name>
…lot side as an extra layer of protection for cases where the time-out mechanism on the rucio api side does not work - pilot waits an additional 10 s to let rucio abort first
@PalNilsson PalNilsson merged commit e9d7bb4 into PanDAWMS:next Oct 10, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant