Skip to content
This repository has been archived by the owner on Jan 30, 2024. It is now read-only.

2.3.1 (30)

Compare
Choose a tag to compare
@PalNilsson PalNilsson released this 14 Nov 09:25
· 1211 commits to master since this release
fd71c50
  • Internal changes to enable Python 3 support (in progress)
  • Reset of a timing counter that led to increasing setup time measurements in multi jobs.
    • Reported by Claire A. Bourdarios (not released as planned in v 2.2.2)
  • Prevented a case where a failed transfer led to failure to send final server update. Reported by R. Walker
  • Now cutting the tail of long stage-in/out error messages rather than the beginning, which cut the LFN+DDM endpoint info in some cases. Removed useless "Copy operation failed .." sub string from stage-in/out messages
  • Switched off containers when neither platform nor alrbuserplatform are set. Requested by T. Maeno
  • Now only including logExtracts for failed/holding jobs, which also cleans up the log somewhat (the logExtracts are just the tail of the pilot log and is never interesting for finished jobs)
  • Using Rucio traces info to improve error messages from Rucio (requires version 1.20.8). Note: this change means we break compatibility with SLC6 since only deprecated rucio versions run there (a few queues in test mode remain on SLC6)
  • Resolving the real source location when using mv
  • Data API and top workflow updates
    • Direct access workflow upgraded (support allow_lan & allow_wan, direct_access_lan & direct_access_wan)
    • Removed direct access handling from copytools (gfal, lsm, rucio, xrdcp), the logic is applied at the top level
    • Stage-in: ignore already processed files on stage-in retry with failover copy tool
    • (Additional details below)

Contributions from A. Anisenkov, D. Cameron, P. Nilsson. Thanks to Ilija Vukotic for help with testing.

Affected (direct) access settings:

Job settings requested by server: transfertype=direct (per job) for remoteio, storage_token=local * (per specific file) for copy2scratch

  • Job input options: --accessmode=direct for remoteio, --accessmode=copy or --useLocalIO for copy2scratch
  • PandaQueue settings: allow_lan + direct_acccess_lan for remoteio over LAN, allow_wan + direct_access_wan for remoteio over WAN

Logic

  • By default access method is copyt2scrach.
  • Each input file with transfertype = 'direct' and storage_token!=local as requested by PanDA could potentially use remotio (direct access)
  • Job input options (--accessmode, --useLocalIO) overwrite PanDA server decision
  • PandaQueue configuration takes finally controls either LAN/WAN replica can be used and direct access method could be applied for given file, pilot takes PanDA server decision and Job input options and apply for them PandaQueue settings:
    • for LAN: requires: allow_lan=True + direct_access_lan=True + and availability of appropriate replica for remote io
    • for WAN: requires: allow_wan=True + direct_access_wan=True + and availability of appropriate replica for remote io