Skip to content
This repository was archived by the owner on Jan 30, 2024. It is now read-only.

2.12.1.62 #353

Merged
merged 106 commits into from
Jun 30, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
106 commits
Select commit Hold shift + click to select a range
0d07511
New version 2.11.3.1
PalNilsson May 12, 2021
830ab0a
Added diagnostics to failed remote file open verification
PalNilsson May 13, 2021
2f86607
Updated comment
May 13, 2021
ef73e5b
Updated comment
May 13, 2021
0b928df
Updated comment
May 13, 2021
5213742
Updated log message
PalNilsson May 13, 2021
d67cab8
Updated comment
PalNilsson May 17, 2021
f3bbf96
Now adding resimevents to job metrics also when it is zero
PalNilsson May 17, 2021
9787898
Changed gs.py for GCS buckets with automatic bucket name extraction
May 18, 2021
c54abb2
Removed default protocol value from trace - previously set to copy to…
PalNilsson May 19, 2021
8273d47
Added new pilot option for turning on/off rucio traces
PalNilsson May 19, 2021
77589e8
Pilot now only sends rucio traces if required
PalNilsson May 19, 2021
396c66d
Removed -p all option for xcache kill. Added debug info for xcache me…
PalNilsson May 21, 2021
fca62f2
Adjusted gs.py text format to pass the flake8 check
May 21, 2021
37b26eb
Corrected xcache kill. Refactored env var support functions. Initial …
PalNilsson May 25, 2021
d95d559
Added function comments
PalNilsson May 25, 2021
3f1df91
Moved xcache code to atlas area. Corrected xcache output handling. Lo…
PalNilsson May 25, 2021
515f623
Dask validation
May 25, 2021
d6e8c26
Removed args object from establish_logging()
May 25, 2021
14a86fd
Added logging to dask
May 25, 2021
8669f8d
Update
May 25, 2021
92b1571
Update
May 25, 2021
df9f0a0
Added override values
May 25, 2021
0d505f3
Update
May 25, 2021
72bc46d
Removed tabs
May 25, 2021
d13735e
Parsing of kubectl output
May 25, 2021
c0d7da5
Checking if service is running
May 25, 2021
7480a83
Update
May 25, 2021
02f40e0
Update
May 25, 2021
172cceb
Update
May 25, 2021
f91c677
Update
May 25, 2021
6b92b21
Update
May 25, 2021
60a9bef
Installation and uninstallation of dask
May 25, 2021
49e5c0c
Update
May 25, 2021
5826863
Connected cluster
May 25, 2021
c8cc5d5
Update
May 25, 2021
8126ef2
Update
May 25, 2021
6e2b151
Update
May 25, 2021
beed675
Update
May 25, 2021
7a177e4
Fixed bad xcache debugging
PalNilsson May 25, 2021
cac5620
Part 1 of 2; fix for postprocesses and xcache
PalNilsson May 25, 2021
4964894
Update
PalNilsson May 25, 2021
3eb851a
Flake8 corrections
PalNilsson May 25, 2021
e0c80b5
Flake8 corrections
PalNilsson May 25, 2021
574bdaa
Update
PalNilsson May 25, 2021
ab1406b
Now allowing for different dask_kubernetes manager
May 26, 2021
49b0775
Update
May 26, 2021
0b2462c
Updated post-process handling to support multiple post-processes (lik…
PalNilsson May 27, 2021
dd8e5aa
Updated post-process handling (added label) to support multiple post-…
PalNilsson May 27, 2021
e8d2fef
Refactored get_utility_commands()
PalNilsson May 27, 2021
9cc2fac
Update
PalNilsson May 27, 2021
18a30b7
Refactored validate()
PalNilsson May 27, 2021
c7bcc4c
Lowercased some variable names in gs.py to comply with flake8
May 27, 2021
cb8d6fb
Added error code for no ctypes. Now using ctypes to guarantee orphans…
PalNilsson May 27, 2021
e95e769
Merge pull request #350 from yesw2000/next
PalNilsson May 27, 2021
2cf2b5a
Implemented tail and ls debug commands
PalNilsson May 27, 2021
405fe95
Skipping xrootd when finding pid for prmon
PalNilsson May 28, 2021
fa2debc
Many fixes for debug mode, including full containerisation of gdb com…
PalNilsson Jun 4, 2021
523442d
Now moving raythena/AthenaMP output to shared directory (if --output-…
Jun 4, 2021
e826770
Added some debug info for direct access
PalNilsson Jun 7, 2021
be5e9ff
Flake8
PalNilsson Jun 7, 2021
de4e150
Refactoring
PalNilsson Jun 7, 2021
667b4fb
Will not fail jobs on sites that fail to import ctypes
PalNilsson Jun 8, 2021
b95210f
Fixed case where job object size calculation fails with exception due…
PalNilsson Jun 8, 2021
a9cac88
make free space check (check_availablespace) being optional for speci…
anisyonk Jun 9, 2021
d73a175
Merge pull request #351 from anisyonk/skip_spacecheck_mv
PalNilsson Jun 9, 2021
9ec4cad
Merge remote-tracking branch 'upstream/next' into next
PalNilsson Jun 9, 2021
196b79d
gdb updates. Now avoiding containerisation of gdb command since core …
PalNilsson Jun 9, 2021
8bb688d
Testing du. Lazy logging updates. Pylint fixes
PalNilsson Jun 11, 2021
552b548
General debug commands now supported. Added event number to job metri…
PalNilsson Jun 14, 2021
29a819a
Added prmon to list with unwanted files in looping job killer
PalNilsson Jun 14, 2021
f53b7d5
Pylint updates. Fixed debug mode to allow 'debug,<cmd+options>'
PalNilsson Jun 15, 2021
0dd9750
Pylint updates
PalNilsson Jun 15, 2021
f26dfde
Flake8 corrections
PalNilsson Jun 15, 2021
54b5a0a
Pylint corrections. Fixes for localSite problem in traces
PalNilsson Jun 17, 2021
a77cd07
Pylint corrections. Fixes for localSite problem in traces
PalNilsson Jun 17, 2021
cb0adac
Pylint corrections
PalNilsson Jun 17, 2021
5c09bb5
Flake8 correction. UTF-8 fix for Popen
PalNilsson Jun 17, 2021
c70d879
Fix pylint issues
Jun 21, 2021
d40f349
Remove trailing whitespace
Jun 21, 2021
11c1a4e
Fix flake8 issues
Jun 21, 2021
401dbfb
Avoiding the decode problem with strings. Added protection against UT…
PalNilsson Jun 21, 2021
7aa6bd8
Improve code in jobreport parsing function
Jun 21, 2021
20fa8d2
Merge pull request #4 from brinick/improve-code
PalNilsson Jun 21, 2021
df87ff5
Version update
PalNilsson Jun 21, 2021
e6106e2
Merge branch 'next' into pylint-fixes
Jun 21, 2021
2020e3d
Fix logging calls
Jun 22, 2021
8fc2d5f
Added handling for new preprocess exit codes. Removed thread name inf…
PalNilsson Jun 23, 2021
a304373
Merge pull request #3 from brinick/pylint-fixes
PalNilsson Jun 23, 2021
e234690
Updated build number after merge with Brinick's pylint updates
PalNilsson Jun 23, 2021
347b22c
Fix initialisation bug and logging bug
Jun 23, 2021
48de986
Merge pull request #5 from brinick/next
PalNilsson Jun 23, 2021
5c5b62b
Merge with Brinick's pylint updates. Nuber of concurrent remote file …
PalNilsson Jun 23, 2021
76d399a
Flake8 correction
PalNilsson Jun 23, 2021
90125ed
Popen corrections for Python 3 and utf-8
PalNilsson Jun 25, 2021
fe032cf
Add nojekyll file to build docs workflow
Jun 28, 2021
43930ec
Merge pull request #6 from brinick/add-nojekyll
PalNilsson Jun 28, 2021
395e0fa
Improved check for singularity errors
PalNilsson Jun 28, 2021
f33755b
Merge branch 'next' of https://github.com/PalNilsson/pilot2 into next
PalNilsson Jun 28, 2021
74d3829
Added missing result queue in remote file open script
PalNilsson Jun 28, 2021
514f83e
Added log messages
PalNilsson Jun 28, 2021
8d8c142
Flake8 correction
PalNilsson Jun 29, 2021
cce6852
Fixes and cleanup
PalNilsson Jun 29, 2021
02b7d21
Update
PalNilsson Jun 29, 2021
90b8a61
Flake8 correction
PalNilsson Jun 29, 2021
59af5af
Merge pull request #352 from PalNilsson/next
PalNilsson Jun 30, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions .github/workflows/build-docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,13 +30,16 @@ jobs:
run: |
cd ./doc
make github
cd ..
- name: Add nojekyll file to repo root dir
run: |
touch .nojekyll
- name: Push docs to repo
run: |
git config user.name "brinick"
git config user.email "brinick@users.noreply.github.com"
git add docs
git commit -m "Adding documentation"
git add docs .nojekyll
git commit -m "Adding Pilot documentation"
git push
2 changes: 1 addition & 1 deletion PILOTVERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
2.11.2.22
2.12.1.62
94 changes: 25 additions & 69 deletions pilot.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
# - Paul Nilsson, paul.nilsson@cern.ch, 2017-2019

from __future__ import print_function # Python 2 (2to3 complains about this)
from __future__ import absolute_import

import argparse
import logging
Expand Down Expand Up @@ -68,7 +69,7 @@ def main():
infosys.init(args.queue)
# check if queue is ACTIVE
if infosys.queuedata.state != 'ACTIVE':
logger.critical('specified queue is NOT ACTIVE: %s -- aborting' % infosys.queuedata.name)
logger.critical('specified queue is NOT ACTIVE: %s -- aborting', infosys.queuedata.name)
return errors.PANDAQUEUENOTACTIVE
except PilotException as error:
logger.fatal(error)
Expand All @@ -81,14 +82,14 @@ def main():
environ['PILOT_SITENAME'] = infosys.queuedata.resource #args.site # TODO: replace with singleton

# set requested workflow
logger.info('pilot arguments: %s' % str(args))
logger.info('pilot arguments: %s', str(args))
workflow = __import__('pilot.workflow.%s' % args.workflow, globals(), locals(), [args.workflow], 0) # Python 3, -1 -> 0

# execute workflow
try:
exit_code = workflow.run(args)
except Exception as e:
logger.fatal('main pilot function caught exception: %s' % e)
logger.fatal('main pilot function caught exception: %s', e)
exit_code = None

return exit_code
Expand All @@ -101,62 +102,6 @@ class Args:
pass


# rename module to pilot2 to avoid conflict in import with pilot directory
def import_module(**kwargs):
"""
This function allows for importing the pilot code.

:param kwargs: pilot options (dictionary).
:return: pilot error code (integer).
"""

argument_dictionary = {'-a': kwargs.get('workdir', ''),
'-d': kwargs.get('debug', None),
'-w': kwargs.get('workflow', 'generic'),
'-l': kwargs.get('lifetime', '3600'),
'-q': kwargs.get('queue'), # required
'-r': kwargs.get('resource'), # required
'-s': kwargs.get('site'), # required
'-j': kwargs.get('job_label', 'ptest'), # change default later to 'managed'
'-i': kwargs.get('version_tag', 'PR'),
'-t': kwargs.get('verify_proxy', True),
'-z': kwargs.get('update_server', True),
'--cacert': kwargs.get('cacert', None),
'--capath': kwargs.get('capath'),
'--url': kwargs.get('url', ''),
'-p': kwargs.get('port', '25443'),
'--country-group': kwargs.get('country_group', ''),
'--working-group': kwargs.get('working_group', ''),
'--allow-other-country': kwargs.get('allow_other_country', 'False'),
'--allow-same-user': kwargs.get('allow_same_user', 'True'),
'--pilot-user': kwargs.get('pilot_user', 'generic'),
'--input-dir': kwargs.get('input_dir', ''),
'--output-dir': kwargs.get('output_dir', ''),
'--hpc-resource': kwargs.get('hpc_resource', ''),
'--harvester-workdir': kwargs.get('harvester_workdir', ''),
'--harvester-datadir': kwargs.get('harvester_datadir', ''),
'--harvester-eventstatusdump': kwargs.get('harvester_eventstatusdump', ''),
'--harvester-workerattributes': kwargs.get('harvester_workerattributes', ''),
'--harvester-submitmode': kwargs.get('harvester_submitmode', ''),
'--resource-type': kwargs.get('resource_type', '')
}

args = Args()
parser = argparse.ArgumentParser()
try:
_items = list(argument_dictionary.items()) # Python 3
except Exception:
_items = argument_dictionary.iteritems() # Python 2
for key, value in _items:
print(key, value)
parser.add_argument(key)
parser.parse_args(args=[key, value], namespace=args) # convert back int and bool strings to int and bool??

# call main pilot function

return 0


def str2bool(v):
""" Helper function to convert string to bool """

Expand Down Expand Up @@ -379,6 +324,11 @@ def get_args():
dest='jobtype',
default='',
help='Job type (managed, user)')
arg_parser.add_argument('--use-rucio-traces',
dest='use_rucio_traces',
type=str2bool,
default=True,
help='Use rucio traces')

# HPC options
arg_parser.add_argument('--hpc-resource',
Expand Down Expand Up @@ -413,10 +363,10 @@ def create_main_work_dir(args):
try:
# create the main PanDA Pilot work directory
mkdirs(mainworkdir)
except Exception as e:
except PilotException as error:
# print to stderr since logging has not been established yet
print('failed to create workdir at %s -- aborting: %s' % (mainworkdir, e), file=sys.stderr)
exit_code = shell_exit_code(e._errorCode)
print('failed to create workdir at %s -- aborting: %s' % (mainworkdir, error), file=sys.stderr)
exit_code = shell_exit_code(error._errorCode)
else:
mainworkdir = getcwd()

Expand Down Expand Up @@ -467,9 +417,15 @@ def set_environment_variables(args, mainworkdir):
# set the (HPC) resource name (if set in options)
environ['PILOT_RESOURCE_NAME'] = args.hpc_resource

# allow for the possibility of turning off rucio traces
environ['PILOT_USE_RUCIO_TRACES'] = str(args.use_rucio_traces)

# event service executor type
environ['PILOT_ES_EXECUTOR_TYPE'] = args.executor_type

if args.output_dir:
environ['PILOT_OUTPUT_DIR'] = args.output_dir

# keep track of the server urls
_port = ":%s" % args.port
url = args.url if _port in args.url else args.url + _port
Expand All @@ -495,9 +451,9 @@ def wrap_up(initdir, mainworkdir, args):
try:
rmtree(mainworkdir)
except Exception as e:
logging.warning("failed to remove %s: %s" % (mainworkdir, e))
logging.warning("failed to remove %s: %s", mainworkdir, e)
else:
logging.info("removed %s" % mainworkdir)
logging.info("removed %s", mainworkdir)

# in Harvester mode, create a kill_worker file that will instruct Harvester that the pilot has finished
if args.harvester:
Expand All @@ -509,15 +465,15 @@ def wrap_up(initdir, mainworkdir, args):
except Exception:
exit_code = trace
else:
logging.info('traces error code: %d' % exit_code)
logging.info('traces error code: %d', exit_code)
if trace.pilot['nr_jobs'] <= 1:
if exit_code != 0:
logging.info('an exit code was already set: %d (will be converted to a standard shell code)' % exit_code)
logging.info('an exit code was already set: %d (will be converted to a standard shell code)', exit_code)
elif trace.pilot['nr_jobs'] > 0:
if trace.pilot['nr_jobs'] == 1:
logging.getLogger(__name__).info('pilot has finished (%d job was processed)' % trace.pilot['nr_jobs'])
logging.getLogger(__name__).info('pilot has finished (%d job was processed)', trace.pilot['nr_jobs'])
else:
logging.getLogger(__name__).info('pilot has finished (%d jobs were processed)' % trace.pilot['nr_jobs'])
logging.getLogger(__name__).info('pilot has finished (%d jobs were processed)', trace.pilot['nr_jobs'])
exit_code = SUCCESS
elif trace.pilot['state'] == FAILURE:
logging.critical('pilot workflow failure -- aborting')
Expand Down Expand Up @@ -579,7 +535,7 @@ def get_pilot_source_dir():
set_environment_variables(args, mainworkdir)

# setup and establish standard logging
establish_logging(args)
establish_logging(debug=args.debug, nopilotlog=args.nopilotlog)

# execute main function
trace = main()
Expand Down
17 changes: 8 additions & 9 deletions pilot/api/analytics.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# http://www.apache.org/licenses/LICENSE-2.0
#
# Authors:
# - Paul Nilsson, paul.nilsson@cern.ch, 2018
# - Paul Nilsson, paul.nilsson@cern.ch, 2018-2021

from .services import Services
from pilot.common.exception import NotDefined, NotSameLength, UnknownException
Expand Down Expand Up @@ -146,21 +146,20 @@ def get_fitted_data(self, filename, x_name='Time', y_name='pss+swap', precision=
y = y[:-2]

if (len(x) > 7 and len(y) > 7) and len(x) == len(y):
logger.info('fitting %s vs %s' % (y_name, x_name))
logger.info('fitting %s vs %s', y_name, x_name)
try:
fit = self.fit(x, y)
_slope = self.slope()
except Exception as e:
logger.warning('failed to fit data, x=%s, y=%s: %s' % (str(x), str(y), e))
logger.warning('failed to fit data, x=%s, y=%s: %s', str(x), str(y), e)
else:
if _slope:
slope = float_to_rounded_string(fit.slope(), precision=precision)
chi2 = float_to_rounded_string(fit.chi2(), precision=0) # decimals are not needed for chi2
if slope != "":
logger.info('current memory leak: %s B/s (using %d data points, chi2=%s)' %
(slope, len(x), chi2))
logger.info('current memory leak: %s B/s (using %d data points, chi2=%s)', slope, len(x), chi2)
else:
logger.warning('wrong length of table data, x=%s, y=%s (must be same and length>=4)' % (str(x), str(y)))
logger.warning('wrong length of table data, x=%s, y=%s (must be same and length>=4)', str(x), str(y))

return {"slope": slope, "chi2": chi2}

Expand All @@ -182,8 +181,8 @@ def extract_from_table(self, table, x_name, y_name):
y2_name = y_name.split('+')[1]
y1_value = table.get(y1_name, [])
y2_value = table.get(y2_name, [])
except Exception as e:
logger.warning('exception caught: %s' % e)
except Exception as error:
logger.warning('exception caught: %s', error)
x = []
y = []
else:
Expand Down Expand Up @@ -238,7 +237,7 @@ def __init__(self, **kwargs):
self.set_intersect()
self.set_chi2()
else:
logger.warning("\'%s\' model is not implemented" % self._model)
logger.warning("\'%s\' model is not implemented", self._model)
raise NotImplementedError()

def fit(self):
Expand Down
Loading