You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Impact of the new feature
WMAgent logs, potentially Microservices
Is your feature request related to a problem? Please describe.
As reported here, the stderr and stdout are mixed up in the job logs, and this make difficult to understand the reports, with obvious consequences on the capacity to detect and eventually fix bugs.
An example is reported:
2024-10-15 21:02:07,975:INFO:StageOutImpl:Creating output directory...
2024-10-15 21:02:07,982:INFO:StageOutImpl:Running the stage out...
2024-10-15 21:02:07,990:ERROR:StageOutImpl:Attempt 0 to stage out failed.
Error details:
<@========== WMException Start ==========@>
Exception Class: StageOutError
Message: Simulated stage out failure
ClassName : None
ModuleName : WMCore.Storage.StageOutError
MethodName : __init__
ClassInstance : None
FileName : /srv/job/WMCore.zip/WMCore/Storage/StageOutError.py
LineNumber : 32
ErrorNr : 0
ErrorCode : 60311
ErrorType : GeneralStageOutFailure
Traceback:
<@---------- WMException End ----------@>
2024-10-15 21:07:08,091:ERROR:StageOutImpl:Maximum number of retries exhausted. Further details on the failed command reported below.
2024-10-15 21:07:08,100:INFO:StageOutMgr:===> Stage Out Failure for file:
2024-10-15 21:07:08,100:INFO:StageOutMgr:======> /store/unmerged/logs/prod/2024/10/15/apiccine_TC_LHE_PFN_GFALStageoutTest_v11_FNAL_241015_194912_6819/DarkSUSY_14TeV_2023D17_GenSimHLBeamSpotFull14/0000/0/01280c59-36ef-4642-9465-eb098d075a4e-3-0-logArchive.tar.gz
2024-10-15 21:07:08,100:INFO:StageOutMgr:======> storageSite: T1_US_FNAL, volume: FNAL_dCache_EOS, protocol: XRootD, command: xrdcp, options: None, phedex-node: T1_US_FNAL_Disk
using this stage out
2024-10-15 21:07:08,101:INFO:StageOutMgr:LFN to PFN match made:
LFN: /store/unmerged/logs/prod/2024/10/15/apiccine_TC_LHE_PFN_GFALStageoutTest_v11_FNAL_241015_194912_6819/DarkSUSY_14TeV_2023D17_GenSimHLBeamSpotFull14/0000/0/01280c59-36ef-4642-9465-eb098d075a4e-3-0-logArchive.tar.gz
PFN: davs://cmsdcadisk.fnal.gov:2880/dcache/uscmsdisk/store/unmerged/logs/prod/2024/10/15/apiccine_TC_LHE_PFN_GFALStageoutTest_v11_FNAL_241015_194912_6819/DarkSUSY_14TeV_2023D17_GenSimHLBeamSpotFull14/0000/0/01280c59-36ef-4642-9465-eb098d075a4e-3-0-logArchive.tar.gz
2024-10-15 21:07:08,101:INFO:StageOutImpl:Creating output directory...
2024-10-15 21:07:08,102:INFO:StageOutImpl:Running the stage out...
2024-10-15 21:07:08,111:ERROR:StageOutImpl:Attempt 0 to stage out failed.
Error details:
<@========== WMException Start ==========@>
Exception Class: StageOutError
Message: Simulated stage out failure
ClassName : None
ModuleName : WMCore.Storage.StageOutError
MethodName : __init__
ClassInstance : None
FileName : /srv/job/WMCore.zip/WMCore/Storage/StageOutError.py
LineNumber : 32
ErrorNr : 0
ErrorCode : 60311
ErrorType : GeneralStageOutFailure
Traceback:
<@---------- WMException End ----------@>
2024-10-15 21:12:08,211:ERROR:StageOutImpl:Maximum number of retries exhausted. Further details on the failed command reported below.
2024-10-15 21:12:08,421:INFO:StageOutImpl:Command exited with status: 0
Output message: stdout: -----------------------------------------------------------
===========================================================
Debugging information on failing gfal-copy command
Current date and time: 2024-10-15 21:12:08
gfal-copy command which failed: env -i X509_USER_PROXY=/srv/myproxy.pem JOBSTARTDIR=/srv bash -c '. /srv/startup_environment.sh; date; gfal-copy -t 2400 -T 2400 -p -v --abort-on-failure -K adler32 file:///srv/job/WMTaskSpace/logArch1/logArchive.tar.gz davs://cmsdcadisk.fnal.gov:2880/dcache/uscmsdisk/store/un\
merged/logs/prod/2024/10/15/apiccine_TC_LHE_PFN_GFALStageoutTest_v11_FNAL_241015_194912_6819/DarkSUSY_14TeV_2023D17_GenSimHLBeamSpotFull14/0000/0/01280c59-36ef-4642-9465-eb098d075a4e-3-0-logArchive.tar.gz'
Hostname: cmsgli-428988-0-cmswn2423.fnal.gov
OS: Linux 5.14.0-427.37.1.el9_4.x86_64
GFAL environment variables:
GFAL_PLUGIN_DIR=/cvmfs/oasis.opensciencegrid.org/mis/osg-wn-client/3.4/3.4.57/el6-x86_64/usr/lib64/gfal2-plugins/
GFAL_CONFIG_DIR=/cvmfs/oasis.opensciencegrid.org/mis/osg-wn-client/3.4/3.4.57/el6-x86_64/etc/gfal2.d/
PYTHON environment variables:
PYTHONPATH=/cvmfs/cms.cern.ch/slc6_amd64_gcc700/external/py2-future/0.16.0/lib/python3.6/site-packages:/cvmfs/oasis.opensciencegrid.org/mis/osg-wn-client/3.4/3.4.57/el6-x86_64/usr/lib/python2.6/site-packages:/cvmfs/oasis.opensciencegrid.org/mis/osg-wn-client/3.4/3.4.57/el6-x86_64/usr/lib64/python2.6/site-pack\
ages:/srv/job/WMCore.zip:/srv/job
LD_* environment variables:
LD_LIBRARY_PATH=/cvmfs/cms.cern.ch/slc6_amd64_gcc700/external/py2-future/0.16.0/lib:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/external/python3/3.6.4/lib:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/external/python3/3.6.4/lib:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/external/libffi/3.2.1-omkpbe2/lib64:/cvmfs/cms.cern.ch/slc6_a\
md64_gcc700/external/sqlite/3.22.0-omkpbe/lib:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/external/gdbm/1.10-omkpbe2/lib:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/external/db6/6.0.30-omkpbe2/lib:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/external/expat/2.1.0-omkpbe2/lib:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/external/bz2lib/1.0.\
6-omkpbe2/lib:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/external/openssl/1.0.2d-omkpbe2/lib:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/external/zlib-x86_64/1.2.11-omkpbe2/lib:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/external/zlib-x86_64/1.2.11-omkpbe2/lib:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/external/gcc/7.0.0-omkpbe2/lib64\
:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/external/gcc/7.0.0-omkpbe2/lib:/cvmfs/oasis.opensciencegrid.org/mis/osg-wn-client/3.4/3.4.57/el6-x86_64/lib64:/cvmfs/oasis.opensciencegrid.org/mis/osg-wn-client/3.4/3.4.57/el6-x86_64/lib:/cvmfs/oasis.opensciencegrid.org/mis/osg-wn-client/3.4/3.4.57/el6-x86_64/usr/lib64:/c\
vmfs/oasis.opensciencegrid.org/mis/osg-wn-client/3.4/3.4.57/el6-x86_64/usr/lib:/cvmfs/oasis.opensciencegrid.org/mis/osg-wn-client/3.4/3.4.57/el6-x86_64/usr/lib64/dcap:/cvmfs/oasis.opensciencegrid.org/mis/osg-wn-client/3.4/3.4.57/el6-x86_64/usr/lib64/lcgdm:/.singularity.d/libs
gfal-copy location: /cvmfs/oasis.opensciencegrid.org/mis/osg-wn-client/3.4/3.4.57/el6-x86_64/usr/bin/gfal-copy.
Source PFN: file:///srv/job/WMTaskSpace/logArch1/logArchive.tar.gz
Target PFN: davs://cmsdcadisk.fnal.gov:2880/dcache/uscmsdisk/store/unmerged/logs/prod/2024/10/15/apiccine_TC_LHE_PFN_GFALStageoutTest_v11_FNAL_241015_194912_6819/DarkSUSY_14TeV_2023D17_GenSimHLBeamSpotFull14/0000/0/01280c59-36ef-4642-9465-eb098d075a4e-3-0-logArchive.tar.gz
Here are informations about certification methods
Bearer token content:
Bearer token file:
httokendecode path:
httokendecode:
VOMS proxy info:
subject : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=amaltaro/CN=718748/CN=Alan Malta Rodrigues/CN=1132199593/CN=978440072/CN=1090606534/CN=1857206451/CN=1707977247
issuer : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=amaltaro/CN=718748/CN=Alan Malta Rodrigues/CN=1132199593/CN=978440072/CN=1090606534/CN=1857206451
identity : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=amaltaro/CN=718748/CN=Alan Malta Rodrigues/CN=1132199593/CN=978440072/CN=1090606534/CN=1857206451
type : RFC compliant proxy
strength : 2048 bits
path : /srv/myproxy.pem
timeleft : 22:52:19
key usage : Digital Signature, Key Encipherment
=== VO cms extension information ===
VO : cms
subject : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=amaltaro/CN=718748/CN=Alan Malta Rodrigues
issuer : /DC=ch/DC=cern/OU=computers/CN=cms-auth.web.cern.ch
attribute : /cms/Role=production/Capability=NULL
attribute : /cms/Role=NULL/Capability=NULL
attribute : /cms/country/Role=NULL/Capability=NULL
attribute : /cms/country/us/Role=NULL/Capability=NULL
attribute : /cms/uscms/Role=NULL/Capability=NULL
timeleft : 159:42:55
uri : voms-cms-auth.app.cern.ch:15000
===========================================================
-----------------------------------------------------------
stderr: which: no httokendecode in (/cvmfs/cms.cern.ch/slc6_amd64_gcc700/external/py2-future/0.16.0/bin:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/external/python3/3.6.4/bin:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/external/python3/3.6.4/bin:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/external/sqlite/3.22.0-omkpbe/bin:/cvmfs/\
cms.cern.ch/slc6_amd64_gcc700/external/gdbm/1.10-omkpbe2/bin:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/external/db6/6.0.30-omkpbe2/bin:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/external/expat/2.1.0-omkpbe2/bin:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/external/bz2lib/1.0.6-omkpbe2/bin:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/ex\
ternal/openssl/1.0.2d-omkpbe2/bin:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/external/gcc/7.0.0-omkpbe2/bin:/cvmfs/cms.cern.ch/common:/srv/.gwms.d/bin:/storage/local/data1/condor/execute/dir_269578/glide_kXmu43/main/condor/libexec:/cvmfs/oasis.opensciencegrid.org/mis/osg-wn-client/3.4/3.4.57/el6-x86_64/usr/bin:/cvm\
fs/oasis.opensciencegrid.org/mis/osg-wn-client/3.4/3.4.57/el6-x86_64/usr/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin)
2024-10-15 21:12:08,436:INFO:StageOutMgr:===> Stage Out Failure for file:
2024-10-15 21:12:08,436:INFO:StageOutMgr:======> /store/unmerged/logs/prod/2024/10/15/apiccine_TC_LHE_PFN_GFALStageoutTest_v11_FNAL_241015_194912_6819/DarkSUSY_14TeV_2023D17_GenSimHLBeamSpotFull14/0000/0/01280c59-36ef-4642-9465-eb098d075a4e-3-0-logArchive.tar.gz
2024-10-15 21:12:08,437:INFO:StageOutMgr:======> storageSite: T1_US_FNAL, volume: FNAL_dCache_EOS, protocol: WebDAV, command: gfal2, options: None, phedex-node: T1_US_FNAL_Disk
using this stage out
2024-10-15 21:12:08,440:ERROR:ExecuteMaster:Exception occured when executing step
2024-10-15 21:12:08,440:ERROR:ExecuteMaster:Exception is <@========== WMException Start ==========@>
Exception Class: StageOutFailure
Message: Failure for stage out:
<@========== WMException Start ==========@>
Exception Class: StageOutError
Message: Simulated stage out failure
ClassName : None
ModuleName : WMCore.Storage.StageOutError
MethodName : __init__
ClassInstance : None
FileName : /srv/job/WMCore.zip/WMCore/Storage/StageOutError.py
LineNumber : 32
ErrorNr : 0
ErrorCode : 60311
ErrorType : GeneralStageOutFailure
Traceback:
<@---------- WMException End ----------@>Traceback (most recent call last):
Describe the solution you'd like
Avoid separation bewteen stderr and stdout, for example in this way:
Impact of the new feature
WMAgent logs, potentially Microservices
Is your feature request related to a problem? Please describe.
As reported here, the stderr and stdout are mixed up in the job logs, and this make difficult to understand the reports, with obvious consequences on the capacity to detect and eventually fix bugs.
An example is reported:
Describe the solution you'd like
Avoid separation bewteen stderr and stdout, for example in this way:
Describe alternatives you've considered
None for now
FYI @amaltaro
The text was updated successfully, but these errors were encountered: