Skip to content
This repository has been archived by the owner on Sep 12, 2024. It is now read-only.

Adding Rucio to DBS-Phedex consistency check #561

Conversation

todor-ivanov
Copy link
Contributor

@todor-ivanov todor-ivanov commented May 9, 2020

Fixes #560

Status

tested

Description

Adding a simple wrapper class of the Rucio Client so that we can safely replicate the set of checks done through Phedex and create a default fallback mechanism to Rucio, when it comes to NANOAOD data tiers.

Is it backward compatible (if not, which system it affects?)

YES

Related PRs

N/A

External dependencies / deployment changes

Requires:

rucio-client

Configuration file:

~/.local/etc/rucio.cfg
[common]
[client]
rucio_host = http://cms-rucio.cern.ch
auth_host = https://cms-rucio-auth.cern.ch
auth_type = x509
ca_cert = /etc/grid-security/certificates/
client_cert = $X509_USER_CERT
client_key = $X509_USER_KEY
client_x509_proxy = $X509_USER_PROXY
request_retries = 3

Environment:

export X509_USER_PROXY=/tmp/x509up_$UID
export RUCIO_HOME=~/.local/

Mention people to look at PRs

@sharad1126 @vlimant @amaltaro

@todor-ivanov todor-ivanov force-pushed the feature_Rucio_Phedex_DBS_consist_check branch from b87239a to 55fc894 Compare May 10, 2020 03:31
@todor-ivanov
Copy link
Contributor Author

@ericvaandering Could you please take a look too. And in case I am doing the right thing I will squash them in a single commit and we can merge it.

@ericvaandering
Copy link

The code looks correct, but I really worry about this approach of listing all the files in a dataset on both PhEDEx and Rucio. Unified has a history of overloading services and I think this could easily do that later on when it's expanded beyond NanoAOD. E.g. the premixed pileup dataset has something like 200k files.

I'd prefer you implemented what we discussed which is just a count (easier on Rucio) of files in blocks.

@vlimant
Copy link
Contributor

vlimant commented May 11, 2020

one probably wants to relocate

export RUCIO_HOME=~/.local/
${RUCIO_HOME}/rucio.cfg

to something more central. Is that "home" supposed to be large, or just holding the configuration?
There is base_dir, base_eos_dir and cache_dir

@vlimant
Copy link
Contributor

vlimant commented May 11, 2020

one thing we should do, to make @ericvaandering less worried is to make sure that wmcore takes care of all data consistency before a workflow is set to completed
@amaltaro @todor-ivanov could you please make a GH issue for that? unless there is already one since years pending to be solved, and that forced people on developing the consistency check otherwise

@ericvaandering
Copy link

ericvaandering commented May 11, 2020 via email

@sharad1126 sharad1126 added this to the phedex to rucio transition milestone May 11, 2020
@todor-ivanov
Copy link
Contributor Author

Hi @vlimant, thanks for the feedback. I will try to search for any old issues regarding data consistency in wmcore. About the place for the Rucio config file - I agree it must be in a more central place. I would leave that to @sharad1126 to decide where it should be, since he is the one to know those machines the best right now.
@ericvaandering could you please check the final commit please. I know I left one place to fetch/list filenames per full dataset, but this line should be hit only for broken datasets, those which do not pass the general check, based on file count per block.

@todor-ivanov todor-ivanov force-pushed the feature_Rucio_Phedex_DBS_consist_check branch from 0870076 to e2b9cb8 Compare May 12, 2020 04:50
@sharad1126
Copy link
Contributor

sharad1126 commented May 12, 2020

Regarding the rucio config file, I would just place it in WmAgentScripts itself. Why to hide it in the base_dir, base_eos_dir or cache_dir as it has no secrets in there:

[common]
[client]
rucio_host = http://cms-rucio.cern.ch
auth_host = https://cms-rucio-auth.cern.ch
auth_type = x509
ca_cert = /etc/grid-security/certificates/
client_cert = $X509_USER_CERT
client_key = $X509_USER_KEY
client_x509_proxy = $X509_USER_PROXY
request_retries = 3

@vlimant @ericvaandering @todor-ivanov any issues putting it in GH repo?

The only problem I see is rucio client apparently only checks the following places for the cfg file:

raise Exception('Could not load rucio configuration file rucio.cfg.'
Exception: Could not load rucio configuration file rucio.cfg.Rucio looks in the following directories for a configuration file, in order:
	${RUCIO_HOME}/etc/rucio.cfg
	/opt/rucio/etc/rucio.cfg
	${VIRTUAL_ENV}/etc/rucio.cfg

Unified/checkor.py Show resolved Hide resolved
RucioClient.py Outdated Show resolved Hide resolved
Unified/checkor.py Show resolved Hide resolved
phedexClient.py Show resolved Hide resolved
@ericvaandering
Copy link

ericvaandering commented May 12, 2020 via email

@sharad1126
Copy link
Contributor

@ericvaandering I hope you don't mind but I just removed all extra lines from your email on the comments. please check if in case I removed anything I shouldn't have. Also, I would suggest you to please reply on the review conversations individually if it's not too much to ask for. thanks.

@sharad1126
Copy link
Contributor

@ericvaandering I am just waiting now for two changes in this from @todor-ivanov .

  1. the error in doc string
  2. To add account='unified' in the call or in the default config. I would suggest putting it in the call which object creation.
    Then we can test the unified checkor.py module!

From my side, I just need to pip install the rucio client in all prod machines.

@todor-ivanov todor-ivanov force-pushed the feature_Rucio_Phedex_DBS_consist_check branch from e2b9cb8 to 3383bf4 Compare May 12, 2020 14:59
@todor-ivanov
Copy link
Contributor Author

@sharad1126 the account='unified' is added to the default config now. It works with my account, so it should work with the 'unified' too. One question though - Is the unified account using a proxy or a cert/key pair? if the it is a pair then the authentication type needs to be changed to 'x509' instead of 'x509_proxy', and we will need the right cnfiguration variables too.

@sharad1126
Copy link
Contributor

sharad1126 commented May 12, 2020

@sharad1126 the account='unified' is added to the default config now. It works with my account, so it should work with the 'unified' too. One question though - Is the unified account using a proxy or a cert/key pair? if the it is a pair then the authentication type needs to be changed to 'x509' instead of 'x509_proxy', and we will need the right cnfiguration variables too.

Let's just leave it as it is as I confirm that I am able to call rucioclient functions from unified account. @todor-ivanov here is what it returns.

[cmsunified@vocms0277 WmAgentScripts]$ python
Python 2.7.13+ (default, Dec 18 2018, 10:09:09) 
[GCC 6.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from RucioClient import RucioClient
/data/srv/wmagent/v1.3.0/sw/slc7_amd64_gcc630/external/py2-cryptography/2.4.2-comp2/lib/python2.7/site-packages/cryptography/hazmat/bindings/openssl/binding.py:163: CryptographyDeprecationWarning: OpenSSL version 1.0.1 is no longer supported by the OpenSSL project, please upgrade. A future version of cryptography will drop support for it.
  utils.CryptographyDeprecationWarning
>>> rucioClient = RucioClient()
>>> 
>>> rucioClient.getFileCountPerBlock('/MinBias_TuneCP1_13TeV-pythia8/RunIISummer19UL17SIM-106X_mc2017_realistic_v6-v2/GEN-SIM#2923941b-47fe-42cd-8c9e-9376498cdce1')
[(u'/store/mc/RunIISummer19UL17SIM/MinBias_TuneCP1_13TeV-pythia8/GEN-SIM/106X_mc2017_realistic_v6-v2/270000/02EFF986-3A5F-BC43-8294-C2C290912C20.root', None), (u'/store/mc/RunIISummer19UL17SIM/MinBias_TuneCP1_13TeV-pythia8/GEN-SIM/106X_mc2017_realistic_v6-v2/270000/033F6D24-8296-B04C-B621-A760AAE31909.root', None), (u'/store/mc/RunIISummer19UL17SIM/MinBias_TuneCP1_13TeV-pythia8/GEN-SIM/106X_mc2017_realistic_v6-v2/270000/04D5812A-3345-EA45-B83E-8CB2C6E8FC80.root', None), (u'/store/mc/RunIISummer19UL17SIM/MinBias_TuneCP1_13TeV-pythia8/GEN-SIM/106X_mc2017_realistic_v6-v2/270000/06C0BEA8-37B1-3144-BCBE-591D7F53183D.root', None), (u'/store/mc/RunIISummer19UL17SIM/MinBias_TuneCP1_13TeV-pythia8/GEN-SIM/106X_mc2017_realistic_v6-v2/270000/06C1E2E2-33C9-EF4D-B085-F35A482DED2E.root', None), (u'/store/mc/RunIISummer19UL17SIM/MinBias_TuneCP1_13TeV-pythia8/GEN-SIM/106X_mc2017_realistic_v6-v2/270000/06E6618B-D181-0541-B07E-34A992A290D8.root', None), (u'/store/mc/RunIISummer19UL17SIM/MinBias_TuneCP1_13TeV-pythia8/GEN-SIM/106X_mc2017_realistic_v6-v2/270000/06FC8039-17B2-154B-8CE8-98513284D2B1.root', None), (u'/store/mc/RunIISummer19UL17SIM/MinBias_TuneCP1_13TeV-pythia8/GEN-SIM/106X_mc2017_realistic_v6-v2/270000/0712749B-B929-6248-8916-61E491F051DC.root', None), (u'/store/mc/RunIISummer19UL17SIM/MinBias_TuneCP1_13TeV-pythia8/GEN-SIM/106X_mc2017_realistic_v6-v2/270000/071F9993-3A32-7644-88FB-D80BDEC41B32.root', None), (u'/store/mc/RunIISummer19UL17SIM/MinBias_TuneCP1_13TeV-pythia8/GEN-SIM/106X_mc2017_realistic_v6-v2/270000/07D418E7-5CE3-3247-AEDC-DAB8235CA195.root', None), (u'/store/mc/RunIISummer19UL17SIM/MinBias_TuneCP1_13TeV-pythia8/GEN-SIM/106X_mc2017_realistic_v6-v2/270000/08E67093-C11F-B340-9735-257377E09116.root', None), (u'/store/mc/RunIISummer19UL17SIM/MinBias_TuneCP1_13TeV-pythia8/GEN-SIM/106X_mc2017_realistic_v6-v2/270000/08EEBF1A-CB22-E04C-AB55-F7BA011DC694.root', None), (u'/store/mc/RunIISummer19UL17SIM/MinBias_TuneCP1_13TeV-pythia8/GEN-SIM/106X_mc2017_realistic_v6-v2/270000/0AC1AFA0-9F34-E647-BFC5-60154852434B.root', None), (u'/store/mc/RunIISummer19UL17SIM/MinBias_TuneCP1_13TeV-pythia8/GEN-SIM/106X_mc2017_realistic_v6-v2/270000/0B0A9215-4338-6349-B6EB-B077765B3CFB.root', None), (u'/store/mc/RunIISummer19UL17SIM/MinBias_TuneCP1_13TeV-pythia8/GEN-SIM/106X_mc2017_realistic_v6-v2/270000/0C0CDD68-2F67-084E-B0E9-E84DC2975152.root', None), (u'/store/mc/RunIISummer19UL17SIM/MinBias_TuneCP1_13TeV-pythia8/GEN-SIM/106X_mc2017_realistic_v6-v2/270000/0C0E375B-10E1-AD42-A194-5D068C008050.root', None),...

@sharad1126
Copy link
Contributor

sharad1126 commented May 12, 2020

@todor-ivanov @ericvaandering just encountered a problem with installing the rucio-client in vocms0268. The production machines which use checkor module are vocms0268, vocms0269 and vocms0273 and therefore I intend to pip install rucio client in these machines only.

Installing collected packages: urllib3, requests, zipp, configparser, python-dateutil
  Found existing installation: zipp 0.0.0
    Uninstalling zipp-0.0.0:
      Successfully uninstalled zipp-0.0.0
  Running setup.py install for zipp ... done
  Running setup.py install for configparser ... error
    Complete output from command /usr/bin/python2 -u -c "import setuptools, tokenize;__file__='/tmp/sagarwal/pip-build-K5L7gu/configparser/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/sagarwal/pip-8liMyp-record/install-record.txt --single-version-externally-managed --compile --user --prefix=:
    running install
    running build
    running build_py
    creating build
    creating build/lib
    copying src/configparser.py -> build/lib
    creating build/lib/backports
    copying src/backports/__init__.py -> build/lib/backports
    creating build/lib/backports/configparser
    copying src/backports/configparser/__init__.py -> build/lib/backports/configparser
    copying src/backports/configparser/helpers.py -> build/lib/backports/configparser
    running egg_info
    error: 'egg_base' must be a directory name (got `src`)
    
    ----------------------------------------
Command "/usr/bin/python2 -u -c "import setuptools, tokenize;__file__='/tmp/sagarwal/pip-build-K5L7gu/configparser/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/sagarwal/pip-8liMyp-record/install-record.txt --single-version-externally-managed --compile --user --prefix=" failed with error code 1 in /tmp/sagarwal/pip-build-K5L7gu/configparser/

@nsmith- any suggestions?

@nsmith-
Copy link

nsmith- commented May 13, 2020

Some googling suggests you might be missing python headers. Try yum install python-devel or something similar.

@sharad1126
Copy link
Contributor

@nsmith- yeah I googled as well before sending but most comments there looked like it is due to some hard coded dependency of python version or similar. So I thought, if one can remove that hard coded dependency and make the version flexible which might help. I need to investigate more before install anything in prod machines.

@nsmith-
Copy link

nsmith- commented May 13, 2020

At least you can confirm if the python-devel headers are installed. It should not cause any trouble, assuming you are using the system python.

@sharad1126
Copy link
Contributor

sharad1126 commented May 14, 2020

@nsmith- @ericvaandering we don't have python-devel package in vocms0275 or vocms0277 where we installed rucio-client without any issues with pip. We don't have the package in vocms0268 as well where I am trying to install rucio right now i.e. prod machine.

We have the following python-* packages only in all our vocms machines.

[sagarwal@vocms0268 ~]$ pip list | grep python
Flask (0.10.1, /usr/lib/python2.7/site-packages)
python-collectd-certificate (0.0.11)
python-dateutil (1.5)
python-ldap (2.4.15)
python-linux-procfs (0.4.9)
python-magic (0.4.18)
python-utils (2.4.0)

@sharad1126
Copy link
Contributor

As it the pip install gave the error:
Running setup.py install for configparser ... error

so I checked the configparser in all machines and it is the same:

[sagarwal@vocms0268 ~]$ pip list | grep config
configobj (4.7.2)
configparser (0.0.0)

@sharad1126
Copy link
Contributor

sharad1126 commented May 14, 2020

The only difference I just noticed here is:

[sagarwal@vocms0268 ~]$ pip list | grep setup
setuptools (40.6.3)
[sagarwal@vocms0275 ~]$ pip list | grep setup
setuptools (41.4.0)

I have asked Caio to change it in the machine vocms0268 to the 41.4.0 as well in case that is something causing us the issues. Let's wait for his action now.

@nsmith-
Copy link

nsmith- commented May 14, 2020

Hi Sharad, pip and yum are two very different package managers for different purposes. pip is unique to python packages, and yum is for system packages, including such packages as the python C binding headers. Can you check if python-devel is installed, per yum?

@sharad1126
Copy link
Contributor

@nsmith- I did check the python-devel in yum and it is not installed in any machine. I thought I mentioned it above. The following comments are more investigation I did to find differences in two machines i.e. vocms0275/277(where rucio client is installed without issues) and vocms0268(where we need the rucio client)

@sharad1126
Copy link
Contributor

@nsmith- @ericvaandering as expected, it was the issue of setuptools. I just successfully installed rucioclient on vocms0268.
This means you can add this as a dependency for rucio client installation in your docs:

[sagarwal@vocms0268 ~]$ pip list | grep setup
setuptools (41.4.0)

@sharad1126
Copy link
Contributor

I have successfully installed rucio client in all machines which runs checkor module i.e. vocms0268, vocms0269 and vocms0273 under the user cmsunified.

@sharad1126
Copy link
Contributor

I also created the cfg file to test the client.

[cmsunified@lxplus715 ~]$ touch ~/.local/etc/rucio.cfg
[cmsunified@lxplus715 ~]$ vim ~/.local/etc/rucio.cfg
[cmsunified@vocms0268 ~]$ export RUCIO_HOME=~/.local/
[cmsunified@vocms0268 ~]$ python
Python 2.7.5 (default, Apr  2 2020, 13:16:51) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import rucio.client
>>> 
[cmsunified@vocms0268 ~]$ 

@todor-ivanov
Copy link
Contributor Author

Super! Thanks @sharad1126

@todor-ivanov todor-ivanov force-pushed the feature_Rucio_Phedex_DBS_consist_check branch from 3383bf4 to 6e337dd Compare May 15, 2020 16:31
Check file presence at both systems - Rucio & Phedex

Recalculate missing_phedex with corrections for files managed by Rucio.

Typo

Check filecount on block level, fetch filecount from metadata instead if len(filenames)

Adding 'account=unified' in default config && typo

Split unified config lists to relval and nonrelval

Split unified config lists to relval and nonrelval / 2
@todor-ivanov todor-ivanov force-pushed the feature_Rucio_Phedex_DBS_consist_check branch from cca87e5 to 892a255 Compare May 15, 2020 16:37
Unified/checkor.py Show resolved Hide resolved
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding Rucio to DBS-Phedex consistency check
7 participants