-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
root:// TPC transfer fail with xrootd 5.6.x #2202
Comments
Could you please attach here the configuration used for the server and the command used for triggering the TPC? There are many things that could go wrong, just the error above is not enough to understand the problem. Thanks! |
TPC copy command: server config(our side): xrdcp-tpc.sh #!/bin/sh
#Original code
#/usr/bin/xrdcp --server -f $1 root://$XRDXROOTD_PROXY/$2
# Get the last two variables as SRC and DST, all others are assumed as additional arguments
OTHERARGS="${@:1:$#-2}"
DSTFILE="${@:$#:1}"
SRCFILE="${@:$#-1:1}"
/usr/bin/xrdcp $OTHERARGS --server -f $SRCFILE root://ceph-gw8.gridpp.rl.ac.uk:1095/$DSTFILE |
Thank you very much, I will setup local servers with the same configuration and investigate this problem. I just noticed this:
Note the |
Hello, |
I see. I'm hoping to get a 5.6.9 out soon, and will definitely investigate this issue before to include it, just have not had the time yet. |
I'm trying to reproduce this problem, but it seems that you may have a failure due to using a relative path (i.e. not using two slashes) in the destination URL. This is what I see on my machine just after the "permission denied" error: $ xrdcp -d2 -v -f --tpc delegate only root://gentoo.cern.ch:9091//file0 root://gentoo.cern.ch:9092/test0
...
[2024-03-13 10:58:11.283337 +0100][Debug ][ExDbgMsg ] [gentoo.cern.ch:9092] Calling MsgHandler: 0x425e1860 (message: kXR_stat (path: test0, flags: none) ) with status: [ERROR] Error response: permission denied.
[2024-03-13 10:58:11.283363 +0100][Debug ][ExDbgMsg ] [gentoo.cern.ch:9092] Destroying MsgHandler: 0x425e1860.
[2024-03-13 10:58:11.283386 +0100][Error ][App ] [ERROR] Error response: permission denied (destination)
[ERROR] Server responded with an error: [3010] Stating relative path 'test0' is disallowed. |
Hi @amadio, |
including in the destination URL in the copy command: |
Could you please provide a log file with |
clientlog.txt |
Hi @Jo-stfc - I don't see how this TPC will work given the circumstances. The relevant failure is here: [prometheus.desy.de:33140] Got a kXR_error response to request kXR_sync (handle: 0x00000000) [3010] Authentication to ceph-gw8.gridpp.rl.ac.uk:1094 failed; all protocols have been tried. The way this works is that the client issuing the TPC request contacts the destination which happens to be a test dCache instances named prometheus:1095. Authentication works there and the subsequent open redirects the client to a door which is responsible for copying the file to its local storage. This happens to be the same machine but a different service (i.e. prometheus:33140). So, this service which has no credentials that I can think of (it's a test instance) connect to gw8.gridpp.rl.ac.uk:1094 to open the source file and pull it in. Since the dCache door has no credentials the login fails. Hence the whole TPC fails. I'm not sure what the intent was here but it certainly will not work with the systems involved. I suppose if dCache implemented Unix authentication then enabling Unix for Prometheus might be a solution. |
xrd554working.txt |
The problem in the above seems to be the same, authentication failure after a redirect:
I'd look for configuration differences between the two different sites to which you get redirected. Note also the |
Hi @amadio. The 2 machines in question were configured to have different xrootd versions: ceph-svc26 is running on xrootd-5.5.4, while ceph-gw8 is on 5.6.7-3. so the TPC transfer on version 5.5.4 was run from svc26 to golias, while the 5.6.7 transfer was from gw8 to golias. |
Is the configuration on these two machines the same? Is the authentication used for TPC supposed to work with both? Maybe it's worth to add |
svc26 and gw8 have the same configuration. These are the logs for those servers for the transfer:
svc26(v5.5.4)
|
I don't see the error in the first log, can you include a bit more of the output? It seems that the VOMS attribute extraction or the grid-mapfile may differ between the instances, which could explain the authentication failure on one of the sides. |
there is no error logged on gw8. that's all it logged for the transfer. |
these are the client logs from the machine that initiated the TPC |
I see yet another redirection in the new log you sent me:
So after all redirects, the connection is actually a pull from |
Could you please also check the log of the client on |
I don't have access to those logs as they're on another site. I'll ask them and update when I have the logs |
I have almost no experience with xroot-tpc, this is in our dpmpool24 dCache 9.2.6 log file
We use default dCache configuration when it comes to xroot protocol configuration. Our dCache have |
I'm getting same (?) error also with dCache 8.2.36 that runs a bit closer to RAL $ XRD_LOGLEVEL=Info gfal-copy -f root://ceph-gw8.gridpp.rl.ac.uk:1094//dteam:test450_2.txt root://mover.pp.rl.ac.uk:1094/pnfs/pp.rl.ac.uk/data/atlas/atlasdatadisk/SAM/x
...
Copying root://ceph-gw8.gridpp.rl.ac.uk:1094//dteam:test450_2.txt 2s File size: 122MB [2024-03-20 14:03:06.421648 +0100][Error ][File ] [0x1c0013c0@root://mover.pp.rl.ac.uk:1094///pnfs/pp.rl.ac.uk/data/atlas/atlasdatadisk/SAM/x?oss.asize=127979454&tpc.dlg=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.dlgon=1&tpc.key=2f4257fbb91e4d0d65fade88&tpc.lfn=//dteam:test450_2.txt&tpc.scgi=tpc.stage=placement xrd.gsiusrpxy=/tmp/x509up_u8021&tpc.spr=root&tpc.src=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.stage=copy&tpc.tpr=root&xrd.gsiusrpxy=/tmp/x509up_u8021&xrdcl.intent=tpc&xrdcl.requuid=059dbcf9-d0cb-40b8-b555-18d853904042] Fatal file state error. Message kXR_sync (handle: 0x00000000) returned with [ERROR] Server responded with an error: [3010] Authentication to ceph-gw8.gridpp.rl.ac.uk:1094 failed; all protocols have been tried.
... I also upgraded dpmpool20.farm.particle.cz at our dCache storage to the latest 9.2.15 release and error is still same. btw: is there any advantage using xroot protocol for third-party-copy? Majority of WLCG TPC transfers are done with HTTP-TPC. |
Thanks Petr. |
This seems to finally get to the real issue. It seems dCache does not support |
I've tried setting -md:sha1 and -md:sha1:md5. sha1:md5 worked |
thanks! |
Alright! Closing then! Glad we could figure this one out. |
TPC transfers on root protocol fail on servers with xrootd 5.6.0 and above.
I haven't been able to replicate failure this between servers at RAL, only with pulls from remote.
The same servers with the same configuration would succeed with xrootd 5.5.4 installed.
[2024-02-22 13:13:08.310905 +0000][Debug ][File ] [0x2391810@root://prometheus.desy.de:1095/VOs/dteam/test450.txt?oss.asize=128255006&tpc.dlg=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.dlgon=1&tpc.key=1f31bb888c238ce765d74863&tpc.lfn=/dteam:test450_2.txt&tpc.scgi=tpc.stage=placement&tpc.spr=root&tpc.src=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.stage=copy&tpc.tpr=root&xrdcl.intent=tpc&xrdcl.requuid=c0a7f9f0-9d1b-4a3a-b481-b5a0665920b2] Sending a sync command for handle 0x0 to prometheus.desy.de:33121
[2024-02-22 13:13:08.310943 +0000][Debug ][ExDbgMsg ] [prometheus.desy.de:33121] MsgHandler created: 0x2397a80 (message: kXR_sync (handle: 0x00000000) ).
[2024-02-22 13:13:08.311052 +0000][Debug ][ExDbgMsg ] [prometheus.desy.de:33121] Moving MsgHandler: 0x2397a80 (message: kXR_sync (handle: 0x00000000) ) from out-queue to in-queue.
[2024-02-22 13:13:08.568243 +0000][Debug ][ExDbgMsg ] [msg: 0x340009d0] Assigned MsgHandler: 0x2397a80.
[2024-02-22 13:13:08.568300 +0000][Debug ][ExDbgMsg ] [handler: 0x2397a80] Removed MsgHandler: 0x2397a80 from the in-queue.
[2024-02-22 13:13:08.568363 +0000][Debug ][XRootD ] [prometheus.desy.de:33121] Handling error while processing kXR_sync (handle: 0x00000000): [ERROR] Error response: permission denied.
The text was updated successfully, but these errors were encountered: