Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

root:// TPC transfer fail with xrootd 5.6.x #2202

Closed
Jo-stfc opened this issue Feb 22, 2024 · 31 comments
Closed

root:// TPC transfer fail with xrootd 5.6.x #2202

Jo-stfc opened this issue Feb 22, 2024 · 31 comments
Assignees
Milestone

Comments

@Jo-stfc
Copy link
Collaborator

Jo-stfc commented Feb 22, 2024

TPC transfers on root protocol fail on servers with xrootd 5.6.0 and above.
I haven't been able to replicate failure this between servers at RAL, only with pulls from remote.
The same servers with the same configuration would succeed with xrootd 5.5.4 installed.

[2024-02-22 13:13:08.310905 +0000][Debug ][File ] [0x2391810@root://prometheus.desy.de:1095/VOs/dteam/test450.txt?oss.asize=128255006&tpc.dlg=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.dlgon=1&tpc.key=1f31bb888c238ce765d74863&tpc.lfn=/dteam:test450_2.txt&tpc.scgi=tpc.stage=placement&tpc.spr=root&tpc.src=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.stage=copy&tpc.tpr=root&xrdcl.intent=tpc&xrdcl.requuid=c0a7f9f0-9d1b-4a3a-b481-b5a0665920b2] Sending a sync command for handle 0x0 to prometheus.desy.de:33121
[2024-02-22 13:13:08.310943 +0000][Debug ][ExDbgMsg ] [prometheus.desy.de:33121] MsgHandler created: 0x2397a80 (message: kXR_sync (handle: 0x00000000) ).
[2024-02-22 13:13:08.311052 +0000][Debug ][ExDbgMsg ] [prometheus.desy.de:33121] Moving MsgHandler: 0x2397a80 (message: kXR_sync (handle: 0x00000000) ) from out-queue to in-queue.
[2024-02-22 13:13:08.568243 +0000][Debug ][ExDbgMsg ] [msg: 0x340009d0] Assigned MsgHandler: 0x2397a80.
[2024-02-22 13:13:08.568300 +0000][Debug ][ExDbgMsg ] [handler: 0x2397a80] Removed MsgHandler: 0x2397a80 from the in-queue.
[2024-02-22 13:13:08.568363 +0000][Debug ][XRootD ] [prometheus.desy.de:33121] Handling error while processing kXR_sync (handle: 0x00000000): [ERROR] Error response: permission denied.

@amadio
Copy link
Member

amadio commented Feb 22, 2024

Could you please attach here the configuration used for the server and the command used for triggering the TPC? There are many things that could go wrong, just the error above is not enough to understand the problem. Thanks!

@amadio amadio self-assigned this Feb 22, 2024
@Jo-stfc
Copy link
Collaborator Author

Jo-stfc commented Feb 22, 2024

TPC copy command:
xrdcp -d2 -v -f --tpc delegate only root://ceph-gw8.gridpp.rl.ac.uk:1094//dteam:test450_2.txt root://prometheus.desy.de:1095/VOs/dteam/test450.txt --force

server config(our side):
xrootd-tpc.txt
xrootd-unified.txt

xrdcp-tpc.sh

#!/bin/sh

#Original code
#/usr/bin/xrdcp --server -f $1 root://$XRDXROOTD_PROXY/$2

# Get the last two variables as SRC and DST, all others are assumed as additional arguments
OTHERARGS="${@:1:$#-2}"
DSTFILE="${@:$#:1}"
SRCFILE="${@:$#-1:1}"


/usr/bin/xrdcp $OTHERARGS --server -f $SRCFILE root://ceph-gw8.gridpp.rl.ac.uk:1095/$DSTFILE

@amadio
Copy link
Member

amadio commented Feb 22, 2024

Thank you very much, I will setup local servers with the same configuration and investigate this problem. I just noticed this:

gw8.gridpp.rl.ac.uk:1094&tpc.stage=copy&tpc.tpr=root&xrdcl.intent=tpc&xrdcl.requuid=c0a7f9f0-9d1b-4a3a-b481-b5a0665920b2] Sending a sync command for handle 0x0 to prometheus.desy.de:33121

Note the handle 0x0, which could be the problem here.

@mdoidge
Copy link

mdoidge commented Mar 6, 2024

Hello,
Just to comment we also see the same issue here at Lancaster, with our 5.6.7 XRoot servers and DUNE root:// tpc transfers.
Cheers,
Matt

@amadio amadio added this to the 5.6.9 milestone Mar 6, 2024
@amadio
Copy link
Member

amadio commented Mar 6, 2024

I see. I'm hoping to get a 5.6.9 out soon, and will definitely investigate this issue before to include it, just have not had the time yet.

@amadio amadio modified the milestones: 5.6.9, 5.7.0 Mar 8, 2024
@amadio
Copy link
Member

amadio commented Mar 13, 2024

I'm trying to reproduce this problem, but it seems that you may have a failure due to using a relative path (i.e. not using two slashes) in the destination URL. This is what I see on my machine just after the "permission denied" error:

$ xrdcp -d2 -v -f --tpc delegate only root://gentoo.cern.ch:9091//file0 root://gentoo.cern.ch:9092/test0
...
[2024-03-13 10:58:11.283337 +0100][Debug  ][ExDbgMsg          ] [gentoo.cern.ch:9092] Calling MsgHandler: 0x425e1860 (message: kXR_stat (path: test0, flags: none) ) with status: [ERROR] Error response: permission denied.
[2024-03-13 10:58:11.283363 +0100][Debug  ][ExDbgMsg          ] [gentoo.cern.ch:9092] Destroying MsgHandler: 0x425e1860.
[2024-03-13 10:58:11.283386 +0100][Error  ][App               ] [ERROR] Error response: permission denied (destination)
[ERROR] Server responded with an error: [3010] Stating relative path 'test0' is disallowed.

@Jo-stfc
Copy link
Collaborator Author

Jo-stfc commented Mar 13, 2024

Hi @amadio,
RAL has name2name mapping configurations set to allow single slash paths for historic reasons. I've amended the TPC script (xrdcp-tpc.sh) in gw8 to use double slashes instead but still get the same permission denied error

@Jo-stfc
Copy link
Collaborator Author

Jo-stfc commented Mar 13, 2024

including in the destination URL in the copy command:
-bash-4.2$ xrdcp -v -f --tpc delegate only root://ceph-gw8.gridpp.rl.ac.uk:1094//dteam:test450_2.txt root://prometheus.d
esy.de:1095//VOs/dteam/test450.txt --force
[0B/0B][100%][==================================================][0B/s]
Run: [ERROR] Server responded with an error: [3010] Authentication to ceph-gw8.gridpp.rl.ac.uk:1094 failed; all protocols have been tried.

@amadio
Copy link
Member

amadio commented Mar 13, 2024

Could you please provide a log file with XRD_LOGLEVEL=Dump for the command above? It seems likely that the credentials are not working to authenticate with ceph-gw8.gridpp.rl.ac.uk:1094

@Jo-stfc
Copy link
Collaborator Author

Jo-stfc commented Mar 13, 2024

clientlog.txt
Hi, this is the client log with loglevel dump

@abh3
Copy link
Member

abh3 commented Mar 13, 2024

Hi @Jo-stfc - I don't see how this TPC will work given the circumstances. The relevant failure is here:

[prometheus.desy.de:33140] Got a kXR_error response to request kXR_sync (handle: 0x00000000) [3010] Authentication to ceph-gw8.gridpp.rl.ac.uk:1094 failed; all protocols have been tried.

The way this works is that the client issuing the TPC request contacts the destination which happens to be a test dCache instances named prometheus:1095. Authentication works there and the subsequent open redirects the client to a door which is responsible for copying the file to its local storage. This happens to be the same machine but a different service (i.e. prometheus:33140). So, this service which has no credentials that I can think of (it's a test instance) connect to gw8.gridpp.rl.ac.uk:1094 to open the source file and pull it in. Since the dCache door has no credentials the login fails. Hence the whole TPC fails. I'm not sure what the intent was here but it certainly will not work with the systems involved. I suppose if dCache implemented Unix authentication then enabling Unix for Prometheus might be a solution.

@Jo-stfc
Copy link
Collaborator Author

Jo-stfc commented Mar 14, 2024

xrd554working.txt
xrd567denied.txt
Hi @abh3,
thanks for the explanation, I've checked against the prometeus endpoint and it indeed doesn't work (on either version). I've attached the log dumps against another endpoint where I could replicate the issue (root TPC working on 5.5.4 and getting denied on 5.6.7)

@amadio
Copy link
Member

amadio commented Mar 19, 2024

The problem in the above seems to be the same, authentication failure after a redirect:


### WORKING LOG

~ $ grep -Ei '\[error\]| TPC |redir' ~/tmp/xrd554working.txt 
[2024-03-14 09:08:10.892926 +0000][Debug  ][Utility           ] Unable to process user config file: [ERROR] OS Error: no such file or directory
[2024-03-14 09:08:10.893209 +0000][Debug  ][PlugInMgr         ] Unable to process directory /home/tier1/fur43467/.xrootd/client.plugins.d: [ERROR] OS Error: no such file or directory
[2024-03-14 09:08:11.229268 +0000][Debug  ][Utility           ] Generating the destination TPC URL
[2024-03-14 09:08:11.496345 +0000][Dump   ][XRootD            ] [golias100.farm.particle.cz:1094] Got kXR_redirect response to message kXR_open (file: /dpm/farm.particle.cz//home/dteam/test/test450_2.txt?oss.asize=127979454&tpc.dlg=ceph-svc26.gridpp.rl.ac.uk:1094&tpc.dlgon=1&tpc.key=0daa7c8e2d1e2e0965f2be7b&tpc.lfn=/dteam:test450_2.txt&tpc.scgi=tpc.stage=placement&tpc.spr=root&tpc.src=ceph-svc26.gridpp.rl.ac.uk:1094&tpc.stage=copy&tpc.tpr=root, mode: 0644, flags: kXR_delete kXR_open_updt kXR_async kXR_retstat ): ds28.farm.particle.cz?tpc.dlg=ceph-svc26.gridpp.rl.ac.uk:1094&tpc.key=0daa7c8e2d1e2e0965f2be7b&tpc.tpr=root&tpc.spr=root&tpc.dlgon=1&tpc.lfn=/dteam:test450_2.txt&tpc.src=ceph-svc26.gridpp.rl.ac.uk:1094&org.dcache.xrootd.client=fur43467.1977865@lcgui05.gridpp.rl.ac.uk&oss.asize=127979454&org.dcache.uuid=1a96a241-9b1f-47e7-a8e8-fdba72e60b12&tpc.scgi=tpc.stage=placement&tpc.stage=copy?org.dcache.door=2001:718:401:6017:2:0:0:1000:1094, port 23509
[2024-03-14 09:08:11.688903 +0000][Debug  ][XRootD            ] Redirect trace-back:
[2024-03-14 09:08:11.688903 +0000][Debug  ][XRootD            ] 	0. Redirected from: root://golias100.farm.particle.cz:1094//dpm/farm.particle.cz//home/dteam/test/test450_2.txt to: root://ds28.farm.particle.cz:23509/
[2024-03-14 09:08:11.688987 +0000][Debug  ][Utility           ] Checking if the data server ds28.farm.particle.cz:23509 supports tpc / tpc lite
[2024-03-14 09:08:11.794831 +0000][Debug  ][Utility           ] TPC lite supported at: ds28.farm.particle.cz:23509

### FAILING LOG

~ $ grep -Ei '\[error\]| TPC |redir' ~/tmp/xrd567denied.txt
[2024-03-14 09:09:50.774686 +0000][Debug  ][Utility           ] Unable to process user config file: [ERROR] OS Error: no such file or directory
[2024-03-14 09:09:50.775010 +0000][Debug  ][PlugInMgr         ] Unable to process directory /home/tier1/fur43467/.xrootd/client.plugins.d: [ERROR] OS Error: no such file or directory
[2024-03-14 09:09:51.411516 +0000][Debug  ][Utility           ] Generating the destination TPC URL
[2024-03-14 09:09:51.661676 +0000][Dump   ][XRootD            ] [golias100.farm.particle.cz:1094] Got kXR_redirect response to message kXR_open (file: /dpm/farm.particle.cz//home/dteam/test/test450_2.txt?oss.asize=127979454&tpc.dlg=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.dlgon=1&tpc.key=188762392e7e2ec865f2bedf&tpc.lfn=/dteam:test450_2.txt&tpc.scgi=tpc.stage=placement&tpc.spr=root&tpc.src=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.stage=copy&tpc.tpr=root, mode: 0644, flags: kXR_delete kXR_open_updt kXR_async kXR_retstat ): ds32.farm.particle.cz?tpc.dlg=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.key=188762392e7e2ec865f2bedf&tpc.tpr=root&tpc.spr=root&tpc.dlgon=1&tpc.lfn=/dteam:test450_2.txt&tpc.src=ceph-gw8.gridpp.rl.ac.uk:1094&org.dcache.xrootd.client=fur43467.1978056@lcgui05.gridpp.rl.ac.uk&oss.asize=127979454&org.dcache.uuid=067b5f41-2375-4e59-bfd4-bd7022e05055&tpc.scgi=tpc.stage=placement&tpc.stage=copy?org.dcache.door=2001:718:401:6017:2:0:0:1000:1094, port 23767
[2024-03-14 09:09:51.813868 +0000][Debug  ][XRootD            ] Redirect trace-back:
[2024-03-14 09:09:51.813868 +0000][Debug  ][XRootD            ] 	0. Redirected from: root://golias100.farm.particle.cz:1094//dpm/farm.particle.cz//home/dteam/test/test450_2.txt to: root://ds32.farm.particle.cz:23767/
[2024-03-14 09:09:51.813946 +0000][Debug  ][Utility           ] Checking if the data server ds32.farm.particle.cz:23767 supports tpc / tpc lite
[2024-03-14 09:09:51.920400 +0000][Debug  ][Utility           ] TPC lite supported at: ds32.farm.particle.cz:23767
[2024-03-14 09:09:52.144044 +0000][Debug  ][XRootD            ] [ds32.farm.particle.cz:23767] Handling error while processing kXR_sync (handle: 0x00000000): [ERROR] Error response: permission denied.
[2024-03-14 09:09:52.144066 +0000][Debug  ][ExDbgMsg          ] [ds32.farm.particle.cz:23767] Calling MsgHandler: 0x19f8d00 (message: kXR_sync (handle: 0x00000000) ) with status: [ERROR] Error response: permission denied.
[2024-03-14 09:09:52.144086 +0000][Dump   ][File              ] [0x19ecd00@root://golias100.farm.particle.cz:1094//dpm/farm.particle.cz//home/dteam/test/test450_2.txt?oss.asize=127979454&tpc.dlg=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.dlgon=1&tpc.key=188762392e7e2ec865f2bedf&tpc.lfn=/dteam:test450_2.txt&tpc.scgi=tpc.stage=placement&tpc.spr=root&tpc.src=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.stage=copy&tpc.tpr=root&xrdcl.intent=tpc&xrdcl.requuid=12baa290-5837-4252-aff5-3281962e3b78] File state error encountered. Message kXR_sync (handle: 0x00000000) returned with [ERROR] Server responded with an error: [3010] Authentication to ceph-gw8.gridpp.rl.ac.uk:1094 failed; all protocols have been tried.
[2024-03-14 09:09:52.144109 +0000][Error  ][File              ] [0x19ecd00@root://golias100.farm.particle.cz:1094//dpm/farm.particle.cz//home/dteam/test/test450_2.txt?oss.asize=127979454&tpc.dlg=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.dlgon=1&tpc.key=188762392e7e2ec865f2bedf&tpc.lfn=/dteam:test450_2.txt&tpc.scgi=tpc.stage=placement&tpc.spr=root&tpc.src=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.stage=copy&tpc.tpr=root&xrdcl.intent=tpc&xrdcl.requuid=12baa290-5837-4252-aff5-3281962e3b78] Fatal file state error. Message kXR_sync (handle: 0x00000000) returned with [ERROR] Server responded with an error: [3010] Authentication to ceph-gw8.gridpp.rl.ac.uk:1094 failed; all protocols have been tried.
[2024-03-14 09:09:52.144126 +0000][Dump   ][File              ] [0x19ecd00@root://golias100.farm.particle.cz:1094//dpm/farm.particle.cz//home/dteam/test/test450_2.txt?oss.asize=127979454&tpc.dlg=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.dlgon=1&tpc.key=188762392e7e2ec865f2bedf&tpc.lfn=/dteam:test450_2.txt&tpc.scgi=tpc.stage=placement&tpc.spr=root&tpc.src=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.stage=copy&tpc.tpr=root&xrdcl.intent=tpc&xrdcl.requuid=12baa290-5837-4252-aff5-3281962e3b78] Failing message kXR_sync (handle: 0x00000000) with [ERROR] Server responded with an error: [3010] Authentication to ceph-gw8.gridpp.rl.ac.uk:1094 failed; all protocols have been tried.
[2024-03-14 09:09:54.488986 +0000][Debug  ][XRootD            ] [ds32.farm.particle.cz:23767] Handling error while processing kXR_stat (handle: 0x00000000, flags: none): [ERROR] Error response: invalid request code.
[2024-03-14 09:09:54.488997 +0000][Debug  ][ExDbgMsg          ] [ds32.farm.particle.cz:23767] Calling MsgHandler: 0x1a00dc0 (message: kXR_stat (handle: 0x00000000, flags: none) ) with status: [ERROR] Error response: invalid request code.
[2024-03-14 09:09:54.489010 +0000][Dump   ][File              ] [0x19ecd00@root://golias100.farm.particle.cz:1094//dpm/farm.particle.cz//home/dteam/test/test450_2.txt?oss.asize=127979454&tpc.dlg=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.dlgon=1&tpc.key=188762392e7e2ec865f2bedf&tpc.lfn=/dteam:test450_2.txt&tpc.scgi=tpc.stage=placement&tpc.spr=root&tpc.src=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.stage=copy&tpc.tpr=root&xrdcl.intent=tpc&xrdcl.requuid=12baa290-5837-4252-aff5-3281962e3b78] File state error encountered. Message kXR_stat (handle: 0x00000000, flags: none) returned with [ERROR] Server responded with an error: [3006] Invalid request; user not authenticated
[2024-03-14 09:09:54.489020 +0000][Error  ][File              ] [0x19ecd00@root://golias100.farm.particle.cz:1094//dpm/farm.particle.cz//home/dteam/test/test450_2.txt?oss.asize=127979454&tpc.dlg=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.dlgon=1&tpc.key=188762392e7e2ec865f2bedf&tpc.lfn=/dteam:test450_2.txt&tpc.scgi=tpc.stage=placement&tpc.spr=root&tpc.src=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.stage=copy&tpc.tpr=root&xrdcl.intent=tpc&xrdcl.requuid=12baa290-5837-4252-aff5-3281962e3b78] Fatal file state error. Message kXR_stat (handle: 0x00000000, flags: none) returned with [ERROR] Server responded with an error: [3006] Invalid request; user not authenticated
[2024-03-14 09:09:54.489028 +0000][Dump   ][File              ] [0x19ecd00@root://golias100.farm.particle.cz:1094//dpm/farm.particle.cz//home/dteam/test/test450_2.txt?oss.asize=127979454&tpc.dlg=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.dlgon=1&tpc.key=188762392e7e2ec865f2bedf&tpc.lfn=/dteam:test450_2.txt&tpc.scgi=tpc.stage=placement&tpc.spr=root&tpc.src=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.stage=copy&tpc.tpr=root&xrdcl.intent=tpc&xrdcl.requuid=12baa290-5837-4252-aff5-3281962e3b78] Failing message kXR_stat (handle: 0x00000000, flags: none) with [ERROR] Server responded with an error: [3006] Invalid request; user not authenticated
[2024-03-14 09:09:54.489091 +0000][Error  ][Utility           ] Third party copy from root://ceph-gw8.gridpp.rl.ac.uk:1094//dteam:test450_2.txt?xrdcl.intent=tpc to root://golias100.farm.particle.cz:1094//dpm/farm.particle.cz//home/dteam/test/test450_2.txt?xrdcl.intent=tpc failed: [ERROR] Server responded with an error: [3010] Authentication to ceph-gw8.gridpp.rl.ac.uk:1094 failed; all protocols have been tried.
[2024-03-14 09:09:54.522770 +0000][Debug  ][XRootD            ] [ds32.farm.particle.cz:23767] Handling error while processing kXR_close (handle: 0x00000000): [ERROR] Error response: numerical argument out of domain.
[2024-03-14 09:09:54.522780 +0000][Debug  ][ExDbgMsg          ] [ds32.farm.particle.cz:23767] Calling MsgHandler: 0x1a00dc0 (message: kXR_close (handle: 0x00000000) ) with status: [ERROR] Error response: numerical argument out of domain.
[2024-03-14 09:09:54.522793 +0000][Debug  ][File              ] [0x19ecd00@root://golias100.farm.particle.cz:1094//dpm/farm.particle.cz//home/dteam/test/test450_2.txt?oss.asize=127979454&tpc.dlg=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.dlgon=1&tpc.key=188762392e7e2ec865f2bedf&tpc.lfn=/dteam:test450_2.txt&tpc.scgi=tpc.stage=placement&tpc.spr=root&tpc.src=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.stage=copy&tpc.tpr=root&xrdcl.intent=tpc&xrdcl.requuid=12baa290-5837-4252-aff5-3281962e3b78] Close returned from ds32.farm.particle.cz:23767 with: [ERROR] Server responded with an error: [3019] File size mismatch (expected=127979454, actual=0)
Run: [ERROR] Server responded with an error: [3010] Authentication to ceph-gw8.gridpp.rl.ac.uk:1094 failed; all protocols have been tried.

I'd look for configuration differences between the two different sites to which you get redirected. Note also the tpc.src=ceph-svc26.gridpp.rl.ac.uk:1094 on the working case and tpc.src=ceph-gw8.gridpp.rl.ac.uk:1094 on the failing case. So the pull happens from different machines after the redirection, and to different sources. Is the authentication method used supposed to work with all of them?

@Jo-stfc
Copy link
Collaborator Author

Jo-stfc commented Mar 19, 2024

Hi @amadio. The 2 machines in question were configured to have different xrootd versions:

ceph-svc26 is running on xrootd-5.5.4, while ceph-gw8 is on 5.6.7-3.

so the TPC transfer on version 5.5.4 was run from svc26 to golias, while the 5.6.7 transfer was from gw8 to golias.

@amadio
Copy link
Member

amadio commented Mar 19, 2024

Is the configuration on these two machines the same? Is the authentication used for TPC supposed to work with both? Maybe it's worth to add tpc.trace all (or the dcache equivalent) and look at the client log for both pull attempts to see what's wrong. I only see authentication failure from ds32.farm.particle.cz:23767 to ceph-gw8.gridpp.rl.ac.uk:1094 to perform the pull, but the logs of the tpc client pull are needed to know why authentication fails. You may also look at the server logs of ceph-gw8.gridpp.rl.ac.uk:1094 to see what happened with the connection attempt from ds32.farm.particle.cz:23767.

@Jo-stfc
Copy link
Collaborator Author

Jo-stfc commented Mar 19, 2024

svc26 and gw8 have the same configuration. These are the logs for those servers for the transfer:
gw8(v5.6.7):

240319 10:24:20 3503760 XrdLinkXeq: anon.0:118@lcgui05 connection upgraded to TLSv1.2
240319 10:24:21 3503760  XrdVomsFun: proxy: <proxycert>
240319 10:24:21 3503760  XrdVomsFun: adding cert: /C=UK/O=eScience/OU=CLRC/L=RAL/CN=jyothish thomas
240319 10:24:21 3503760  XrdVomsFun: retrieval successful
240319 10:24:21 3503760  XrdVomsFun: found VO: dteam
240319 10:24:21 3503760  XrdVomsFun:  ---> group: '/dteam', role: 'NULL', cap: 'NULL'
240319 10:24:21 3503760  XrdVomsFun:  ---> fqan: '/dteam/Role=NULL/Capability=NULL'
240319 10:24:21 3503760 XrootdXeq: fur43467.2365846:118@lcgui05 pub IP64 TLSv1.2 login as 45954061.0
240319 10:24:21 3503760 acc_Audit: fur43467.2365846:118@lcgui05 grant gsi 45954061.0@lcgui05.gridpp.rl.ac.uk read /dteam:test450_2.txt
240319 10:24:21 ceph_namelib : translated /dteam:test450_2.txt to dteam:test450_2.txt
240319 10:24:21 dumpClusterInfo : Counts: 5 5 0 0 5 5 5 CountsbyCluster: [0:24, 1:24, 2:23, 3:23, 4:23, ],
240319 10:24:21 Access Mode: /dteam:test450_2.txt flags&O_ACCMODE 0
240319 10:24:21 File descriptor 25 associated to file /dteam:test450_2.txt opened in read mode
XrdCephOssBufferedFile::Open got fd: 25 /dteam:test450_2.txt
240319 10:24:21 ceph_posix_fstat
XrdCephOssBufferedFile::Summary: {"fd":25, "Elapsed_time_ms":2, "path":"/dteam:test450_2.txt", read_B:0, readV_B:0, readAIO_B:0, writeB:0, writeAIO_B:0, startTime:"2024-03-19 10:24:21", endTime:"2024-03-19 10:24:21", nBuffersRead:0}
240319 10:24:21 ceph_close: closed fd 25 for file test450_2.txt, read ops count 0, write ops count 0, async write ops 0/0, async pending write bytes 0, async read ops 0/0, bytes written/max offset 0/0, longest async write 0.000000, longest callback invocation 0.000000, last async op age 0.000000
240319 10:24:21 3656008 XrdLinkXeq: anon.0:115@dpmpool24.farm.particle.cz connection upgraded to TLSv1.2
240319 10:24:22 3656008 XrootdXeq: fur43467.52021:115@dpmpool24.farm.particle.cz disc 0:00:01

svc26(v5.5.4)

240319 10:25:30 2670243 XrdLinkXeq: anon.0:79@lcgui05 connection upgraded to TLSv1.2
240319 10:25:30 2670243  XrdVomsFun: proxy: <proxycert>
240319 10:25:30 2670243  XrdVomsFun: adding cert: /C=UK/O=eScience/OU=CLRC/L=RAL/CN=jyothish thomas
240319 10:25:30 2670243  XrdVomsFun: retrieval successful
240319 10:25:30 2670243  XrdVomsFun: found VO: dteam
240319 10:25:30 2670243  XrdVomsFun:  ---> group: '/dteam', role: 'NULL', cap: 'NULL'
240319 10:25:30 2670243  XrdVomsFun:  ---> fqan: '/dteam/Role=NULL/Capability=NULL'
240319 10:25:30 2670243 XrootdXeq: fur43467.2365950:79@lcgui05 pub IP64 TLSv1.2 login as 45954061.0
240319 10:25:30 2670243 acc_Audit: fur43467.2365950:79@lcgui05 grant gsi 45954061.0@lcgui05.gridpp.rl.ac.uk read /dteam:test450_2.txt
240319 10:25:30 ceph_namelib : translated /dteam:test450_2.txt to dteam:test450_2.txt
240319 10:25:30 dumpClusterInfo : Counts: 5 5 0 0 5 5 5 CountsbyCluster: [0:3, 1:3, 2:3, 3:3, 4:3, ],
240319 10:25:30 Access Mode: /dteam:test450_2.txt flags&O_ACCMODE 0
240319 10:25:30 File descriptor 2 associated to file /dteam:test450_2.txt opened in read mode
XrdCephOssBufferedFile::Open got fd: 2 /dteam:test450_2.txt
240319 10:25:30 ceph_posix_fstat
XrdCephOssBufferedFile::Summary: {"fd":2, "Elapsed_time_ms":2, "path":"/dteam:test450_2.txt", read_B:0, readV_B:0, readAIO_B:0, writeB:0, writeAIO_B:0, startTime:"2024-03-19 10:25:30", endTime:"2024-03-19 10:25:30", nBuffersRead:0}
240319 10:25:30 ceph_close: closed fd 2 for file test450_2.txt, read ops count 0, write ops count 0, async write ops 0/0, async pending write bytes 0, async read ops 0/0, bytes written/max offset 0/0, longest async write 0.000000, longest callback invocation 0.000000, last async op age 0.000000
240319 10:25:30 2670333 XrdLinkXeq: anon.0:36@dpmpool21.farm.particle.cz connection upgraded to TLSv1.2
240319 10:25:31 2670333  XrdVomsFun: proxy: <proxycert>/CN=871422721
240319 10:25:31 2670333  XrdVomsFun: adding cert: <proxycert>
240319 10:25:31 2670333  XrdVomsFun: adding cert: /C=UK/O=eScience/OU=CLRC/L=RAL/CN=jyothish thomas
240319 10:25:31 2670333  XrdVomsFun: retrieval successful
240319 10:25:31 2670333  XrdVomsFun: found VO: dteam
240319 10:25:31 2670333  XrdVomsFun:  ---> group: '/dteam', role: 'NULL', cap: 'NULL'
240319 10:25:31 2670333  XrdVomsFun:  ---> fqan: '/dteam/Role=NULL/Capability=NULL'
240319 10:25:31 2670333 XrootdXeq: fur43467.11114:36@dpmpool21.farm.particle.cz pub IPv6 TLSv1.2 login as 45954061.0
240319 10:25:31 2670333 acc_Audit: fur43467.11114:36@dpmpool21.farm.particle.cz grant gsi 45954061.0@dpmpool21.farm.particle.cz read /dteam:test450_2.txt
240319 10:25:31 ceph_namelib : translated /dteam:test450_2.txt to dteam:test450_2.txt
240319 10:25:31 dumpClusterInfo : Counts: 5 5 0 0 5 5 5 CountsbyCluster: [0:4, 1:4, 2:4, 3:3, 4:3, ],
240319 10:25:31 Access Mode: /dteam:test450_2.txt flags&O_ACCMODE 0
240319 10:25:31 File descriptor 3 associated to file /dteam:test450_2.txt opened in read mode
XrdCephOssBufferedFile::Open got fd: 3 /dteam:test450_2.txt
240319 10:25:31 ceph_posix_fstat
XrdCephOssBufferedFile: buffer: got 0 buffers already
XrdCephBufferDataSimple:  Global: 1 16777216
XrdCephOssBufferedFile::Open: fd: 3 Buffer created: 16777216
XrdCephOssBufferedFile::Summary: {"fd":3, "Elapsed_time_ms":4229, "path":"/dteam:test450_2.txt", read_B:127979454, readV_B:0, readAIO_B:0, writeB:0, writeAIO_B:0, startTime:"2024-03-19 10:25:31", endTime:"2024-03-19 10:25:35", nBuffersRead:1}
240319 10:25:35 ceph_close: closed fd 3 for file test450_2.txt, read ops count 8, write ops count 0, async write ops 0/0, async pending write bytes 0, async read ops 0/0, bytes written/max offset 0/0, longest async write 0.000000, longest callback invocation 0.000000, last async op age 0.000000
XrdCephBufferAlgSimple::Destructor, fd=3, retrieved_bytes=127979454, bypassed_bytes=0, delivered_bytes=127979454, cache_hit_frac=1
CephIOAdapterRaw::Summary fd:3 nwrite:0 byteswritten:0 write_s:0 writemax_s0 write_MBs:0 nread:8 bytesread:127979454 read_s:1074.78  readmax_s:262.472 read_MBs:0 striperlessRead: 1
XrdCephBufferDataSimple~:  Global: 0 0
240319 10:25:35 2670333 XrootdXeq: fur43467.11114:36@dpmpool21.farm.particle.cz disc 0:00:05
240319 10:25:35 2670470 XrootdXeq: fur43467.2365950:79@lcgui05 disc 0:00:05

@amadio
Copy link
Member

amadio commented Mar 19, 2024

I don't see the error in the first log, can you include a bit more of the output? It seems that the VOMS attribute extraction or the grid-mapfile may differ between the instances, which could explain the authentication failure on one of the sides.

@Jo-stfc
Copy link
Collaborator Author

Jo-stfc commented Mar 19, 2024

there is no error logged on gw8. that's all it logged for the transfer.
I've set:
xrd.trace debug
xrootd.trace debug
tpc.trace all

@Jo-stfc
Copy link
Collaborator Author

Jo-stfc commented Mar 19, 2024

these are the client logs from the machine that initiated the TPC
initclientlogs.txt

@amadio
Copy link
Member

amadio commented Mar 19, 2024

This looks suspicious, the ^A in the username. Maybe an encoding problem in the gridmap?
screenshot

@amadio
Copy link
Member

amadio commented Mar 19, 2024

I see yet another redirection in the new log you sent me:

$ grep -Ei '\[error\]| TPC |redir|balanc' ~/tmp/initclientlogs.txt 
[2024-03-19 10:37:20.715660 +0000][Debug  ][Utility           ] Unable to process user config file: [ERROR] OS Error: no such file or directory
[2024-03-19 10:37:20.715971 +0000][Debug  ][PlugInMgr         ] Unable to process directory /home/tier1/fur43467/.xrootd/client.plugins.d: [ERROR] OS Error: no such file or directory
[2024-03-19 10:37:21.161875 +0000][Debug  ][Utility           ] Generating the destination TPC URL
[2024-03-19 10:37:21.431134 +0000][Dump   ][XRootD            ] [golias100.farm.particle.cz:1094] Got kXR_redirect response to message kXR_open (file: /dpm/farm.particle.cz//home/dteam/test/test450_2.txt?oss.asize=127979454&tpc.dlg=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.dlgon=1&tpc.key=09a620ed18271d7765f96ae1&tpc.lfn=/dteam:test450_2.txt&tpc.scgi=tpc.stage=placement&tpc.spr=root&tpc.src=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.stage=copy&tpc.tpr=root, mode: 0644, flags: kXR_delete kXR_open_updt kXR_async kXR_retstat ): dpmpool24.farm.particle.cz?tpc.dlg=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.key=09a620ed18271d7765f96ae1&tpc.tpr=root&tpc.spr=root&tpc.dlgon=1&tpc.lfn=/dteam:test450_2.txt&tpc.src=ceph-gw8.gridpp.rl.ac.uk:1094&org.dcache.xrootd.client=fur43467.2366839@lcgui05.gridpp.rl.ac.uk&oss.asize=127979454&org.dcache.uuid=6968bed8-182f-474c-a110-fc4770be3297&tpc.scgi=tpc.stage=placement&tpc.stage=copy?org.dcache.door=2001:718:401:6017:2:0:0:1000:1094, port 22176
[2024-03-19 10:37:21.431177 +0000][Dump   ][XRootD            ] [golias100.farm.particle.cz:1094] Current server has been assigned as a load-balancer for message kXR_open (file: /dpm/farm.particle.cz//home/dteam/test/test450_2.txt?oss.asize=127979454&tpc.dlg=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.dlgon=1&tpc.key=09a620ed18271d7765f96ae1&tpc.lfn=/dteam:test450_2.txt&tpc.scgi=tpc.stage=placement&tpc.spr=root&tpc.src=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.stage=copy&tpc.tpr=root, mode: 0644, flags: kXR_delete kXR_open_updt kXR_async kXR_retstat )
[2024-03-19 10:37:21.551096 +0000][Debug  ][XRootD            ] Redirect trace-back:
[2024-03-19 10:37:21.551096 +0000][Debug  ][XRootD            ] 	0. Redirected from: root://golias100.farm.particle.cz:1094//dpm/farm.particle.cz//home/dteam/test/test450_2.txt to: root://dpmpool24.farm.particle.cz:22176/
[2024-03-19 10:37:21.551186 +0000][Debug  ][Utility           ] Checking if the data server dpmpool24.farm.particle.cz:22176 supports tpc / tpc lite
[2024-03-19 10:37:21.659105 +0000][Debug  ][Utility           ] TPC lite supported at: dpmpool24.farm.particle.cz:22176
[2024-03-19 10:37:21.937619 +0000][Debug  ][XRootD            ] [dpmpool24.farm.particle.cz:22176] Handling error while processing kXR_sync (handle: 0x00000000): [ERROR] Error response: permission denied.
[2024-03-19 10:37:21.937640 +0000][Debug  ][ExDbgMsg          ] [dpmpool24.farm.particle.cz:22176] Calling MsgHandler: 0xbc7f20 (message: kXR_sync (handle: 0x00000000) ) with status: [ERROR] Error response: permission denied.
[2024-03-19 10:37:21.937660 +0000][Dump   ][File              ] [0xbb1d00@root://golias100.farm.particle.cz:1094//dpm/farm.particle.cz//home/dteam/test/test450_2.txt?oss.asize=127979454&tpc.dlg=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.dlgon=1&tpc.key=09a620ed18271d7765f96ae1&tpc.lfn=/dteam:test450_2.txt&tpc.scgi=tpc.stage=placement&tpc.spr=root&tpc.src=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.stage=copy&tpc.tpr=root&xrdcl.intent=tpc&xrdcl.requuid=765a2c13-3867-46e2-ba68-6a04bbe9c3ea] File state error encountered. Message kXR_sync (handle: 0x00000000) returned with [ERROR] Server responded with an error: [3010] Authentication to ceph-gw8.gridpp.rl.ac.uk:1094 failed; all protocols have been tried.
[2024-03-19 10:37:21.937672 +0000][Error  ][File              ] [0xbb1d00@root://golias100.farm.particle.cz:1094//dpm/farm.particle.cz//home/dteam/test/test450_2.txt?oss.asize=127979454&tpc.dlg=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.dlgon=1&tpc.key=09a620ed18271d7765f96ae1&tpc.lfn=/dteam:test450_2.txt&tpc.scgi=tpc.stage=placement&tpc.spr=root&tpc.src=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.stage=copy&tpc.tpr=root&xrdcl.intent=tpc&xrdcl.requuid=765a2c13-3867-46e2-ba68-6a04bbe9c3ea] Fatal file state error. Message kXR_sync (handle: 0x00000000) returned with [ERROR] Server responded with an error: [3010] Authentication to ceph-gw8.gridpp.rl.ac.uk:1094 failed; all protocols have been tried.
[2024-03-19 10:37:21.937698 +0000][Dump   ][File              ] [0xbb1d00@root://golias100.farm.particle.cz:1094//dpm/farm.particle.cz//home/dteam/test/test450_2.txt?oss.asize=127979454&tpc.dlg=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.dlgon=1&tpc.key=09a620ed18271d7765f96ae1&tpc.lfn=/dteam:test450_2.txt&tpc.scgi=tpc.stage=placement&tpc.spr=root&tpc.src=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.stage=copy&tpc.tpr=root&xrdcl.intent=tpc&xrdcl.requuid=765a2c13-3867-46e2-ba68-6a04bbe9c3ea] Failing message kXR_sync (handle: 0x00000000) with [ERROR] Server responded with an error: [3010] Authentication to ceph-gw8.gridpp.rl.ac.uk:1094 failed; all protocols have been tried.
[2024-03-19 10:37:24.239583 +0000][Debug  ][XRootD            ] [dpmpool24.farm.particle.cz:22176] Handling error while processing kXR_stat (handle: 0x00000000, flags: none): [ERROR] Error response: invalid request code.
[2024-03-19 10:37:24.239594 +0000][Debug  ][ExDbgMsg          ] [dpmpool24.farm.particle.cz:22176] Calling MsgHandler: 0xbc7f20 (message: kXR_stat (handle: 0x00000000, flags: none) ) with status: [ERROR] Error response: invalid request code.
[2024-03-19 10:37:24.239608 +0000][Dump   ][File              ] [0xbb1d00@root://golias100.farm.particle.cz:1094//dpm/farm.particle.cz//home/dteam/test/test450_2.txt?oss.asize=127979454&tpc.dlg=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.dlgon=1&tpc.key=09a620ed18271d7765f96ae1&tpc.lfn=/dteam:test450_2.txt&tpc.scgi=tpc.stage=placement&tpc.spr=root&tpc.src=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.stage=copy&tpc.tpr=root&xrdcl.intent=tpc&xrdcl.requuid=765a2c13-3867-46e2-ba68-6a04bbe9c3ea] File state error encountered. Message kXR_stat (handle: 0x00000000, flags: none) returned with [ERROR] Server responded with an error: [3006] Invalid request; user not authenticated
[2024-03-19 10:37:24.239619 +0000][Error  ][File              ] [0xbb1d00@root://golias100.farm.particle.cz:1094//dpm/farm.particle.cz//home/dteam/test/test450_2.txt?oss.asize=127979454&tpc.dlg=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.dlgon=1&tpc.key=09a620ed18271d7765f96ae1&tpc.lfn=/dteam:test450_2.txt&tpc.scgi=tpc.stage=placement&tpc.spr=root&tpc.src=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.stage=copy&tpc.tpr=root&xrdcl.intent=tpc&xrdcl.requuid=765a2c13-3867-46e2-ba68-6a04bbe9c3ea] Fatal file state error. Message kXR_stat (handle: 0x00000000, flags: none) returned with [ERROR] Server responded with an error: [3006] Invalid request; user not authenticated
[2024-03-19 10:37:24.239628 +0000][Dump   ][File              ] [0xbb1d00@root://golias100.farm.particle.cz:1094//dpm/farm.particle.cz//home/dteam/test/test450_2.txt?oss.asize=127979454&tpc.dlg=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.dlgon=1&tpc.key=09a620ed18271d7765f96ae1&tpc.lfn=/dteam:test450_2.txt&tpc.scgi=tpc.stage=placement&tpc.spr=root&tpc.src=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.stage=copy&tpc.tpr=root&xrdcl.intent=tpc&xrdcl.requuid=765a2c13-3867-46e2-ba68-6a04bbe9c3ea] Failing message kXR_stat (handle: 0x00000000, flags: none) with [ERROR] Server responded with an error: [3006] Invalid request; user not authenticated
[2024-03-19 10:37:24.239705 +0000][Error  ][Utility           ] Third party copy from root://ceph-gw8.gridpp.rl.ac.uk:1094//dteam:test450_2.txt?xrdcl.intent=tpc to root://golias100.farm.particle.cz:1094//dpm/farm.particle.cz//home/dteam/test/test450_2.txt?xrdcl.intent=tpc failed: [ERROR] Server responded with an error: [3010] Authentication to ceph-gw8.gridpp.rl.ac.uk:1094 failed; all protocols have been tried.
[2024-03-19 10:37:24.294346 +0000][Debug  ][XRootD            ] [dpmpool24.farm.particle.cz:22176] Handling error while processing kXR_close (handle: 0x00000000): [ERROR] Error response: numerical argument out of domain.
[2024-03-19 10:37:24.294363 +0000][Debug  ][ExDbgMsg          ] [dpmpool24.farm.particle.cz:22176] Calling MsgHandler: 0xbc7f20 (message: kXR_close (handle: 0x00000000) ) with status: [ERROR] Error response: numerical argument out of domain.
[2024-03-19 10:37:24.294377 +0000][Debug  ][File              ] [0xbb1d00@root://golias100.farm.particle.cz:1094//dpm/farm.particle.cz//home/dteam/test/test450_2.txt?oss.asize=127979454&tpc.dlg=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.dlgon=1&tpc.key=09a620ed18271d7765f96ae1&tpc.lfn=/dteam:test450_2.txt&tpc.scgi=tpc.stage=placement&tpc.spr=root&tpc.src=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.stage=copy&tpc.tpr=root&xrdcl.intent=tpc&xrdcl.requuid=765a2c13-3867-46e2-ba68-6a04bbe9c3ea] Close returned from dpmpool24.farm.particle.cz:22176 with: [ERROR] Server responded with an error: [3019] File size mismatch (expected=127979454, actual=0)
Run: [ERROR] Server responded with an error: [3010] Authentication to ceph-gw8.gridpp.rl.ac.uk:1094 failed; all protocols have been tried.

So after all redirects, the connection is actually a pull from dpmpool24.farm.particle.cz:22176 to gw8.gridpp.rl.ac.uk:1094. I think that with all these redirections, the authentication information is getting lost somewhere.

@Jo-stfc
Copy link
Collaborator Author

Jo-stfc commented Mar 19, 2024

This looks suspicious, the ^A in the username. Maybe an encoding problem in the gridmap? screenshot

that encoded character also appears in the working transfer against svc26

@amadio
Copy link
Member

amadio commented Mar 19, 2024

Could you please also check the log of the client on dpmpool24.farm.particle.cz:22176 trying to connect to gw8.gridpp.rl.ac.uk:1094? That's where the authentication failure is happening. Please follow all the redirections and paste the very final authentication log here.

@Jo-stfc
Copy link
Collaborator Author

Jo-stfc commented Mar 19, 2024

I don't have access to those logs as they're on another site. I'll ask them and update when I have the logs

@vokac
Copy link

vokac commented Mar 20, 2024

I have almost no experience with xroot-tpc, this is in our dpmpool24 dCache 9.2.6 log file

Mar 19 11:37:21 dpmpool24.farm.particle.cz dcache@poolsDomain_dpmpool24_heppool1: 19 Mar 2024 11:37:21 (heppool1_060) [] TPC with ztn and scitokens is not yet established; this is not guaranteed to work with version 0.
Mar 19 11:37:21 dpmpool24.farm.particle.cz dcache@poolsDomain_dpmpool24_heppool1: 19 Mar 2024 11:37:21 (heppool1_060) [] Unable to complete ztn authentication to ceph-gw8.gridpp.rl.ac.uk:1094, channel d20d02b9, stream 2, session 530000007E7635007300000062000000: org.dcache.xrootd.core.XrootdException: TPC was not provided a ztn token..
Mar 19 11:37:21 dpmpool24.farm.particle.cz dcache@poolsDomain_dpmpool24_heppool1: 19 Mar 2024 11:37:21 (heppool1_060) [] Unable to complete gsi authentication to ceph-gw8.gridpp.rl.ac.uk:1094, channel d20d02b9, stream 2, session 530000007E7635007300000062000000: org.dcache.xrootd.core.XrootdException: all sender digests are unsupported: [sha256].
Mar 19 11:37:21 dpmpool24.farm.particle.cz dcache@poolsDomain_dpmpool24_heppool1: 19 Mar 2024 11:37:21 (heppool1_060) [] Exception caught on channel d20d02b9: org.dcache.xrootd.core.XrootdException: Authentication to ceph-gw8.gridpp.rl.ac.uk:1094 failed; all protocols have been tried..
Mar 19 11:37:21 dpmpool24.farm.particle.cz dcache@poolsDomain_dpmpool24_heppool1: 19 Mar 2024 11:37:21 (heppool1_060) [] Exception caught on channel d20d02b9: org.dcache.xrootd.core.XrootdException: Invalid request; user not authenticated
Mar 19 11:37:21 dpmpool24.farm.particle.cz dcache@poolsDomain_dpmpool24_heppool1: .

We use default dCache configuration when it comes to xroot protocol configuration. Our dCache have oidc plugin enabled and support access with WLCG tokens.

@vokac
Copy link

vokac commented Mar 20, 2024

I'm getting same (?) error also with dCache 8.2.36 that runs a bit closer to RAL

$ XRD_LOGLEVEL=Info gfal-copy -f root://ceph-gw8.gridpp.rl.ac.uk:1094//dteam:test450_2.txt root://mover.pp.rl.ac.uk:1094/pnfs/pp.rl.ac.uk/data/atlas/atlasdatadisk/SAM/x
...
Copying root://ceph-gw8.gridpp.rl.ac.uk:1094//dteam:test450_2.txt     2s  File size: 122MB                                                                                                                                                                                      [2024-03-20 14:03:06.421648 +0100][Error  ][File              ] [0x1c0013c0@root://mover.pp.rl.ac.uk:1094///pnfs/pp.rl.ac.uk/data/atlas/atlasdatadisk/SAM/x?oss.asize=127979454&tpc.dlg=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.dlgon=1&tpc.key=2f4257fbb91e4d0d65fade88&tpc.lfn=//dteam:test450_2.txt&tpc.scgi=tpc.stage=placement	xrd.gsiusrpxy=/tmp/x509up_u8021&tpc.spr=root&tpc.src=ceph-gw8.gridpp.rl.ac.uk:1094&tpc.stage=copy&tpc.tpr=root&xrd.gsiusrpxy=/tmp/x509up_u8021&xrdcl.intent=tpc&xrdcl.requuid=059dbcf9-d0cb-40b8-b555-18d853904042] Fatal file state error. Message kXR_sync (handle: 0x00000000) returned with [ERROR] Server responded with an error: [3010] Authentication to ceph-gw8.gridpp.rl.ac.uk:1094 failed; all protocols have been tried.
...

I also upgraded dpmpool20.farm.particle.cz at our dCache storage to the latest 9.2.15 release and error is still same.

btw: is there any advantage using xroot protocol for third-party-copy? Majority of WLCG TPC transfers are done with HTTP-TPC.

@Jo-stfc
Copy link
Collaborator Author

Jo-stfc commented Mar 20, 2024

Thanks Petr.
It rarely used but it's a valid use case that some VOs might want to use ( and we did see some prod DUNE transfers using tpc root at some point)

@amadio
Copy link
Member

amadio commented Mar 20, 2024

org.dcache.xrootd.core.XrootdException: all sender digests are unsupported: [sha256].

This seems to finally get to the real issue. It seems dCache does not support SHA256, which is the new default message digest algorithm in XRootD 5.6.x (see #2030). You may be able to work around this issue by setting it back to SHA1 (by passing option -md:sha1 in your sec.protocol gsi section of the configuration), but this is a problem that ultimately needs to be fixed in dCache.

@Jo-stfc
Copy link
Collaborator Author

Jo-stfc commented Mar 20, 2024

I've tried setting -md:sha1 and -md:sha1:md5. sha1:md5 worked

@Jo-stfc
Copy link
Collaborator Author

Jo-stfc commented Mar 20, 2024

thanks!

@amadio
Copy link
Member

amadio commented Mar 20, 2024

Alright! Closing then! Glad we could figure this one out.

@amadio amadio closed this as not planned Won't fix, can't repro, duplicate, stale Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants