Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload failures due to locking issue #5024

Closed
rhafer opened this issue Nov 10, 2022 · 6 comments
Closed

Upload failures due to locking issue #5024

rhafer opened this issue Nov 10, 2022 · 6 comments
Labels
Priority:p1-urgent Consider a hotfix release with only that fix Status:Needs-Info Type:Bug

Comments

@rhafer
Copy link
Contributor

rhafer commented Nov 10, 2022

Describe the bug

In a test environment where I placed the filesystem for decomposedfs on a remote NFS volume I see quite a few locking issue when uploading a small folder hierarchy (3 levels deep, about 500MB, 550 files in 300 directories) via the WebUI. This seems to be caused by lock contention (like we already saw before):

[tusd] 2022/11/10 12:12:25.692423 event="ResponseOutgoing" status="500" method="PATCH" path="/d4341f00-705f-43ed-b9e0-4629f8d631ce" error="Decomposedfs: could not set extended attribute: xattrs: Can not acquire write log: unable to acquire a lock on the file" requestId="" 

This was when testing with v2.0.0-rc.1 the behavior is much better when compliing against latest reva/edge (because cs3org/reva#3397 improved the locking behavior). So it might actually be a non-issue 🤞 once we bumped reva for the next RC.

An easy way to reproduce this, seem to be to create a deep folder structure, with a couple of files at the deepest level and upload that folder via the webui (use some NFS share for ocis strorage-users):

mkdir -p $(for i in $(seq -w 1 100); do echo -n level_$i/; done)
cd  $(for i in $(seq -w 1 100); do echo -n level_$i/; done)
for i in $(seq -w 1 1000); do echo $i > file_$i; done
@micbar
Copy link
Contributor

micbar commented Nov 15, 2022

@rhafer Can you update this ticket with your findings?

@micbar micbar added this to the 2.0.0 General Availability milestone Nov 15, 2022
@micbar micbar added Priority:p2-high Escalation, on top of current planning, release blocker GA-Blocker Priority:p1-urgent Consider a hotfix release with only that fix and removed Priority:p2-high Escalation, on top of current planning, release blocker labels Nov 15, 2022
@butonic
Copy link
Member

butonic commented Nov 23, 2022

with #5061 we no longer calculate the dir size, which also reduces the amount of readlocks we have to create.

@rhafer
Copy link
Contributor Author

rhafer commented Nov 24, 2022

I can confirm that the behavior with latest master is a lot better. But I am still able getting the locking error. When uploading 1000 files (in a single directory) via the web frontend. (on an ocis server backed by nfs storage):

tusd] 2022/11/24 14:08:09.139959 event="ResponseOutgoing" status="500" method="PATCH" path="/d6979c17-043c-4cd5-9945-7d40bbe23234" error="xattrs: Can not acquire write log: unable to acquire a lock on the file" requestId="" 
{"level":"error","service":"storage-users","pkg":"rhttp","traceid":"00000000000000000000000000000000","host":"127.0.0.1","method":"PATCH","uri":"/data/tus/d6979c17-043c-4cd5-9945-7d40bbe23234","url":"/data/tus/d6979c17-043c-4cd5-9945-7d40bbe23234","proto":"HTTP/1.1","status":500,"size":72,"start":"24/Nov/2022:14:08:07 +0000","end":"24/Nov/2022:14:08:09 +0000","time_ns":1246257073,"time":"2022-11-24T14:08:09.139993501Z","message":"http"}
{"level":"error","service":"frontend","pkg":"rhttp","traceid":"00000000000000000000000000000000","host":"127.0.0.1","method":"PATCH","uri":"/data/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNjY5Mzg1Mjg3LCJpYXQiOjE2NjkyOTg4ODcsInRhcmdldCI6Imh0dHA6Ly9sb2NhbGhvc3Q6OTE1OC9kYXRhL3R1cy9kNjk3OWMxNy0wNDNjLTRjZDUtOTk0NS03ZDQwYmJlMjMyMzQifQ.48NOZ1qImhwFM8tM363m5Tc3XWRmQym6lKm4q_vKdc8","url":"/data/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNjY5Mzg1Mjg3LCJpYXQiOjE2NjkyOTg4ODcsInRhcmdldCI6Imh0dHA6Ly9sb2NhbGhvc3Q6OTE1OC9kYXRhL3R1cy9kNjk3OWMxNy0wNDNjLTRjZDUtOTk0NS03ZDQwYmJlMjMyMzQifQ.48NOZ1qImhwFM8tM363m5Tc3XWRmQym6lKm4q_vKdc8","proto":"HTTP/1.1","status":500,"size":0,"start":"24/Nov/2022:14:08:07 +0000","end":"24/Nov/2022:14:08:09 +0000","time_ns":1246850458,"time":"2022-11-24T14:08:09.140264525Z","message":"http"}

This happens a lot less often now. For rc1 I get about 50 failed uploads, for latest master it's 0-3 failed uploads.

@rhafer
Copy link
Contributor Author

rhafer commented Nov 24, 2022

I've been experimenting a bit with the locking timeout and retry values in https://github.com/cs3org/reva/blob/edge/pkg/storage/utils/filelocks/filelocks.go as the default of 3ms seemed a bit low. Increasing that timeout helped a lot. I tried with 30ms (not sure if that value makes any good sense) and didn't see lock failures anymore.

It might make sense to add configuration knobs for the filelocking timeout and retries. (I assume the behavior will depend a lot on the underlying storage)

@butonic
Copy link
Member

butonic commented Nov 25, 2022

we made the lock cycle duration factor configurable and increased it in cs3org/reva#3493

@micbar
Copy link
Contributor

micbar commented Nov 25, 2022

Reva update done.

@micbar micbar closed this as completed Nov 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority:p1-urgent Consider a hotfix release with only that fix Status:Needs-Info Type:Bug
Projects
Archived in project
Development

No branches or pull requests

3 participants