Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

activitylogs eventually fails to persists activities with "nats: maximum payload exceeded" #9462

Closed
aduffeck opened this issue Jun 25, 2024 · 12 comments · Fixed by #10377
Closed
Assignees
Labels
Priority:p2-high Escalation, on top of current planning, release blocker Type:Bug
Milestone

Comments

@aduffeck
Copy link
Contributor

Describe the bug

The activitylog service eventually runs into the problem that it can no longer persist new activities when the maximum payload size of the underlying store is reached. The space root is the first resource to run into that problem as events bubble up the tree to the root.

Steps to reproduce

  1. Upload a few thousand files into a space. For me the problem started happening on the space root after around 7.5k files across ~10 directories.

Expected behavior

New activities should be persisted.

Actual behavior

New activities are no longer persisted, the log shows errors like

2024-06-25T10:38:20+02:00 ERR could not process event error="could not store activity: Failed to store data in bucket 'ON2G64TBM5SS25LTMVZHGLJRERZW63LFFVQWI3LJNYWXK43FOIWWSZBNGAYDAMBNGAYDAMBQGAYDAMBQGAYCC43PNVSS2YLENVUW4LLVONSXELLJMQWTAMBQGAWTAMBQGAYDAMBQGAYDAMA=': nats: maximum payload exceeded" event={"Event":{"ExecutingUser":{"id":{"idp":"https://localhost:9200","opaque_id":"some-admin-user-id-0000-000000000000"},"username":"admin"},"Failed":false,"FileRef":{"path":"./500 files (4)/random148.file","resource_id":{"opaque_id":"some-admin-user-id-0000-000000000000","space_id":"some-admin-user-id-0000-000000000000","storage_id":"storage-users-1"}},"Filename":"random148.file","SpaceOwner":{"idp":"https://localhost:9200","opaque_id":"some-admin-user-id-0000-000000000000","type":1},"Timestamp":{"nanos":342093138,"seconds":1719304692},"UploadID":"5f623527-3916-4315-8493-0510d256018f"},"ID":"74da64f1-9366-4f7f-a319-42cac2a67a6f","InitiatorID":"2118ff3e-a9ef-4b20-94ca-7f9a02183bc7","TraceParent":"00-a2854b4fe336e07ce2aa74fd64e65356-0dbd7a77379fe10a-00","Type":"events.UploadReady"} line=/home/andre/src/owncloud/ocis/services/activitylog/pkg/service/service.go:106 service=activitylog
@aduffeck
Copy link
Contributor Author

We just looked into it briefly in zoom, apparently the default max. payload size in nats is 1MB and could be increased to 64MB but it's still a limit that could eventually be reached.
Another option would be to limit the number of stored events per resource in a FIFO fashion.

@nmagill123
Copy link

@aduffeck
What is a good solution here for about 1.5 TB of files with a huge directory tree? Ran into this issue trying to upload 100GB of files with the s3ng backend and got:

ERR could not process event error="could not store activity: Failed to store data in bucket 'HBRTIYJRMNSDALJVHE3WGLJUMI2TALJZHE2GCLLGGFSTKODBMNRDSNLEMUSGGYTBHAYWIMBWFUYTEMTBFU2DGYJQFU4TGMZSFU2DCOBXGAYDEN3FGFQWEIJUGJRWCZRTGBSS2N3GHFRC2NBWGRSC2OJTG43C2NDFMFRDKNTGMQ2GGYZZ': nats: maximum payload exceeded" event={"Event":{"ExecutingUser":{"id":{"idp":"h

I am guessing this is an s3 specific error?

@Kartoffelbauer
Copy link

Kartoffelbauer commented Sep 26, 2024

@aduffeck

We just looked into it briefly in zoom, apparently the default max. payload size in nats is 1MB and could be increased to 64MB but it's still a limit that could eventually be reached. Another option would be to limit the number of stored events per resource in a FIFO fashion.

Is there an option to change this value via an environment variable or the yml configuration? Looked into the nats service docs, but did not find anything.

@butonic butonic moved this from Qualification to Prio 2 in Infinite Scale Team Board Oct 4, 2024
@jvillafanez
Copy link
Member

Very likely related #10173

@Kartoffelbauer
Copy link

I have also noticed that this problem seems to cause OCIS to use a lot more system resources than normal. If the ~7.5k file limit is survirly passed, OCIS will permanently use a lot more CPU and Disk time trying to persist the activities (while also causing huge log build-ups).

For reference, I'm running a small installation on a raspberry pi 4, when the problem starts to occur (no uploading/online users) OCIS will permanently ramp up to about 45% CPU time and will permanently write to disk at about 12MB/s. The last part is particularly bad for the TTL of an SSD.

This makes OCIS in its current state not really usable for small setups like mine, as it slows down the whole machine and wears out the SSD in the process.

@micbar
Copy link
Contributor

micbar commented Oct 16, 2024

@butonic @kobergj

Is is possible to remove old entries when new ones are added? That would be a quick solution.

@kobergj
Copy link
Collaborator

kobergj commented Oct 16, 2024

Yes. That would be my first approach too.

@dragotin
Copy link
Contributor

Other alternative: We persist "old" activities to a (hidden) file in the file system, which can be displayed separately.

@kobergj
Copy link
Collaborator

kobergj commented Oct 16, 2024

@Kartoffelbauer if you don't necessarily need activities, you can just disable activitylog service (OCIS_EXCLUDE_RUN_SERVICES=activitylog). This will disable the activities but should get rid of the hight cpu and discwrites.

@kobergj
Copy link
Collaborator

kobergj commented Oct 16, 2024

Other alternative: We persist "old" activities to a (hidden) file in the file system, which can be displayed separately.

They are stored in nats-js. So we would save "old" activities also to nats. At the moment we use the fileID as key. We could just extend that like fileID-1, fileID-2, ...
On read we could list all keys fileID-* and append all activities. Downside would be that this is a significant larger amount of logic.

In my opinion it is good enough to just drop "old" activities. But other option is feasible too.

Which way should we go? @micbar @dragotin

@jvillafanez
Copy link
Member

On read we could list all keys fileID-* and append all activities. Downside would be that this is a significant larger amount of logic.

Consider also the performance because listing is usually expensive and not recommended on key-value stores. I'm talking mainly for redis, but it's possible that nats-js has similar limitations, so we should probably check this for all the officially supported stores.

@micbar micbar added the Priority:p2-high Escalation, on top of current planning, release blocker label Oct 21, 2024
@micbar
Copy link
Contributor

micbar commented Oct 21, 2024

@kobergj We agree on purging the old entries for now.

@kobergj kobergj self-assigned this Oct 21, 2024
@kobergj kobergj moved this from Prio 2 to In progress in Infinite Scale Team Board Oct 21, 2024
@micbar micbar added this to the Release 7.0.0 milestone Oct 21, 2024
@github-project-automation github-project-automation bot moved this from In progress to Done in Infinite Scale Team Board Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority:p2-high Escalation, on top of current planning, release blocker Type:Bug
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

7 participants