-
-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process workflow logs in batches #4045
Conversation
in itselfe looks god, we just a:
|
Thank you for the review. Line numbers aren't affected as log records are simply passed on unchanged in the same order as they were previously. The queue should indeed be drained on shutdown, thanks for catching this. Should be fixed now. (As far as GRPC connection lifetime permits — I did not change it and don't think it should be touched by this PR). |
The second commit depends on the first one and is more of a demo at the moment, to show that the protocol breakage is worth it. This job with 1 million lines of output: when:
event: push
steps:
logs:
image: ubuntu:24.04
commands:
- seq 1 1000000 finishes in 18-20 seconds when writing to disk, and in 5.5 minutes when writing to PostgreSQL. IIRC, Drone shows better performance for the latter case by saving the entire massive log blob as one record. Probably not something the project is interested in. |
yes drone let the agent stream the log and use that stream just for live logs and then at the end send the whole log again as blob to be stored. We want to have a single stream to both store and view the logs at the same time ... as it ensures that what you see is what is actually happening and later on you still can ensure that all seen logs are stored also the same. In the end your pull finish what i tested and then had in mind at #2072 but never go to implement it ... |
well it does not enhance the storing logs part... but that's as you wrote up for the next pull ... |
I misunderstood this: the log lines were indeed getting reordered, but only for live streaming web clients. It should be fixed now, but with limited Go knowledge I only came up with a solution that blocks in While this shouldn't happen much in practice (each channel stores up to 100 seconds of logs, or up to about 25 megabytes of text, whichever limit comes first), it's still a limitation that can probably be removed. If you're not happy about it -- I'll try and explore some more.
It solves the problem for file storage, which is likely already about as fast as it's going to get; less so for db. It's not taking hours anymore, but saving only 12 megabytes of logs per minute is still not ideal. I'm not sure what else can be done about it tbh, besides increasing granularity from "1 log line = 1 db row" to "full logs for 1 step = 1 db row" (like Drone does), or "logs printed at [time XX truncated to the second] = 1 db row". |
Speeding up the saving of log entries to the database could be done by inserting batches directly (https://xorm.io/docs/chapter-04/readme/) and if still necessary dropping the primary id auto increment and replacing it with sth like uuid and so on. However before adjusting any furthercode parts it would be awesome if we to do some profiling to determine the actual bottlenecks to not complicate the code in unrelated parts. |
Sure, records are already saved in batches, regardless of which type of storage is used. Everything but the database storage is relatively efficient (if file storage is used, the server can process ~50k log rows/second compared to ~100-150 rows/second on |
Tearing down https://woodpecker-ci-woodpecker-pr-4045.surge.sh |
if you need help to resolve the conflict just ask :) |
No problem, I just haven't had the time lately. FWIW, I've been using this in production for the past few weeks, and it at least works well… the part that pushes updates to web clients is less than ideal, though. |
beside this two nits it looks good to go 🎉 |
thanks for the awesome work!!! |
@hg if you want you could join the maintainers group and or get some swag :) |
Not sure if I've earned it yet, but if other maintainers don't mind, why not. I don't have a lot of time right now, but I should be able to start working on the project in a week or so. As for the swag, I live very far away from both the US and Europe, and shipping will cost more than the swag itself, but thank you for the offer. |
In this case you might DM me via matrix https://matrix.to/#/@marddl:obermui.de if you find a timeslot, so i can onbord u. If you dont have an account, just use the matrix.org to create one ... |
Hopefully this removes the
firstbottleneck.The benchmark mentioned in the issue now finishes in 2 seconds because its output fits into one batch. Anything that spills over into more batches hangs waiting for the server to write out the previous ones.edit: original description, not relevant anymore.
Closes #3999
Ref: #2072
( Close #2064 )