Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review process to abort UDP request when the ring buffer is full #842

Closed
josecelano opened this issue May 7, 2024 · 4 comments · Fixed by #846
Closed

Review process to abort UDP request when the ring buffer is full #842

josecelano opened this issue May 7, 2024 · 4 comments · Fixed by #846
Assignees
Labels
Question / Discussion Community Feedback
Milestone

Comments

@josecelano
Copy link
Member

Relates to: #830

    async fn run_udp_server(tracker: Arc<Tracker>, socket: Arc<UdpSocket>) {
        let tracker = tracker.clone();
        let socket = socket.clone();

        let reqs = &mut ActiveRequests::default();

        // Main Waiting Loop, awaits on async [`receive_request`].
        loop {
            if let Some(h) = reqs.rb.push_overwrite(
                Self::spawn_request_processor(Self::receive_request(socket.clone()).await, tracker.clone(), socket.clone())
                    .abort_handle(),
            ) {
                if !h.is_finished() {
                    // the task is still running, lets yield and give it a chance to flush.
                    tokio::task::yield_now().await;

                    h.abort();

                    let server_socket_addr = socket.local_addr().expect("Could not get local_addr for socket.");

                    tracing::span!(
                        target: "UDP TRACKER",
                        tracing::Level::WARN, "request-aborted", server_socket_addr = %server_socket_addr);
                }
            }
        }
    }

We are using the push_overwrite method.

image

That method overwrites the latest item, not the oldest. Does that make sense in our case? Would not be better if we overwrite the oldest request? That's the one that has had a longer time to be processed.

cc @da2ce7

@josecelano josecelano changed the title Review process to abort UDP request when the ringbuffer is full Review process to abort UDP request when the ring buffer is full May 7, 2024
@josecelano josecelano added the Question / Discussion Community Feedback label May 7, 2024
@josecelano josecelano added this to the v3.0.0 milestone May 7, 2024
@da2ce7
Copy link
Contributor

da2ce7 commented May 7, 2024

@josecelano I think that you are correct:

da2ce7@ff9b43d

This commit should fix the problem. However now I don't see any advantage of using the ring-buffer over a standard vector. Since we are not inserting-or-removing entries from the buffer, but just swapping them for new entries.

@josecelano
Copy link
Member Author

@josecelano I think that you are correct:

da2ce7@ff9b43d

This commit should fix the problem. However now I don't see any advantage of using the ring-buffer over a standard vector. Since we are not inserting-or-removing entries from the buffer, but just swapping them for new entries.

Hi @da2ce7, after emerging the issue:

I restarted the live demo. Now, I can get this graph with Datadog by parsing the "request-aborted" lines in the logs.

image

Regarding your patch, I guess now the socket acts as a buffer for all incoming request, and we only handle them when the previous ones have finished. I suppose that will consume more memory, but It's a trade-off between rejecting the requests or consuming more memory and increasing the response time.

NOTE: Remember, we still don't have a timeout for processing the UDP requests.

I have to think about this, but I prefer directly controlling that buffer in the application. I mean, I would read all incoming packets as soon as possible and keep those pending tasks in the app. That way, we could implement policies to handle the overload. For example, we could:

  • Reject new requests when the buffer is full. With the current solution, we don't know when it is full (I guess so) or if we did it would be less readable.
  • Ignore new requests when the buffer is full. We don't even send a response in the current version. That doesn't have to be bad. Maybe it's also a way to save bandwidth, and with UDP, the client shouldn't have a problem with that.

Maybe a more common solution would be more straightforward for newcomers to understand. For example, we could have a pool of workers processing requests. The main loop could get all incoming requests from the socket and send them to the workers by using a channel. The channel would be the buffer. Maybe under the hood, this implementation is the same, but it has two advantages:

  1. The app controls the buffer. We can probably get info from the socket buffer, but I guess the API is going to be harder to understand.
  2. The worker's pattern is well-known and documented. ChatGPT can create the code for that :-) and can explain to other developers.

Finally, limiting the number of active requests with the current version (without your new patch) is a way of limiting resources. In this case, we limit CPU usage (not even sending a "busy" response to clients) and memory usage because task processors consume memory. I'm not sure if we should limit the resources without providing a way to monitor the actual server load. I mean, this limitation was probably hiding an overload problem. I have been monitoring the memory consumption in the demo, and I didn't see this problem (we were rejecting a lot of requests). I only saw a CPU problem when it was caused by the swapping. In conclusion, I think we should disable all limits (your new path disables them, unless there is a limit for the socket buffer) and monitor the servers to scale up. If we want to introduce some kind of limitations we have to think how to monitor that limitation so sysadmins can detect overload.

I've been discussing these limitations here.

I will merge your patch for the time being.

@da2ce7
Copy link
Contributor

da2ce7 commented May 8, 2024

@josecelano there is was a bug in my patch that enabled the test to fail, now I yield before I return then new "filled" ring-buffer.

Here is the updated patch:

da2ce7@d030030

@josecelano josecelano linked a pull request May 8, 2024 that will close this issue
@josecelano
Copy link
Member Author

Just for the record, this is the Datadog dashboard where I put the number of handled requests (below) and the number of aborted requests (top)

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Question / Discussion Community Feedback
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants