Drain udp socket recv buffer without any significant delay #87

goshawk-3 · 2022-01-04T17:18:51Z

Describe what you want implemented
As it turns out that under heavy load a kadcast peer may lose chunks (respectively messages) due to full udp_recv_buffer we should consider to drain recv socket as fast as possible.

Currently, we receive a chunk from socket, try to decode full msg (a CPU-intensive task) and then receive next chunk

  let (_, remote_address) =
                socket.recv_from(&mut bytes).await.map_err(|e| {
                    error!("Error receiving from socket {}", e);
                    e
                })?;

            match Message::unmarshal_binary(&mut &bytes[..]) {
                Ok(deser) => {
                    trace!("> Received {:?}", deser);
                    let to_process = decoder.decode(deser);

Instead we could try to process/decode the received chunk asynchronously.

Pseudo idea:

loop {
 # recv_from a socket
 socket.recv_from(&mut chunk).await.map_err(|e| {
                    error!("Error receiving from socket {}", e);
                    e
                })?;

 # run fast sanity check, drop and report if invalid

 # delegate it to another task
 channel.send(chunk)
 
}

Describe "Why" this is needed
This should mitigate the pressure on OS-level udp_recv_buffer and give us more control over chunk buffering. That's said, we'll be able to trace/report when a chunk is lost, resize buffer/chan size dynamically if needed.

Describe alternatives you've considered
In addition to this idea, a kadcast peer will be able to set udp_recv_buffer at socket level (SO_RCVBUF). However this might be limited due to OS-level net.core.rmem_max config.

Additional context
Currently, a system-test with following params CLUSTER_SIZE=10 MSG_SIZE=1000000 WAIT=5s go test -count=10 -tags=testbed -run TestCluster on Linux will pass only if net.core.rmem_default is set to a large number.

The text was updated successfully, but these errors were encountered:

goshawk-3 added status:minor Low priority improvements type:enhancement Issues concerning code or feature improvement (performance, refactoring, etc) mark:testnet labels Jan 4, 2022

goshawk-3 changed the title ~~Drain udp socket recv buffer without any significant delays~~ Drain udp socket recv buffer without any significant delay Jan 4, 2022

goshawk-3 self-assigned this Jan 5, 2022

goshawk-3 mentioned this issue Jan 5, 2022

Handle and try to decode chunks in a separate tokio task #89

Merged

goshawk-3 closed this as completed in #89 Jan 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drain udp socket recv buffer without any significant delay #87

Drain udp socket recv buffer without any significant delay #87

goshawk-3 commented Jan 4, 2022

Drain udp socket recv buffer without any significant delay #87

Drain udp socket recv buffer without any significant delay #87

Comments

goshawk-3 commented Jan 4, 2022