Fixing performance issues and out-of-order packets #916

urbanishimwe · 2021-04-11T21:29:37Z

Reducing CPU context switching and number of goroutines.

Packet capture and packet processing now use only two goroutines which helps to minimize CPU context switches. Spawning too many goroutines is harmful here.

Optimized packet capture - allocated memory only when required, and only for data which is used

Using ZeroCopy methods from libpcap library to avoid unnecessary allocations. Now memory gets allocated ONLY for the valid packets, and only for the packets which have the data. E.g. no SYN/FIN packets are used now. Additionally we now use sync.Pool for re-using packet objects, which helps to re-use already allocated memory.

Simplification and optimization of request/response detection

There is no SYN/FIN packets anymore etc. Now only packet payload is used to detect start and end of the packet. More over payload detection now does not require generating a total “message” buffer, and works with individual packet payloads.

Message payloads now concatenated from packets only in the end when message is dispatched. Also, before checking if message is complete, added additional check if all received packets in the valid order, e.g. if their SEQ is valid, and no packets are missing.

Reworked chunked encoding validation, and now it does not need expensive operation of re-calculating all the chunks. Now it “trust” that client gives valid chunk body, check if packets are in the right order (e.g. SEQ match), and checks if message ends with the right suffix. All is done with 0 allocations.

Parsing all Headers using proto.GetHeaders was proved to be very slow. Now we only parse the headers we need(and do it only once).

Packets gets matched together using ACK, which on high RPS removed chances of duplicating IDs. Additionally, even if packets are received out of order, now it will properly sort them, before dispatching the message.

Changes in ID generation algorithm

Message ID generation and relations between request and response IDs is fully rewritten. Responses now do not have to lookup for request data in order to get the same ID. ID no rely on the fact that SEQ of the first packet of the response should be the same as ACK of the request. If previously Message ID contained random values, like current timestamp, now it has a consistent algorithm which is based on TCP stream id (SrcPort + DstPort + SrcIP/DstIP) and current ACK/SEQ number (to distinguish multiple messages within the same stream).

BPF filter optimizations

When tracking response it now uses a more accurate BPF rule to filter only needed traffic.

Misc

The packet code is now fully moved to tcp/Packet, so packet processing done only once in one place.

TCP output now has a 5 second timeout, and has a proper Close method.

Fully switching to go modules and removing vendoring.

tcp/tcp_message.go

buger · 2021-04-12T08:59:36Z

Since there are a lot of mutex usage now, maybe think about using https://github.com/cornelk/hashmap instead.

See the difference:

BenchmarkReadHashMapWithWritesUint-8      	  200000	      8395 ns/op
BenchmarkReadGoMapWithWritesUintMutex-8   	   10000	    143793 ns/op

buger · 2021-04-12T09:07:53Z

The main thing to ensure here is that packet "capture" happens in one thread, and packet "processing" in another.
And actually if we ensure that all packet processing is inside "single" goroutine, then we do not need to put mutexes around maps. And "slower" performance processing goroutine will not affect that much "capture" goroutine.

houndci-bot · 2021-04-29T15:56:08Z

tcp/tcp_message.go

+	}
+}
+
+func (parser *MessageParser) Close() error {


exported method MessageParser.Close should have comment or be unexported

tcp/tcp_message.go

urbanishimwe

this naming makes more sense, thanks!

proto/proto.go

Too smart, and can cause to unexpected errors.

urbanishimwe · 2021-05-03T07:41:09Z

proto/proto.go

-// Message is an interface used to provide feedback or store dummy data for future use
-type Message interface {
+// Message is an interface used to provide protocol state or store dummy data for future use
+type ProtocolStateSetter interface {


Since this interface has both setter(SetProtocolState) and getter(ProtocolState), it may be called ProtocolState

Should have way better performance, and can fix a few bugs

houndci-bot · 2021-05-03T18:41:55Z

proto/proto.go

@@ -19,6 +19,7 @@ package proto
 import (
 	"bufio"
 	"bytes"
+	_ "fmt"


a blank import should be only in a main or test package, or have a comment justifying it

houndci-bot · 2021-05-17T18:11:25Z

output_tcp.go

@@ -129,3 +143,7 @@ func (o *TCPOutput) connect(address string) (conn net.Conn, err error) {
 func (o *TCPOutput) String() string {
 	return fmt.Sprintf("TCP output %s, limit: %d", o.address, o.limit)
 }
+
+func (o *TCPOutput) Close() {


exported method TCPOutput.Close should have comment or be unexported

houndci-bot · 2021-05-17T18:11:25Z

proto/proto.go

+			state.headerStart = MIMEHeadersStartPos(data)
+			if state.headerStart < 0 {
+				return false
+			} else {


if block ends with a return statement, so drop this else and outdent its block

houndci-bot · 2021-05-17T18:11:25Z

output_tcp.go

@@ -129,3 +143,7 @@ func (o *TCPOutput) connect(address string) (conn net.Conn, err error) {
 func (o *TCPOutput) String() string {
 	return fmt.Sprintf("TCP output %s, limit: %d", o.address, o.limit)
 }
+
+func (o *TCPOutput) Close() {


exported method TCPOutput.Close should have comment or be unexported

houndci-bot · 2021-05-17T18:11:25Z

proto/proto.go

+			state.headerStart = MIMEHeadersStartPos(data)
+			if state.headerStart < 0 {
+				return false
+			} else {


if block ends with a return statement, so drop this else and outdent its block

houndci-bot · 2021-05-17T18:11:26Z

tcp/tcp_message.go

+	parser.Emit(m)
+}
+
+func (parser *MessageParser) Emit(m *Message) {


exported method MessageParser.Emit should have comment or be unexported

houndci-bot · 2021-05-17T18:11:26Z

tcp/tcp_message.go

@@ -117,171 +157,176 @@ func (m *Message) Sort() {
 	sort.SliceStable(m.packets, func(i, j int) bool { return m.packets[i].Seq < m.packets[j].Seq })
 }

-// Handler message handler
-type Handler func(*Message)
+func (m *Message) Finalize() {


exported method Message.Finalize should have comment or be unexported

houndci-bot · 2021-05-17T18:11:26Z

tcp/tcp_message.go

+	return false
+}
+
+func (m *Message) PacketData() [][]byte {


exported method Message.PacketData should have comment or be unexported

houndci-bot · 2021-05-17T18:11:26Z

tcp/tcp_message.go

 }

 // Packets returns packets of the message
 func (m *Message) Packets() []*Packet {
 	return m.packets
 }

+func (m *Message) MissingChunk() bool {


exported method Message.MissingChunk should have comment or be unexported

houndci-bot · 2021-05-17T18:11:26Z

tcp/tcp_packet.go

 	return
 }

+func (pckt *Packet) MessageID() uint64 {


exported method Packet.MessageID should have comment or be unexported

### Reducing CPU context switching and number of goroutines. Packet capture and packet processing now use only two goroutines which helps to minimize CPU context switches. Spawning too many goroutines is harmful here. ### Optimized packet capture - allocated memory only when required, and only for data which is used Using ZeroCopy methods from libpcap library to avoid unnecessary allocations. Now memory gets allocated ONLY for the valid packets, and only for the packets which have the data. E.g. no SYN/FIN packets are used now. Additionally we now use `sync.Pool` for re-using packet objects, which helps to re-use already allocated memory. ### Simplification and optimization of request/response detection There is no SYN/FIN packets anymore etc. Now only packet payload is used to detect start and end of the packet. More over payload detection now does not require generating a total “message” buffer, and works with individual packet payloads. Message payloads now concatenated from packets only in the end when message is dispatched. Also, before checking if message is complete, added additional check if all received packets in the valid order, e.g. if their SEQ is valid, and no packets are missing. Reworked chunked encoding validation, and now it does not need expensive operation of re-calculating all the chunks. Now it “trust” that client gives valid chunk body, check if packets are in the right order (e.g. SEQ match), and checks if message ends with the right suffix. All is done with 0 allocations. Parsing all Headers using `proto.GetHeaders` was proved to be very slow. Now we only parse the headers we need(and do it only once). Packets gets matched together using ACK, which on high RPS removed chances of duplicating IDs. Additionally, even if packets are received out of order, now it will properly sort them, before dispatching the message. ### Changes in ID generation algorithm Message ID generation and relations between request and response IDs is fully rewritten. Responses now do not have to lookup for request data in order to get the same ID. ID no rely on the fact that SEQ of the first packet of the response should be the same as ACK of the request. If previously Message ID contained random values, like current timestamp, now it has a consistent algorithm which is based on TCP stream id (SrcPort + DstPort + SrcIP/DstIP) and current ACK/SEQ number (to distinguish multiple messages within the same stream). ### BPF filter optimizations When tracking response it now uses a more accurate BPF rule to filter only needed traffic. ### Misc The packet code is now fully moved to tcp/Packet, so packet processing done only once in one place. TCP output now has a 5 second timeout, and has a proper Close method. Fully switching to go modules and removing vendoring.

fix hasChunked

007dcb4

houndci-bot reviewed Apr 11, 2021

View reviewed changes

tcp/tcp_message.go Outdated Show resolved Hide resolved

buger reviewed Apr 12, 2021

View reviewed changes

tcp/tcp_message.go Outdated Show resolved Hide resolved

urbanishimwe requested a review from buger April 12, 2021 09:52

urbanishimwe force-pushed the tcp-timer branch 2 times, most recently from 00a86fd to 6ea7244 Compare April 25, 2021 19:04

global message timer

6cb06f9

urbanishimwe force-pushed the tcp-timer branch from 6ea7244 to 6cb06f9 Compare April 25, 2021 19:05

Refactoring

257d4d3

houndci-bot reviewed Apr 29, 2021

View reviewed changes

urbanishimwe commented Apr 30, 2021

View reviewed changes

Refactoring/renaming

fcad715

houndci-bot reviewed May 2, 2021

View reviewed changes

proto/proto.go Outdated Show resolved Hide resolved

houndci-bot reviewed May 2, 2021

View reviewed changes

proto/proto.go Outdated Show resolved Hide resolved

houndci-bot reviewed May 2, 2021

View reviewed changes

proto/proto.go Outdated Show resolved Hide resolved

proto/proto.go Outdated Show resolved Hide resolved

buger added 2 commits May 2, 2021 21:17

Fix comment

c641e62

Remove variable ticker functionality

01f5a1c

Too smart, and can cause to unexpected errors.

urbanishimwe commented May 3, 2021

View reviewed changes

buger added 2 commits May 3, 2021 14:34

Do not use empty bpf filter when tracking response

c7d0473

Should have way better performance, and can fix a few bugs

Add handling for out of order packets

32c005a

houndci-bot reviewed May 3, 2021

View reviewed changes

buger changed the title ~~global message timer and improved proto.HasFullPayload~~ Fixing performance issues and out-of-order packets May 3, 2021

Updates

a940bad

houndci-bot reviewed May 17, 2021

View reviewed changes

buger added 2 commits May 18, 2021 13:48

Remove memory limits

139441b

Remove debugging

0198847

buger merged commit e74e945 into master May 19, 2021

buger deleted the tcp-timer branch May 19, 2021 17:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing performance issues and out-of-order packets #916

Fixing performance issues and out-of-order packets #916

urbanishimwe commented Apr 11, 2021 •

edited by buger

Loading

buger commented Apr 12, 2021 •

edited

Loading

buger commented Apr 12, 2021

houndci-bot Apr 29, 2021

urbanishimwe left a comment

urbanishimwe May 3, 2021

houndci-bot May 3, 2021

houndci-bot May 17, 2021

houndci-bot May 17, 2021

houndci-bot May 17, 2021

houndci-bot May 17, 2021

houndci-bot May 17, 2021

houndci-bot May 17, 2021

houndci-bot May 17, 2021

houndci-bot May 17, 2021

houndci-bot May 17, 2021

Fixing performance issues and out-of-order packets #916

Fixing performance issues and out-of-order packets #916

Conversation

urbanishimwe commented Apr 11, 2021 • edited by buger Loading

Reducing CPU context switching and number of goroutines.

Optimized packet capture - allocated memory only when required, and only for data which is used

Simplification and optimization of request/response detection

Changes in ID generation algorithm

BPF filter optimizations

Misc

buger commented Apr 12, 2021 • edited Loading

buger commented Apr 12, 2021

Choose a reason for hiding this comment

urbanishimwe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

urbanishimwe commented Apr 11, 2021 •

edited by buger

Loading

buger commented Apr 12, 2021 •

edited

Loading