Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: agent - eBPF Fix TCP DNS client request loss #9218

Merged
merged 1 commit into from
Feb 28, 2025
Merged

Conversation

yinjiping
Copy link
Contributor

Note that TCP DNS adds two length bytes at the beginning of the protocol, whereas UDP DNS does not. We need to handle this properly to ensure that these two length bytes are not sent to the upper layer.

When receiving data, the client does not first receive two bytes but instead receives everything at once; whereas the server receives two bytes (length) first and then receives the remaining bytes.

When the client sends a request, it combines both 'A' and 'AAAA' type queries into a single request to the CoreDNS server. The first two bytes represent the length, but this length only includes the 'A' query, not the combined length of both the 'A' and 'AAAA' queries (the total size is referred to as "count" here). As a result, the length check may miss this case.

This fixes the issue where the client fails to retrieve the request by adding direction judgment.

This PR is for:

  • Agent

Affected branches

  • main
  • v6.6
  • v6.5
  • v6.4

Note that TCP DNS adds two length bytes at the beginning of the protocol,
whereas UDP DNS does not. We need to handle this properly to ensure that
these two length bytes are not sent to the upper layer.

When receiving data, the client does not first receive two bytes but instead
receives everything at once; whereas the server receives two bytes (length) first
and then receives the remaining bytes.

When the client sends a request, it combines both 'A' and 'AAAA'
type queries into a single request to the CoreDNS server. The first
two bytes represent the length, but this length only includes the
'A' query, not the combined length of both the 'A' and 'AAAA' queries
(the total size is referred to as "count" here). As a result, the
length check may miss this case.

This fixes the issue where the client fails to retrieve the request by adding direction judgment.
@yinjiping
Copy link
Contributor Author

下面是客户端 wrk:worker_0 和服务端 coredns 的DNS通信行为。

  • 客户端发送请求(注意是AAAAA类型的请求一起发送的):
  • --------------------------------- +
    2025-02-27 16:38:10.643777 [datadump] SEQ 2790 DIR out TYPE unknown(7) PID 61039 THREAD_ID 61096 COROUTINE_ID 0 ROLE client CONTAINER_ID 824004fd7683479932062c03052e530ee3a97ce3b67d5466f53e36d8630caaa3 SOURCE 0 COMM wrk:worker_0 TCP 172.18.144.157.48170 > 172.18.144.156.53 LEN 140 SYSCALL_LEN 140 SOCKET_ID 737982394291899595 TRACE_ID 0 TCP_SEQ 3441319224 DATA_SEQ 0 TLS false KernCapTime 2025-02-27 16:38:10.612108 KernMonoTime 18126903250837 us
    00 44(D) B1 58(X) 01 00 00 01 00 00 00 00 00 00 07 72(r) 65(e) 76(v) 69(i) 65(e) 77(w) 73(s) 18 64(d) 65(e) 65(e) 70(p) 66(f) 6C(l) 6F(o) 77(w) 2D(-) 65(e) 62(b) 70(p) 66(f) 2D(-) 69(i) 73(s) 74(t) 69(i) 6F(o) 2D(-) 64(d) 65(e) 6D(m) 6F(o) 03 73(s) 76(v) 63(c) 07 63(c) 6C(l) 75(u) 73(s) 74(t) 65(e) 72(r) 05 6C(l) 6F(o) 63(c) 61(a) 6C(l) 00 00 01 00 01 00 44(D) 40(@) 61(a) 01 00 00 01 00 00 00 00 00 00 07 72(r) 65(e) 76(v) 69(i) 65(e) 77(w) 73(s) 18 64(d) 65(e) 65(e) 70(p) 66(f) 6C(l) 6F(o) 77(w) 2D(-) 65(e) 62(b) 70(p) 66(f) 2D(-) 69(i) 73(s) 74(t) 69(i) 6F(o) 2D(-) 64(d) 65(e) 6D(m) 6F(o) 03 73(s) 76(v) 63(c) 07 63(c) 6C(l) 75(u) 73(s) 74(t) 65(e) 72(r) 05 6C(l) 6F(o) 63(c) 61(a) 6C(l) 00 00 1C 00 01
  • --------------------------------- +

双记录查询:客户端同时发送A和AAAA记录请求,是支持IPv4/IPv6双栈环境的典型行为(如Pod网络配置为双栈模式)。
域名合法性:
svc.cluster.local 为Kubernetes默认集群域,符合服务发现规则。
服务名称(reviews)和命名空间(deepflow-ebpf-istio-demo)符合Kubernetes命名规范。

  • 服务端:(先收两个字节,然后收后面的DNS数据)
  • --------------------------------- +
    2025-02-27 16:38:10.612182 [datadump] SEQ 2768 DIR in TYPE unknown(7) PID 4141214 THREAD_ID 4141228 COROUTINE_ID 0 ROLE server CONTAINER_ID d0bef00b7306605026c7c26cf972bdca3d8a82502bee457fb5708627bf5aefba SOURCE 0 COMM coredns TCP 172.18.144.157.48170 > 172.18.144.156.53 LEN 2 SYSCALL_LEN 2 SOCKET_ID 17406453912620263 TRACE_ID 17406453912620316 TCP_SEQ 3441319224 DATA_SEQ 0 TLS false KernCapTime 2025-02-27 16:38:10.612129 KernMonoTime 18126903250857 us
    00 44(D)
  • --------------------------------- +
  • --------------------------------- +
    2025-02-27 16:38:10.638114 [datadump] SEQ 2770 DIR in TYPE unknown(7) PID 4141214 THREAD_ID 4141228 COROUTINE_ID 0 ROLE server CONTAINER_ID d0bef00b7306605026c7c26cf972bdca3d8a82502bee457fb5708627bf5aefba SOURCE 0 COMM coredns TCP 172.18.144.157.48170 > 172.18.144.156.53 LEN 68 SYSCALL_LEN 68 SOCKET_ID 17406453912620263 TRACE_ID 17406453912620316 TCP_SEQ 3441319226 DATA_SEQ 1 TLS false KernCapTime 2025-02-27 16:38:10.612137 KernMonoTime 18126903250865 us
    B1 58(X) 01 00 00 01 00 00 00 00 00 00 07 72(r) 65(e) 76(v) 69(i) 65(e) 77(w) 73(s) 18 64(d) 65(e) 65(e) 70(p) 66(f) 6C(l) 6F(o) 77(w) 2D(-) 65(e) 62(b) 70(p) 66(f) 2D(-) 69(i) 73(s) 74(t) 69(i) 6F(o) 2D(-) 64(d) 65(e) 6D(m) 6F(o) 03 73(s) 76(v) 63(c) 07 63(c) 6C(l) 75(u) 73(s) 74(t) 65(e) 72(r) 05 6C(l) 6F(o) 63(c) 61(a) 6C(l) 00 00 01 00 01
  • --------------------------------- +

服务端先收取 A 记录(上面)的查询信息进行应答:

  • --------------------------------- +
    2025-02-27 16:38:10.638154 [datadump] SEQ 2771 DIR out TYPE unknown(7) PID 4141214 THREAD_ID 4141228 COROUTINE_ID 0 ROLE server CONTAINER_ID d0bef00b7306605026c7c26cf972bdca3d8a82502bee457fb5708627bf5aefba SOURCE 0 COMM coredns TCP 172.18.144.156.53 > 172.18.144.157.48170 LEN 136 SYSCALL_LEN 136 SOCKET_ID 17406453912620263 TRACE_ID 17406453912620316 TCP_SEQ 3839016411 DATA_SEQ 2 TLS false KernCapTime 2025-02-27 16:38:10.612167 KernMonoTime 18126903250895 us
    00 86 B1 58(X) 85 00 00 01 00 01 00 00 00 00 07 72(r) 65(e) 76(v) 69(i) 65(e) 77(w) 73(s) 18 64(d) 65(e) 65(e) 70(p) 66(f) 6C(l) 6F(o) 77(w) 2D(-) 65(e) 62(b) 70(p) 66(f) 2D(-) 69(i) 73(s) 74(t) 69(i) 6F(o) 2D(-) 64(d) 65(e) 6D(m) 6F(o) 03 73(s) 76(v) 63(c) 07 63(c) 6C(l) 75(u) 73(s) 74(t) 65(e) 72(r) 05 6C(l) 6F(o) 63(c) 61(a) 6C(l) 00 00 01 00 01 07 72(r) 65(e) 76(v) 69(i) 65(e) 77(w) 73(s) 18 64(d) 65(e) 65(e) 70(p) 66(f) 6C(l) 6F(o) 77(w) 2D(-) 65(e) 62(b) 70(p) 66(f) 2D(-) 69(i) 73(s) 74(t) 69(i) 6F(o) 2D(-) 64(d) 65(e) 6D(m) 6F(o) 03 73(s) 76(v) 63(c) 07 63(c) 6C(l) 75(u) 73(s) 74(t) 65(e) 72(r) 05 6C(l) 6F(o) 63(c) 61(a) 6C(l) 00 00 01 00 01 00 00 00 1B 00 04 C0 A8 39(9) 45(E)
  • --------------------------------- +

应答记录
资源记录类型:A记录(00 01)
IP地址:C0 A8 39 45 → 192.168.57.69
(C0 A8 = 192.168,39 45 = 57.69)

服务端再收取 AAAA 查询进行应答:

  • --------------------------------- +
    2025-02-27 16:38:10.638179 [datadump] SEQ 2772 DIR in TYPE unknown(7) PID 4141214 THREAD_ID 4141228 COROUTINE_ID 0 ROLE server CONTAINER_ID d0bef00b7306605026c7c26cf972bdca3d8a82502bee457fb5708627bf5aefba SOURCE 0 COMM coredns TCP 172.18.144.157.48170 > 172.18.144.156.53 LEN 2 SYSCALL_LEN 2 SOCKET_ID 17406453912620263 TRACE_ID 17406453912620317 TCP_SEQ 3441319294 DATA_SEQ 3 TLS false KernCapTime 2025-02-27 16:38:10.612219 KernMonoTime 18126903250947 us
    00 44(D)
  • --------------------------------- +
  • --------------------------------- +
    2025-02-27 16:38:10.638189 [datadump] SEQ 2773 DIR in TYPE unknown(7) PID 4141214 THREAD_ID 4141228 COROUTINE_ID 0 ROLE server CONTAINER_ID d0bef00b7306605026c7c26cf972bdca3d8a82502bee457fb5708627bf5aefba SOURCE 0 COMM coredns TCP 172.18.144.157.48170 > 172.18.144.156.53 LEN 68 SYSCALL_LEN 68 SOCKET_ID 17406453912620263 TRACE_ID 17406453912620317 TCP_SEQ 3441319296 DATA_SEQ 4 TLS false KernCapTime 2025-02-27 16:38:10.612223 KernMonoTime 18126903250952 us
    40(@) 61(a) 01 00 00 01 00 00 00 00 00 00 07 72(r) 65(e) 76(v) 69(i) 65(e) 77(w) 73(s) 18 64(d) 65(e) 65(e) 70(p) 66(f) 6C(l) 6F(o) 77(w) 2D(-) 65(e) 62(b) 70(p) 66(f) 2D(-) 69(i) 73(s) 74(t) 69(i) 6F(o) 2D(-) 64(d) 65(e) 6D(m) 6F(o) 03 73(s) 76(v) 63(c) 07 63(c) 6C(l) 75(u) 73(s) 74(t) 65(e) 72(r) 05 6C(l) 6F(o) 63(c) 61(a) 6C(l) 00 00 1C 00 01
  • --------------------------------- +
  • --------------------------------- +
    2025-02-27 16:38:10.638205 [datadump] SEQ 2774 DIR out TYPE unknown(7) PID 4141214 THREAD_ID 4141228 COROUTINE_ID 0 ROLE server CONTAINER_ID d0bef00b7306605026c7c26cf972bdca3d8a82502bee457fb5708627bf5aefba SOURCE 0 COMM coredns TCP 172.18.144.156.53 > 172.18.144.157.48170 LEN 163 SYSCALL_LEN 163 SOCKET_ID 17406453912620263 TRACE_ID 17406453912620317 TCP_SEQ 3839016547 DATA_SEQ 5 TLS false KernCapTime 2025-02-27 16:38:10.612244 KernMonoTime 18126903250972 us
    00 A1 40(@) 61(a) 85 00 00 01 00 00 00 01 00 00 07 72(r) 65(e) 76(v) 69(i) 65(e) 77(w) 73(s) 18 64(d) 65(e) 65(e) 70(p) 66(f) 6C(l) 6F(o) 77(w) 2D(-) 65(e) 62(b) 70(p) 66(f) 2D(-) 69(i) 73(s) 74(t) 69(i) 6F(o) 2D(-) 64(d) 65(e) 6D(m) 6F(o) 03 73(s) 76(v) 63(c) 07 63(c) 6C(l) 75(u) 73(s) 74(t) 65(e) 72(r) 05 6C(l) 6F(o) 63(c) 61(a) 6C(l) 00 00 1C 00 01 07 63(c) 6C(l) 75(u) 73(s) 74(t) 65(e) 72(r) 05 6C(l) 6F(o) 63(c) 61(a) 6C(l) 00 00 06 00 01 00 00 00 1B 00 44(D) 02 6E(n) 73(s) 03 64(d) 6E(n) 73(s) 07 63(c) 6C(l) 75(u) 73(s) 74(t) 65(e) 72(r) 05 6C(l) 6F(o) 63(c) 61(a) 6C(l) 00 0A 68(h) 6F(o) 73(s) 74(t) 6D(m) 61(a) 73(s) 74(t) 65(e) 72(r) 07 63(c) 6C(l) 75(u) 73(s) 74(t) 65(e) 72(r) 05 6C(l) 6F(o) 63(c) 61(a) 6C(l) 00 67(g) C0 24($) 66(f) 00 00 1C 20( ) 00 00 07 08 00 01 51(Q) 80 00 00 00 1E
  • --------------------------------- +

客户端接收(A的应答,注意它没有先读取4字节):

  • --------------------------------- +
    2025-02-27 16:38:10.612252 [datadump] SEQ 2769 DIR in TYPE unknown(7) PID 61039 THREAD_ID 61096 COROUTINE_ID 0 ROLE client CONTAINER_ID 824004fd7683479932062c03052e530ee3a97ce3b67d5466f53e36d8630caaa3 SOURCE 0 COMM wrk:worker_0 TCP 172.18.144.156.53 > 172.18.144.157.48170 LEN 136 SYSCALL_LEN 136 SOCKET_ID 737982394291899595 TRACE_ID 161521641988476197 TCP_SEQ 3839016411 DATA_SEQ 1 TLS false KernCapTime 2025-02-27 16:38:10.612231 KernMonoTime 18126903250960 us
    00 86 B1 58(X) 85 00 00 01 00 01 00 00 00 00 07 72(r) 65(e) 76(v) 69(i) 65(e) 77(w) 73(s) 18 64(d) 65(e) 65(e) 70(p) 66(f) 6C(l) 6F(o) 77(w) 2D(-) 65(e) 62(b) 70(p) 66(f) 2D(-) 69(i) 73(s) 74(t) 69(i) 6F(o) 2D(-) 64(d) 65(e) 6D(m) 6F(o) 03 73(s) 76(v) 63(c) 07 63(c) 6C(l) 75(u) 73(s) 74(t) 65(e) 72(r) 05 6C(l) 6F(o) 63(c) 61(a) 6C(l) 00 00 01 00 01 07 72(r) 65(e) 76(v) 69(i) 65(e) 77(w) 73(s) 18 64(d) 65(e) 65(e) 70(p) 66(f) 6C(l) 6F(o) 77(w) 2D(-) 65(e) 62(b) 70(p) 66(f) 2D(-) 69(i) 73(s) 74(t) 69(i) 6F(o) 2D(-) 64(d) 65(e) 6D(m) 6F(o) 03 73(s) 76(v) 63(c) 07 63(c) 6C(l) 75(u) 73(s) 74(t) 65(e) 72(r) 05 6C(l) 6F(o) 63(c) 61(a) 6C(l) 00 00 01 00 01 00 00 00 1B 00 04 C0 A8 39(9) 45(E)
  • --------------------------------- +

客户端接收(AAAA的应答,注意它没有先读取4字节):省略

@yinjiping yinjiping enabled auto-merge (squash) February 28, 2025 11:23
@yinjiping yinjiping merged commit 30708dc into main Feb 28, 2025
8 checks passed
@yinjiping yinjiping deleted the fix_coredns_loss branch February 28, 2025 11:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants