Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FPC LRU cache related changes #2497

Merged
merged 10 commits into from
Jul 22, 2024
Merged

FPC LRU cache related changes #2497

merged 10 commits into from
Jul 22, 2024

Conversation

mmanoj
Copy link
Contributor

@mmanoj mmanoj commented Jul 12, 2024

Please sign (check) the below before submitting the Pull Request:

Link to the related issue:

Describe changes:
changes related to #2322
item# 2
2.1.Create a new LRU cache where the key is the pair "Client_IP - Resolved_IP" and the value is a protocol id:
2.2 For each flow (at the very first packet) lookup into the cache using "SRC IP - DST IP" as key and, if a match is found, save the protocol id into ndpi_flow_struct structure as a "fpc result"

mmanoj added 2 commits July 12, 2024 19:58
update fpc lru cache related changes
fpc lru cache related changes
Copy link
Collaborator

@IvanNardi IvanNardi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmanoj, some things to improve; this is only a preliminary review.
The most obvious missing piece is the cache initialization! You must add it in ndpi_finalize_initialization
Please, be sure that the code compile (at least on your machine), before pushing it

src/include/ndpi_private.h Outdated Show resolved Hide resolved
src/include/ndpi_typedefs.h Outdated Show resolved Hide resolved
src/include/ndpi_private.h Outdated Show resolved Hide resolved
src/include/ndpi_private.h Outdated Show resolved Hide resolved
src/include/ndpi_typedefs.h Outdated Show resolved Hide resolved
src/lib/ndpi_main.c Outdated Show resolved Hide resolved
src/lib/ndpi_main.c Outdated Show resolved Hide resolved
src/lib/ndpi_main.c Outdated Show resolved Hide resolved
@mmanoj mmanoj requested a review from IvanNardi July 14, 2024 16:18
mmanoj added 2 commits July 14, 2024 21:54
Update for review changes
Update as per review comments
Copy link
Contributor Author

@mmanoj mmanoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update as per review comments.Howeer need some clarifications regarding section of "first packet of this flow to be analyzed" in ndpi_main.c we can discuss it after this review.

@mmanoj
Copy link
Contributor Author

mmanoj commented Jul 14, 2024

@mmanoj, some things to improve; this is only a preliminary review. The most obvious missing piece is the cache initialization! You must add it in ndpi_finalize_initialization Please, be sure that the code compile (at least on your machine), before pushing it

All changes done and compiled in local env with latest dev branch.

Copy link
Collaborator

@IvanNardi IvanNardi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general idea of the cache is:

  • add an entry in the cache with DNS traffic. In the DNS code, if ndpi_match_host_subprotocol() return a protocol different than NDPI_PROTOCOL_UNKNOWN, we should do somethig like ndpi_lru_add_to_cache(ndpi_str->fpc_dns_cache, key, ret.app_protocol, ndpi_get_current_time(flow));.
    The key is something like SRC_FLOW_IP + ip_required_in_the_dns_query

  • perform a lookup in ndpi_internal_detection_process_packet just after fpc_check_ip() call. If there is an entry, call something like fpc_update(ndpi_str, flow, NDPI_PROTOCOL_UNKNOWN, protocol_returned_from_the_cache, NDPI_FPC_CONFIDENCE_DNS)

Any specific doubts?

src/include/ndpi_private.h Outdated Show resolved Hide resolved
@@ -361,6 +367,9 @@ struct ndpi_detection_module_struct {

/* NDPI_PROTOCOL_MSTEAMS */
struct ndpi_lru_cache *msteams_cache;

/* NDPI_FIRST_PKT_CLASSIFICATION_CACHE */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above

src/include/ndpi_typedefs.h Outdated Show resolved Hide resolved
src/lib/ndpi_main.c Outdated Show resolved Hide resolved
src/lib/ndpi_main.c Outdated Show resolved Hide resolved
src/lib/ndpi_main.c Outdated Show resolved Hide resolved
@@ -10250,6 +10330,10 @@ int ndpi_get_lru_cache_stats(struct ndpi_global_context *g_ctx,
case NDPI_LRUCACHE_MSTEAMS:
ndpi_lru_get_stats(is_local ? ndpi_struct->msteams_cache : g_ctx->msteams_global_cache, stats);
return 0;
//ToDo:add fpc cache details
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the comment, please

src/lib/ndpi_main.c Outdated Show resolved Hide resolved
@mmanoj
Copy link
Contributor Author

mmanoj commented Jul 15, 2024

The general idea of the cache is:

* add an entry in the cache with DNS traffic. In the DNS code, if `ndpi_match_host_subprotocol()` return a protocol different than `NDPI_PROTOCOL_UNKNOWN`, we should do somethig like `ndpi_lru_add_to_cache(ndpi_str->fpc_dns_cache,  key, ret.app_protocol, ndpi_get_current_time(flow));`.
  The key is something like SRC_FLOW_IP + ip_required_in_the_dns_query

* perform a lookup in `ndpi_internal_detection_process_packet` just after `fpc_check_ip()` call. If there is an entry, call something like `fpc_update(ndpi_str, flow, NDPI_PROTOCOL_UNKNOWN, protocol_returned_from_the_cache, NDPI_FPC_CONFIDENCE_DNS)`

Any specific doubts?

can you explain bit more about this? SRC_FLOW_IP + ip_required_in_the_dns_query
is this flow source IP and DNS response IP (like A or AAA address?)

@IvanNardi
Copy link
Collaborator

IvanNardi commented Jul 15, 2024

can you explain bit more about this? SRC_FLOW_IP + ip_required_in_the_dns_query is this flow source IP and DNS response IP (like A or AAA address?)

That was the idea, but I was wrong (because of ipv4 and ipv6 possible mismatch) We should use only one address as key.

Anyway, let try together to analyze what we should do step by step.
In this simple trace there is a DNS flow and a TCP flow: fpc_dns.zip

  1. We should take a look at every DNS A or AAAA replies. If the domain required provides a sub-classification, i.e. if ndpi_match_host_subprotocol() in DNS code returns a protocol different than NDPI_PROTOCOL_UNKNOWN, we should save that information into the cache. I think we can always add only one entry, even if there are multiple answers in the DNS reply
    *) Packet 2 is a DNS AAAA reply with Youtube as subclassification -> add an entry with key "2a00:1450:4002:416::200e" (i.e. the first answer) and protocol NDPI_PROTOCOL_YOUTUBE

  2. At the very first pkt of every flow, we should lookup into the cache using the server address (i.e. flow->s_addr), in ndpi_internal_detection_process_packet just after fpc_check_ip() call. If a match is found, we should call fpc_update(ndpi_str, flow, NDPI_PROTOCOL_UNKNOWN, protocol_returned_from_the_cache, NDPI_FPC_CONFIDENCE_DNS)
    *) At packet 3 we should lookup using "2a00:1450:4002:416::200e" as key (the server address). The returned protocol is NDPI_PROTOCOL_YOUTUBE, so the FPC of this flow is Youtube with confidence "dns"

If everything is implemented correctly we should got something like:

./example/ndpiReader -t -i ~/fpc_dns.pcapng -v2


	1	TCP [2001:b07:a3d:c112:9326:58a1:68f7:5712]:40088 <-> [2a00:1450:4002:410::200e]:443 ... [FPC: 124/Youtube, Confidence: DNS]

Am I missing something?

@mmanoj
Copy link
Contributor Author

mmanoj commented Jul 16, 2024

can you explain bit more about this? SRC_FLOW_IP + ip_required_in_the_dns_query is this flow source IP and DNS response IP (like A or AAA address?)

That was the idea, but I was wrong (because of ipv4 and ipv6 possible mismatch) We should use only one address as key.

Anyway, let try together to analyze what we should do step by step. In this simple trace there is a DNS flow and a TCP flow: fpc_dns.zip

1. We should take a look at every DNS A or AAAA replies. If the domain required provides a sub-classification, i.e. if `ndpi_match_host_subprotocol()` in DNS code returns a protocol different than `NDPI_PROTOCOL_UNKNOWN`, we should save that information into the cache. I think we can always add only one entry, even if there are multiple answers in the DNS reply
   *) Packet 2 is a DNS AAAA reply with Youtube as subclassification -> add an entry with key "2a00:1450:4002:416::200e" (i.e. the first answer) and protocol `NDPI_PROTOCOL_YOUTUBE`

2. At the very first pkt of every flow, we should lookup into the cache using the server address (i.e. `flow->s_addr`), in `ndpi_internal_detection_process_packet` just after `fpc_check_ip()` call. If a match is found, we should call `fpc_update(ndpi_str, flow, NDPI_PROTOCOL_UNKNOWN, protocol_returned_from_the_cache, NDPI_FPC_CONFIDENCE_DNS)`
   *) At packet 3 we should lookup using "2a00:1450:4002:416::200e"  as key (the server address). The returned protocol is `NDPI_PROTOCOL_YOUTUBE`, so the FPC of this flow is Youtube with confidence "dns"

If everything is implemented correctly we should got something like:

./example/ndpiReader -t -i ~/fpc_dns.pcapng -v2


	1	TCP [2001:b07:a3d:c112:9326:58a1:68f7:5712]:40088 <-> [2a00:1450:4002:410::200e]:443 ... [FPC: 124/Youtube, Confidence: DNS]

Am I missing something?

@IvanNardi

Thanks for detailed clarification,let me go through the attached dns packet capture and described logic flow with ongoing FPC implementation and update you any clarifications required.Thank you very much for quick support and all valuable advice.Really appreciate this advice to narrow down the implementation.

@mmanoj
Copy link
Contributor Author

mmanoj commented Jul 17, 2024

can you explain bit more about this? SRC_FLOW_IP + ip_required_in_the_dns_query is this flow source IP and DNS response IP (like A or AAA address?)

That was the idea, but I was wrong (because of ipv4 and ipv6 possible mismatch) We should use only one address as key.
Anyway, let try together to analyze what we should do step by step. In this simple trace there is a DNS flow and a TCP flow: fpc_dns.zip

1. We should take a look at every DNS A or AAAA replies. If the domain required provides a sub-classification, i.e. if `ndpi_match_host_subprotocol()` in DNS code returns a protocol different than `NDPI_PROTOCOL_UNKNOWN`, we should save that information into the cache. I think we can always add only one entry, even if there are multiple answers in the DNS reply
   *) Packet 2 is a DNS AAAA reply with Youtube as subclassification -> add an entry with key "2a00:1450:4002:416::200e" (i.e. the first answer) and protocol `NDPI_PROTOCOL_YOUTUBE`

2. At the very first pkt of every flow, we should lookup into the cache using the server address (i.e. `flow->s_addr`), in `ndpi_internal_detection_process_packet` just after `fpc_check_ip()` call. If a match is found, we should call `fpc_update(ndpi_str, flow, NDPI_PROTOCOL_UNKNOWN, protocol_returned_from_the_cache, NDPI_FPC_CONFIDENCE_DNS)`
   *) At packet 3 we should lookup using "2a00:1450:4002:416::200e"  as key (the server address). The returned protocol is `NDPI_PROTOCOL_YOUTUBE`, so the FPC of this flow is Youtube with confidence "dns"

If everything is implemented correctly we should got something like:

./example/ndpiReader -t -i ~/fpc_dns.pcapng -v2


	1	TCP [2001:b07:a3d:c112:9326:58a1:68f7:5712]:40088 <-> [2a00:1450:4002:410::200e]:443 ... [FPC: 124/Youtube, Confidence: DNS]

Am I missing something?

@IvanNardi

Thanks for detailed clarification,let me go through the attached dns packet capture and described logic flow with ongoing FPC implementation and update you any clarifications required.Thank you very much for quick support and all valuable advice.Really appreciate this advice to narrow down the implementation.

@IvanNardi

The logic seems fine, let me implement and perform the initial test and update you for any clarifications required.

@mmanoj
Copy link
Contributor Author

mmanoj commented Jul 18, 2024

2a00:1450:4002:416::200e
@IvanNardi

I almost implement the logic from dns.c side and ndpi_main.c, still need to fix some the key generation logic with IP address as it's not working as expected.However hard coded key it's working fine.I will perform more test and share the code.

@mmanoj
Copy link
Contributor Author

mmanoj commented Jul 20, 2024

2a00:1450:4002:416::200e
@IvanNardi

I almost implement the logic from dns.c side and ndpi_main.c, still need to fix some the key generation logic with IP address as it's not working as expected.However hard coded key it's working fine.I will perform more test and share the code.

@IvanNardi
Key generation issue fixed

1	TCP 192.168.1.8:48516 <-> 157.240.235.60:443 [proto: 91.142/TLS.WhatsApp][IP: 142/WhatsApp][Encrypted][Confidence: DPI][**FPC: 242/WhatsAppFiles, Confidence: DNS**][DPI packets: 6][cat: Chat/9][179 pkts/21880 bytes <-> 178 pkts/155403 bytes][Goodput ratio: 46/92][12.09 sec][Hostname/SNI: web.whatsapp.com][(Advertised) ALPNs: h2;http/1.1][TLS Supported Versions: TLSv1.3;TLSv1.2][bytes ratio: -0.753 (Download)][IAT c2s/s2c min/avg/max/stddev: 0/0 43/11 2752/306 264/38][Pkt Len c2s/s2c min/avg/max/stddev: 66/66 122/873 1103/1446 143/629][TLSv1.3][JA3C: a195b9c006fcb23ab9a2343b0871e362][JA4: t13d1715h2_5b57614c22b0_7121afd63204][JA3S: fcb2d4d0991292272fcb1e464eedfd43][ECH: version 0xfe0d][Firefox][Cipher: TLS_AES_128_GCM_SHA256][Plen Bins: 1,26,10,2,0,3,6,0,0,0,3,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,44,0,0,0,0]

mmanoj added 5 commits July 20, 2024 19:50
FPC review change updates with DNS cache
FPC review change updates with DNS cache
FPC review change updates with DNS cache
FPC review change updates with DNS cache
@mmanoj mmanoj requested a review from IvanNardi July 20, 2024 14:30
@mmanoj
Copy link
Contributor Author

mmanoj commented Jul 20, 2024

@IvanNardi

All changes updated with test pcap file also.Please review and advice if any changes required.

./ndpiReader -t -i fpc_dns.pcap -v2

1 TCP 192.168.1.8:39432 <-> 142.250.199.4:443 [proto: 91/TLS][IP: 126/Google][Encrypted][Confidence: DPI][FPC: 126/Google, Confidence: DNS][DPI packets: 3][cat: Web/5][405 pkts/47359 bytes <-> 742 pkts/962242 bytes][Goodput ratio: 44/95][4.54 sec][bytes ratio: -0.906 (Download)][IAT c2s/s2c min/avg/max/stddev: 0/0 12/5 1094/1181 74/52][Pkt Len c2s/s2c min/avg/max/stddev: 66/66 117/1297 1466/1466 208/436][Risk: ** Probing attempt **][Risk Score: 50][Risk Info: TLS/QUIC Probing][Plen Bins: 3,3,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,91,0,0,0,0]

Update documetation.
Fix key with ipv6 addreses over ipv4 flow.
Update unit tests results.
Copy link
Collaborator

@IvanNardi IvanNardi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed some small changes and I'll update the commit message.
But the code was almost good anyway...
Thanks a lot for that and for your perseverance

@IvanNardi IvanNardi merged commit 67f5cda into ntop:dev Jul 22, 2024
35 checks passed
@mmanoj
Copy link
Contributor Author

mmanoj commented Jul 22, 2024

I pushed some small changes and I'll update the commit message. But the code was almost good anyway... Thanks a lot for that and for your perseverance

@IvanNardi

Thanks for the support and I was review the fixes you posted and it's good improvements which I was missed.I'm happy to do some valuable contribution to community and thank again for your advice and guidelines.Really appreciated.

Hope this change will improve the detection accuracy overall as per the test results I can see good detection specially services based on CDN's.

Once This pull request merge I will work on adding more IP lists from some reliable sources specially malware and C&C servers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants