Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streamline configuration of Zeek live capture worker load balancing using AF_PACKET and fanout #475

Closed
mmguero opened this issue May 13, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request performance Related to speed/performance zeek Relating to Malcolm's use of Zeek
Milestone

Comments

@mmguero
Copy link
Collaborator

mmguero commented May 13, 2024

EDIT: After investigation it turns out that my assumptions about what the various parameters in node.cfg were doing were not quite correct.

The way we're generating node.cfg was creating multiple workers per interface, based on the lb_procs variable. However, with this better understanding I have changed the environment variables to make this more automated.

See the next comment for updated documentation that will outline how to use the changes I've made

Original issue text for context:


Looking at the "cluster setup" documentation for AF_Packet, a few notes:

Zeek is not multithreaded, so once the limitations of a single processor core are reached the only option currently is to spread the workload across many cores, or even many physical computers

and, later:

Load balancing

The more interesting use-case is to use AF_PACKET to run multiple Zeek workers and have their packet sockets join what is called a fanout group. In such a setup, the network traffic is load-balanced across Zeek workers. By default load balancing is based on symmetric flow hashes 1.

For example, running two Zeek workers listening on the same network interface, each worker analyzing approximately half of the network traffic, can be done as follows:

zeek -i af_packet::eth0 &
zeek -i af_packet::eth0 &

The fanout group is identified by an id and configurable using the AF_Packet::fanout_id constant which defaults to 23. In the example above, both Zeek workers join the same fanout group.

EDIT: This is the part of my assumption that was incorrect:

We are, right now, doing the more simple "one worker per interface" or "single worker mode" which for high throughput isn't going to cut it. One of our users is getting some packet drops that are probably related to this limitation.


We need to examine the following:

  • allow a way to specify the number of workers per interface (rather than just 1)
    • EDIT: this was already possible, but is now handled better through ZEEK_LB_PROCS_WORKER_DEFAULT
  • make sure that the fanout ID gets set appropriately
    • EDIT: this was already happening correctly
  • reexamine the existing variables and make sure they cover our needs
  • create a documentation section for high-performance capture and gather all of this zeek stuff, the suricata stuff, arkime capture stuff, etc., into one place
    • EDIT: done, see here
@mmguero mmguero added enhancement New feature or request zeek Relating to Malcolm's use of Zeek performance Related to speed/performance labels May 13, 2024
@mmguero mmguero added this to the z.staging milestone May 13, 2024
@mmguero mmguero self-assigned this May 13, 2024
@mmguero mmguero added this to Malcolm May 13, 2024
@mmguero mmguero moved this to Todo (develop) in Malcolm May 13, 2024
@mmguero mmguero modified the milestones: z.staging, v24.05.0 May 13, 2024
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue May 15, 2024
…k and documentation updates

- see idaholab#475 for the zeek deploy changes
- see idaholab#435 for an issue about Suricata settings (documentation changes)
- Arkime documentation changes as well
@mmguero mmguero changed the title Zeek live capture worker load balancing using AF_PACKET and fanout more straightforward way to configure Zeek live capture worker load balancing using AF_PACKET and fanout May 15, 2024
@mmguero mmguero changed the title more straightforward way to configure Zeek live capture worker load balancing using AF_PACKET and fanout streamline configuration of Zeek live capture worker load balancing using AF_PACKET and fanout May 15, 2024
@mmguero
Copy link
Collaborator Author

mmguero commented May 15, 2024

from the new documentation:

From the new documentation:

Zeek's resource utilization and performance can be tuned using environment variables. These environment variables are the same for both Hedgehog Linux and Malcolm's own monitoring of local network interfaces. For Hedgehog Linux, they are found in /opt/sensor/sensor_ctl/control_vars.conf, and for Malcolm they should be added to or modified in zeek-live.env.

Malcolm and Hedgehog Linux use Zeek's support for AF_Packet sockets for packet capture. Review Zeek's documentation on cluster setup to better understand the parameters discussed below.

The relevant environment variables related to tuning Zeek for live packet capture are:

  • ZEEK_AF_PACKET_BUFFER_SIZE - AF_Packet ring buffer size in bytes (default 67108864)
  • ZEEK_AF_PACKET_FANOUT_MODE - AF_Packet fanout mode (default FANOUT_HASH)
  • ZEEK_LB_PROCS_WORKER_DEFAULT - "Zeek is not multithreaded, so once the limitations of a single processor core are reached the only option currently is to spread the workload across many cores". This value defines the number of processors to be assigned to each group of workers created for each capture interface for load balancing (default 1). A value of 0 means "autocalculate based on the number of CPUs present in the system."
  • ZEEK_LB_PROCS_WORKER_n - Explicitly defines the number of processor to be assigned to the group of workers for the n-th capture interface. If unspecified this defaults to the number of CPUs ZEEK_PIN_CPUS_WORKER_n if defined, or ZEEK_LB_PROCS_WORKER_DEFAULT otherwise.
  • ZEEK_LB_PROCS_LOGGER - Defines the number of processors to be assigned to the loggers (default 1)
  • ZEEK_LB_PROCS_PROXY - Defines the number of processors to be assigned to the proxies (default 1)
  • ZEEK_LB_PROCS_CPUS_RESERVED - If ZEEK_LB_PROCS_WORKER_DEFAULT is 0 ("autocalculate"), exclude this number of CPUs from the autocalculation (defaults to 1 (kernel) + 1 (manager) + ZEEK_LB_PROCS_LOGGER + ZEEK_LB_PROCS_PROXY)
  • ZEEK_PIN_CPUS_WORKER_AUTO - Automatically pin worker CPUs (default false)
  • ZEEK_PIN_CPUS_WORKER_n - Explicitly defines the processor IDs to be to be assigned to the group of workers for the n-th capture interface (e.g., 0 means "the first CPU"; 12,13,14,15 means "the last four CPUs" on a 16-core system)
  • ZEEK_PIN_CPUS_OTHER_AUTO - automatically pin CPUs for manager, loggers, and proxies if possible (default false)
  • ZEEK_PIN_CPUS_MANAGER - list of CPUs to pin for the manager process (default is unset; only used if ZEEK_PIN_CPUS_OTHER_AUTO is false)
  • ZEEK_PIN_CPUS_LOGGER - list of CPUs to pin for the logger processes (default is unset; only used if ZEEK_PIN_CPUS_OTHER_AUTO is false)
  • ZEEK_PIN_CPUS_PROXY - list of CPUs to pin for the proxy processes (default is unset; only used if ZEEK_PIN_CPUS_OTHER_AUTO is false)

@mmguero mmguero moved this from Todo (develop) to Testing in Malcolm May 15, 2024
@mmguero mmguero moved this from Testing to In Progress in Malcolm May 22, 2024
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue May 22, 2024
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue May 22, 2024
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue May 22, 2024
mmguero added a commit to mmguero-dev/Malcolm that referenced this issue May 23, 2024
@mmguero mmguero moved this from In Progress to Testing in Malcolm May 24, 2024
This was referenced May 29, 2024
@mmguero mmguero closed this as completed May 29, 2024
@github-project-automation github-project-automation bot moved this from Testing to Done in Malcolm May 29, 2024
@mmguero mmguero moved this from Done to Released in Malcolm May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Related to speed/performance zeek Relating to Malcolm's use of Zeek
Projects
Status: Released
Development

No branches or pull requests

1 participant