virtio-net: add multiqueue support #2011

francescolavra · 2024-03-22T15:16:26Z

This changeset enhances the virtio-net driver to support multiple tx/rx queues. This allows achieving better network performance in multi-CPU instances running workloads that serve multiple network connections simultaneously.
By default, the virtio-net driver uses as many queues as supported by the attached device; it is possible to override this behavior by specifying the "io-queues" configuration option in the manifest tuple corresponding to a given network interface. For example,
the following snippet of an Ops configuration file instructs the driver to use 2 queues for the first network interface:

"ManifestPassthrough": {
  "en1": {
    "io-queues": "2"
  }
}

The number of queues used by the driver is always limited to the number of CPUs in the running instance (this behavior cannot be overridden by the "io-queues" option).
In order to optimize parallelism, each tx/rx queue is configured with an interrupt affinity such that different queues are served
by different CPUs.

This field is incremented in the thread_return() Unix function, but is never used.

The current code is initializing the PLIC threshold for all APs from the start_secondary_cores() function, executed by the boot processor. However, PLIC configuration for a given CPU can be changed by the boot ROM executed by the CPU at startup, and it has been observed that the OpenSBI firmware changes the PLIC threshold value, thereby overwriting the configuration set by the boot processor. This change fixes the above issue by moving the configuration of the PLIC threshold for a given CPU to the ap_start() function, executed by the CPU itself.

In the RISC-V SMP initialization code, the device tree is parsed in order to retrieve the number of present CPUs. Commit 364b517 introduced a regression by which a cpu device tree entry, whose name is null-terminated in the device tree, can no longer be recognized, thereby preventing the kernel from enumerating the application processors. This change fixes the above regression by using the sstring_from_cstring() function when parsing a device tree entry name.

This function claims all pending interrupts without invoking their handler, therefore if called when enabling a new interrupt it may cause missing other (previously enabled) interrupts. This issue has been observed to cause missing interrupts from the virtio-blk device when initializing the virtio-net device, which stalls the kernel during startup. This change fixes the above issue by removing the usage of plic_clear_pending_int(), which should not cause other problems because interrupt handlers are supposed to be able to handle any spurious interrupts that may be triggered.

With this change, the stage3 startup() function is always invoked after all attached peripherals have been probed, even if the root filesystem is loaded before PCI bus discovery is complete (which can happen if disk device interrupts are handled on a CPU other than the boot CPU). The next commit will allow device interrupts to be assigned to an arbitrary CPU, and this commit prevents errors such as "NET: no network interface found" from occurring if network device probing is not complete before the root filesystem is loaded.

In the current code, all device interrupts are routed to either the boot CPU (on x86 and ARM), or to any CPU (on RISC-V). This new feature allows interrupts to be configured to target a specific CPU, which allows optimizing interrupt allocation to spread a workload over multiple CPUS. The arch-specific msi_format() and dev_irq_enable() functions have been changed to take an additional `target_cpu` parameter, while msi_get_vector() has been replaced by msi_get_config() which allows retrieving both the interrupt number and the target CPU associated to a given PCI MSI. Other changes in arch-specific interrupt code have been made in order to support targeting an arbitrary interrupt to an arbitrary CPU. Various functions in the PCI and virtIO code have been amended to take an additional `cpu_affinity` range parameter, which is used as a hint to select a target CPU when a given interrupt vector is enabled in the interrupt controller. The irq_get_target_cpu() utility function takes a CPU affinity range as parameter and returns an optimal target CPU, selected from the supplied range based on which CPUs have the smallest number of device interrupts targeted at them, so as to spread the interrupt handling work over different CPUs. The irq_put_target_cpu() function is called to signal that an interrupt has been removed from a given target CPU.

With this change, a network device driver can optionally register a setup closure that will be invoked (after the root filesystem is loaded) with a configuration tuple as argument. This allows using manifest options to pass arbitrary configuration parameters to network device drivers for each attached network interface. The next commit will use this new feature in the virtio-net driver, which will implement multiqueue support with a configurable number of queues. Duplicated code across the different network device drivers has been consolidated in the generic init_network_iface() function.

This change enhances the virtio-net driver to support multiple tx/rx queues. This allows achieving better network performance in multi-CPU instances running workloads that serve multiple network connections simultaneously. By default, the virtio-net driver uses as many queues as supported by the attached device; it is possible to override this behavior by specifying the "io-queues" configuration option in the manifest tuple corresponding to a given network interface. For example, the following snippet of an Ops configuration file instructs the driver to use 2 queues for the first network interface: ``` "en1": { "io-queues": "2" } ``` The number of queues used by the driver is always limited to the number of CPUs in the running instance (this behavior cannot be overridden by the "io-queues" option). In order to optimize parallelism, each tx/rx queue is configured with an interrupt affinity such that different queues are served by different CPUs.

francescolavra added 8 commits April 17, 2024 13:06

cpuinfo: remove frcount structure member

d511814

This field is incremented in the thread_return() Unix function, but is never used.

francescolavra force-pushed the feature/irq_affinity branch from 4f72f32 to 62182be Compare April 17, 2024 11:06

francescolavra merged commit 62182be into master Apr 17, 2024
5 checks passed

francescolavra deleted the feature/irq_affinity branch April 17, 2024 11:22

eyberg mentioned this pull request May 3, 2024

document multi-queue nanovms/ops-documentation#475

Closed

francescolavra mentioned this pull request May 14, 2024

GPU integration with nanos nanovms/ops#1621

Closed

rinor mentioned this pull request May 15, 2024

fix: build on latest nanos nanovms/gpu-nvidia#5

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

virtio-net: add multiqueue support #2011

virtio-net: add multiqueue support #2011

francescolavra commented Mar 22, 2024 •

edited

Loading

virtio-net: add multiqueue support #2011

virtio-net: add multiqueue support #2011

Conversation

francescolavra commented Mar 22, 2024 • edited Loading

francescolavra commented Mar 22, 2024 •

edited

Loading