Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

virtio-net: add multiqueue support #2011

Merged
merged 8 commits into from
Apr 17, 2024
Merged

Conversation

francescolavra
Copy link
Member

@francescolavra francescolavra commented Mar 22, 2024

This changeset enhances the virtio-net driver to support multiple tx/rx queues. This allows achieving better network performance in multi-CPU instances running workloads that serve multiple network connections simultaneously.
By default, the virtio-net driver uses as many queues as supported by the attached device; it is possible to override this behavior by specifying the "io-queues" configuration option in the manifest tuple corresponding to a given network interface. For example,
the following snippet of an Ops configuration file instructs the driver to use 2 queues for the first network interface:

"ManifestPassthrough": {
  "en1": {
    "io-queues": "2"
  }
}

The number of queues used by the driver is always limited to the number of CPUs in the running instance (this behavior cannot be overridden by the "io-queues" option).
In order to optimize parallelism, each tx/rx queue is configured with an interrupt affinity such that different queues are served
by different CPUs.

This field is incremented in the thread_return() Unix function, but
is never used.
The current code is initializing the PLIC threshold for all APs
from the start_secondary_cores() function, executed by the boot
processor. However, PLIC configuration for a given CPU can be
changed by the boot ROM executed by the CPU at startup, and it has
been observed that the OpenSBI firmware changes the PLIC threshold
value, thereby overwriting the configuration set by the boot
processor.
This change fixes the above issue by moving the configuration of
the PLIC threshold for a given CPU to the ap_start() function,
executed by the CPU itself.
In the RISC-V SMP initialization code, the device tree is parsed in
order to retrieve the number of present CPUs. Commit
364b517 introduced a regression by
which a cpu device tree entry, whose name is null-terminated in the
device tree, can no longer be recognized, thereby preventing the
kernel from enumerating the application processors.
This change fixes the above regression by using the
sstring_from_cstring() function when parsing a device tree entry
name.
This function claims all pending interrupts without invoking their
handler, therefore if called when enabling a new interrupt it may
cause missing other (previously enabled) interrupts. This issue has
been observed to cause missing interrupts from the virtio-blk
device when initializing the virtio-net device, which stalls the
kernel during startup.
This change fixes the above issue by removing the usage of
plic_clear_pending_int(), which should not cause other problems
because interrupt handlers are supposed to be able to handle any
spurious interrupts that may be triggered.
With this change, the stage3 startup() function is always invoked
after all attached peripherals have been probed, even if the root
filesystem is loaded before PCI bus discovery is complete (which
can happen if disk device interrupts are handled on a CPU other
than the boot CPU).
The next commit will allow device interrupts to be assigned to an
arbitrary CPU, and this commit prevents errors such as "NET: no
network interface found" from occurring if network device probing
is not complete before the root filesystem is loaded.
In the current code, all device interrupts are routed to either
the boot CPU (on x86 and ARM), or to any CPU (on RISC-V).
This new feature allows interrupts to be configured to target a
specific CPU, which allows optimizing interrupt allocation to
spread a workload over multiple CPUS.
The arch-specific msi_format() and dev_irq_enable() functions have
been changed to take an additional `target_cpu` parameter, while
msi_get_vector() has been replaced by msi_get_config() which allows
retrieving both the interrupt number and the target CPU associated
to a given PCI MSI. Other changes in arch-specific interrupt code
have been made in order to support targeting an arbitrary interrupt
to an arbitrary CPU.
Various functions in the PCI and virtIO code have been amended to
take an additional `cpu_affinity` range parameter, which is used as
a hint to select a target CPU when a given interrupt vector is
enabled in the interrupt controller. The irq_get_target_cpu()
utility function takes a CPU affinity range as parameter and
returns an optimal target CPU, selected from the supplied range
based on which CPUs have the smallest number of device interrupts
targeted at them, so as to spread the interrupt handling work over
different CPUs. The irq_put_target_cpu() function is called to
signal that an interrupt has been removed from a given target CPU.
With this change, a network device driver can optionally register a
setup closure that will be invoked (after the root filesystem is
loaded) with a configuration tuple as argument. This allows using
manifest options to pass arbitrary configuration parameters to
network device drivers for each attached network interface.
The next commit will use this new feature in the virtio-net driver,
which will implement multiqueue support with a configurable number
of queues.
Duplicated code across the different network device drivers has
been consolidated in the generic init_network_iface() function.
This change enhances the virtio-net driver to support multiple
tx/rx queues. This allows achieving better network performance in
multi-CPU instances running workloads that serve multiple network
connections simultaneously.
By default, the virtio-net driver uses as many queues as supported
by the attached device; it is possible to override this behavior by
specifying the "io-queues" configuration option in the manifest
tuple corresponding to a given network interface. For example,
the following snippet of an Ops configuration file instructs the
driver to use 2 queues for the first network interface:
```
"en1": {
  "io-queues": "2"
}
```
The number of queues used by the driver is always limited to the
number of CPUs in the running instance (this behavior cannot be
overridden by the "io-queues" option).
In order to optimize parallelism, each tx/rx queue is configured
with an interrupt affinity such that different queues are served
by different CPUs.
@francescolavra francescolavra merged commit 62182be into master Apr 17, 2024
5 checks passed
@francescolavra francescolavra deleted the feature/irq_affinity branch April 17, 2024 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant