Skip to content

The BESS Scheduler

Matt Mukerjee edited this page Jul 20, 2018 · 3 revisions

Modules and Tasks

Let's take a look at simple pipeline from a sample script (bess/bessctl/conf/samples/acl.bess):

localhost:10514 $ show pipeline 
+---------+                    +----------+                    +-----+                    +-------+
| source0 |  :0 200285248 0:   | rewrite0 |  :0 200257248 0:   | fw  |  :0 100117232 0:   | sink0 |
| Source  | -----------------> | Rewrite  | -----------------> | ACL | -----------------> | Sink  |
+---------+                    +----------+                    +-----+                    +-------+

The modules classes above can be divided into two categories, based on their behavior:

  • The Rewrite , ACL and Sink classes are only called by their left neighbor when there are packets to process.
  • The Source module class is called periodically and generates packets on its own.

The Source module class behaves differently because its instances create a task.

The QueueInc class behaves a lot like Source: it registers a task that periodically gets called to read packets from a Port rxq. Remember: BESS ports operate in polling mode, to avoid interrupt overhead.

The PortInc module is very similar to the QueueInc module, except that it may register more tasks (one per each rxq of the Port).

A module class is not forced to choose between receiving packets and generating them: there are classes that mix the two behaviors. Let's take a look at another pipeline (bess/bessctl/conf/samples/queue.bess):

localhost:10514 $ show pipeline 
+--------+                   +----------+                   +-----------+                  +-------------------+                  +-------+
|  src   |                   | rewrite0 |                   |   queue   |                  |    vlan_push0     |                  | sink0 |
| Source |  :0 26043968 0:   | Rewrite  |  :0 26040608 0:   |   Queue   |  :0 2897536 0:   |     VLANPush      |  :0 2898400 0:   | Sink  |
|        | ----------------> |          | ----------------> | 1023/1024 | ---------------> | PCP=0 DEI=0 VID=2 | ---------------> |       |
+--------+                   +----------+                   +-----------+                  +-------------------+                  +-------+

The queue module above (instance of Queue, very different from QueueInc!), receives packets generated from a task registered by src, but also registers its own task. It doesn't immediately forwards the packets received, but it stores them in a ring buffer. The task created by queue will later read packets from the ring buffer and forwards them to its right neighbor.

There are two different tasks in the pipeline. How often does the src task get called? How often does the queue task get called?

The BESS scheduler

BESS implements a fully hierarchical task scheduler that supports different policies. The job of the scheduler is to decide which task needs to be executed next. The tasks in the scheduler are organized in a tree-like data structure, where the leaf nodes are the tasks itself, and the other nodes represents particular policies.

When the user or the author of the BESS script doesn't configure the scheduler, the execution of all the tasks is interleaved in a round robin fashion. The scheduler tree can be examined using the show tc command:

localhost:10514 $ show tc
<worker 0>
  +-- !default_rr_0            round_robin
      +-- !leaf_src:0          leaf
      +-- !leaf_queue:0        leaf

The above command shows that we have only one thread (worker 0) with a very simple tree: there's a root node (called !default_rr_0) with type round_robin and two children (!leaf_src:0) and (!leaf_queue:0), which are the two tasks registered by the src and queue modules. In this case the scheduler behavior is very simple: it will simply alternate the execution of src and queue over and over.

How to configure the scheduler

Add rate limiting

The rate of the execution of a task can be throttled with a 'rate_limit' node in the scheduler tree. We can create the node with a line in the BESS configuration script:

bess.add_tc('fast', policy='rate_limit', resource='packet', limit={'packet': 9000000})

If we inspect the tree now we see:

localhost:10514 $ show tc
<worker 0>
  +-- !default_rr_0            round_robin
      +-- !leaf_src:0          leaf
      +-- !leaf_queue:0        leaf
      +-- fast                 rate_limit          9.000 Mpps

The newly created note doesn't have any effect on the src or queue tasks, because they're still under the round_robin policy. To actually enforce the limit on a task, we have to make it a child of the rate_limit node, using this code:

src.attach_task('fast')

With the above line we tell the module src to attach its task under the 'fast' policy. Now the scheduler tree looks like:

localhost:10514 $ show tc
<worker 0>
  +-- !default_rr_0            round_robin
      +-- !leaf_queue:0        leaf
      +-- fast                 rate_limit          9.000 Mpps
          +-- !leaf_src:0      leaf

Similarly, we can also limit the execution of the task registered by the queue module to a slower rate with these two lines:

bess.add_tc('slow', policy='rate_limit', resource='packet', limit={'packet': 1000000})
queue.attach_task('slow')

The final tree looks like this:

localhost:10514 $ show tc
<worker 0>
  +-- !default_rr_0            round_robin
      +-- fast                 rate_limit          9.000 Mpps
      |   +-- !leaf_src:0      leaf
      +-- slow                 rate_limit          1.000 Mpps
          +-- !leaf_queue:0    leaf

BESS python API references

The following functions are used to interact with the scheduler from a BESS script:

  • bess.add_tc(name, policy, wid=-1, parent='', resource=None, priority=None, share=None, limit=None, max_burst=None)

    Create a new node in the scheduler tree called name of type policy. name must be a unique string that identifies the node (it cannot start with '!'). policy can be one of the following:

    • 'round_robin': Each time this node is visited by the scheduler, a child is picked in a round robin fashion.
    • 'weighted_fair': The children of this nodes are executed in proportion to their share.
    • 'rate_limit': The node can have at most one child. When the child execution exceeds the limits imposed by limit and max_burst the node will be put in a blocked state (it will be unblocked after an appropriate amount of time).
    • 'priority' The node always schedules the child with the highest priority that's not blocked (in the sense described by the rate_limit node.). If a node has no children, or if all its children are temporarily blocked, it is considered blocked itself.
    • 'leaf' Nodes with this policy represent a task. They cannot be created with add_tc, they're added by the modules when a task is registered.

    The wid and parent arguments control where to place the new node in the tree. They're mutually exclusive, i.e. only one of them can have a non default value. If parent is specified, its value must be the name of an existing node in the tree: the newly added node will become one of its children. If wid is specified, the newly added node will become the root of the tree on worker wid; if there's a root already, the roots will be placed under a round robin node named 'default_rr_<wid>'. If also wid is unspecified (i.e. -1), the worker will be chosen in a round robin fashion.

    The weighted_fair or rate_limit policies can (respectively) share among children or limit different types of resources. The resource parameter must be used when creating one of them to choose. It can be:

    • count: The new node will share fairly or limit the number of times a child is scheduled.
    • cycle: The new node will share fairly or limit the number of cycles (as measured by the TSC) a child's execution takes.
    • packet: The new node will schedule the children nodes to try to share fairly or to limit the number of packets generated.
    • bit: The new node will schedule the children nodes to try to share fairly or to limit the number of bits generated.

    The next two parameters are only used when attaching to certain types of parent nodes. The priority parameter control the priority that this node has among the children of a priority parent (a lower number means higher priority). The share parameter controls the relative share of the resource among the children of a weighted_fair parent.

    limit and max_burst are only used when creating rate_limit nodes: they control the rate of the resource used and the excess allowed. They must be in the form of an object with the resource as key (which must be the same as the resource parameter) and an integer as value (e.g. {'packet': 1000}). The node will schedule its children to consume no more than limit units per second.

  • <module>.attach_task(parent='', wid=-1, module_taskid=0, priority=None, share=None)

    bess.attach_task(module_name, parent='', wid=-1, module_taskid=0, priority=None, share=None)

    Moves a task in the scheduler tree. The two forms are equivalent.

    The task to move is the task numbered module_taskid (it is usually 0), registered by a module. The module is identified by the <module> object in the first form, or by module_name in the second form.

    parent and wid behave like in bess.add_tc().

    priority and share behave like in bess.add_tc().

  • update_tc_params(self, name, resource=None, limit=None, max_burst=None)

    Update the parameters of the node named name in the scheduler tree. Only weighted_fair and rate_limit nodes have parameters that can be updated.

    resource, limit and max_burst behave like in bess.add_tc()