-
Notifications
You must be signed in to change notification settings - Fork 21
Scalability Issues
ConMan will need to support increased scalability for upcoming clusters. Current scalability is limited by the following:
-
Use of
poll()
in themux_io()
event loop (see Reactor pattern). This should be replaced withepoll()
; but since that is Linux-specific, investigate event-notification libraries such as libevent, libuv, libev, and libeio. -
Use of a singly-linked list to manage console objects. While this was acceptable with
select()
and laterpoll()
, it will limit any performance gains from replacingpoll()
. This also negatively impacts performance at startup when checking for duplicate console names during object creation. The objects list should be replaced with a tree-based data structure providing efficient random access. -
Use of a sorted singly-linked list to manage timers resulting in O(n) for insertions and deletions (although dispatch is only O(1)). This should be replaced with a heap data structure which would have O(log n) for insertion, deletion, and dispatch. Another possibility would be hashed timing wheels [Varghese and Lauck 1996] which can be as efficient as O(1) for insertion, deletion, and dispatch.
-
Use of Expect to support SSH connections. Each console connected in this manner requires an additional two processes: one for Expect and another for ssh. While previous testing on MCR showed acceptable performance for ~1280 consoles using Expect to drive a telnet process, increasing process counts are expected to impact scalability (or at the very least, clutter the
ps
listing). Investigate libssh and libssh2. -
Inability to add/remove/edit consoles without restarting conmand (see #13). Starting the daemon causes a burst of network activity as connections are established. This is problematic when managing a large number of consoles since CPU cycles are wasted traversing the console list to process
poll()
events. Furthermore, requiring the daemon to be restarted in order to add/remove/edit consoles will likely result in console messages being dropped while connections are re-established. -
IPv4 only. While the use of IPv4-only connections is not expected to be directly impacting performance, the lack of IPv6 support limits usability by sites requiring IPv6 addressing.
-
Outdated client/server protocol. The current protocol is largely unchanged since its inception. It is not particularly efficient. Queries for a list of matching consoles are limited by a maximum buffer size (currently 128KB; see
MAX_SOCK_LINE
). Furthermore, the protocol is not encrypted and not easily extended. -
Single-threaded event loop.
mux_io()
is a single-threaded non-blocking I/O event loop. As such, it is largely unable to take advantage of increasing core/processor counts. However, consoles using FreeIPMI are able to benefit from increasing parallelism sincelibipmiconsole
manages its own thread pool. Multi-threaded support may be necessary for increased scalability. -
Centralized daemon. Each conmand process is a standalone server managing all client connections. A decentralized or hierarchical model may be necessary for increased scalability.
-
Fixed-sized static console object buffer (currently 16KB; see
OBJ_BUF_SIZE
). As the number of consoles increases, it would be advantageous to use a dynamically-sized buffer to reduce overall memory usage, or at least allow the buffer size to be specified in the configuration file. -
Much of the event processing is based on the underlying object type and controlled by
if
orcase
statements. A small performance gain might be achieved by switching to function pointers for common object operations. This would also make the code more understandable and maintainable.