host-metrics: use psutil #3391

oliver-sanders · 2019-09-30T14:03:32Z

At present the cylc get-host-metrics command used to pick one host from a group based on system activity uses Linux only commands to gather metrics which are fragile and not portable (e.g. to Darwin).

The amazing psutil library provides portable abstractions which provide what we need and much more.

Upgrade the cylc get-host-metrics command to allow users to specify "thresholds" as Python expressions using psutil functions e.g:

virtual_memory().available > 123456789
getloadavg()[0] < 5
cpu_count() > 1
disk_usage('/').free > 123

As a start point here is an approach which could be used:

import ast
from io import BytesIO
import pickle
from subprocess import Popen, PIPE
from tokenize import tokenize
import token


THRESHOLD_STRING = '''
    virtual_memory().available > 123456789
    getloadavg()[0] < 5
    cpu_count() > 1
    disk_usage('/').free > 123
'''


class SimpleVisitor(ast.NodeVisitor):
    """Abstract syntax tree node visitor for simple safe operations."""

    def visit(self, node):
        if not isinstance(node, self.whitelist):
            # permit only whitelisted operations
            raise ValueError(type(node))
        return super().visit(node)

    whitelist = (
        ast.Expression,
        # variables
        ast.Name, ast.Load, ast.Attribute, ast.Subscript, ast.Index,
        # opers
        ast.BinOp, ast.operator,
        # types
        ast.Num, ast.Str,
        # comparisons
        ast.Compare, ast.cmpop, ast.List, ast.Tuple
    )


def simple_eval(expr, **variables):
    """Safely evaluates simple python expressions.

    Supports a minimal subset of Python operators:
    * Binary operations
    * Simple comparisons

    Supports a minimal subset of Python data types:
    * Numbers
    * Strings
    * Tuples
    * Lists

    Examples:
        >>> simple_eval('1 + 1')
        2
        >>> simple_eval('1 < a', a=2)
        True
        >>> simple_eval('1 in (1, 2, 3)')
        True
        >>> import psutil
        >>> simple_eval('a.available > 0', a=psutil.virtual_memory())
        True

        If you try to get it to do something it's not supposed to:
        >>> simple_eval('open("foo")')
        Traceback (most recent call last):
        ValueError: open("foo")

    """
    try:
        node = ast.parse(expr.strip(), mode='eval')
        SimpleVisitor().visit(node)
        return eval(
            compile(node, '<string>', 'eval'),
            {'__builtins__': None},
            variables
        )
    except Exception:
        raise ValueError(expr)


def get_thresholds(string):
    """Yield parsed threshold expressions.

    Examples:
        The first ``token.NAME`` encountered is returned as the query:
        >>> get_thresholds('foo() == 123').__next__()
        (('foo',), 'RESULT == 123')

        If multiple are present they will not get parsed:
        >>> get_thresholds('foo() in bar()').__next__()
        (('foo',), 'RESULT in bar()')

        Positional arguments are added to the query tuple:
        >>> get_thresholds('1 in foo("a")').__next__()
        (('foo', 'a'), '1 in RESULT')

    Yields:
        tuple - (query, expression)
        query (tuple):
            The method to call followed by any positional arguments.
        expression (str):
            The expression with the method call replaced by `RESULT`

    """
    for line in string.splitlines():
        # parse the string one line at a time
        # purposfully don't support multi-line expressions
        line = line.strip()
        if not line:
            # skip blank lines
            continue

        query = []
        start = None
        in_args = False

        line_feed = BytesIO(line.encode())
        for item in tokenize(line_feed.readline):
            if item.type == token.ENCODING:
                # encoding tag, not of interest
                pass
            elif not query:
                # the first token.NAME has not yet been encountered
                if item.type == token.NAME and item.string != 'in':
                    # this is the first token.NAME, assume it it the method
                    start = item.start[1]
                    query.append(item.string)
            elif item.string == '(':
                # positional arguments follow this
                in_args = True
            elif item.string == ')':
                # end of positional arguments
                in_args = False
                break
            elif item.string == ',':
                pass
            elif in_args:
                # literal eval each argument
                query.append(ast.literal_eval(item.string))
        end = item.end[1]

        yield (
            tuple(query),
            line[:start] + 'RESULT' + line[end:]
        )


def get_script(keys):
    """Return a Python script for obtaining the requested keys."""
    return '; '.join([
        'import pickle',
        'import psutil',
        'print(pickle.dumps([%s]))' % (
            ', '.join((
                f'getattr(psutil, "{key[0]}"){key[1:]}'
                for key in keys
            ))
        )
    ])


def run(script):
    """Run the pprovided script un-pickling the result."""
    cmd = ['python', '-']
    stdout, stderr = Popen(
        cmd, stdout=PIPE, stdin=PIPE
    ).communicate(script.encode())

    return pickle.loads(ast.literal_eval(stdout.decode()))


def main():
    # get the threshold strings
    string = THRESHOLD_STRING
    thresholds = [x for x in get_thresholds(string)]

    # get a list of metrics we need to obtain from each host
    keys = list({x for x, _ in thresholds})

    # obtain these metrics
    script = get_script(keys)
    results = dict(zip(keys, run(script)))

    # evaluate the thresholds
    return all(
        simple_eval(expression, RESULT=results[key])
        for key, expression in thresholds
    )


print(
    main()
)

Caveats:

~~Calls python with a generated program rather than calling a cylc subcommand which is bad for whitelisting~~ resolved.
~~Uses pickle for serialisation (psutil isn't great for serialisation)~~ uses JSON.
Uses eval though in a restricted way.

Pull requests welcome!

The text was updated successfully, but these errors were encountered:

oliver-sanders added this to the some-day milestone Sep 30, 2019

oliver-sanders mentioned this issue Sep 30, 2019

host-metrics: use MemAvailable field where present #3388

Merged

6 tasks

oliver-sanders mentioned this issue Nov 19, 2019

Platform proposal oliver-sanders/cylc-admin#1

Closed

oliver-sanders modified the milestones: some-day, cylc-8.0.0 Jan 27, 2020

oliver-sanders self-assigned this Jan 27, 2020

oliver-sanders mentioned this issue Jan 28, 2020

Host select #3489

Merged

9 tasks

hjoliver closed this as completed in #3489 Mar 30, 2020

oliver-sanders modified the milestones: cylc-8.0.0, cylc-8.0a2 Jun 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

host-metrics: use psutil #3391

host-metrics: use psutil #3391

oliver-sanders commented Sep 30, 2019 •

edited

Loading

host-metrics: use psutil #3391

host-metrics: use psutil #3391

Comments

oliver-sanders commented Sep 30, 2019 • edited Loading

oliver-sanders commented Sep 30, 2019 •

edited

Loading