Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

host-metrics: use psutil #3391

Closed
oliver-sanders opened this issue Sep 30, 2019 · 0 comments · Fixed by #3489
Closed

host-metrics: use psutil #3391

oliver-sanders opened this issue Sep 30, 2019 · 0 comments · Fixed by #3489
Assignees
Milestone

Comments

@oliver-sanders
Copy link
Member

oliver-sanders commented Sep 30, 2019

At present the cylc get-host-metrics command used to pick one host from a group based on system activity uses Linux only commands to gather metrics which are fragile and not portable (e.g. to Darwin).

The amazing psutil library provides portable abstractions which provide what we need and much more.

Upgrade the cylc get-host-metrics command to allow users to specify "thresholds" as Python expressions using psutil functions e.g:

virtual_memory().available > 123456789
getloadavg()[0] < 5
cpu_count() > 1
disk_usage('/').free > 123

As a start point here is an approach which could be used:

import ast
from io import BytesIO
import pickle
from subprocess import Popen, PIPE
from tokenize import tokenize
import token


THRESHOLD_STRING = '''
    virtual_memory().available > 123456789
    getloadavg()[0] < 5
    cpu_count() > 1
    disk_usage('/').free > 123
'''


class SimpleVisitor(ast.NodeVisitor):
    """Abstract syntax tree node visitor for simple safe operations."""

    def visit(self, node):
        if not isinstance(node, self.whitelist):
            # permit only whitelisted operations
            raise ValueError(type(node))
        return super().visit(node)

    whitelist = (
        ast.Expression,
        # variables
        ast.Name, ast.Load, ast.Attribute, ast.Subscript, ast.Index,
        # opers
        ast.BinOp, ast.operator,
        # types
        ast.Num, ast.Str,
        # comparisons
        ast.Compare, ast.cmpop, ast.List, ast.Tuple
    )


def simple_eval(expr, **variables):
    """Safely evaluates simple python expressions.

    Supports a minimal subset of Python operators:
    * Binary operations
    * Simple comparisons

    Supports a minimal subset of Python data types:
    * Numbers
    * Strings
    * Tuples
    * Lists

    Examples:
        >>> simple_eval('1 + 1')
        2
        >>> simple_eval('1 < a', a=2)
        True
        >>> simple_eval('1 in (1, 2, 3)')
        True
        >>> import psutil
        >>> simple_eval('a.available > 0', a=psutil.virtual_memory())
        True

        If you try to get it to do something it's not supposed to:
        >>> simple_eval('open("foo")')
        Traceback (most recent call last):
        ValueError: open("foo")

    """
    try:
        node = ast.parse(expr.strip(), mode='eval')
        SimpleVisitor().visit(node)
        return eval(
            compile(node, '<string>', 'eval'),
            {'__builtins__': None},
            variables
        )
    except Exception:
        raise ValueError(expr)


def get_thresholds(string):
    """Yield parsed threshold expressions.

    Examples:
        The first ``token.NAME`` encountered is returned as the query:
        >>> get_thresholds('foo() == 123').__next__()
        (('foo',), 'RESULT == 123')

        If multiple are present they will not get parsed:
        >>> get_thresholds('foo() in bar()').__next__()
        (('foo',), 'RESULT in bar()')

        Positional arguments are added to the query tuple:
        >>> get_thresholds('1 in foo("a")').__next__()
        (('foo', 'a'), '1 in RESULT')

    Yields:
        tuple - (query, expression)
        query (tuple):
            The method to call followed by any positional arguments.
        expression (str):
            The expression with the method call replaced by `RESULT`

    """
    for line in string.splitlines():
        # parse the string one line at a time
        # purposfully don't support multi-line expressions
        line = line.strip()
        if not line:
            # skip blank lines
            continue

        query = []
        start = None
        in_args = False

        line_feed = BytesIO(line.encode())
        for item in tokenize(line_feed.readline):
            if item.type == token.ENCODING:
                # encoding tag, not of interest
                pass
            elif not query:
                # the first token.NAME has not yet been encountered
                if item.type == token.NAME and item.string != 'in':
                    # this is the first token.NAME, assume it it the method
                    start = item.start[1]
                    query.append(item.string)
            elif item.string == '(':
                # positional arguments follow this
                in_args = True
            elif item.string == ')':
                # end of positional arguments
                in_args = False
                break
            elif item.string == ',':
                pass
            elif in_args:
                # literal eval each argument
                query.append(ast.literal_eval(item.string))
        end = item.end[1]

        yield (
            tuple(query),
            line[:start] + 'RESULT' + line[end:]
        )


def get_script(keys):
    """Return a Python script for obtaining the requested keys."""
    return '; '.join([
        'import pickle',
        'import psutil',
        'print(pickle.dumps([%s]))' % (
            ', '.join((
                f'getattr(psutil, "{key[0]}"){key[1:]}'
                for key in keys
            ))
        )
    ])


def run(script):
    """Run the pprovided script un-pickling the result."""
    cmd = ['python', '-']
    stdout, stderr = Popen(
        cmd, stdout=PIPE, stdin=PIPE
    ).communicate(script.encode())

    return pickle.loads(ast.literal_eval(stdout.decode()))


def main():
    # get the threshold strings
    string = THRESHOLD_STRING
    thresholds = [x for x in get_thresholds(string)]

    # get a list of metrics we need to obtain from each host
    keys = list({x for x, _ in thresholds})

    # obtain these metrics
    script = get_script(keys)
    results = dict(zip(keys, run(script)))

    # evaluate the thresholds
    return all(
        simple_eval(expression, RESULT=results[key])
        for key, expression in thresholds
    )


print(
    main()
)

Caveats:

  • Calls python with a generated program rather than calling a cylc subcommand which is bad for whitelisting resolved.
  • Uses pickle for serialisation (psutil isn't great for serialisation) uses JSON.
  • Uses eval though in a restricted way.

Pull requests welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant