Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to support executing EPP configurations #56

Merged
merged 11 commits into from
Oct 3, 2024
Merged

Conversation

cpelley
Copy link
Collaborator

@cpelley cpelley commented Sep 5, 2024

  • Fixed dagrunner logger (sometimes there was duplicate entries).
  • Remote data polling plugin separating glob patterns and grouping non-glob patterns which fixes an issue where 1 file found would obscure another missing file.
  • Rationalised the plugin executer call args UI (users no longer need to configure empty dictionaries).

More detailed explanation of changes

  1. Running of scheduler now managed outside dagrunner execution (through dagrunner-logger script).

    • More flexible to manage the scheduler outside the execution of the configuration by dagrunner.
    • I don't think the ThreadingTCPServer socket server was ever intended to be run within its own thread of the same python instance that the client attaches to the socket. The ThreadingTCPServer creates a new thread for each client connection.
    • Minimal downside to managing the server independently where we launch the logging server and kill it after we are done. Basic illustration to how this can be done:
      dagrunner-logger --sqlite-filepath ... &
      pid1=$!
      dagrunner-execute-graph graph_config.graphs.gen_networkx ...
      kill $pid1
      
  2. Overhaul remote data polling plugin (final time hopefully).

    • Bugfix: Ensuring that globular pattern matching is differentiated with matches against non-globular matches. Not differentiating between the two could cause the data polling plugin to return True when not ALL files were actually found.
    • Grouping filepath polling into groups: by host and glob/non-glob. This ensures that:
      1. Minimise ssh calls (ssh calls are comparatively slow).
      2. Ensuring that any glob calls are independently verified as expanding to a result. Where globular patterns were included in the same call as non-globular patterns, we wouldn't know whether each pattern was successfully expanding based on filepaths found (hence the bug mentioned above).
  3. Refactored logic around plugin_executor handling of the call parameter. This makes it a lot clearer how keyword arguments are passed along to the underlying 'plugin'. This change also means that an empty dictionary being defined to represent keyword arguments is no longer necessary (i.e. a technical implementation detail imposed on end users).
    Illustration of what this means (where 'kwargs' represents a dictionary of keyword arguments to pass):

    For 'plugin' callables (functions etc.):

    no kwargs:

    call={callable_obj}
    res = callable_obj()

    kwargs passed to callable_obj:

    call={callable_obj, kwargs}
    res = callable_obj(**kwargs)

    For callables, we check that we are given only 0 or 1 item under 'call' in addition to the callable_obj ref.


    For plugin classes:

    no kwargs:

    call={callable_obj}
    res = callable_obj()()

    kwargs passed for initialisation of callable_obj:

    call={callable_obj, kwargs}
    res = callable_obj(**kwargs)()

    kwargs passed to plugin object callable:

    call={callable_obj, {}, kwargs}
    res = callable_obj()(**kwargs)

    kwargs passed for initialisation and kwargs2 passed for object callable:

    call={callable_obj, kwargs, kwargs2}
    res = callable_obj(**kwargs2)(**kwargs2)

    For classes, we check that we are given only 0, 1 or 2 items under 'call' in addition to the callable_obj ref.

Issues

@cpelley cpelley self-assigned this Sep 5, 2024
@cpelley cpelley changed the title independent logging server Changes to support executing EPP configurations Sep 24, 2024
@cpelley cpelley marked this pull request as ready for review September 27, 2024 11:34
@cpelley cpelley force-pushed the indep_logging_server branch 2 times, most recently from 993478d to 844d664 Compare September 27, 2024 11:58
@MetOffice MetOffice deleted a comment from github-actions bot Sep 27, 2024
@MetOffice MetOffice deleted a comment from github-actions bot Sep 27, 2024
@MetOffice MetOffice deleted a comment from github-actions bot Sep 27, 2024
@MetOffice MetOffice deleted a comment from github-actions bot Sep 27, 2024
@cpelley cpelley merged commit 0c52328 into main Oct 3, 2024
1 check passed
@cpelley cpelley deleted the indep_logging_server branch October 3, 2024 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants