Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache get_platform #5182

Closed
wants to merge 2 commits into from
Closed

Conversation

oliver-sanders
Copy link
Member

Spotted that the get_platform_from_name function was using a lot of CPU.

Turns out that it was called 7662 times for this sample workflow:
[task parameters]
    a = 1..5
    b = 1..5
    c = 1..5

[scheduling]
    initial cycle point = 2000
    final cycle point = 20000110T00Z
    runahead limit = P5
    [[queues]]
        [[[default]]]
            limit = 25
    [[graph]]
        P1D = """
            <a> => <b> => <c>
            <b>[-P1D] => <b>
        """

[runtime]
    [[<a>, <b>, <c>]]
        script = true

Embarrassingly this workflow only uses the localhost platform.

Added lru_cache to the function to reduce the impact:

Before:
Screenshot from 2022-10-07 15-19-27

After:
Screenshot from 2022-10-07 15-19-38

Caching does have an overhead, but inspecting the call tree it would appear this saved ~3 seconds of a ~35s run, which is ~10% of CPU!

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg and conda-environment.yml.
  • Tests are included (or explain why tests are not needed).
  • CHANGES.md entry included if this is a change that can affect users
  • Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
  • If this is a bug fix, PRs raised to both master and the relevant maintenance branch.

@oliver-sanders oliver-sanders added small efficiency For notable efficiency improvements labels Oct 7, 2022
@oliver-sanders oliver-sanders added this to the cylc-8.1.0 milestone Oct 7, 2022
@oliver-sanders oliver-sanders self-assigned this Oct 7, 2022
@oliver-sanders oliver-sanders marked this pull request as draft October 7, 2022 15:02
@oliver-sanders
Copy link
Member Author

oliver-sanders commented Oct 7, 2022

Dammit the bad_hosts make a mess of this. Will need to split the function into the cacheable and non-cacheable parts.

@oliver-sanders oliver-sanders modified the milestones: cylc-8.1.0, cylc-8.2.0 Oct 18, 2022
@oliver-sanders oliver-sanders linked an issue Nov 28, 2022 that may be closed by this pull request
@oliver-sanders
Copy link
Member Author

Closing for now pending two changes:

  1. A reduction in the number of get_platform calls (done).
  2. A refactor of the get_platform logic to separate the cachable and non-cachable parts.

@oliver-sanders oliver-sanders removed this from the cylc-8.2.0 milestone Mar 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
efficiency For notable efficiency improvements small
Projects
None yet
Development

Successfully merging this pull request may close these issues.

platforms: cache results or reduce overheads
1 participant