Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise Scenario #1434

Open
PietroPasotti opened this issue Oct 11, 2024 · 9 comments
Open

Optimise Scenario #1434

PietroPasotti opened this issue Oct 11, 2024 · 9 comments
Assignees
Labels
25.04 testing Related to ops.testing

Comments

@PietroPasotti
Copy link
Contributor

PietroPasotti commented Oct 11, 2024

The unit testing suite for traefik is probably the largest scenario test battery around.
Today I ran it and I realized it took over a minute and a half to complete, so I decided to profile it.

This is the result:
image

to produce the profile, run with this tox env:

[testenv:profile-scenario-tests]
description = Profile scenario unit test battery
deps =
    pytest
    pytest-profiling
    ops[testing]>=2.17
    -r{toxinidir}/requirements.txt
commands =
    pytest -v --tb native {[vars]tst_path}/scenario --profile --profile-svg --log-cli-level=INFO -s {posargs}

There are some obvious candidates for optimization:

  • using an in-memory mock for the sqlite db instead of using the real thing could shave off a good chunk of time spent in pointless IO
  • A ridiculous amount of time is spent in State.__new__. Can we do something about that?
  • how come mocking juju-log takes so long?
  • A single list comprehension in state.py:143 takes 2 seconds of our time: can we lazify some of the code perhaps?

profiling scenario's own test suite yields a very similar picture:
image

@PietroPasotti PietroPasotti changed the title optimize scenario tests optimize scenario Oct 11, 2024
@tonyandrewmeyer
Copy link
Contributor

  • using an in-memory mock for the sqlite db instead of using the real thing could shave off a good chunk of time spent in pointless IO

This indeed seems like an obvious win. Presumably we can just pass in :memory: instead of the unit state filename - trivial to do, and we're not testing sqlite so no downsides.

  • A ridiculous amount of time is spent in State.__new__. Can we do something about that?
  • A single list comprehension in state.py:143 takes 2 seconds of our time: can we lazify some of the code perhaps?

Both of these are the positional/keyword argument processing. I wonder if we can find a tidy way to have newer Python use the built-in support and only fall back to this on old Python.

  • how come mocking juju-log takes so long?

I wonder if this is also the same issue: lots of log lines and each one involves creating a JujuLogLine object.

@benhoyt
Copy link
Collaborator

benhoyt commented Oct 13, 2024

Presumably we can just pass in :memory: instead of the unit state filename - trivial to do, and we're not testing sqlite so no downsides.

Yeah, this is probably the single biggest win. I made this change locally when running tox -e scenario on the traefik-k8s-operator tests. My times are different, but I get about 35s without the change, and 20s with the change, so a nice improvement! FWIW, it looks like Harness already uses :memory:.

Happy for us to spend a few hours of rainy-day time looking at the other bottlenecks too. (Let's just be careful not to get too carried away on the long tail...)

@tonyandrewmeyer tonyandrewmeyer changed the title optimize scenario Optimise Scenario Oct 13, 2024
@benhoyt benhoyt added the testing Related to ops.testing label Oct 16, 2024
@james-garner-canonical
Copy link
Contributor

Alex Batisse mentioned that Scenario tests take much longer and the culprit is YAML?
Maybe they just need the compiled extension (cyaml?)

@Batalex
Copy link
Contributor

Batalex commented Oct 31, 2024

Alex Batisse mentioned that Scenario tests take much longer and the culprit is YAML? Maybe they just need the compiled extension (cyaml?)

I checked, and I do have libyaml bundled with/used by pyyaml. When I profiled my tests, I noticed that the tests were spending a long time serializing yaml in scenario/runtime:Runtime._virtual_charm_root.

Mind you, this is but a quick and dirty experiment, but here is what I tried:

  • I added a new attribute in Context with the serialized meta/config/action charm spec, passed all the way down to the runtime.py method mentioned above where I removed the yaml.safe_dump calls to use the new attribute
  • Since I use a fixture to set up the Context in my tests, I widened the fixture scope from "function" to "module"

This yielded a ~15% speed increase for my test file

@tonyandrewmeyer
Copy link
Contributor

Some initial times (best of 3), using traefik-k8s (branch scenario-7-migraine 😂) -e scenario:

  • 105.373 base (operator and operator/testing main@HEAD)
  • 181.183 :memory: unit state location (I must be doing something wrong here, but I'm not sure what)
  • 103.840 avoid YAML dump of metadata by linking files
  • 103.196 avoid YAML dump of metadata by linking files and cache spec objects to avoid loading YAML
  • 72.423 _max_posargs(n) is object (as a proxy for somehow simplifying or avoiding all that code with recent Python)

@tonyandrewmeyer
Copy link
Contributor

I checked, and I do have libyaml bundled with/used by pyyaml. When I profiled my tests, I noticed that the tests were spending a long time serializing yaml in scenario/runtime:Runtime._virtual_charm_root.

@Batalex what repo (and branch, if appropriate) are you testing against? I'll use it as a second benchmark so that we're checking against multiple types of tests.

@Batalex
Copy link
Contributor

Batalex commented Nov 12, 2024

@Batalex what repo (and branch, if appropriate) are you testing against? I'll use it as a second benchmark so that we're checking against multiple types of tests.

I think I was using the latest release (7.0.5) because I directly edited the files in the virtual env between two sessions during the sprint. I don't have this venv around any more, but the project itself has ops-scenario 7.0.5 in its dependencies lock file.

@tonyandrewmeyer
Copy link
Contributor

I think I was using the latest release (7.0.5)

Hey @Batalex, sorry for being unclear. I meant what test suite are you running (one of the data platform repos I assume), so I can run against that one too with changes to make sure that they do improve things.

@Batalex
Copy link
Contributor

Batalex commented Nov 13, 2024

Ah, sorry, I was quite slow-witted yesterday. I was testing on https://github.com/canonical/kafka-k8s-operator (branch feat/dpe-5591-status, which should be merged into main by today). IIRC, I was specifically running tox -e unit -- -s tests/unit/test_charm.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
25.04 testing Related to ops.testing
Projects
None yet
Development

No branches or pull requests

5 participants