Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: integration testing battery #3616

Closed
wants to merge 40 commits into from

Conversation

oliver-sanders
Copy link
Member

@oliver-sanders oliver-sanders commented May 19, 2020

tldr

Here is a POC Python test framework which could be used to re-implement our flaky unit tests which run workflows (an alternative option to #3611 which opens more doors).

def test_someting(make_flow, run_flow, run_dir):
    # bare minimum boilerplate
    scheduler = make_flow(
        {
            'scheduling': {
                'dependencies': {
                    'graph': 'foo'
                }
            }
        }
    )
    with run_flow(scheduler):
        # write your tests here
        assert client('ping_suite')
        assert Path(run_dir, reg, '.service', 'contact').exists()

This writeup was probably more work than the code so I'm happy to drop it if not desired. Might cherry pick a few commits onto master if dropped.

The Testing Situation

At present we have two test batteries:

  • Functional tests (tests/) written in bash and run with prove.
  • Unit tests (cylc/flow/tests) written in python and run with pytest.

We are missing a layer.

image

  • Unit tests test the minimal unit of functionality
  • Integration tests test the interaction between modules.
  • Functional tests test the interaction between systems.

(note our functional tests are muddled in with some unit tests and quite a lot of integration tests).

CylcWorkflowTestCase

In the unit tests we have a framework called CylcWorkflowTestCase, a number of tests are already implemented using it:

  • cylc/flow/tests/network/test_client.py
  • cylc/flow/tests/network/test_publisher.py
  • cylc/flow/tests/network/test_resolvers.py
  • cylc/flow/tests/network/test_server.py
  • cylc/flow/tests/network/test_subscriber.py
  • cylc/flow/tests/test_data_store_mgr.py
  • cylc/flow/tests/test_job_pool.py

These aren't unit tests, they are really more like integration tests. They have been implemented as unittests because it is more convenient to test Python from Python.

Unfortunately running Cylc from Python is not really possible at the moment, consequently CylcWorkflowTestCase works by mocking something that looks like a scheduler and mounting the necessary parts onto it. This makes the tests very complex to write and fragile to changes to the scheduler.

Here's an excerpt from cylc/flow/tests/network/test_client.py so you can see what I'm going on about:

# the test has to re-implment scheduler functionality:
self.task_pool.release_runahead_tasks() 
self.scheduler.data_store_mgr.initiate_data_model() 
self.workflow_id = self.scheduler.data_store_mgr.workflow_id 
# and put in-place system-level functionality
create_auth_files(self.suite_name)  # auth keys are required for comms 
# and handle fine implementation detail which could change at any time
barrier = Barrier(2, timeout=20) 
self.server = SuiteRuntimeServer( 
    self.scheduler, 
    context=SERVER_CONTEXT, 
    threaded=True, 
    barrier=barrier, 
    daemon=True) 

There are 100 lines of boilerplate but all the test actually wants to do is to call a ZMQ endpoint.

These tests would be a lot simpler if written in bash (see #3611) but there is more to it than that...

Testing And Python

I think there is a strong argument for running cylc tests in Python, a few quick examples:

  • Powerful testing framework.
  • Much faster.
  • Mocking.
  • Ability to inspect the internals of Cylc.

Unfortunately CylcWorkflowTestCase does not yet have the benefit of the same evolution as the functional test harness, for example it uses the user/site configuration rather than the test configuration and doesn't tidy up afterwards.

A New Framework

By making some small changes to the Scheduler, namely moving command-line logic back into the command line we can make it possible to run suites by invoking the Scheduler directly.

This makes it possible to write a Python test harness for workflows without re-inventing the wheel.

This PR shows a proof of concept of how that can be done.

A New Battery?

We could just bung this code into the existing unit tests, however, I'm keen to keep integration tests (ones that run suites) out of the unit tests:

  • Keep the unit tests, simple, fast and clean.
    • Often in development we break cylc run, but we might still want to run the unit tests.
  • Encourage unit tests over integration tests.
    • Resist the urge to run workflows where not necessary.
  • Create a hierarchy of tests
    • It's a lot easier to fix simple tests than complex ones, the mix of testing in tests/ makes it hard to flush out the simple errors.
    • Devs should run the simplest tests first, fix the issues then move onto the more complex tests.
    • unit tests < integration tests < functional tests.

This PR puts the integration test framework in its own directory, this forces us to maintain a distinction between unit and functional tests. (And yes the integration framework is unit tested but "A Foolish Consistency is the Hobgoblin of Little Minds".)

This POC

  • Registers all tests under a common hierarchy (like the functional tests).
  • Deletes suites if the tests pass.
  • Leaves behind no files if all the tests pass.
  • Handles startup and shutdown.
  • Is implemented in about 100 lines of code (see the fixtures at the bottom of itests/__init__)

Going Forward

This PR isn't ready to roll, there are a few major issues:

  • The suite log is blank (because the tests share a logger with the flows).
  • The tests aren't parallel safe YET because the Scheduler cannot be invoked twice in the same session YET.
  • Etc.

And some minor features yet to be implemented:

  • Use the test configuration.
  • Potentially support no_detach=False
  • etc.

Oh, and by the way, the tests in this PR take ~2s each, which includes suite installation, execution and shutdown so this could be quicker to run than the bash tests :)

Questions

@cylc/core

  1. Different / better ideas?
  2. Are my arguments for a third test battery compelling enough?
  3. Should we co-locate tests e.g:
    • cylc
      • tests
        • unit
        • integration
        • functional
        • u -> unit
        • i -> integration
        • f -> functoinal
          Makes it easy to strip them from distribution.
          Makes more sense if we manage to get the functional tests running under pytest too.
  4. Are these tests close enough to integration tests for this approach to make sense.
  5. Do the guidelines I've dropped in (see README.md files) make sense.

Requirements check-list

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Appropriate tests are included (unit and/or functional).
  • Already covered by existing tests.
  • Does not need tests (why?).
  • Appropriate change log entry included.
  • No change log entry required (why? e.g. invisible to users).
  • (master branch) I have opened a documentation PR at cylc/cylc-doc/pull/XXXX.
  • (7.8.x branch) I have updated the documentation in this PR branch.
  • No documentation update required.

@oliver-sanders oliver-sanders added POC Proof of Concept question Flag this as a question for the next Cylc project meeting. labels May 19, 2020
@oliver-sanders oliver-sanders added this to the cylc-8.0.0 milestone May 19, 2020
@oliver-sanders oliver-sanders self-assigned this May 19, 2020
@oliver-sanders oliver-sanders modified the milestones: cylc-8.0.0, some-day May 19, 2020
@hjoliver hjoliver mentioned this pull request May 20, 2020
6 tasks
@hjoliver
Copy link
Member

After a quick skim through (all I have time for today 😬 ) - sounds brilliant 👍

@wxtim
Copy link
Member

wxtim commented May 20, 2020

It seems sane as a concept and the examples look reasonable. I'm not sure I could dive in and use it to write a test of the scheduler straight out, although it'd be fun to try. Would you like me to give that a go?

@oliver-sanders
Copy link
Member Author

Not quite ready for general use yet, the interface may well change, need to get logging working, etc.

@oliver-sanders
Copy link
Member Author

oliver-sanders commented May 22, 2020

I've managed to get to the point where cylc.flow.scheduler.Scheduler can be imported and run in the same process (meaning you can run multiple schedulers in a single event loop). Syntax simplified and direct access to the scheduler object (in the same process) granted.

It seems robust and I'm confident it can handle the existing use cases. Working my way through the existing unit tests and translating them across.

For me a simple test takes less than 2 seconds, using pytest scoping for sharing objects between test functions that's approx 2 seconds for a small battery of tests.

Copy link
Member Author

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this is now a working system, not quite ready for review but good enough for discussion. I've converted two of the unittests over as a POC.

Here are some examples of the sort of thing that is now possible.

import logging
from pathlib import Path
  
import pytest
  
  
# no more gripping the log file, log tuples can be obtained returned by run_flow()
  
@pytest.mark.asyncio
async def test_cylc_version(flow, run_flow, simple_conf):
    """Ensure the flow logs the cylc version 8.0a1."""
    scheduler = flow(simple_conf)
    async with run_flow(scheduler) as log:
        assert (
            ('cylc', logging.INFO, 'Cylc version: 8.0a1')
            in log.record_tuples
        )   
  
  
# command line options can be provided to flow() using their "dest" names
  
@pytest.mark.asyncio
async def test_hold_start(flow, run_flow, simple_conf):
    """Ensure the flow starts in held mode when run with hold_start=True."""
    scheduler = flow(simple_conf, hold_start=True)
    async with run_flow(scheduler):
        assert scheduler.paused()
  
  
# when the flow stops the scheduler object is still there for us to poke
  
@pytest.mark.asyncio
async def test_shutdown(flow, run_flow, simple_conf):
    """Ensure the server shutsdown with the flow."""
    scheduler = flow(simple_conf)
    async with run_flow(scheduler):
        await scheduler.shutdown('because i said so')
        assert scheduler.server.socket.closed
  
  
# you don't have to run suites, infact we should avoid it when possible
  
@pytest.mark.asyncio
async def test_install(flow, run_flow, simple_conf, run_dir):
    """Ensure the flow starts in held mode when run with hold_start=True."""
    scheduler = flow(simple_conf)
    jobscript = Path(run_dir, scheduler.suite, '.service', 'etc', 'job.sh')
    assert jobscript.exists() 

And the best bit:

[gw0] [ 25%] PASSED itests/test_foo.py::test_cylc_version 
itests/test_foo.py::test_hold_start 
[gw0] [ 50%] PASSED itests/test_foo.py::test_hold_start 
itests/test_foo.py::test_shutdown 
[gw0] [ 75%] PASSED itests/test_foo.py::test_shutdown 
itests/test_foo.py::test_install 
[gw0] [100%] PASSED itests/test_foo.py::test_install

success = True
contact = (run_dir / scheduler.suite / '.service' / 'contact')
try:
asyncio.get_event_loop().create_task(scheduler.start())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can now import cylc.flow.scheduler.Scheduler, initiate and run it from Python, all you need to do is to call Scheduler.start from an event loop. You can start multiple schedulers in the same event loop.

success = False
raise exc from None # raise the exception so the test fails
finally:
await scheduler.shutdown(SchedulerStop(StopMode.AUTO.value))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All schedulers start in a single process which makes tidying up afterwards a lot easier, just call the shutdown method.



@pytest.fixture
def run_flow(run_dir):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The basic fixtures are:

  • make_flow - writes the suite.rc to disk.
  • make_scheduler - initiates the Scheduler.
  • flow - shorthand for make_scheduler(make_flow) because I'm lazy.
  • run_flow - calls Scheduler.start.

There are mod_* and ses_* variants of each of these to allow you to create Pytest fixtures with module or session level scoping. This allows us to run a workflow once and use it in multiple tests for efficiency reasons.

This is parallel safe!

Pytest has been configured to run tests from the same module together, so all the tests in a module can share the same scheduler, no problem.

caveat the exception is you can't run workflows in the session scope, but that would be crazy anyway.

Comment on lines 8 to 23
@pytest.mark.asyncio
async def test_ping(flow_a_w_client):
"""It should return True if running."""
scheduler, client = flow_a_w_client
assert await client.async_request('ping_suite')
assert not client.socket.closed
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a cylc ping example, the bash equivalent would be:

set_tests 2
mkdir "$HOME/cylc-run/$REG" -p
cat > "$HOME/cylc-run/$REG" <<<__SUITERC__
[scheduling
    [[dependencies]]
        graph = foo
__SUITERC__
run_ok cylc run "$REG"
_poll_suite_started
run_ok cylc ping "$REG"
cylc stop "$REG"
_poll_suite_stopped
purge_suite
exit

@oliver-sanders
Copy link
Member Author

Other thoughts:

  • Raising nasty exceptions whilst the suite is running.
  • Testing the main loop plugins independently.
  • Compatible with pytest.monkeypatch for hack-the-codebase tests (e.g. simulate FS lag).
  • Triggering tests run in simulation mode (would be good for testing alternate branding in SOD).
  • Other ideas?

@oliver-sanders oliver-sanders modified the milestones: some-day, cylc-8.0a3 May 22, 2020
@hjoliver
Copy link
Member

Other ideas?

I'm already overwhelmed by the existing great ideas here 😁

* move cli stuff into the cli
* move functional stuff out of the cli
* add an interface for creating scheduler options objects
* tidy the --format argument
* move daemonise logic to scheudler_cli
* move event loop logic to scheduler_cli
* move logging into scheduler_cli
* move start message to scheduler_cli
* store id as top-level attr
@oliver-sanders oliver-sanders force-pushed the scheduler-options branch 3 times, most recently from 955724a to 6a0be64 Compare June 12, 2020 13:45
@oliver-sanders oliver-sanders removed this from the cylc-8.0a3 milestone Jun 12, 2020
@oliver-sanders oliver-sanders removed the question Flag this as a question for the next Cylc project meeting. label Jun 12, 2020
@oliver-sanders
Copy link
Member Author

No one screamed loud enough so I'm going to close this proposal and re-raise as a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
POC Proof of Concept
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants