Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check50.check(dynamic=True) #230

Closed
wants to merge 23 commits into from
Closed

check50.check(dynamic=True) #230

wants to merge 23 commits into from

Conversation

Jelleas
Copy link
Contributor

@Jelleas Jelleas commented Jul 17, 2020

Adressing #222

Backstory

check50 currently runs checks in parallel through multiprocessing. The idea is simple, once a check passes, all of its dependents are started simultaneously. Take for instance the following set of checks:

@check50.check()
def A():
     pass
@check50.check(A)
def B():
     pass
@check50.check(A)
def C():
     pass
...

Once A passes, both B and C will start running. And once those pass, their dependents start to run too. But perhaps this is easier to grasp visually:

image

Do note that this is a simplified representation of check50s execution order, in practice check B might take much longer than C to run. That could cause F and/or G to start or even finish before B finishes.

Theoretically this model works wonders if checks spend most of their time waiting on IO, or if checks are lengthy and there is a machine with many cores available. In practice, this often isn't the case.

Motivation

In practice not every set of checks benefits all that much from concurrency. But the current design does force multiprocessing on everything. This makes it:

  • Impossible to create checks dynamically. Each check needs to be defined up front.
  • Impossible to pass complex objects in memory between checks. For instance, there is currently no way to load a (students') module once and then pass it around between checks.
  • Complex and hacky in places. To combat some gripes with multiprocessing, check50 uses a pool of processes instead of a process for each check. But now some checks (non deterministically) run in the same process. So check50 reloads the module for every check. This can create some seemingly weird behavior that's hard to explain to new users.

Approach

Make check50 single process again. Sort of. I've iterated on this design quite a bit and landed on the following:

check50.check(dynamic=True)

A dynamic check runs in one process in order of definition. That's it. Jumping back to the example earlier. Let's mark C as dynamic:

@check50.check()
def A():
     pass
@check50.check(A)
def B():
     pass
@check50.check(A, dynamic=True)
def C():
     pass
...

Here's what happens:

image

All dynamic checks run after static (old-style) checks. Each check that depends on a dynamic check or is created (more on this soon) by a dynamic check is automatically marked as dynamic itself. So no need to mark F and G as such.

Dynamic checks have a deterministic execution order, so there's no need to re-import the checks' module in between checks anymore. In fact, if all checks are dynamic, the checks' module is imported exactly once. 🎊

State between dynamic checks is shared in memory. There's no serialization, anything can be returned and received.

Dynamic checks can create other checks. For example, instead of:

https://github.com/cs50/problems/blob/7b83decd08252ce0d9b85fedc2281d0e01fea74b/hello/__init__.py#L14-L25

@check50.check(dynamic=True)
def prints_hello():
    """prints "hello, world\\n" """
    expected = "[Hh]ello, world!?"
    check50.run("./hello").stdout(expected , str_output="hello, world!\n")
    
    @check50.check()
    def prints_newline():
        """a newline \\n is printed after hello world"""
        check50.run("./hello").stdout(expected + "\n" , str_output="hello, world!\n")
...

But far more interesting, through dynamic checks it is easy to wrap an existing testing tool, like Python's unittest or JUnit. Here's a meta unittest for check50, where we now run unittest to check whether check50 can create checks from a unittest module. You'll get it once you see it:

https://github.com/cs50/check50/blob/single_process/tests/checks/dynamic/python_unittest/__init__.py

Implementation

Dynamic checks use a queue for execution order. Once a dynamic check is created with the check50.check decorator, it is placed at the end of the queue. There's no need to add the check to the module scope.

I've dubbed "importing the module to find all checks": check discovery. This now runs as a job in its own separate process, The check's module is no longer imported in the main check50 process. This process is later re-used for dynamic checks (to import the module just once).

The old hack with runner._check_names is replaced with a new hack runner.CHECK_REGISTRY. That in similar spirit keeps track of all defined checks, but now also all side_effects (state / newly created checks) from checks. To wrap your head around this, imagine there's two sides, static and dynamic. The static side has access to static data: check names,descriptions, etc. and static side effects. The dynamic side has access to dynamic data, the actual checks in memory and dynamic side effects.

In case you're worried, it is quite difficult to create a circular dependency through dynamic checks. The check50.check decorator forces the dependency to exist (be a check50 check) before it can be used as a dependency of another check. So there's no chicken and the egg question here, the dependency must exist prior to the dependent. And because dependencies are not mutable (unless you hack your way in) you can't go back and introduce a circular dependency.

Tests

Many: https://github.com/cs50/check50/blob/single_process/tests/dynamic_tests.py

Docs

None yet. Let me know what you think first! :)

@Jelleas Jelleas requested review from cmlsharp and kzidane July 17, 2020 19:13
Copy link
Contributor

@cmlsharp cmlsharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do a more thorough review later. These are just some nitpicks.

One general concern is I think this would arguably make more sense as a module-wide decision rather than a per-check decision. I'd guess that this would also substantially simplify the implementation.

@attr.s(slots=True)
class CheckSideEffects:
state = attr.ib(default=None)
new_checks = attr.ib(default=attr.Factory(list))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just do attr.ib(factory=list)

# - It's dependending on a dynamic check
check._is_dynamic = dynamic \
or internal.check_running \
or (check._check_dependency and check._check_dependency._is_dynamic)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just do or getattr(check._check_dependency, "_is_dynamic", False)



def register_check(self, check):
self.checks.append({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be an attrs class or at least a namedtuple rather than just a dict.

@Jelleas
Copy link
Contributor Author

Jelleas commented Jul 17, 2020

One general concern is I think this would arguably make more sense as a module-wide decision rather than a per-check decision. I'd guess that this would also substantially simplify the implementation.

This was something I tried out too, in fact here's a working implementation in an old commit: 702652e. I scrapped the API call later on, because in most situations marking an exists check or something similar as dynamic, would have the same effect.

It seemed weird to me that you'd need to choose between the static/dynamic, especially if only a couple of checks want to pass something unpickle-able for instance. Even more so because static/dynamic can happily co-exist. You'd be surprised I think, allowing both to exist doesn't add all that much complexity if at all.

@kzidane kzidane closed this Jul 21, 2021
@kzidane kzidane deleted the branch develop July 21, 2021 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants