When should we run unasync? #149

njsmith · 2019-11-23T03:41:50Z

We're using unasync to generate synchronous code from async code. We have a few choices about when to run it. Which is best? There's some logic behind our current approach, but we haven't really talked it through, and @sethmlarson raised some questions about it, so let's discuss.

Options

Option 1: We could run unasync at commit time, and check the generated code into the repo. (With a CI check to make sure that it stays up to date.)

Upsides: tooling generally expects your Python code to live inside the repo, and optimizes for this case. So e.g. setuptools, pytest, flit, development installs, etc. all automatically work. It's easy to look at the final code since it's present on disk / in PR diffs etc., instead of having to imagine it. You can use the library directly from a checkout without any "build" step. Coverage "just works".

Downsides: since the generated code affects most of the library logic + tests, almost all commits will causes changes in the generated code. That means we'd have to run the generator script on like, every single commit, and every PR diff would have double copies of every change. We get to tell contributors that they forgot to re-run the generator script, over and over.

Option 2: We could run unasync when building wheels or installing. (This is what we do now.)

Upsides: Only actual "source of truth" code gets checked into version control, so no doubled diffs, no need to worry about things getting out of sync, etc.

Downsides: We need a "build" step before we can use the code, which makes testing and packaging more complicated. (E.g., this rules out using flit. Unless we can convince @takluyver to add a hook to run some arbitrary code during flit's build step, but so far he's been resistant to adding such features :-).) Though I guess even with "option 1" we'd still want to make our test harness always re-run the code generator before running tests, b/c anything else is too prone to confusing mistakes, so that's probably pretty similar between these two approaches. Coverage is messy though. (Maybe we could use some hack to re-map coverage of lines in _sync/*.py back to the corresponding file in _async/*.py?)

Option 3: We could run unasync at import time.

Upsides: can't possibly get out of sync; no need for special workflow steps.

Downsides: Probably makes import too slow to be acceptable. Source may not be available at import time (some people distribute apps with only .pycs included, no .pys). And in fact, eval may not even be available at import time (e.g. Dropbox patches it out of their client's bundled interpreter as a hardening measure, or some environments may disable it using PEP 578 audit hooks). If we want to run unasync on tests (and we do want to run unasync on tests), then we'd have to figure out how to make our magic import stuff play nicely with pytest's magic import stuff (where they rewrite asserts).

Are there any other options I'm missing?

Comments

Code generation at import time isn't really viable, except maybe as a convenience for development. In that case it might have some merit – anything that makes test iterations faster is great! – but for that we'd still have to figure out how to get pytest to play along, which sounds painful. (I once went digging in pytest to try to figure out how their import magic works, in hopes of fixing non-src/ layouts to work better, and I completely failed to understand it at all.)

So aside from that special case, I think the choice is really between committing the generated code, versus auto-generating it at build time.

For Trio, we have some generated code, and we actually commit it to the repo, like Option 1. But that's a bit of a different situation: the generated code in Trio is all trivial boilerplate, and rarely changes. So it having to re-generate it manually isn't much of burden on contributors – in fact most contributors never encounter it at all. And in the rare cases where it does need to be regenerated, it's easy to ignore during reviews – because there's nothing interesting in those files anyway, so you can just skip them.

For this project, most of the interesting logic and tests are in files that affect the generated code, so I think we'll need to optimize our regular contributing workflow around dealing with that. That's why we decided to go with build-time generation initially – it has more of an up-front cost in terms of building the infrastructure to make it work at all, but it's not much worse than any project with a build step (e.g. any project that uses C extensions), and the hope is that it'll produce less manual churn during day-to-day development.

I could be wrong about that though :-). If anyone else has thoughts, please share!

The text was updated successfully, but these errors were encountered:

pquentin · 2019-11-23T12:41:14Z

Coverage is messy though. (Maybe we could use some hack to re-map coverage of lines in _sync/.py back to the corresponding file in _async/.py?)

We already have this hack actually:

https://github.com/python-trio/urllib3/blob/9ba79da08d22b604c2a6161309dba0514911dba8/cleancov.py#L1-L12

I should probably clean this up and move it into noxfile.py instead of calling it from there. It was necessary to get sync coverage as only the _sync files were covered but the source only had _async files. Maybe we won't need that when we get coverage for _async file, since the same tests will run, we can assume the same paths will be chosen. The problem with the current setup is that codecov can't show the _sync files as they don't exist on GitHub.

sethmlarson · 2020-05-08T02:34:26Z

I think we're all in agreement that we should run unasync as a part of the build step as is done now. Potentially can close this discussion?

pquentin · 2020-05-08T18:58:17Z

Yeah I think we're good.

pquentin mentioned this issue Jan 27, 2020

Running unasync on test files #198

Closed

pquentin closed this as completed May 8, 2020

pquentin mentioned this issue May 19, 2020

ahip, hip, async tests and coverage #227

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When should we run unasync? #149

When should we run unasync? #149

njsmith commented Nov 23, 2019

pquentin commented Nov 23, 2019

sethmlarson commented May 8, 2020

pquentin commented May 8, 2020