AST-based program generation #2

Zac-HD · 2019-08-06T09:09:44Z

Grammar-based generation works, and gives us syntactically valid source code.

The next step is to get semantically valid source code! The clear best approach for this is to generate a syntax tree, and "unparse" it into source code. Based on experiments at the PyCon Australia sprints the best AST to use is probably from lib2to3 - and that will give us the unparsing for free via black.

After that, I'd like to go to a concrete syntax tree where we draw formatting information at the same time as the node. This would massively improve our usefulness for black, but it's a lot of extra work.

The text was updated successfully, but these errors were encountered:

Zac-HD · 2019-09-11T07:42:10Z

I have a WIP based on libCST, which does indeed generate source code... though it's also very rough around the edges.

To take this further, I think a better approach would be to use typed_ast to generate a (probably) semantically-valid program, then convert this to a libCST version and unparse that. Done concurrently it should shrink well too!

Development strategy: instead of starting with everything (as above) and then fixing each node, I think a more productive approach would be to start with a minimal subset (e.g. integer arithmetic) and then gradually expand. Keeping everything well-behaved while adding a single node type should be tractable, and that means incremental development works again 😄

Zac-HD · 2020-04-24T05:06:50Z

The simplest possible way to generate CST nodes is to start with from_type(Module), and then register just enough patches that we can actually instantiate each node. This is exactly how Hypothesmith 0.0.8 actually works - do the easy thing first! - and it was good enough to find two way to generate invalid code with LibCST.

However, this approach uses default values for all node parameters which have defaults, and that makes us much less likely to find any issues related to e.g. whitespace and parentheses. Supporting those makes us responsible for balancing parentheses though, which is non-trivial.

I'd also like to support swarm testing over the generated productions. This isn't too hard conceptually, but hypothesis FeatureFlags is not (yet) public API, and an ideal implementation would have some way of avoiding 'dead ends' where we generate something that the feature config doesn't allow us to complete.

Finally, there are many nodes dependent on other nodes - which makes registering strategies for them a little tricky. We therefore make very liberal use of the deferred(lambda: from_type(...)) idiom. This could be reduced a great deal - though not eliminated! - by partitioning node types into topologically ordered groups and using deferred only where there really are cycles in the dependency graph. Maybe after the MVP is out the marginal gains to introspection and performace will seem worth it, but they're not at the moment.

Zac-HD · 2021-05-08T11:15:03Z

This is basically working now - registering more specific strategies for individual node types would be a substantial performance boost, but I'm happy with the current design for now.

This was referenced Apr 19, 2020

Property-based testing for the parser we-like-parsers/cpython#91

Closed

Improved get_type_hints() for classes with forward references HypothesisWorks/hypothesis#2409

Merged

Should invalid code be representable? Instagram/LibCST#287

Closed

Zac-HD mentioned this issue Apr 27, 2020

Better support for typing_extensions module HypothesisWorks/hypothesis#2413

Closed

Zac-HD closed this as completed May 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AST-based program generation #2

AST-based program generation #2

Zac-HD commented Aug 6, 2019

Zac-HD commented Sep 11, 2019

Zac-HD commented Apr 24, 2020

Zac-HD commented May 8, 2021

AST-based program generation #2

AST-based program generation #2

Comments

Zac-HD commented Aug 6, 2019

Zac-HD commented Sep 11, 2019

Zac-HD commented Apr 24, 2020

Zac-HD commented May 8, 2021