Skip to content

Autogenerate boilerplate code for expressions from a single tool #3264

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 82 commits into from

Conversation

kripken
Copy link
Member

@kripken kripken commented Oct 19, 2020

This proposes a new way of developing Binaryen: Instead of hand-writing the
boilerplate for expressions - like the logic for walking, comparing, etc. - we
have a single declaration of each expression, and a script that generates
the code.

The goal is to reduce the hand-written boilerplate code, mainly to make it
easier to work on Binaryen - reduce the time to add new instructions, or
change them - and to reduce the risk for bugs in those places. A minor
goal is to also improve the speed of the code.

This PR implements the declaration and emitting in Python. This can't
be done in C++ itself - in theory macros or templates can get close, but
we want a lot more power here than macros provide even in other
languages (AFAIK - but there is lisp...). Fundamentally we want the power to
process the declarations of expressions and to generate something
pretty arbitrary from that, using a full programming language. A possible
future use of this, for example, is to automatically reorder our instructions
in some optimal way, so that it's a single comparison to check if something
is control flow structure or not; or to reorder (reorderable) fields on an
expression; and more complex things are possible.

Example of what this PR does: We declare call once,

class Call(Expression):
    operands = ChildList()
    target = Name()
    isReturn = Bool(init='false')

This reads as: Call is an expression. It has a field "operands" which is
a list of child expressions. It has a field "target" which is a name (the
called function). And it has a field "isReturn" which is a bool that has
an initial value.

The declarations are all ported from the existing header, with initial
values and comments and so forth all copied over, so this should be
NFC (hopefully!)

We then run the tool and it autogenerates the C++ header:

class Call : public SpecificExpression<Expression::CallId> {
public:
  Call(MixedArena& allocator) : operands(allocator) {}
  ExpressionList operands;
  Name target;
  bool isReturn = false;
  void finalize();
};

And also it autogenerates C++ code to compare calls:

case Expression::CallId: {
  auto* castLeft = left->cast<Call>();
  auto* castRight = right->cast<Call>();
  if (castLeft->operands.size() != castRight->operands.size()) {
    return false;
  }
  for (auto* child : castLeft->operands) {
    leftStack.push_back(child);
  }
  for (auto* child : castRight->operands) {
    rightStack.push_back(child);
  }
  if (castLeft->target != castRight->target) {
    return false;
  }
  if (castLeft->isReturn != castRight->isReturn) {
    return false;
  }
  break;
}

That code shows how this can prevent bugs: the tool won't forget a field
or anything like that.

This PR is just a proof of concept for discussion. It emits just the two things
just shown: the main header declaration of the class, and the logic to
compare things. If we like this, the goal will be to automate all the other
boilerplate as well - walking, hashing, even the C and JS APIs.

Reading the Python, it's not perfect: f-strings would be nice (f'{foo}' will
replace {foo} with the local var foo), but you need to escape curly braces,
and we are emitting C++ here... so I avoid f-strings in some places. In
theory JavaScript could be better here, as the formatting there doesn't have
the issue. I don't have a strong feeling myself between the two. Benefits
of python include that it's already used in Binaryen, and we can import and reuse
a little code from scripts/shared, but it's not major.

This PR passes the test suite as well as fuzzing and emscripten tests, so it
looks stable. As mentioned earlier speed is not a major goal here, but still,
the comparison code is faster as it avoids the stack of immediates we had
before (perf stat reports 9% fewer instructions are run, but the wall time
is not much changed, likely because my test was memory bandwidth bound).

This integrates with the existing CMake build system, running the tool
automatically at the start. It's pretty fast, and it only writes out the output if
there are changes, so it doesn't impact build times that I can tell.

Process-wise, the generated C++ is all checked in. It's also pretty
readable (the tool runs clang-format on it for you). So it should be ok in
terms of debugging, for example - no weird stack trace issues like with
macros. You generally don't need to care about how those files were
generated, unless you are adding a new instruction (and then hopefully
it saves you a lot of time!).

Very interested to hear feedback here! Overall, I think I've convinced myself
at least that this is a good idea. While the script code for a single thing is
less readable than the emitted C++ code for it, you do write that script code
once - and not once per instruction. So it's one slightly worse task rather than
many slightly better ones.

One thing I am not happy with in the current code is how Method()s are
declared. Open to suggestions there.

@tlively
Copy link
Member

tlively commented Oct 22, 2020

Oh, @aardappel do you mean still having a separate generation script but having it be written in C++ instead of Python? I'd be fine with that.

@aardappel
Copy link
Contributor

@tlively I suggested that would be an improvement, yes. But still, not having it at all would be even better.

@kripken
Copy link
Member Author

kripken commented Oct 22, 2020

Thinking more on this, I agree we should do as much as possible in C++ itself. The best I can think of is a combination of autogeneration, as in this PR, plus C macros, for later work, to avoid autogeneration except where actually necessary. That is, I agree it's not good to use autogeneration where C macros can solve things reasonably well.

I do not see a way to replace some amount of core autogeneration with either C macros or with C++ templates (at least not without really weird template tricks, that I don't think would be maintainable). It's possible I'm missing something, so I'd be interested to hear concrete ideas!

To summarize how I see things: There are places where you just want to "do X for each class", and in that case, C macros are fine. But autogeneration is better when you need to do something different based on the properties of each field of the class. (Again, I know there are ways to do it with macros or templates - but I can't think of a good way that I'd like to maintain.)

I also want to stress the simplicity of the autogeneration here. This is orders of magnitude less powerful and less complex than tblgen! Maybe that wasn't clear earlier, but I think it makes a difference. I'll give some examples of what I mean. Note that these examples also represent future work: I've explored further in side branches, and have not had to add significant complexity to the system. And I've run out of new things to autogenerate, so at least for all initial goals, I think this will be very simple. (Speculative later goals may require more complexity, but we can assess them later - maybe those are bad ideas.)

Here's what I mean by simplicity, using an example of the output for comparing expressions, here a LocalSet:

case Expression::LocalSetId: {
  auto* castLeft = left->cast<LocalSet>();
  auto* castRight = right->cast<LocalSet>();
  if (castLeft->index != castRight->index) {
    return false;
  }
  leftStack.push_back(castLeft->value);
  rightStack.push_back(castRight->value);
  break;
}

It casts the two expressions being compared, then it compares their index, then it pushes the children to the stack.

This is pretty much the C++ you would write by hand. So compiler errors, debugging, etc. are really great, all in C++. Most of the time that's all you need to care about, as a binaryen developer.

In the rare cases when you need to add or modify an instruction, you need to edit a few lines of Python, something like this:

class LocalSet(Expression):
    # The index of the local.
    index = Index()
    # The value to assign to that local.
    value = Child()

Again, this is really simple! I don't think this is harmed by being in Python.

The even-more-rare case is that you want to write a new code autogeneration rule. This will be very very rare! It's entirely possible that after creating the first batch, very few will ever be written. But if you do, then here's a key fragment of the generator for the above code:

for key, field in fields.items():
    if is_a(field, Child):
        operations.append(f'leftStack.push_back(castLeft->{key});')
        operations.append(f'rightStack.push_back(castRight->{key});')
    elif ...

You iterate the fields, then do a sequence of is the field a X? then emit this string. I'm not sure that C++'s static typing would help with that much: The possible bugs here are in the strings, that is, the errors happen when you try to compile the emitted C++ later. (But maybe I'm missing a cool C++ idea?)

@aardappel
Copy link
Contributor

To me that's not simple, for reasons mentioned above. I'd much rather deal with even moderately complex C++ templates, since they have the huge advantage of being directly part of the program I am working on.

But anyway, this sounds like it is coming down to personal engineering sensibilities.

Have we tried what these crazy templates and macros look like? Would be good to have them for comparison, even if just for a single method, and a few node types.

I could imagine something like:

compare(get(left->index), get(right->index)) && compare(get(left->child), get(right->child))

Where get template deals with how a particular field is retrieved for comparison (for scalars it just returns itself, for child nodes it performs a push_back and just returns true). compare can also do something special or just be replaced by == and rely on overloading. The overal structure of inserting field names and chaining them with && generated by a macro.

Of course, there's variadic templates, but I sense they wouldn't be welcome here :)

@aardappel
Copy link
Contributor

Actually I am realizing that I am more arguing for what I think is "right", than any practical need to not have this codegen step. I am sure I will handle it just fine if I ever need to use it, so feel free to forge ahead.

@kripken
Copy link
Member Author

kripken commented Oct 23, 2020

@tlively recently added SIMDLoadStoreLane. After fixing the merge conflict here, I added it in 4ac589d, which may be a nice example of how easy it is to add a new instruction. Only the python lines there were handwritten, the rest is autogenerated.

@dcodeIO
Copy link
Contributor

dcodeIO commented Oct 26, 2020

Btw, are there plans to generate C and JS API code as well?

Edit: Oh, found it :)

the goal will be to automate all the other
boilerplate as well - walking, hashing, even the C and JS APIs.

Copy link
Member

@aheejin aheejin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for joining the party late.

For ExpressionDefinitionRenderer, I'm not sure writing python code for a new instruction is significantly easier than writing similar code in wasm.h / wasm.cpp, but I also understand you intend to use this data structure to generate other parts of code, such as ExpressionComparisonRenderer.

But I am also a little unsure about approaches like ExpressionComparisonRenderer too, especially about embedding actual code logic in python code. I also wonder what other parts of the codebase you want to rewrite in this way? I understand why you would prefer it over the existing ExpressionAnalyzer.cpp though.

When I heard about code auto generation I was thinking more in line of.... cases like when you want to add a single SIMD instruction, running some script adds bunch of skeleton template with WASM_UNREACHABLE in 20 different places so that it is easier to track which part is missing or something, and not the autogeneration of actual programming logic, but maybe I miunderstood what people talked about.

This is just a random vague idea and I'm not sure if it would work, but can we make something akin to ChildIterator for other fields too, like, FieldIterator<Address> or something? AFAIK C++ does not offer an easy way for reflection, but I guess we can specialize each Expression class for each specialized type of FieldIterator. Then maybe we can compare two expressions in the same manner using ChildIterator and FieldIterator..? (Names need some special treatment, but that wouldn't be too hard..?) This kind of efforts wouldn't be worth it if we only intend to use it for comparison, but just in case you plan to use this for other parts too.

I haven't managed to read all the code in detail, but while reading it I just made some random nitpicky comments, and then I also realized I was supposed talk about the high level stuff 😅 But anyway I'll post them...

def is_subclass_of(x, y):
"""Whether x is a subclass of y."""

return y in getattr(x, '__bases__', {})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same with issubclass?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be, yes. I experimented with a few ideas here and preferred to add an indirection layer over raw python, while doing so. But yes, it seems we can just use python stuff.

def is_instance(x, cls):
"""Whether x is an instance of cls."""

return x.__class__ == cls
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same with isinstance?

// must be exactly identical.
if (iter != rightNames.end()) {
left = iter->second;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this existing code very well.. Can you elaborate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want (block $x) to compare the same as (block $y), that is, to see things as equal if they are isomorphic, ignoring the concrete name choices. So we keep a map of names on the left to names on the right, and then when we see (br $x) vs (br $y), we check if $x means "the same" as $y (which in those blocks it would).

@kripken
Copy link
Member Author

kripken commented Oct 27, 2020

@aheejin

I also wonder what other parts of the codebase you want to rewrite in this way?

I've experimented in side branches with rewriting the following:

  • wasm.h declaration of the class. I agree this by itself doesn't save much work.
  • Comparing and hashing. I agree these could be done with a FieldIterator class. In fact the current ExpressionAnalyzer does have a walker for fields, that shares that logic between both comparing and hashing. But, autogeneration avoids the need to write that out.
  • Copying. This could I guess be also done with a ChildIterator plus a FieldIterator.
  • wasm-traversal.h visiting and walking. Much of this can be done with a C macro. However, part has to be written, as the walking logic is what ChildIterator is implemented using.
  • ReFinalize. This could be also be done with a C macro.

Other things in the possible future:

  • C and JS APIs. I see no real alternative to a script autogenerating these (aside from writing by hand).

Overall I agree more can be done than I thought using C++ templates and C macros. However, some things just can't, like the core walking and the C and JS APIs, so this will still save work.

There is also the performance aspect. I agree this is less important, but it's still a nice benefit of the autogeneration approach. The benefit it gives is that a ChildIterator or FieldIterator uses a vector to store the fields etc., instead of just writing out the actual operations (copy.x = original.x; copy.y = original.y;). (In theory the compiler could optimize the Iterators too, if we do some work to help it, but I'm skeptical.)

It's possible I haven't fully understood your Iterator idea, though, I'm not sure what you meant by specializing there?

@kripken
Copy link
Member Author

kripken commented Oct 28, 2020

There's a pretty big bar for a change like this, and it looks like some of us are (understandably) skeptical, so I opened #3294 with another option for discussion.

@aheejin
Copy link
Member

aheejin commented Oct 29, 2020

  • Comparing and hashing. I agree these could be done with a FieldIterator class. In fact the current ExpressionAnalyzer does have a walker for fields, that shares that logic between both comparing and hashing. But, autogeneration avoids the need to write that out.
  • Copying. This could I guess be also done with a ChildIterator plus a FieldIterator.

Where do we have something akin to FieldIterator class now? But anyway assuming we have one, I'm wondering if these need delegation at all..? Can we possibly treat all expression in the same manner?

  • wasm-traversal.h visiting and walking. Much of this can be done with a C macro. However, part has to be written, as the walking logic is what ChildIterator is implemented using.

I think this logic is not that ugly to be written as is now, and I don't think there are a lot of churn there, given that we only add something there not every time we add a new instruction but only when we add a new instruction category? In theory they can be also written using things like ChildIterator of FieldIterator but those are currently written using routines in wasm-traversal.h, I don't think it's feasible..

  • ReFinalize. This could be also be done with a C macro.

These also we need handwritten routines only for control flow instructions, and I believe others can be treated similarly to #3290.

  • C and JS APIs. I see no real alternative to a script autogenerating these (aside from writing by hand).

Yeah I'm not very sure about this one can be done fully without a script.

Overall I'm not very opinionated and if I'm the only person who's little skeptical about this approach I'm happy to go with what most people like. I was mostly just wondering if there would be a cleaner approach than printing code for actual logic (not repetitive boilerplate) from Python code.

@kripken
Copy link
Member Author

kripken commented Oct 29, 2020

@aheejin

Where do we have something akin to FieldIterator class now?

We have the ImmediateVisitor that is used for hashing and comparing, in trunk now,

Can we possibly treat all expression in the same manner?

I'm not sure what you mean?

Overall my goal is to avoid writing large amounts of boilerplate like we need to now. I agree there are incremental things like #3290, and I think some of your suggestions are along those lines, and make sense. But also, fundamentally I want us to have a single "place" where we define fields, so that we can derive everything from there. That would avoid bugs and save time. And, I want us to do that without becoming slower (the ImmediateVisitor linked above has the downside that e.g. comparing ends up saving to a vector). I think both this PR and #3294 achieve those goals, but definitely if there is another approach that would be good to consider too.

@kripken kripken marked this pull request as ready for review October 29, 2020 17:32
@tlively
Copy link
Member

tlively commented Oct 29, 2020

FWIW, I do still prefer the Python script solution over the pure-macro solution as well. I think it will be more readable overall.

@aheejin
Copy link
Member

aheejin commented Oct 30, 2020

I tried something like this with macros hoping that we can use this as a single source of truth and use macros to generate wasm.h and even C/JS API, but this looks very uglier ;( (I'm NOT suggesting we do this 😂)

wasm.h:

...
#define EXPRESSION(id)                                                         \
  class id : public SpecificExpression<Expression::id##Id> {                   \
  public:
#define FIELD_ALLOC(id, field)                                                 \
  id(MixedArena& allocator) : field(allocator) {}
#define NO_FIELD_ALLOC(id)                                                     \
  id() {}                                                                      \
  id(MixedArena& allocator) : id(allocator) {}
#define FIELD(type, name) type name;
#define METHOD(method) method
#define EXPRESSION_END }
#include "wasm-delegations.h"
#undef EXPRESSION
#undef NEED_ALLOC
#undef FIELD
#undef METHOD
...

wasm-delegations.h:

EXPRESSION(Nop)
NO_FIELD_ALLOC(Nop)
EXPRESSION_END

EXPRESSION(Block)
FIELD_ALLOC(Block, list)
FIELD(Name, name)
FIELD(ExpressionList, list)
METHOD(void finalize(type type_);)
EXPRESSION_END
...

Anyway I think I'm OK with Python code generation, but I'm still wondering if we can get away with not generating the logic part of the code if we use some Immediates-like class, which also I guess can be generated from the same Python structure. I imagine its structure could be something like (I haven't compiled the below, so there may be many errors)

struct ImmediatesBase { // copied from ExpressionAnalyzer.cpp
  SmallVector<Name, 1> scopeNames;
  SmallVector<Name, 1> nonScopeNames;
  SmallVector<int32_t, 3> ints;
  SmallVector<Literal, 1> literals;
  SmallVector<Type, 1> types;
  SmallVector<Index, 1> indexes;
  SmallVector<Address, 2> addresses;
  operator=(const Immediates& other) { ... }
};

template<class T>
struct Immediates  {};

// I think the below can be generated from the same Python data structure
template<> struct Immediates<Nop> : public ImmediateBase {
  Immediates(Nop* curr) {}
};

template<> struct Immedates<Call> : public ImmediateBase {
  Immediates(Call* curr) {
    nonScopeNames.push_back(curr->target);
    ints.push_back(curr-isReturn);
  }
};
...

And then comparing or copying can be handled within a single method in a uniform way within ~30 lines of code... hopefully? This involves SmallVectors which you have concerns with, but I'm not very sure whether, or how much, this will be slower.

I can be mistaken about this approach, and I haven't actually coded and tested this myself yet, so I'm not totally sure if this will work or will be reasonably fast or anything though.

@kripken
Copy link
Member Author

kripken commented Oct 30, 2020

I think a reasonable compromise here might be to not aim for a single source of truth, but to have two: wasm.h which is handwritten, and a univeral delegation thing - either a C macro as in #3294 or a C++ template if we can find one - to implement things with. The benefit of two places is that wasm.h is still readable (I agree with @aheejin that it's hard to find any nice way to autogenerate wasm.h! At least that isn't a separate script), and also it avoids a separate script (with the downsides of depending on python more in the build system, etc. - some downstream user of binaryen will be annoyed by adding it, I'm sure...). And two places isn't that bad.

I do think we shouldn't compromise on performance, though. @aheejin in your ImmediatesBase example, using vectors will add overhead. Even a SmallVector is not as efficient as just directly comparing/copying/etc. without any intermediate memory reads and writes. #3294 avoids that overhead.

Overall I think I am changing my mind in favor of #3294 . It's not as nice as the python, but as mentioned above 2 places as the source of truth seems acceptable, and speed is ok. Overall very few people (maybe only me?) will write the C macros themselves. More commonly people will add or modify an instruction, and it's as easy to add

class NewInst(Expression):
  field = Name()

as it is to add (in addition to wasm.h)

case Expression::NewInstId:
  DELEGATE_FIELD_NAME(NewInst, field);

@aheejin
Copy link
Member

aheejin commented Oct 30, 2020

I'm not very concerned about using Python itself; we are already doing it with gen-s-parser.py and it works great. My concern was mostly about generating not only the boilerplate but logic itself from Python, and this concern applies in the same way to #3294, so it is not really related to Python vs. macro stuff. But maybe it is not really avoidable, if we want to avoid using vectors.

@kripken
Copy link
Member Author

kripken commented Nov 4, 2020

Closing in favor of #3294. That one just uses C macros, which avoids the need for python, but still removes almost all boilerplate. Less risky/controversial.

@kripken kripken closed this Nov 4, 2020
@kripken kripken deleted the gen2 branch November 4, 2020 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants