Skip to content

Commit

Permalink
First checkin of partial meta function support, with interface meta…
Browse files Browse the repository at this point in the history
… type function

This commit includes "just enough" to make this first meta function work, which can be used like this...

```
Human: @interface type = {
    speak: (this);
}
```

... where the implementation of `interface` is just about line-for-line from my paper P0707, and now (just barely!) compiles and runs in cppfront (and I did test the `.require` failure cases and it's quite lovely to see them merge with the compiler's own built-in diagnostics):

```
//-----------------------------------------------------------------------
//  interface: an abstract base class having only pure virtual functions
auto interface( meta::type_declaration&  t ) -> void {
    bool has_dtor = false;
    for (auto m : t.get_members()) {
        m.require( !m.is_object(),
                   "interfaces may not contain data objects");
        if (m.is_function()) {
            auto mf = m.as_function();
            mf.require( !mf.is_copy_or_move(),
                        "interfaces may not copy or move; consider a virtual clone() instead");
            mf.require( !mf.has_initializer(),
                        "interface functions must not have a function body; remove the '=' initializer");
            mf.require( mf.make_public(),
                        "interface functions must be public");
            mf.make_function_virtual();
            has_dtor |= mf.is_destructor();
        }
    }
    if (!has_dtor) {
        t.require( t.add_member( "operator=: (virtual move this) = { }"),
                   "could not add pure virtual destructor");
    }
}
```

That's the only example that works so far.

To make this example work, so far I've added:

- The beginnings of a reflection API.

- The beginnings of generation from source code: The above `t.add_member` call now takes the source code fragment string, lexes it,  parses it, and adds it to the `meta::type_declaration` object `t`.

- The first compile-time meta function that participates in interpreting the meaning of a type definition immediately after the type grammar is initially parsed (we'll never modify a type after it's defined, that would be ODR-bad).

I have NOT yet added the following, and won't get to them in the short term (thanks in advance for understanding):

- There is not yet a general reflection operator/expression.

- There is not yet a general Cpp2 interpreter that runs inside the cppfront compiler and lets users write meta functions like `interface` as external code outside the compiler. For now I've added `interface`, and I plan to add a few more from P0707, as meta functions provided within the compiler. But with this commit, `interface` is legitimately doing everything except being run through an interpreter -- it's using the `meta::` API and exercising it so I can learn how that API should expand and become richer, it's spinning up a new lexer and parser to handle code generation to add a member, it's stitching the generated result into the parse tree as if it had been written by the user explicitly... it's doing everything I envisioned for it in P0707 except for being run through an interpreter.

This commit is just one step. That said, it is a pretty big step, and I'm quite pleased to finally have reached this point.

---

This example is now part of the updated `pure2-types-inheritance.cpp2` test case:

    // Before this commit it was this
    Human: type = {
        speak: (virtual this);
    }

    //  Now it's this... and this fixed a subtle bug (can you spot it?)
    Human: @interface type = {
        speak: (this);
    }

That's a small change, but it actually also silently fixed a bug that I had written in the original code but hadn't noticed: Before this commit, the `Human` interface did not have a virtual destructor (oops). But now it does, because part of `interface`'s implementation is to generate a virtual destructor if the user didn't write one, and so by letting the user (today, that was me) express their intent, we get to do more on their behalf. I didn't even notice the omission until I saw the diff for the test case's generated `.cpp` had added a `virtual ~Human()`... sweet.

Granted, if `Human` were a class I was writing for real use, I would have later discovered that I forgot to write a virtual destructor when I did more testing or tried to do a polymorphic destruction, or maybe a lint/checker tool might have told me. But by declaratively expressing my intent, I got to not only catch the problem earlier, but even prevent it.

I think it's a promising data point that my own first attempt to use a metaclass in such a simple way already fixed a latent simple bug in my own code that I hadn't noticed. Cool beans.

---

Re syntax: I considered several options to request a meta function `m` be applied to the type being defined, including variations of `is(m)` and `as(m)` and `type(m)` and `$m`. I'm going with `@m` for now, and not because of Python envy... there are two main reasons:

- I think "generation of new code is happening here" is such a fundamental and important new concept that it should be very visible, and actually warrants taking a precious new symbol. The idea of "generation" is likely to be more widely used, so being able to have a symbol reserved for that meaning everywhere is useful. The list of unused symbols is quite short (Cpp2 already took `$` for capture), and the `@` swirl maybe even visually connotes generation (like the swirl in a stirred pot -- we're stirring/cooking something up here -- or maybe it's just me).

- I want the syntax to not close the door on applying meta functions to declarations other than types. So putting the decoration up front right after `:` is important, because putting it at the end of the type would likely much harder to read for variables and especially functions.
  • Loading branch information
hsutter committed Apr 19, 2023
1 parent 65fcd0f commit d8c1a50
Show file tree
Hide file tree
Showing 9 changed files with 945 additions and 184 deletions.
4 changes: 2 additions & 2 deletions regression-tests/pure2-types-inheritance.cpp2
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@

Human: type = {
speak: (virtual this);
Human: @interface type = {
speak: (this);
}

N: namespace = {
Expand Down
6 changes: 5 additions & 1 deletion regression-tests/test-results/pure2-types-inheritance.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@

#line 2 "pure2-types-inheritance.cpp2"
class Human;


#line 6 "pure2-types-inheritance.cpp2"
namespace N {
Expand All @@ -27,10 +28,11 @@ class Cyborg;
#line 2 "pure2-types-inheritance.cpp2"
class Human {
public: virtual auto speak() const -> void = 0;

public: virtual ~Human();
public: Human() = default;
public: Human(Human const&) = delete;
public: auto operator=(Human const&) -> void = delete;

#line 4 "pure2-types-inheritance.cpp2"
};

Expand Down Expand Up @@ -86,6 +88,8 @@ auto main() -> int;

//=== Cpp2 function definitions =================================================

#line 0 "pure2-types-inheritance.cpp2"
Human::~Human(){}

#line 6 "pure2-types-inheritance.cpp2"
namespace N {
Expand Down
12 changes: 6 additions & 6 deletions source/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,8 @@ struct source_line
bool all_tokens_are_densely_spaced = true; // to be overridden in lexing if they're not

source_line(
std::string const& t = {},
category c = category::empty
std::string_view t = {},
category c = category::empty
)
: text{t}
, cat{c}
Expand Down Expand Up @@ -258,10 +258,10 @@ struct error_entry
bool fallback = false; // only emit this message if there was nothing better

error_entry(
source_position w,
std::string const& m,
bool i = false,
bool f = false
source_position w,
std::string_view m,
bool i = false,
bool f = false
)
: where{w}
, msg{m}
Expand Down
97 changes: 81 additions & 16 deletions source/cppfront.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -724,7 +724,7 @@ class positional_printer
if (auto newline_pos = s.find('\n');
!leave_newlines_alone
&& s.length() > 1
&& newline_pos != std::string_view::npos
&& newline_pos != s.npos
)
{
while (newline_pos != std::string_view::npos)
Expand Down Expand Up @@ -1228,7 +1228,7 @@ class cppfront
assert (!section.second.empty());

// Get the parse tree for this section and emit each forward declaration
auto decls = parser.get_parse_tree(section.second);
auto decls = parser.get_parse_tree_declarations_in_range(section.second);
for (auto& decl : decls) {
assert(decl);
emit(*decl);
Expand Down Expand Up @@ -1329,7 +1329,7 @@ class cppfront
assert (!map_iter->second.empty());

// Get the parse tree for this section and emit each forward declaration
auto decls = parser.get_parse_tree(map_iter->second);
auto decls = parser.get_parse_tree_declarations_in_range(map_iter->second);
for (auto& decl : decls) {
assert(decl);
emit(*decl);
Expand Down Expand Up @@ -1398,7 +1398,7 @@ class cppfront
assert (!section.second.empty());

// Get the parse tree for this section and emit each forward declaration
auto decls = parser.get_parse_tree(section.second);
auto decls = parser.get_parse_tree_declarations_in_range(section.second);
for (auto& decl : decls) {
assert(decl);
emit(*decl);
Expand All @@ -1412,7 +1412,7 @@ class cppfront
// Finally, some debug checks
printer.finalize_phase();
assert(
tokens.num_unprinted_comments() == 0
(!errors.empty() || tokens.num_unprinted_comments() == 0)
&& "ICE: not all comments were printed"

This comment has been minimized.

Copy link
@filipsajdak

filipsajdak Apr 19, 2023

Contributor

In the case of !errors.empty(), the ICE: not all comments were printed message is misleading.

This comment has been minimized.

Copy link
@hsutter

hsutter Apr 19, 2023

Author Owner

The intent was not to trigger the assert if there were errors (so we shouldn't be generating a cpp1 file anyway and don't care if all the comments were printed). Is there a thinko in that condition?

This comment has been minimized.

Copy link
@filipsajdak

filipsajdak Apr 19, 2023

Contributor

Right. Correct. My mistake here.

This comment has been minimized.

Copy link
@hsutter

hsutter Apr 19, 2023

Author Owner

NP! Thanks.

);

Expand Down Expand Up @@ -3226,7 +3226,7 @@ class cppfront

emit(*n.expr);

// emit == and != as infix a @ b operators (since we don't have
// emit == and != as infix a ? b operators (since we don't have
// any checking/instrumentation we want to do for those)
if (flag_safe_comparisons) {
switch (op.type()) {
Expand Down Expand Up @@ -3322,7 +3322,7 @@ class cppfront

lambda_body += lhs_name;

// emit == and != as infix a @ b operators (since we don't have
// emit == and != as infix a ? b operators (since we don't have
// any checking/instrumentation we want to do for those)
if (flag_safe_comparisons) {
switch (term.op->type()) {
Expand Down Expand Up @@ -4660,6 +4660,69 @@ class cppfront
)
-> void
{
// First, do some deferred sema checks - deferred to here because
// they may be satisfied by metafunction application

// If this is a nonvirtual function, it must have an initializer
if (
n.is_function()
&& !n.is_virtual_function()
&& !n.has_initializer()
)
{
errors.emplace_back(
n.position(),
"a nonvirtual function must have a body ('=' initializer)"
);
return;
}

{
auto this_index = n.index_of_parameter_named("this");
auto that_index = n.index_of_parameter_named("that");

if (this_index >= 0) {
if (!n.parent_is_type()) {
errors.emplace_back(
n.position(),
"'this' must be the first parameter of a type-scope function"
);
return;
}
if (this_index != 0) {
errors.emplace_back(
n.position(),
"'this' must be the first parameter"
);
return;
}
}

if (that_index >= 0) {
if (!n.parent_is_type()) {
errors.emplace_back(
n.position(),
"'that' must be the second parameter of a type-scope function"
);
return;
}
if (that_index != 1) {
errors.emplace_back(
n.position(),
"'that' must be the second parameter"
);
return;
}
if (this_index != 0) {
errors.emplace_back(
n.position(),
"'that' must come after an initial 'this' parameter"
);
return;
}
}
}

// In phase 0, only need to consider namespaces and types

if (
Expand Down Expand Up @@ -4700,8 +4763,8 @@ class cppfront

// If we're in a type scope, handle the access specifier
if (n.parent_is_type()) {
if (n.access) {
printer.print_cpp2(n.access->to_string(true) + ": ", n.access->position());
if (!n.is_default_access()) {
printer.print_cpp2(to_string(n.access) + ": ", n.position());
}
else {
printer.print_cpp2("public: ", n.position());
Expand Down Expand Up @@ -4859,9 +4922,9 @@ class cppfront
// is one, or default to private for data and public for functions
if (printer.get_phase() == printer.phase1_type_defs_func_decls)
{
if (n.access) {
if (!n.is_default_access()) {
assert (is_in_type);
printer.print_cpp2(n.access->to_string(true) + ": ", n.access->position());
printer.print_cpp2(to_string(n.access) + ": ", n.position());
}
else if (is_in_type) {
if (n.is_object()) {
Expand Down Expand Up @@ -5319,10 +5382,9 @@ class cppfront
{
assert(
!is_main
&& prefix.empty()
// prefix can be "virtual"
// suffix1 will be " &&" though we'll ignore that
&& suffix2.empty()
&& "ICE: a destructor shouldn't have been able to generate a prefix or suffix (or be main)"
// suffix2 can be "= 0"
);

// Print the ~-prefixed type name instead of the operator= function name
Expand All @@ -5331,10 +5393,13 @@ class cppfront
&& n.parent_declaration->name()
);
printer.print_cpp2(
type_qualification_if_any_for(n)
prefix
+ type_qualification_if_any_for(n)
+ "~" + n.parent_declaration->name()->to_string(true),
n.position() );
n.position()
);
emit( *func, n.name(), false, true);
printer.print_cpp2( suffix2, n.position() );
}

// Ordinary functions are easier, do all their declarations except
Expand Down
2 changes: 0 additions & 2 deletions source/io.h
Original file line number Diff line number Diff line change
Expand Up @@ -846,14 +846,12 @@ class source
)
{
cpp1_found = true;
//lines.push_back({ &buf[0], source_line::category::preprocessor });
add_preprocessor_line();
while (
pre.has_continuation
&& in.getline(&buf[0], max_line_len)
)
{
//lines.push_back({ &buf[0], source_line::category::preprocessor });
add_preprocessor_line();
pre = is_preprocessor(buf, false);
}
Expand Down
20 changes: 15 additions & 5 deletions source/lex.h
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ enum class lexeme : std::int8_t {
Dot,
Ellipsis,
QuestionMark,
At,
Dollar,
FloatLiteral,
BinaryLiteral,
Expand Down Expand Up @@ -182,6 +183,7 @@ auto __as(lexeme l)
break;case lexeme::Dot: return "Dot";
break;case lexeme::Ellipsis: return "Ellipsis";
break;case lexeme::QuestionMark: return "QuestionMark";
break;case lexeme::At: return "At";
break;case lexeme::Dollar: return "Dollar";
break;case lexeme::FloatLiteral: return "FloatLiteral";
break;case lexeme::BinaryLiteral: return "BinaryLiteral";
Expand Down Expand Up @@ -515,20 +517,23 @@ auto expand_raw_string_literal(
//-----------------------------------------------------------------------
// lex: Tokenize a single line while maintaining inter-line state
//
// line the line to be tokenized
// mutable_line the line to be tokenized
// lineno the current line number
// in_comment are we currently in a comment
// current_comment the current partial comment
// current_comment_start the current comment's start position
// tokens the token list to add to
// comments the comment token list to add to
// errors the error message list to use for reporting problems
// raw_string_multiline the current optional raw_string state
//

// A stable place to store additional text for source tokens that are merged
// into a whitespace-containing token (to merge the Cpp1 multi-token keywords)
// -- this isn't about tokens generated later, that's tokens::generated_tokens
static auto generated_text = std::deque<std::string>{};
static auto generated_text = std::deque<std::string>{};
static auto generated_lines = std::deque<std::vector<source_line>>{};


static auto multiline_raw_strings = std::deque<multiline_raw_string>{};

Expand Down Expand Up @@ -677,7 +682,7 @@ auto lex_line(
auto i = std::ssize(tokens)-1;

// If the third-to-last token is "operator", we may need to
// merge an "operator@" name into a single identifier token
// merge an "operator?" name into a single identifier token

if (
i >= 2
Expand Down Expand Up @@ -1286,6 +1291,9 @@ auto lex_line(
break; case '?':
store(1, lexeme::QuestionMark);

break; case '@':
store(1, lexeme::At);

break;case '$':
if (peek1 == 'R' && peek2 == '"') {
// if peek(j-2) is 'R' it means that we deal with raw-string literal
Expand Down Expand Up @@ -1721,7 +1729,7 @@ class tokens
//-----------------------------------------------------------------------
// lex: Tokenize the Cpp2 lines
//
// lines tagged source lines
// lines tagged source lines
//
auto lex(
std::vector<source_line>& lines
Expand All @@ -1732,7 +1740,7 @@ class tokens
auto raw_string_multiline = std::optional<raw_string>();

assert (std::ssize(lines) > 0);
auto line = std::begin(lines)+1;
auto line = std::begin(lines);
while (line != std::end(lines)) {

// Skip over non-Cpp2 lines
Expand Down Expand Up @@ -1879,6 +1887,8 @@ class tokens

};

static auto generated_lexers = std::deque<tokens>{};

}

#endif
Loading

4 comments on commit d8c1a50

@msadeqhe
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this pretty big step which opens the door of code generation in Cpp2. And thanks for your smart decision to introduce @ in code generation. Also I think symbol @ fits well for user-defined language constructs too (if you have a plan to support it or something similar in the future):

if: <T: type> if let (@opt => std::optional<T>) do (@run => : (: T)) = {
    if @opt.has_value() {
        value: T = @opt.value();
        @run(value);
    }
}

if let optional_variable do: (value) {
    ...
    value.call();
    ...
}

Obviously @identifier is easily recognizable for code generation in comparison to my suggestion which I proposed parenthesis around it (identifier).

@JohelEGP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • There is not yet a general Cpp2 interpreter that runs inside the cppfront compiler and lets users write meta functions like interface as external code outside the compiler.

This seems to group up two big steps.
The interpreter, and name lookup.

You can write an interpreter with name lookup fixed on the built-in metafunctions.

Actual name lookup would be needed for the latter part:

and lets users write meta functions like interface as external code outside the compiler.

That would at least require also parsing the included .h2 headers.
Do you repeat this for every Cpp2 TU?
Or do you perform a more involved solution like Cpp1 modules.

The former would result in simpler cppfront invocations, but increase compile times.
The latter would require more build system support, but decrease compile times.

@JohelEGP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • There is not yet a general Cpp2 interpreter that runs inside the cppfront compiler and lets users write meta functions like interface as external code outside the compiler.

Is such a Cpp2 interpreter possible for Cppfront?
Even the built-in metafunctions require a Cpp1 compiler with reflect.h and the C++ standard library.
I can't see how you could interpret those with a running cppfront program.

Alternatives are possible with build system support.
Here I will present my current design based on a plug-in system.
I have no hands-on experience with those,
so the actual shape is more speculative,
although the idea itself is sound.

IIUC, libraries can be loaded at program startup.
So before cppfront's main, metafunctions can be loaded.
That means replacing the fixed list of metafunctions with a global registry of metafunctions.

  1. When Cppfront compiles a TU, it can emit a library that just loads the declared metafunctions.
  2. When Cppfront compiles a TU, the libraries of dependencies are loaded at cppfront program startup.

You can't consume a metafunction on the same non-module TU or module that declares it.
Otherwise, you run into similar problems as when emitting concepts in Phase 1 "Cpp2 type declarations" (#578 (comment)).

This process should be fast with modules.
Ideally, the reflection API in reflect.h2 is refactored into a module
that doesn't depend on parse.h (which depends on all sources but sema.h and cppfront.cpp).

Here's an example.

  1. Define the module game_lib in Cpp2, which declares the metafunction entity.
  2. Define the module game_lib in CMake.
  3. cppfront lowers game_lib and emits the module/library cpp2.metafunctions_for.game_lib.
    cpp2.metafunctions_for.game_lib populates the global registry of metafunctions
    with the metafunctions declared in game_lib.
  4. A CMake module takes care of the plug-in system part so that things just work.
  5. Define the TU game, which imports game_lib, and uses @entity.
  6. The cppfront program loaded by the CMake module to compile game
    will have the library cpp2.metafunctions_for.game_lib loaded.
    This means that @entity will be available to cppfront as a metafunction.

@JohelEGP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #797 with a proof of concept using Boost.DDL.

Please sign in to comment.