Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUGGESTION] User-defined Language Constructs #382

Closed
msadeqhe opened this issue Apr 18, 2023 · 9 comments
Closed

[SUGGESTION] User-defined Language Constructs #382

msadeqhe opened this issue Apr 18, 2023 · 9 comments

Comments

@msadeqhe
Copy link

msadeqhe commented Apr 18, 2023

Preface

"Express language facilities as libraries" and "Eliminate need for preprocessor" are two parts of Cpp2's 2016 roadmap.

If we look at for construct in Cpp2:

for items do: (item) = { ... }

a question may arise: "Is it a lambda after do?", and the answer seems to be yes. What if for construct could be implemented as a library feature? How about if and while constructs?

A language construct has a feature that normal functions or lambdas cannot support. Language construct doesn't evaluate an expression at first evidence, and it can be evaluated multiple times. For example:

while count < 10 next count++ {
    std::cout << count;
}

Here, count++ will be not evaluated before statement block { ... }, but it will be evaluated in each loop. Consider if it was a function:

while_func(count < 10, count++, : () = {
    std::cout << count;
});

No, while_func doesn't work as you maybe expected. Currently, lambdas are so close to this feature but we have to specify arguments if we don't capture variables:

while_func(count, : (count) = count < 10, : (inout count) = count++, : () = {
    std::cout << count;
});

while_func(: () = count$ < 10, : () = count$++, : () = {
    std::cout << count;
});

In the second one, capture count$++ doesn't work, because it's a copy of count variable, therefore inside while_func implementation we have to be aware of that.

Suggestion Detail

User-defined Language Constructs are a way to pass expressions without evaluating their value at first evidence. Because of their different look (e.g. while condition next count++ { ... }) they are like functions (e.g. wwhile(condition, count++, : () { ... })) but their parameters are expressions instead of variables and lambdas, and they have keywords (e.g. next) instead of parenthesis and comma.

The syntax is not important, I just introduce a syntax (which I was thinking about it) but its syntax can be anything else:

name: <template-params> name opt-keys (expr-param) ... = { implementation }

Everything before = is the signature of an user-defined language construct:

  • name is the name of the user-defined language construct.
  • <template-params> is optional. It's the list of template parameters (e.g. <T: type, U: type>).
  • opt-keys is optional. It's a list of keywords seperated by white-space (e.g. do next).
  • (expr-param) is optional. It's the name and type of an expression parameter. I'll explain it in more detail.
  • ... is optional. It's a group of opt-keys (expr-param) which can be repeated.
  • implementation is optional. It's the actual code which implement the user-defined language construct.

I have to explain (expr-param) and how to declare an expression parameter. An expression parameter with name expr-name and with type expr-type is like this:

(expr-name => expr-type)

The type of an expression parameter can be either a value, a type, an expression, a statement block or an init statement. The syntax of them is described blow:

  • param => something: Here, param is an expression parameter which accepts only expressions with type something.
  • param => {} -> something: Here, param is an expression parameter which accepts only a statement block which returns a value of type something.
  • param => : (params) -> something: Here, param is an expression parameter which accepts only a function with (args) with return type something.
  • and etc.

I don't want to go into detail, becuase the syntax of my suggestion is not important. Now, I write an user-defined language construct to explain it (expression parameters are always between parenthesis):

check: <T: type> check (condition => T) do (run_always => void) (run => {} -> void) = {
    if (condition) {
        (run);
    }
    (run_always);
}

Here are the descriptions:

  • check is the name of the user-defined language construct, consider that this name has to be repeated before and after :.
  • <T: type> is the template parameter. This template is used to specify the type of expression parameter run_always.
  • (condition => T) is an expression with name condition and its type after evaluation must be T which is a template parameter.
  • do is a user-defined keyword.
  • (run_always => void) is an expression with name run_always and it doesn't return anything after evaluation.
  • (run => {} -> void) is a statement block which must not return anything.

We can use check in the following way:

check count < 0 do call_always() {
    call_it();
    call_another_one();
}

It generates the following code:

if count < 0 {
    call_it();
    call_another_one();
}
call_always();

Now, let's define something similar to for loop which is currently a built-in language construct in Cpp2:

for: <T: type> for (list: std::container<T>) do (lambda: (a: T)) = {
    item: = begin(list);
    while item != end(vector) next item++ {
        lambda(item);
    }
}

Cpp2 kewords like do, else, next, if and ... can be used for user-defined language constructs. Furtunately Cpp2 doesn't treat if/while/... constructs as expression, therefore it's possible to use keywords in User-defined Language Constructs without any ambiguity.

User-defined language constructs are like functions in which they can be overloaded by keywords in their signature, for example we can defined two check language construct:

check: check (condition => bool) (run => {} -> void) = { ... }
check: check (condition => bool) turn (something => sometype) (run => {} -> void) = { ... }

main: () = {
    x: = 1;
    y: sometype = 0;

    check x < 10 { ... }
    check x < 10 turn y { ... }
}

Also they can be used in generic programming with template parameters and requires:

check: <T: type> requires std::is_integral_v<T>
    check (condition => bool) turn (something => T) (run => {} -> void)
    = { ... }

This suggestion is not mature yet. I don't know if there is interest in user-defined language constructs... If you agree, I would improve the syntax and semantics and ...

Examples and Usages

Many language constructs can be user-defined in libraries:

for: <T: type> for (list: std::container<T>) do (lambda: (a: T)) = {
    item: = begin(list);
    while item != end(vector) next item++ {
        lambda(item);
    }
}

// => : _ is for variable declaration
for: for (initialize => : _) if (condition => bool) next (step => _) (run => {} -> void) = {
    (initialize);
    while (condition) {
        (run);
        (step);
    }
}

for: for every (duration => seconds) (run => {} -> void) = {
    ...
}

loop: loop (run => {} -> void) = {
    while true {
        (run);
    }
}

skip: skip (run => {} -> void) = {
    try {
        (run);
    }
    catch {
        //,ignore
    }
}

// It's simply a user-defined language construct that gets nothing and does nothing
// It's like pass keyword in Python.
pass: pass;

// => : T is for variable declaration
using: <T: type> using (initialize => variable : T = value) (run => {} -> void) = {
    (variable): T;
    try {
        (variable) = (value);
        (run);
    }
    finally {
        (variable).close();
    }
}

if: if is not (condition => bool) (run => {} -> void) else (run_else => {} -> void) = {
    if ! (condition) {
        (run);
    }
    else {
        (run_else);
    }
}

These are examples of how we use the above user-defined language constructs:

for items do: (item) = {
    //,statements...
}

for x: int = 0 if x < 10 next x++ {
    //,statements...
}

// min is a user-defined literal which means minutes
// This will repeat statements every 1 minutes
for every 1min {
    //,statements...
}

loop {
    //,statements...
}

// Ignore errors
skip {
    //,statements...
}

pass;

// It's like C# construct
using x: some_type = init_value {
    //,statements...
}

// Multi-keywords are allowed...
if is not condition {
    //,statements...
}
else do {
    //,statements...
}

And in the last example is not is not ordinary is and not, they are specialized in user-defined language construct.

Why do I suggest this change?

Becuase it's a general language feature. User-defined Language Constructs have the following benefits:

  1. This feature is not possible today with Cpp1 (without macros) or Cpp2. We want a way to pass an expression and evaluate it later. Lambdas are too much verbose and strange in comparison to user-defined language constructs, and captures are by value (see Preface section for explanation).
  2. They are integrated into the language:
    • They can be used in generic programming with template parameters, concepts and requires clause.
    • They are a better replacement for macroes, because their expression parameters are typed and integrated.
    • They can overload existing and built-in language constructs such as if/while/... by different keywords, e.g. if let and if has are overloads of built-in if construct. Another example is that user-defined language constructs check in, check at and check from can be used in the same code without any problem.
    • They can be inside namespaces, e.g. my::for
  3. They make some Cpp2 language constructs (e.g. for) to be implemented as library features.
  4. They help library writers to express the power of their library.
  5. They are more readable than using indirect solutions.
  6. They are not ambiguous with normal functions and lambdas:
    • Expression parameters are between keywords of language construct, therefore they won't be evaluated at first evidence. Statement blocks start with { ... therefore variables don't have to be captured inside it just like if/while/....
    • Function Parameters are inside parenthesis, therefore they will be evaluated at first evidence. Lambdas start with : (args) = { ... therefore variables have to be captured inside it.
  7. They add more power to Cpp2 in addition to reflections and expressing features like namespace, enum and ... as ordinary libraries.
  8. They make Cpp2 a pseudo language for langauge designers, or test new language construct features, or ...

Will your feature suggestion eliminate X% of security vulnerabilities of a given kind in current C++ code?

No.

Will your feature suggestion automate or eliminate X% of current C++ guidance literature?

Yes.

Describe alternatives you've considered.

At first I tried to make expression parameters to be look like captures (e.g. (expr-param$: expr-type or (expr-param: expr-type)$), but the relation between expression parameters and captures wasn't enough. Also currently I don't know if @hsutter wants to support a language feature similar to user-defined language constructs or not, therefore I didn't think enough for declaration syntax alternatives.

@msadeqhe
Copy link
Author

msadeqhe commented Apr 18, 2023

I have to mention that user-defined language constructs can be inside namespaces in addition to overloading by keywords:

my::for i: = 0 if i < 10 next i++ {
    std::cout << i;
}

my::for items do: (item) = {
    std::cout << item.value() << "\n";
}

@msadeqhe
Copy link
Author

msadeqhe commented Apr 19, 2023

It's an example similar to Swift if let construct which can be user-defined in Cpp2, for declaring if let it's possible to use is similar to pattern matching with inspect:

if: <T: type> if let (initialization => value: T = opt is std::optional<T>) (run => {} -> void) = {
    if (opt).has_value() {
        (value): T = (opt).value();
        (run);
    }
}

with this usage:

if let value: int = optional_var {
    ...
    value.call();
    ...
}

Alternatively it can be declared in this way:

if: <T: type> if let (opt => std::optional<T>) do (run => : (v: T)) = {
    if (opt).has_value() {
        value: T = (opt).value();
        (run)(value);
    }
}

with this alternative usage:

if let optional_var do: (value) = {
    ...
    value.call();
    ...
}

The declaration syntax of user-defined language constructs is not a part of my suggestion, it can be anything else. If this feature is acceptable in Cpp2, I'll suggest a declaration syntax in addition to its usage in a new issue.

msadeqhe referenced this issue Apr 19, 2023
… type function

This commit includes "just enough" to make this first meta function work, which can be used like this...

```
Human: @interface type = {
    speak: (this);
}
```

... where the implementation of `interface` is just about line-for-line from my paper P0707, and now (just barely!) compiles and runs in cppfront (and I did test the `.require` failure cases and it's quite lovely to see them merge with the compiler's own built-in diagnostics):

```
//-----------------------------------------------------------------------
//  interface: an abstract base class having only pure virtual functions
auto interface( meta::type_declaration&  t ) -> void {
    bool has_dtor = false;
    for (auto m : t.get_members()) {
        m.require( !m.is_object(),
                   "interfaces may not contain data objects");
        if (m.is_function()) {
            auto mf = m.as_function();
            mf.require( !mf.is_copy_or_move(),
                        "interfaces may not copy or move; consider a virtual clone() instead");
            mf.require( !mf.has_initializer(),
                        "interface functions must not have a function body; remove the '=' initializer");
            mf.require( mf.make_public(),
                        "interface functions must be public");
            mf.make_function_virtual();
            has_dtor |= mf.is_destructor();
        }
    }
    if (!has_dtor) {
        t.require( t.add_member( "operator=: (virtual move this) = { }"),
                   "could not add pure virtual destructor");
    }
}
```

That's the only example that works so far.

To make this example work, so far I've added:

- The beginnings of a reflection API.

- The beginnings of generation from source code: The above `t.add_member` call now takes the source code fragment string, lexes it,  parses it, and adds it to the `meta::type_declaration` object `t`.

- The first compile-time meta function that participates in interpreting the meaning of a type definition immediately after the type grammar is initially parsed (we'll never modify a type after it's defined, that would be ODR-bad).

I have NOT yet added the following, and won't get to them in the short term (thanks in advance for understanding):

- There is not yet a general reflection operator/expression.

- There is not yet a general Cpp2 interpreter that runs inside the cppfront compiler and lets users write meta functions like `interface` as external code outside the compiler. For now I've added `interface`, and I plan to add a few more from P0707, as meta functions provided within the compiler. But with this commit, `interface` is legitimately doing everything except being run through an interpreter -- it's using the `meta::` API and exercising it so I can learn how that API should expand and become richer, it's spinning up a new lexer and parser to handle code generation to add a member, it's stitching the generated result into the parse tree as if it had been written by the user explicitly... it's doing everything I envisioned for it in P0707 except for being run through an interpreter.

This commit is just one step. That said, it is a pretty big step, and I'm quite pleased to finally have reached this point.

---

This example is now part of the updated `pure2-types-inheritance.cpp2` test case:

    // Before this commit it was this
    Human: type = {
        speak: (virtual this);
    }

    //  Now it's this... and this fixed a subtle bug (can you spot it?)
    Human: @interface type = {
        speak: (this);
    }

That's a small change, but it actually also silently fixed a bug that I had written in the original code but hadn't noticed: Before this commit, the `Human` interface did not have a virtual destructor (oops). But now it does, because part of `interface`'s implementation is to generate a virtual destructor if the user didn't write one, and so by letting the user (today, that was me) express their intent, we get to do more on their behalf. I didn't even notice the omission until I saw the diff for the test case's generated `.cpp` had added a `virtual ~Human()`... sweet.

Granted, if `Human` were a class I was writing for real use, I would have later discovered that I forgot to write a virtual destructor when I did more testing or tried to do a polymorphic destruction, or maybe a lint/checker tool might have told me. But by declaratively expressing my intent, I got to not only catch the problem earlier, but even prevent it.

I think it's a promising data point that my own first attempt to use a metaclass in such a simple way already fixed a latent simple bug in my own code that I hadn't noticed. Cool beans.

---

Re syntax: I considered several options to request a meta function `m` be applied to the type being defined, including variations of `is(m)` and `as(m)` and `type(m)` and `$m`. I'm going with `@m` for now, and not because of Python envy... there are two main reasons:

- I think "generation of new code is happening here" is such a fundamental and important new concept that it should be very visible, and actually warrants taking a precious new symbol. The idea of "generation" is likely to be more widely used, so being able to have a symbol reserved for that meaning everywhere is useful. The list of unused symbols is quite short (Cpp2 already took `$` for capture), and the `@` swirl maybe even visually connotes generation (like the swirl in a stirred pot -- we're stirring/cooking something up here -- or maybe it's just me).

- I want the syntax to not close the door on applying meta functions to declarations other than types. So putting the decoration up front right after `:` is important, because putting it at the end of the type would likely much harder to read for variables and especially functions.
@hsutter
Copy link
Owner

hsutter commented Apr 20, 2023

Thanks.

Good timing: You get "some" of that with meta functions, the first commit of which I just pushed yesterday. It's just a start, more to follow in the future.

But I'm actually actively avoiding going further and creating effectively a mutable language. I cover this design decision in more detail in paper P0707 section 5.2.1), with some discussion about the kinds of problems I want to avoid.

So I won't be pursuing this, but thanks for understanding!

@hsutter hsutter closed this as completed Apr 20, 2023
@msadeqhe
Copy link
Author

Thanks ☺️, I believe in your right decisions.

@msadeqhe
Copy link
Author

@hsutter, Control structures (aka language constructs) have a feature which they can evaluate an expression multiple times or postpone its evaluation later (like count++ after next keyword in while control structure). This feature is similar to macros or even similar to capture by reference but without extra : () = { ... } and without additional symbols &$*. Do you have a plan to support a feature like this in Cpp2?

@msadeqhe
Copy link
Author

msadeqhe commented Apr 21, 2023

But I'm actually actively avoiding going further and creating effectively a mutable language. I cover this design decision in more detail in paper P0707 section 5.2.1), with some discussion about the kinds of problems I want to avoid.

If I prepare a suggestion to support a feature which I described in previous comment, without making Cpp2 a mutable language, and I avoid those kinds of problems which are described in paper P0707, would you review and perhaps accept it or that feature shouldn't be available in Cpp2 anyway?

@JohelEGP
Copy link
Contributor

That sounds like something that might be possible someday, given reflection and P2806 (see https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2806r1.html#what-about-reflection, and http://github.com/cplusplus/papers/issues/1462#issuecomment-1426546113).

@hsutter
Copy link
Owner

hsutter commented Apr 21, 2023

I do suspect a lot of these kinds of things will be satisfied with reflection. See also the conversation on #386.

So I'll suggest the same thing that the SG7 (ISO C++ compile-time-programming) subgroup urged for narrow-feature proposals in the area of generative programming: Wait until we have reflection, and in the meantime try to show how the effect can be achieved using a general reflection feature instead of by adding a narrower/special-purpose language feature (because showing how reflection might do it also generates use cases for reflection which can feed into the reflection design).

@msadeqhe
Copy link
Author

Thanks. I've created the new suggestion in this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants