Skip to content

Fix removing parentheses when using deducing assignement, closes #327 #331

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

filipsajdak
Copy link
Contributor

@filipsajdak filipsajdak commented Apr 8, 2023

The issue is caused by emit(expression_list_node) that skips parentheses when a node is inside the initializer - that serves cases like:

v : std::vector<int> = (1,2,3);

Where it generates:

std::vector<int> v{1,2,3};

When := is used in the following cases:

d := (1 + 2) * (3 + 4) * (5 + 6);

It removes the first parentheses, and we end up with the following:

auto d = {1 + 2 * (3 + 4) * ( 5 + 6)};

This change corrects this behavior on the parsing side. After parsing
the expression list it checks if the next lexeme is Semicolon. If it is it
means that we are on the initializer of the form:

d1 := ((2 + 1) * (4 - 1) * (8 - 3));
d3 := (move d2);
v : std::vector<int> = ();

And we can suppress printing of parentheses - as there will be braces:

auto d1 {(2 + 1) * (4 - 1) * (8 - 3)};
auto d3 {std::move(d2)};
std::vector<int> v {};

When the next lexeme is not Semicolon it means that we are in an initializer
of the form:

d2 := (2 + 1) * (4 - 1) * (8 - 3);
d4 : _ = (1 + 2) * (3 + 4) * (5 + 6);
d5 : int = (1 + 2) * (3 + 4) * (5 + 6);

And we need to keep all the parentheses, and it will be generated to:

auto d2 {(2 + 1) * (4 - 1) * (8 - 3)};
auto d4 {(1 + 2) * (3 + 4) * (5 + 6)};
int d5 {(1 + 2) * (3 + 4) * (5 + 6)};

Closes #327. All regression tests pass.

There is only one difference in generated code in pure2-ufcs-member-access-and-chaining.cpp2 test. The following line:

    res := (42).ufcs();

Generates after this change:

    auto res {CPP2_UFCS_0(ufcs, (42))}; // previously: auto res {CPP2_UFCS_0(ufcs, 42)}; 

This is correct behavior, as the parentheses are not required in this context so they should be left. I have checked the following code:

    r1 := 42.ufcs();
    r2 := (42).ufcs();

That will generate after this change:

    auto r1 {CPP2_UFCS_0(ufcs, 42)}; 
    auto r2 {CPP2_UFCS_0(ufcs, (42))}; 

@hsutter
Copy link
Owner

hsutter commented Apr 8, 2023

Thanks! I'm not sure this is the right fix yet. Two things:

d := (1 + 2) * (3 + 4) * (5 + 6);

I see that this change fixes that, but it doesn't fix

d : _ = (1 + 2) * (3 + 4) * (5 + 6);
d : int = (1 + 2) * (3 + 4) * (5 + 6);

which still have incorrect code gen with this fix. (Thanks for pointing this out!)

there are slight differences in regression tests

I see that those differences are that code gen like this today

auto z {std::move(x)}; 

with this fix becomes instead

auto z {(std::move(x))}; 

and we should be avoiding generating extra parens (they're often innocuous, but I'm concerned about ending up hitting the comma operator).

@filipsajdak
Copy link
Contributor Author

Will work on that.

The issue is caused by `emit(expression_list_node)` that skips parentheses
when node is inside initializer - that serves cases like:
```cpp
v : std::vector<int> = (1,2,3);
```
Where it generates:
```cpp
std::vector<int> v{1,2,3};
```

When `:=` is used in the following cases:
```cpp
d := (1 + 2) * (3 + 4) * (5 + 6);
```
It removes first parentheses and we end up with:
```cpp
auto d = {1 + 2 * (3 + 4) * ( 5 + 6)};
```

This change corrects this behaviour on the parsing side. After parsing
expression list it checks if the next lexeme is `Semicolon`. If it is it
means that we are on the initializer of the form:
```cpp
d1 := ((2 + 1) * (4 - 1) * (8 - 3));
d3 := (move d2);
v : std::vector<int> = ();
```
And we can suppres printing of parentheses - as there will be braces:
```cpp
auto d1 {(2 + 1) * (4 - 1) * (8 - 3)};
auto d3 {std::move(d2)};
std::vector<int> v {};
```

When next lexeme is not `Semicolon` it means that we are in initializer
of the form:
```cpp
d2 := (2 + 1) * (4 - 1) * (8 - 3);
d4 : _ = (1 + 2) * (3 + 4) * (5 + 6);
d5 : int = (1 + 2) * (3 + 4) * (5 + 6);
```
And we need to keep all the parentheses and it will be generates to:
```cpp
auto d2 {(2 + 1) * (4 - 1) * (8 - 3)};
auto d4 {(1 + 2) * (3 + 4) * (5 + 6)};
int d5 {(1 + 2) * (3 + 4) * (5 + 6)};
```
@filipsajdak filipsajdak force-pushed the fsajdak-fix-missing-parentheses-in-deducing-assignement branch from 86bfe85 to 4ab0643 Compare April 8, 2023 21:24
@filipsajdak
Copy link
Contributor Author

@hsutter I found the solution, and the current proposed solution solves all the cases.

There is only one difference in generated code in pure2-ufcs-member-access-and-chaining.cpp2 test. The following line:

    res := (42).ufcs();

Generates after this change:

    auto res {CPP2_UFCS_0(ufcs, (42))}; // previously: auto res {CPP2_UFCS_0(ufcs, 42)}; 

This is correct behavior, as the parentheses are not required in this context so they should be left. I have checked the following code:

    r1 := 42.ufcs();
    r2 := (42).ufcs();

That will generate after this change:

    auto r1 {CPP2_UFCS_0(ufcs, 42)}; 
    auto r2 {CPP2_UFCS_0(ufcs, (42))}; 

I have updated the description of this PR.

@filipsajdak
Copy link
Contributor Author

There is one issue left: shall we rename inside_initializer?

@hsutter
Copy link
Owner

hsutter commented Apr 8, 2023

Thanks! Trying it out now...

BTW in case I wasn't clear in my comment about hitting the comma operator, I meant that when there are multiple parameters, I want to avoid accidentally hit a case where a list of multiple things like {1, 2} accidentally becomes a list of a single thing {(1, 2)} because the comma turns into a comma operator and the 1 gets thrown away. (Have I mentioned Cpp2 has no comma operator? 😁 Or at least it's trying very hard not to have one, even accidentally.)

@hsutter
Copy link
Owner

hsutter commented Apr 8, 2023

OK, looks good -- including that now the one change in the code gen is actually a correction, that's always a good sign.

Thanks! Merging...

There is one issue left: shall we rename inside_initializer?

What would be a better name, inside_assignment?

@hsutter hsutter merged commit aa1411f into hsutter:main Apr 8, 2023
@filipsajdak
Copy link
Contributor Author

inside_initializer is fine but it only is used in cases when it is in the following form:

a : type = ( /* any complex expression */ );

So, it is more about the initializer being surrounded entirely by parenthesis.

@filipsajdak
Copy link
Contributor Author

@hsutter I have just realized that we have anonymous objects that we can create in place e.g. as a function argument. Need to check but probably my change broke the following code:

fun(:std::vector=(1,2,3));

Need to check it.

@filipsajdak
Copy link
Contributor Author

Sorry, this change brake:

fun(:std::vector=(1,2,3));

Generates:

fun(std::vector{(1, 2, 3)});

Looking for a fix... and we need to add a regression test for in-place creation of objects.

@filipsajdak
Copy link
Contributor Author

@hsutter OK, one way to fix is to replace (in parse.h line 2875):

            if (curr().type() != lexeme::Semicolon) {
                expr_list->inside_initializer = false;
            } 

with

            if (
                   curr().type() != lexeme::Semicolon
                && curr().type() != lexeme::RightParen 
                && curr().type() != lexeme::RightBracket 
                && curr().type() != lexeme::Comma
            ) {
                expr_list->inside_initializer = false;
            } 

That should cover cases:

    f1(:std::vector=(1,2,3));
    f2(:std::vector=(1,2,3),:std::vector=(4,5));
    ar[:index=(1,2)];
    ar2[:index=(1,2), :index=(2,1)];

that will generate:

    f1(std::vector{1, 2, 3});
    f2(std::vector{1, 2, 3}, std::vector{4, 5});
    cpp2::assert_in_bounds(ar, index{1, 2});
    cpp2::assert_in_bounds(ar2, index{1, 2}, index{2, 1});

Are there any other cases that we should cover in-placed created objects?

filipsajdak added a commit to filipsajdak/cppfront that referenced this pull request Apr 9, 2023
That should cover cases:
```cpp
    f1(:std::vector=(1,2,3));
    f2(:std::vector=(1,2,3),:std::vector=(4,5));
    ar[:index=(1,2)];
    ar2[:index=(1,2), :index=(2,1)];
```
that will generate:
```cpp
    f1(std::vector{1, 2, 3});
    f2(std::vector{1, 2, 3}, std::vector{4, 5});
    cpp2::assert_in_bounds(ar, index{1, 2});
    cpp2::assert_in_bounds(ar2, index{1, 2}, index{2, 1});
```
filipsajdak added a commit to filipsajdak/cppfront that referenced this pull request Apr 9, 2023
That should cover cases:
```cpp
    f1(:std::vector=(1,2,3));
    f2(:std::vector=(1,2,3),:std::vector=(4,5));
    ar[:index=(1,2)];
    ar2[:index=(1,2), :index=(2,1)];
```
that will generate:
```cpp
    f1(std::vector{1, 2, 3});
    f2(std::vector{1, 2, 3}, std::vector{4, 5});
    cpp2::assert_in_bounds(ar, index{1, 2});
    cpp2::assert_in_bounds(ar2, index{1, 2}, index{2, 1});
```
@filipsajdak filipsajdak deleted the fsajdak-fix-missing-parentheses-in-deducing-assignement branch April 10, 2023 15:07
filipsajdak added a commit to filipsajdak/cppfront that referenced this pull request Apr 10, 2023
That should cover cases:
```cpp
    f1(:std::vector=(1,2,3));
    f2(:std::vector=(1,2,3),:std::vector=(4,5));
    ar[:index=(1,2)];
    ar2[:index=(1,2), :index=(2,1)];
```
that will generate:
```cpp
    f1(std::vector{1, 2, 3});
    f2(std::vector{1, 2, 3}, std::vector{4, 5});
    cpp2::assert_in_bounds(ar, index{1, 2});
    cpp2::assert_in_bounds(ar2, index{1, 2}, index{2, 1});
```
hsutter pushed a commit that referenced this pull request Apr 14, 2023
That should cover cases:
```cpp
    f1(:std::vector=(1,2,3));
    f2(:std::vector=(1,2,3),:std::vector=(4,5));
    ar[:index=(1,2)];
    ar2[:index=(1,2), :index=(2,1)];
```
that will generate:
```cpp
    f1(std::vector{1, 2, 3});
    f2(std::vector{1, 2, 3}, std::vector{4, 5});
    cpp2::assert_in_bounds(ar, index{1, 2});
    cpp2::assert_in_bounds(ar2, index{1, 2}, index{2, 1});
```
zaucy pushed a commit to zaucy/cppfront that referenced this pull request Dec 5, 2023
The issue is caused by `emit(expression_list_node)` that skips parentheses
when node is inside initializer - that serves cases like:
```cpp
v : std::vector<int> = (1,2,3);
```
Where it generates:
```cpp
std::vector<int> v{1,2,3};
```

When `:=` is used in the following cases:
```cpp
d := (1 + 2) * (3 + 4) * (5 + 6);
```
It removes first parentheses and we end up with:
```cpp
auto d = {1 + 2 * (3 + 4) * ( 5 + 6)};
```

This change corrects this behaviour on the parsing side. After parsing
expression list it checks if the next lexeme is `Semicolon`. If it is it
means that we are on the initializer of the form:
```cpp
d1 := ((2 + 1) * (4 - 1) * (8 - 3));
d3 := (move d2);
v : std::vector<int> = ();
```
And we can suppres printing of parentheses - as there will be braces:
```cpp
auto d1 {(2 + 1) * (4 - 1) * (8 - 3)};
auto d3 {std::move(d2)};
std::vector<int> v {};
```

When next lexeme is not `Semicolon` it means that we are in initializer
of the form:
```cpp
d2 := (2 + 1) * (4 - 1) * (8 - 3);
d4 : _ = (1 + 2) * (3 + 4) * (5 + 6);
d5 : int = (1 + 2) * (3 + 4) * (5 + 6);
```
And we need to keep all the parentheses and it will be generates to:
```cpp
auto d2 {(2 + 1) * (4 - 1) * (8 - 3)};
auto d4 {(1 + 2) * (3 + 4) * (5 + 6)};
int d5 {(1 + 2) * (3 + 4) * (5 + 6)};
```
zaucy pushed a commit to zaucy/cppfront that referenced this pull request Dec 5, 2023
That should cover cases:
```cpp
    f1(:std::vector=(1,2,3));
    f2(:std::vector=(1,2,3),:std::vector=(4,5));
    ar[:index=(1,2)];
    ar2[:index=(1,2), :index=(2,1)];
```
that will generate:
```cpp
    f1(std::vector{1, 2, 3});
    f2(std::vector{1, 2, 3}, std::vector{4, 5});
    cpp2::assert_in_bounds(ar, index{1, 2});
    cpp2::assert_in_bounds(ar2, index{1, 2}, index{2, 1});
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] parentheses missing when emitting binary expression tree
2 participants