Skip to content

Add cpp2 raw string literals support with interpolation #251

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

filipsajdak
Copy link
Contributor

@filipsajdak filipsajdak commented Feb 8, 2023

The current implementation of cppfront does not support raw string literals on the cpp2 side. The raw strings are supported only on the cpp1 side. This change introduces:

  • cpp2 raw string liteterals support,
  • support for all prefixes for string literals (L, u8, u, U),
  • string interpolation for cpp2 raw string literals (enable by adding $ before raw-string-literal),

That makes the following code:

main: () -> int = {
    i := 42;
    m : std::map<std::string, int> = ();
    m["one"] = 1;
    m["two"] = 2;

    str : std::string = "this is a string";

    raw_str : std::string = R"string(raw string without interpolation)string";

    raw_str_multi : std::string = R"test(this is raw string literal

that can last for multiple

lines)test";

    raw_str_inter : std::string = $R"test(this is raw string literal
that can last for multiple
lines
(i)$ R"(this can be added too)"
calculations like m["one"] + m["two"] = (m["one"] + m["two"])$ also works
("at the beginning of the line")$!!!)test";

    raw_str_inter_multi : std::string = $R"(

    )" + $R"((i)$)" + $R"((i)$)";

    std::cout << str << std::endl;
    std::cout << raw_str << std::endl;
    std::cout << raw_str_multi << std::endl;
    std::cout << raw_str_inter << std::endl;
    std::cout << raw_str_inter_multi << std::endl;
    std::cout << ($R"((m["one"])$.)" + $R"((m["two"])$.)" + $R"((m["three"])$.)" + $R"((i)$)") << std::endl;
}

Generates the following cpp1 code (skipping boilerplate):

[[nodiscard]] auto main() -> int{
    auto i {42}; 
    std::map<std::string,int> m {}; 
    cpp2::assert_in_bounds(m, "one") = 1;
    cpp2::assert_in_bounds(m, "two") = 2;

    std::string str {"this is a string"}; 

    std::string raw_str {R"string(raw string without interpolation)string"}; 

    std::string raw_str_multi {R"test(this is raw string literal

that can last for multiple

lines)test"}; 

    std::string raw_str_inter {R"test(this is raw string literal
that can last for multiple
lines
)test" + cpp2::to_string(i) + R"test( R"(this can be added too)"
calculations like m["one"] + m["two"] = )test" + cpp2::to_string(cpp2::assert_in_bounds(m, "one") + cpp2::assert_in_bounds(m, "two")) + R"test( also works
)test" + cpp2::to_string("at the beginning of the line") +  R"test(!!!)test"}; 

    std::string raw_str_inter_multi {R"(

    )" + cpp2::to_string(i) + cpp2::to_string(i)}; 

    std::cout << std::move(str) << std::endl;
    std::cout << std::move(raw_str) << std::endl;
    std::cout << std::move(raw_str_multi) << std::endl;
    std::cout << std::move(raw_str_inter) << std::endl;
    std::cout << std::move(raw_str_inter_multi) << std::endl;
    std::cout << (cpp2::to_string(cpp2::assert_in_bounds(m, "one")) + R"(.)" + cpp2::to_string(cpp2::assert_in_bounds(m, "two")) + R"(.)" + cpp2::to_string(cpp2::assert_in_bounds(std::move(m), "three")) + R"(.)" + cpp2::to_string(std::move(i))) << std::endl;
}

All regression tests pass.

Limitations

cppfront accepts all prefixes for string literals (regular string literals and raw string literals). Unfortunately, wide characters are not supported by string interpolation - the generated code is OK, but the cpp2::to_string() functions produce std::string, incompatible with strings that use other character types.

@filipsajdak filipsajdak marked this pull request as draft February 15, 2023 01:43
@filipsajdak
Copy link
Contributor Author

I found some bugs with this patch. Working on a patch to fix it.

@filipsajdak filipsajdak force-pushed the fsajdak-add-cpp2-raw-string-literals branch 4 times, most recently from 4f3cb9f to d48545d Compare February 17, 2023 01:26
@filipsajdak filipsajdak marked this pull request as ready for review February 17, 2023 01:27
@filipsajdak
Copy link
Contributor Author

@hsutter, the code is fixed. All tests passed. I have added a test covering edge cases I found along the way.

I decided to add raw_strings static variable similar to generated_text. I was thinking of using generated_text, but I am also using start and end source_positions. When I recognize the raw string literal, I am adding new raw_string to the raw_strings container, and I use it in the next steps - that ensures that the start & end variables are set to proper values.

@neumannt
Copy link

I don't think interpolation in raw string literals makes sense. After all, you use raw string literals to avoid quoting. If you start interpreting the content of a raw string that defeats the purpose of raw strings. cpp2 should support raw strings, of course, but it should not interpret the content but pass it along as is.

@filipsajdak
Copy link
Contributor Author

@neumannt you are right. This is misleading at a minimum. I like the possibility of writing string without escaping the quotation marks that will interpolate. But raw string should be a raw string 😄

Point taken. Maybe I will propose a way to enable interpolation.

@hsutter
Copy link
Owner

hsutter commented Feb 19, 2023

Just spitballing here, I want to think about this more:

I can see value in "fully raw" string literals, and "raw plus interpolation" string literals.

One option could be to have the usual R"..." syntax for raw strings, and to enable interpolation put a $ immediately before the " without whitespace (e.g., R$"...").

Of course, doing that would resurrect previous requests that normal "..." strings not be interpolated either, and for consistency now also be prefixed with $"..." to enable interpolation. I've resisted that prefix distinction, but I do realize that would follow the existing practice in other languages, which seems to be fine and usable, to enable interpolation in string literals via a prefix in C# ($"...") and Python (f'...').

Requiring the $ up front to enable interpolation also has the benefit of declaring intent up front. That can be good for readability, so the code readers (who outnumber the code author) don't have to read through a long string to see if there are any interpolations. And it can be good for diagnostics, because we could emit a warning if a $" string doesn't contain an interpolation ("did you forget...?", probably there would be no false positives but some users might be annoyed at "having to add the $ to shut up the compiler") and possibly also if a non-$" string contains what would be an interpolation ("did you mean...?", but this one could have false positives and I would expect be more likely to annoy users).

I know we've discussed the above prefix-to-enable-interpolation idea before in another issue, and I resisted it. But now that we have the additional use case of raw string literals, where there also seems to be a reason to let users distinguish string literals that allow interpolation and ones that don't, it's an additional data point that might point toward "string literals 'want to' be prefixed for interpolation"...

@hsutter
Copy link
Owner

hsutter commented Feb 19, 2023

Editing to answer my own question:

Also: Do we really need the entire 2x2 matrix of { raw, non-raw } x { with-interpolations, no-interpolations } ? In particular, is there a need for { non-raw, with-interpolations } ? If not, then $"..." could imply raw as well as allow interpolations, if those two things should naturally go together. I don't have enough experience yet with using the feature to have a good feel for the answer.

Yes, we want the 2x2 matrix: (a) because we want to distinguish raw with/without interpolations even if it's just to customize the prefix + the usage examples already shown in previous comments, and (b) eliminating one option wouldn't actually reduce concept count (we would still need to teach two concepts), it would just reduces expressiveness (because the concepts wouldn't be orthogonal), which is usually bad -- we prefer general orthogonal combinable concepts.

So perhaps it might be most natural to have the R prefix for raw (including customizing the string introducer) and the $ prefix for with-interpolations, which can be used orthogonally in any combination, and leave the default "..." as non-raw non-interpolated.

@filipsajdak
Copy link
Contributor Author

filipsajdak commented Feb 19, 2023

I have worked a little bit more on that and currently prepared an implementation that uses $ as the first character of d-char-sequence

prefix(optional) R"d-char-sequence(optional) (r-char-sequence(optional))d-char-sequence(optional)"

That means that

i := 42;
rs1 := R"(raw string: (i)$)";         // this will not interpolate, pure raw string
rs2 := R"seq(raw string: (i)$)seq";   // this will not interpolate, pure raw string

rs3 := R"$(raw string: (i)$)$";       // this will interpolate
rs4 := R"$seq(raw string: (i)$)$seq"; // this will interpolate

Currently, $ is not allowed in d-char-sequence but probably will be in C++26 thanks to https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2558r0.html (in my current code when I detect it I replace it to S to make it work with current compilers).

This is my current state of work.

@hsutter, as I understand you correctly, you propose adding a $ prefix for all strings that need to be interpolated (even if they are regular string literals and raw string literals), right?

So, the above example can be rewritten to:

i := 42;

s1  := "string: (i)$";               // this will not interpolate, pure string
rs1 := R"(raw string: (i)$)";        // this will not interpolate, pure raw string
rs2 := R"seq(raw string: (i)$)seq";  // this will not interpolate, pure raw string

s2  := $"string: (i)$";              // this will interpolate
rs3 := $R"(raw string: (i)$)";       // this will interpolate
rs4 := $R"seq(raw string: (i)$)seq"; // this will interpolate

Did I get it right? If yes, I can rework my code to adjust it to this logic.

I like that there will be one rule instead of two.

@hsutter
Copy link
Owner

hsutter commented Feb 20, 2023

Yes, thanks. I like that you changed the order, you're right that $ before R is clearer.

@filipsajdak
Copy link
Contributor Author

Ok, I will adjust the code.

@switch-blade-stuff
Copy link

Resurrecting old topic, but since the "interpolation --enabled" literals in this proposal are different from the non-interpolating ones, why not consider the $"{}" syntax? This could also allow for (fmt/std::format compatible) formatting support to be added, something like this $"Some int: {some_int} Some int (hex): {some_int:#x}. Less characters to type, looks cleaner, etc.

This also makes it more consistent with other languages, python especially, since this formatting syntax is originally from python afaik.

Having to use ()$ after already specifying $" seems like too much dollars :)

@switch-blade-stuff
Copy link

switch-blade-stuff commented Feb 20, 2023

As for formatting, it could be enabled only if std::format is available (otherwise, only {id} would work).
For example, interpolation codegen could work like this:

  • Is std::format available (and literal is char or wchar_t)?
    • Pass directly to std::format
  • Does the capture have a format specifier?
    • Fail with a "format unsupported" error.
  • Use string concat.

Considering that Cpp2 does not have to deal with backwards-compatibility and that the required compiler versions are currently the most "bleeding edge" ones, it seems like a reasonable thing to do.

Would also neatly integrate a C++20 feature and reduce potential fragmentation, otherwise you'll have another competing choice for string formatting: Cpp2 interpolation, std::format, iostreams and old school printf.
If Cpp2 interpolation uses the same mechanism as std::format, it can be reduced to just 3 (technically 2, since printf is discouraged).

@hsutter
Copy link
Owner

hsutter commented Feb 20, 2023

Understood, but please remember the considerations in Design note: Capture... having a single way to spell a thing consistently is important to me, and I'm currently exploring the path of making interpolation be the same syntax and meaning as all other captures, which is why it looks like a language operator (it is, it's writing an expression inside the string).

@switch-blade-stuff
Copy link

switch-blade-stuff commented Feb 20, 2023

Understood, but please remember the considerations in Design note: Capture... having a single way to spell a thing consistently is important to me, and I'm currently exploring the path of making interpolation be the same syntax and meaning as all other captures, which is why it looks like a language operator (it is, it's writing an expression inside the string).

The thing is, if interpolated strings require a prefix, this already breaks the lambda parallel. And IMHO, it would be better to have 100% consistency with how other strings and formatting works (both in C++20 and other languages), rather than partial consistency with lambdas that have nothing to do with strings (since lambdas capture state, strings just print it).

@switch-blade-stuff
Copy link

Also creating competing standards is to me the opposite of Cpp2's goal of cleaning up and simplifying the language

@hsutter
Copy link
Owner

hsutter commented Feb 20, 2023

@filipsajdak For now please do the $ prefix only for raw strings ($R" or R$" perhaps), to designate a raw string with interpolation. I'm not yet comfortable with requiring $" to start every ordinary string literal that uses interpolation, and don't want to cross that bridge yet. Thanks again for the PR.

@jarzec
Copy link
Contributor

jarzec commented Feb 22, 2023

I have a very naïve question concerning string interpolation. This a new concept from a C++ point of view and one that can (at least to some extent) be replaced by existing std::format. If I am not mistaken there was an idea for cppfront to avoid adding things that can be achieved through a library. To paraphrase Sugarman: std::format is the answer that makes the questions disappear 😉: i.e. one thing less to teach, no escape characters (other than what already exists), no extra pre-/postfixes, no extra escape characters, no trouble with interoperability with e.g. with regexps, less trouble with parsing/transpilation, ...
Let alone that libraries are easier to fix/extend than the core syntax.

On the other hand you might tell me that string interpolation in cppfront can work as a proof of concept of a possible proposal for ISO C++.

@JohelEGP
Copy link
Contributor

@gregmarr
Copy link
Contributor

If I am not mistaken there was an idea for cppfront to avoid adding things that can be achieved through a library

You can't do the same thing with a library. std::format gives you this, where you still need to make sure that your replacement fields in the string line up with your arguments.

  auto mystring = std::format("Good morning {}, the time is {}, and the weather today is {}.", name, time, weather);

String interpolation gives you this:

    auto mystring = "Good morning (name)$, the time is (time)$, and the weather today is (weather)$.";

Now, with language support, you can convert the second into the first, maybe even we can standardize localization hooks at some point, so you get the compiler converting it into something like this:

    auto mystring = std::format(std::localize("Good morning {0}, the time is {1}, and the weather today is {2}."), name, time, weather);

you might tell me that string interpolation in cppfront can work as a proof of concept of a possible proposal for ISO C++.

That basically describes the mission of cppfront.

@switch-blade-stuff
Copy link

You can't do the same thing with a library.

Correct me if i am mistaken, but doesn't std::format alao support named arguments? Or is it an fmt feature that did not get carried over?

@gregmarr
Copy link
Contributor

Correct me if i am mistaken, but doesn't std::format alao support named arguments?

No, just by index.

@gregmarr
Copy link
Contributor

    print("You clicked {button} at {x},{y}.", arg("button", "b1"), arg("x", 50), arg("y", 30));

I'm not quite sure that this qualifies, anyway. You still need to make sure that your arg() strings match your format string, and it's a lot more boilerplate.

@jarzec
Copy link
Contributor

jarzec commented Feb 22, 2023

You can't do the same thing with a library.

Correct me if i am mistaken, but doesn't std::format alao support named arguments? Or is it an fmt feature that did not get carried over?

I think it is only fmt, at least for now 😉.

@jarzec
Copy link
Contributor

jarzec commented Feb 22, 2023

    print("You clicked {button} at {x},{y}.", arg("button", "b1"), arg("x", 50), arg("y", 30));

I'm not quite sure that this qualifies, anyway. You still need to make sure that your arg() strings match your format string, and it's a lot more boilerplate.

It is true that compile-time checks are a strong plus.

@hsutter
Copy link
Owner

hsutter commented Mar 1, 2023

IIRC, it's about generality. See https://github.com/hsutter/cppfront/wiki/Design-note%3A-Capture.

Yes... in a nutshell, I really want to see if interpolation is (can naturally be) just another case of capture, rather than a special feature that works only in strings.

@msadeqhe
Copy link

msadeqhe commented Mar 9, 2023

Symbol " in string literal is already a special character and needs to be escaped. CPP2 can make the interpolated string and non-interpolated string debate to disappear, if CPP2 could support automatic joining "..." and ...$ expressions.

For example, I suggest to change the normal string interpolation syntax from:

x := "His name is (name)$!";

to:

x0 := "His name is "(name)$"!";

The above line is a combination of "His name is " and (name)$ and "!".
Now, the above line can be written as the following lines too (optionally all of them are the same):

x1 := "His name is "      (name)$               "!"; // Spaces are not mandatory
x2 := "His name is " name$ "!"; // Parenthesis are not mandatory
x3 := "His name is "name$"!"; // Also it doesn't require spaces
x  := "His name is (name)$!"; // Current syntax

For that to work, CPP2 has to just automatically join capture expressions and string literals together.
While it's possible to insert spaces betwean "..." and ...$ but maybe it should be restricted and CPP2 shouldn't allow to have white-spaces between them, because it will be easier to find how many space are used inside string. More examples:

y := "I saw "name$" yesterday!";
z := "The result of 2 * 2 is "(2 * 2)$", and you knew it".

By this way, for raw string literals we can write:

rawstr := R"I saw "name$R" yesterday!";

But if CPP2 could support automatically joining "..." and ...$ expressions, it may be better to change the raw string literal from R"..." notaion to "(...)" notation or something similar:

rawstr := "(I saw )"name$"( yesterday!)";

@msadeqhe
Copy link

msadeqhe commented Mar 9, 2023

Also to avoid defining a new raw-string literal, CPP2 can just accept escape sequences between or before or after a string literal, but an escape sequence cannot be used alone without string literal:

x := "First line\nSecond line"; // \n is not new-line
a := "First line"\n"Second line"; // \n is new-line
b := "First line"\n; // \n is new-line
c := \n"Second line"; // \n is new-line
d := \n; // ERROR! but CPP2 can allow it...
e := ""\n; // \n is new-line

And writing string literals such as "First line"\n is very common, it's more readable than "First line\n", because \n is separated by ". Every " which are between the first and the last ", will act like a separator.

While " is the only special character, it can be escaped in a similar way:

f := "I write "\"" character"; // Output: I write " character.

@msadeqhe
Copy link

msadeqhe commented Mar 9, 2023

In addition, string literals can be more integrated into the language for reflection and generation example:

(func.name()+"_wrapper")$: (forward func.params.first().name$: _) = {
    do_wrapped_extra_stuff();
    func.name()$(func.params.first().name$);
}

If CPP2 could allow directly using string literal for function name, then it can be changed to:

func.name()$"_wrapper": (forward func.params.first().name$: _) = {
    do_wrapped_extra_stuff();
    func.name()$(func.params.first().name$);
}

@msadeqhe
Copy link

msadeqhe commented Mar 9, 2023

Finally, combination of string literals can have prefix, the prefix of the first string literal determines the behavior of the combined string literal. For example if CPP2 supports u8 prefix:

x := u8"First "second$" Last"\n;

then u8 will be applied to all First, second$, Last and \n. Consider the middle " characters are just visual separators like digit grouping 1'234'258 or 1_234_258 which brain can ignore them.

@msadeqhe
Copy link

msadeqhe commented Mar 10, 2023

I forgot to mention that all string literals will be raw strings and we can break them into several lines:

x := "First name: "first$\n"Last name: "last$\n"Age: "age$\n"Sex: "sex$;

y := "First name: "first$"
Last name: "last$"
Age: "age$"
Sex: "sex$;

z := "First name: "first$\n
     "Last name: "last$\n
     "Age: "age$\n
     "Sex: "sex$;

All x, y, z have the same value. If the combination of "...", ...$ and \... are in multiple lines, each line can have spaces before and after them, but spaces shouldn't be allowed in each line between the combination.

@filipsajdak
Copy link
Contributor Author

filipsajdak commented Mar 10, 2023

@msadeqhe Thank you for your suggestion. I am not eager to experiment with this more - I am trying to add missing features in cppfront that are present in cpp1, and I am trying to follow papers that Herb mentions (in the end, it is all about C++ and not the new thing).

If you'd like to play with it, please consider how it will impact the other features, e.g., captures in the lambda. (check here: #247) or capture in contracts:

[[post: vec.size() == vec.size()$ + 1]]

I think all use cases are collected here: https://github.com/hsutter/cppfront/wiki/Design-note:-Capture#q-why-use-postfix--for-capture-wouldnt---be-nicer-for-string-interpolation-like-python

Thanks!

@msadeqhe
Copy link

msadeqhe commented Mar 11, 2023

Thank you for clarification and sorry if I shouldn't write the suggestion here. I'll open an issue for it with more detailed information.

@filipsajdak
Copy link
Contributor Author

Your feedback is always welcome! I just want to fix all the bugs and align that code with Herb style to be merged with the main branch.

@filipsajdak
Copy link
Contributor Author

Rebased to newest changes - all regression tests passed.

@filipsajdak filipsajdak force-pushed the fsajdak-add-cpp2-raw-string-literals branch 2 times, most recently from 0f971cb to 0a31284 Compare March 12, 2023 20:50
@filipsajdak filipsajdak force-pushed the fsajdak-add-cpp2-raw-string-literals branch from 0a31284 to c99e6a2 Compare March 22, 2023 08:33
source/common.h Outdated
struct end_visit {
std::string end_seq;
adds_sequences strategy;
auto operator()(const raw_string& part) const -> std::string {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I tried applying this manually and it passes regressions and the test case. The only thing I noticed it that it doesn't build without warnings...

Here's the first one: Unreferenced parameter? Did you mean to use part, or is it not needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I have checked it. I only need the type of the part (raw_string or cpp_code) name of the variable is not required.

I am preparing fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

source/common.h Outdated
auto operator()(const raw_string& part) const -> std::string {
return strategy & on_the_end ? end_seq : "";
}
auto operator()(const cpp_code& part) const -> std::string {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

source/lex.h Outdated
@@ -366,25 +365,8 @@ auto expand_string_literal(
auto first_quote_pos = pos;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable is no longer needed, right? Seems to be unused now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, true. Sorry for not paying attention to details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

source/lex.h Outdated
{
auto const length = std::ssize(text);
auto pos = 0;
auto first_quote_pos = pos;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, unused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that is an artifact from the previous implementation - I have corrected my compiler flags to spot unused variables.

Do you have a list of flags that you have enabled during compilation? I will make sure I have the same set of flags set not to send faulty changes.

I have prepared the fix and I am running local tests to ensure that I did not break anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Helper class that were used for raw string can replace expansion
of string literal.
Raw-string literals that starts with $ (dollar sign) will interpolate.
That means that following code:
```cpp
rs := $R"(m["one"] + m["two"] = (m["one"] + m["two"])$)";
```
will generate follwing cpp1 code:
```cpp
auto rs { R"(m["one"] + m["two"] = )" + cpp2::to_string(cpp2::assert_in_bounds(m, "one") + cpp2::assert_in_bounds(m, "two")) };
```

It handles raw strings in single line and in multiple lines.
It process line by one and stores parts of multiline raw string in separate buffer (multiline_raw_strings).
As there is only one place where there is a check for `$R"`
I have moved this check outside from is_encoding_prefix_and() function.
This prefix is now check directly after maching `$` in lex_line().

Update comment section of is_encoding_prefix_and() to include
all prefixes that are supported by the function.
@filipsajdak filipsajdak force-pushed the fsajdak-add-cpp2-raw-string-literals branch from c99e6a2 to 085e492 Compare March 22, 2023 22:29
@filipsajdak
Copy link
Contributor Author

Apply review comments. All regression tests passes.

@hsutter hsutter merged commit fda45aa into hsutter:main Mar 23, 2023
@hsutter
Copy link
Owner

hsutter commented Mar 23, 2023

Thanks Filip!

@filipsajdak filipsajdak deleted the fsajdak-add-cpp2-raw-string-literals branch March 26, 2023 18:02
@@ -1284,54 +1472,80 @@ auto lex_line(

//G string-literal:
//G encoding-prefix? '"' s-char-seq? '"'
//G encoding-prefix? 'R"' d-char-seq? '(' s-char-seq? ')' d-char-seq? '"'

This comment was marked as resolved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, see #387 (comment).

zaucy pushed a commit to zaucy/cppfront that referenced this pull request Dec 5, 2023
* Add string_parts

* Add raw_string struct

* Refactor expand_string_literal to use string_parts

Helper class that were used for raw string can replace expansion
of string literal.

* Add support for raw string literals in cpp2

* Add raw string interpolation support for cpp2

Raw-string literals that starts with $ (dollar sign) will interpolate.
That means that following code:
```cpp
rs := $R"(m["one"] + m["two"] = (m["one"] + m["two"])$)";
```
will generate follwing cpp1 code:
```cpp
auto rs { R"(m["one"] + m["two"] = )" + cpp2::to_string(cpp2::assert_in_bounds(m, "one") + cpp2::assert_in_bounds(m, "two")) };
```

It handles raw strings in single line and in multiple lines.
It process line by one and stores parts of multiline raw string in separate buffer (multiline_raw_strings).

* Add regression-tests

* Move `$R"` prefix out from is_encoding_prefix_and()

As there is only one place where there is a check for `$R"`
I have moved this check outside from is_encoding_prefix_and() function.
This prefix is now check directly after maching `$` in lex_line().

Update comment section of is_encoding_prefix_and() to include
all prefixes that are supported by the function.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants