-
Notifications
You must be signed in to change notification settings - Fork 259
Add cpp2 raw string literals support with interpolation #251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cpp2 raw string literals support with interpolation #251
Conversation
I found some bugs with this patch. Working on a patch to fix it. |
4f3cb9f
to
d48545d
Compare
@hsutter, the code is fixed. All tests passed. I have added a test covering edge cases I found along the way. I decided to add |
I don't think interpolation in raw string literals makes sense. After all, you use raw string literals to avoid quoting. If you start interpreting the content of a raw string that defeats the purpose of raw strings. cpp2 should support raw strings, of course, but it should not interpret the content but pass it along as is. |
@neumannt you are right. This is misleading at a minimum. I like the possibility of writing string without escaping the quotation marks that will interpolate. But raw string should be a raw string 😄 Point taken. Maybe I will propose a way to enable interpolation. |
Just spitballing here, I want to think about this more: I can see value in "fully raw" string literals, and "raw plus interpolation" string literals. One option could be to have the usual Of course, doing that would resurrect previous requests that normal Requiring the I know we've discussed the above prefix-to-enable-interpolation idea before in another issue, and I resisted it. But now that we have the additional use case of raw string literals, where there also seems to be a reason to let users distinguish string literals that allow interpolation and ones that don't, it's an additional data point that might point toward "string literals 'want to' be prefixed for interpolation"... |
Editing to answer my own question:
Yes, we want the 2x2 matrix: (a) because we want to distinguish raw with/without interpolations even if it's just to customize the prefix + the usage examples already shown in previous comments, and (b) eliminating one option wouldn't actually reduce concept count (we would still need to teach two concepts), it would just reduces expressiveness (because the concepts wouldn't be orthogonal), which is usually bad -- we prefer general orthogonal combinable concepts. So perhaps it might be most natural to have the |
I have worked a little bit more on that and currently prepared an implementation that uses
That means that i := 42;
rs1 := R"(raw string: (i)$)"; // this will not interpolate, pure raw string
rs2 := R"seq(raw string: (i)$)seq"; // this will not interpolate, pure raw string
rs3 := R"$(raw string: (i)$)$"; // this will interpolate
rs4 := R"$seq(raw string: (i)$)$seq"; // this will interpolate Currently, This is my current state of work. @hsutter, as I understand you correctly, you propose adding a So, the above example can be rewritten to: i := 42;
s1 := "string: (i)$"; // this will not interpolate, pure string
rs1 := R"(raw string: (i)$)"; // this will not interpolate, pure raw string
rs2 := R"seq(raw string: (i)$)seq"; // this will not interpolate, pure raw string
s2 := $"string: (i)$"; // this will interpolate
rs3 := $R"(raw string: (i)$)"; // this will interpolate
rs4 := $R"seq(raw string: (i)$)seq"; // this will interpolate Did I get it right? If yes, I can rework my code to adjust it to this logic. I like that there will be one rule instead of two. |
Yes, thanks. I like that you changed the order, you're right that |
Ok, I will adjust the code. |
Resurrecting old topic, but since the "interpolation --enabled" literals in this proposal are different from the non-interpolating ones, why not consider the This also makes it more consistent with other languages, python especially, since this formatting syntax is originally from python afaik. Having to use |
As for formatting, it could be enabled only if
Considering that Cpp2 does not have to deal with backwards-compatibility and that the required compiler versions are currently the most "bleeding edge" ones, it seems like a reasonable thing to do. Would also neatly integrate a C++20 feature and reduce potential fragmentation, otherwise you'll have another competing choice for string formatting: Cpp2 interpolation, |
Understood, but please remember the considerations in Design note: Capture... having a single way to spell a thing consistently is important to me, and I'm currently exploring the path of making interpolation be the same syntax and meaning as all other captures, which is why it looks like a language operator (it is, it's writing an expression inside the string). |
The thing is, if interpolated strings require a prefix, this already breaks the lambda parallel. And IMHO, it would be better to have 100% consistency with how other strings and formatting works (both in C++20 and other languages), rather than partial consistency with lambdas that have nothing to do with strings (since lambdas capture state, strings just print it). |
Also creating competing standards is to me the opposite of Cpp2's goal of cleaning up and simplifying the language |
@filipsajdak For now please do the |
I have a very naïve question concerning string interpolation. This a new concept from a C++ point of view and one that can (at least to some extent) be replaced by existing On the other hand you might tell me that string interpolation in cppfront can work as a proof of concept of a possible proposal for ISO C++. |
IIRC, it's about generality. See https://github.com/hsutter/cppfront/wiki/Design-note%3A-Capture. |
You can't do the same thing with a library.
String interpolation gives you this:
Now, with language support, you can convert the second into the first, maybe even we can standardize localization hooks at some point, so you get the compiler converting it into something like this:
That basically describes the mission of cppfront. |
Correct me if i am mistaken, but doesn't |
No, just by index. |
I'm not quite sure that this qualifies, anyway. You still need to make sure that your |
I think it is only |
It is true that compile-time checks are a strong plus. |
Yes... in a nutshell, I really want to see if interpolation is (can naturally be) just another case of capture, rather than a special feature that works only in strings. |
Symbol For example, I suggest to change the normal string interpolation syntax from:
to:
The above line is a combination of
For that to work, CPP2 has to just automatically join capture expressions and string literals together.
By this way, for raw string literals we can write:
But if CPP2 could support automatically joining
|
Also to avoid defining a new raw-string literal, CPP2 can just accept escape sequences between or before or after a string literal, but an escape sequence cannot be used alone without string literal:
And writing string literals such as While
|
In addition, string literals can be more integrated into the language for reflection and generation example:
If CPP2 could allow directly using string literal for function name, then it can be changed to:
|
Finally, combination of string literals can have prefix, the prefix of the first string literal determines the behavior of the combined string literal. For example if CPP2 supports
then |
I forgot to mention that all string literals will be raw strings and we can break them into several lines:
All |
@msadeqhe Thank you for your suggestion. I am not eager to experiment with this more - I am trying to add missing features in cppfront that are present in cpp1, and I am trying to follow papers that Herb mentions (in the end, it is all about C++ and not the new thing). If you'd like to play with it, please consider how it will impact the other features, e.g., captures in the lambda. (check here: #247) or capture in contracts:
I think all use cases are collected here: https://github.com/hsutter/cppfront/wiki/Design-note:-Capture#q-why-use-postfix--for-capture-wouldnt---be-nicer-for-string-interpolation-like-python Thanks! |
Thank you for clarification and sorry if I shouldn't write the suggestion here. I'll open an issue for it with more detailed information. |
Your feedback is always welcome! I just want to fix all the bugs and align that code with Herb style to be merged with the main branch. |
0455de1
to
ad74942
Compare
Rebased to newest changes - all regression tests passed. |
0f971cb
to
0a31284
Compare
0a31284
to
c99e6a2
Compare
source/common.h
Outdated
struct end_visit { | ||
std::string end_seq; | ||
adds_sequences strategy; | ||
auto operator()(const raw_string& part) const -> std::string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I tried applying this manually and it passes regressions and the test case. The only thing I noticed it that it doesn't build without warnings...
Here's the first one: Unreferenced parameter? Did you mean to use part
, or is it not needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I have checked it. I only need the type of the part (raw_string
or cpp_code
) name of the variable is not required.
I am preparing fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
source/common.h
Outdated
auto operator()(const raw_string& part) const -> std::string { | ||
return strategy & on_the_end ? end_seq : ""; | ||
} | ||
auto operator()(const cpp_code& part) const -> std::string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
source/lex.h
Outdated
@@ -366,25 +365,8 @@ auto expand_string_literal( | |||
auto first_quote_pos = pos; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable is no longer needed, right? Seems to be unused now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, true. Sorry for not paying attention to details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
source/lex.h
Outdated
{ | ||
auto const length = std::ssize(text); | ||
auto pos = 0; | ||
auto first_quote_pos = pos; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto, unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that is an artifact from the previous implementation - I have corrected my compiler flags to spot unused variables.
Do you have a list of flags that you have enabled during compilation? I will make sure I have the same set of flags set not to send faulty changes.
I have prepared the fix and I am running local tests to ensure that I did not break anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Helper class that were used for raw string can replace expansion of string literal.
Raw-string literals that starts with $ (dollar sign) will interpolate. That means that following code: ```cpp rs := $R"(m["one"] + m["two"] = (m["one"] + m["two"])$)"; ``` will generate follwing cpp1 code: ```cpp auto rs { R"(m["one"] + m["two"] = )" + cpp2::to_string(cpp2::assert_in_bounds(m, "one") + cpp2::assert_in_bounds(m, "two")) }; ``` It handles raw strings in single line and in multiple lines. It process line by one and stores parts of multiline raw string in separate buffer (multiline_raw_strings).
As there is only one place where there is a check for `$R"` I have moved this check outside from is_encoding_prefix_and() function. This prefix is now check directly after maching `$` in lex_line(). Update comment section of is_encoding_prefix_and() to include all prefixes that are supported by the function.
c99e6a2
to
085e492
Compare
Apply review comments. All regression tests passes. |
Thanks Filip! |
@@ -1284,54 +1472,80 @@ auto lex_line( | |||
|
|||
//G string-literal: | |||
//G encoding-prefix? '"' s-char-seq? '"' | |||
//G encoding-prefix? 'R"' d-char-seq? '(' s-char-seq? ')' d-char-seq? '"' |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, see #387 (comment).
* Add string_parts * Add raw_string struct * Refactor expand_string_literal to use string_parts Helper class that were used for raw string can replace expansion of string literal. * Add support for raw string literals in cpp2 * Add raw string interpolation support for cpp2 Raw-string literals that starts with $ (dollar sign) will interpolate. That means that following code: ```cpp rs := $R"(m["one"] + m["two"] = (m["one"] + m["two"])$)"; ``` will generate follwing cpp1 code: ```cpp auto rs { R"(m["one"] + m["two"] = )" + cpp2::to_string(cpp2::assert_in_bounds(m, "one") + cpp2::assert_in_bounds(m, "two")) }; ``` It handles raw strings in single line and in multiple lines. It process line by one and stores parts of multiline raw string in separate buffer (multiline_raw_strings). * Add regression-tests * Move `$R"` prefix out from is_encoding_prefix_and() As there is only one place where there is a check for `$R"` I have moved this check outside from is_encoding_prefix_and() function. This prefix is now check directly after maching `$` in lex_line(). Update comment section of is_encoding_prefix_and() to include all prefixes that are supported by the function.
The current implementation of cppfront does not support raw string literals on the cpp2 side. The raw strings are supported only on the cpp1 side. This change introduces:
L
,u8
,u
,U
),$
before raw-string-literal),That makes the following code:
Generates the following cpp1 code (skipping boilerplate):
All regression tests pass.
Limitations
cppfront accepts all prefixes for string literals (regular string literals and raw string literals). Unfortunately, wide characters are not supported by string interpolation - the generated code is OK, but the
cpp2::to_string()
functions producestd::string
, incompatible with strings that use other character types.