-
Notifications
You must be signed in to change notification settings - Fork 381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raw strings + multi-line string discussion #1004
Comments
In the example of the interpolated multi-line banner, why the "{" is escaped? |
@andylokandy in the section about how Multi-line strings interact with string interpolation I suggest we change #555 to
|
Why not adopt the common |
Looks like #999 now implements the raw string proposal thanks to @andylokandy |
@andylokandy Either of them could work in principle. However, I'd rather not overload the meaning of |
Since raw string and interpolation string are two non-orthogonal features, I think it's ok to share the same special escape character. Further, with main : IO ()
main =
do
let name = generateName
let body = generateBody complexArg0 complexArg1 complexArg2
let content = f"""
module Test
#name : IO ()
#name = #body
"""
writeToFile content "Test.idr" |
That looks okay but it adds |
I feel good with adding |
I am wondering if we can completely get rid of the leading |
This proposal aims to motivate and explain the design behind two new string features in Idris: Raw and multi-line strings. Each proposal has a motivation, from which follow design goals. The proposal answers the design goals at the end, a discussion section talks about the different alternatives and how they do, or do not answer the design goals stated. This 3000 words long expose ends with a note on implementation and an invitation to discuss the design goals.
Multi-line strings
Mutli-line strings are a feature that allow to write string literals across multiple lines.
Motivation
Strings are ubiquitous in programming and allow programmers to communicate through text. Another use for strings is to carry data that will be used by other systems that read strings. Typical examples of this are:
What every example has in common is to write down a string literal across multiple lines. Additionally, some of them would benefit from string interpolation #555 while others would benefit from avoiding character escapes. The first two examples would benefit from avoid escaping characters, while the last two would benefit from string interpolation.
Here are each example with existing string syntax
HTML header
or
in order to show the layout of the file using vertical space.
ascii-art
This example is taken from the idris compiler itself
As you can see this suffers from two problems: the
\
needs to be escaped, and the version number needs to be inserted. In this proposal I will not suggest to fix both of them at the same time but will provide a solution for each of them individually, leading to two different compromises that have different properties.Write out manually an HTTP request.
This program describes an HTTP request that could be used for testing purposes (for example testing either the parsing, or the execution of the request). As you can see we have to add additional
\n
in order and escape the inner"
Output files given a known template.
This program allows to describe idris packages containing the information necessary to create a binary out of an idris project. Typically a package manager written in idris would use such programs to generate project files interactively. We will see that this program does not actually benefit from multi-line strings if they do not support string interpolation.
The design goals are therefore as follows:
• Allow strings to be written across multiple lines.
• Allow indentation to be respected in nested programs.
• Remain compatible with string interpolation.
The proposal
Support multi-line strings using the same syntax rules as Swift that is:
is an error, because the string is not multi-line
is allowed because the
"""
token is escapedtrims the leading whitespace using the last
"""
such that the string has valueThis allows to control both the indentation and the trimming. In turn
This error because the content of the multi-line string has an indentation level that is lower than what we ask to trim from the left.
The following are errors:
In more technical terms,
"""
is both the leading and closing tokens of multi-line strings. The leading"""
has to be followed by a line return that will be trimmed out of the string. The end"""
has to be only preceded by whitespace. This whitespace allows to pick how many leading white-spaces are trimmed out of every line of the multi-line block.The previous examples can be rewritten as
As you can see, the trimming whitespace allows to have a natural indentation in the let block.
However multi-line string is not a miracle solution to all our problems, in particular in the following:
This example suffers from being mixed with concatenation, a problem that is fixed with string interpolation.
Allowing to pick the trimming white-space allows to copy-paste code easily from example, specially when they come from external sources (like a request-creating tool like postman), and inserted in code (for tests typically).
The last example does not benefit from multi-line strings despite looking like it would. This is because the feature would need to be compatible with interpolated strings (#555) in order to behave as one would expect:
Interaction with string interpolation
String interpolation #555 has not been implemented, however, the last example clearly shows that this feature is incomplete without it. We could salvage the same design from #555 and use
s"""
as starting token of interpolated multi-line strings. One issue we notice is that the braces{
symbol might not be the correct symbol to use anymore since one might be tempted to write json templates using the multi-line syntax:This is because
{
and}
are defined in #555 as delimiters for interpolated expressions. Which means they need to be escaped for defining json objects.Since JSON is a very common format to use and expose as API. I suggest we amend #555 in order to replace the delimiters for interpolated strings to
\{
and}
.With the amendement the previous example would look like this:
Which allows the JSON structure to be more visible.
Raw-strings proposal
This proposal might seem unrelated at first glance but is here to address its overlap wrt multi-line strings. Raw strings allow to write string literal without escaping any characters.
Motivation
Raw strings is a feature that allow programmers to write strings between special delimiters such that they do not have to escape anything and the string is interpreted as-is without any transformation from the compiler. This allows for the following use cases:
• Writing regex expression while avoiding escape characters
• Writing strings which are heavy in
"
which need escape characters• Writing strings which contains special character that need to be rendered as is, specially characters like
\n
Without raw strings, this is how the given example would be expressed in Idris:
Those example get particularly bad when mixed together:
Additionally, raw strings are useful when pasting code from other sources, typically one might want to insert a bash script as a literal string as part of a CI jobs:
The design goal for this proposal is thefore as follow:
• Allow any string to be represented within raw string, including line breaks
• Forbid any kind of escape sequences
The proposal
Reuse the syntax and semantics of Rust's raw strings
r#"…"#
. It comes with the following benefits:• Already familiar for Rust programmers
• Handles any string without escape characters using dynamic delimiters for the string
• The changes from normal strings are minimal
This means the previous examples can be rewritten like this:
You will notice the second line is wrapping to the next line, that is beacause raw strings also interpret the carriage return from the text editor. This allows programmers to copy paste code safely without having character be accidentally escaped or white space trimmed like happens with string interpolation.
Additionally, the Rust implementation allows to declare exactly what the closing delimiter should be by using the opening delimiter. This allows to express every string witout restriction on its content. Take this HTML example
This would not parse because the sequence
"#
appears in the content of the string. However we can change the delimiters for raw strings to be"##
instead in order to insert"#
while avoiding escapes:FInally, the bash example can be pasted without risk of errors:
Overlap and difference with multi-line strings
One might notice that we now have two ways to writing a line across multiple lines, we can use either multi-line strings:
or we can use raw strings:
The difference is that the first string allows characters to escape, while the second interpret the content between quotes literally. For reference this is how each of those strings will be interpreted:
and
Again, this is because raw strings do not trim or interpret the content between quotes, it parses everything in quote literally, including line breaks.
Additionally, by definition, raw strings cannot be interpolated, since they interpret their content witout any modification whatsoever. Supporting interpolation would go against the design goal nr2 because it would include escape sequence for the interpolation.
If we take the banner example we see that there is no mix of feature that allow to both bypass the escaping of character and allow for interpollation:
where
\
has to be escaped because it itself is the escape characterHere we do not need to escape
\
anymore but\{
is now interpreted as raw text rather than an instruction to interpolate the string.Discussion and alternatives
The reason why those two proposals are brought up together is that both of those proposals is that they both have the ability to write strings over multiple lines. The great thing about those two proposals is that they borrow from other proposals which already have gone through their stages of discussion and design.
Here I will summarise some examples of already implemented proposals and why I personally think are unsuitable for Idris:
Swift's raw strings
Swift uses
#"…"#
as delimiters for raw strings, and the same delimiters for multi-line strings with"""
. The proposal is here:https://github.com/apple/swift-evolution/blob/master/proposals/0200-raw-string-escaping.md
The issue with this design is that is allows for
#
to be an escape character for interpolated strings. Which blurs the line between what is actually raw and what is still interpreted.This aspect is the one I am the least confident about since it can fix the problem we see on the
banner
example. But clashes with the second design goal of raw strings.Python's multi-line and raw strings
Python uses
"""
as multi-line strings but does not allow to trim prefix whitespace on each line, which make it incompatible with Idris' culture of indentation:Raw strings are delcared by prepending
r
in front of a string, they work with multi-line strings as well. However they have the problem that they do not allow every single character to appear in raw strings since the closing delimiter could appear within the string, and since the delimiter is not customisable they cannot be represented. This clashes with the first design goal of raw stringsScala's raw and multi-line strings
Scala has a very interesting way to go about this because it does all the string transformations at runtime with the pattern
id"…"
whereid
is a method call which will use the string literal following it as an argument.Scala also supports multiline strings with
"""
which does not perform any trimming either in term of line breaks or leading whitespace, however it supports a method calledstripMargin
which remove leading whitespace using|
as a delimiter:This works but is not very ideal since it incurs runtime execution for something that could be done at compile-time. This is specially important for proofs on multi-line strings.
C++ Raw multi-line strings
As far as I know, c++ does not have interpolated string. It has multiline strings through two mechanisms:
or
The second one is a raw string, which means leading whitespace won't be trimmed away. Additionally, this syntax has the benefit of allowing arbitrary delimiters for multiline strings. However, allowing arbitrary delimiters might lead to completely unreadable code. Additionally, the semantics of the double quote
"
has been changed to mean "start selecting the delimiter" since the delimiter is between the double quote and the left parenthesis. This does meet the design goals of raw strings, but might be more complicated than necessary as the discussion on the rust proposal already pointed out.Implementation considerations
This document is not a way to encourage implementing both those features at the same time. As of today, we have a Pull Request (#999) which almost implements multi-line strings. I suggest we perform the following changes to it so that it conforms to the design outlined here:
• Use the second delimiter to pick the whitespace trimming
• Forbid anyother character than a line break after the oppening token
• Forbid any other character others than whitespace before the closing token
As for raw strings, if someone needs them very quickly we can start by implementing
r#"…"#
as delimiters for raw strings without being able to adjust the delimiters. And implement the dynamic delimiters at a later date.Finally, since technically speaking multi-line strings and string interpolation are orthogonal, we can implement interpolation at a later date on multi-line strings.
Issues tracking the progress of each of those features should be openned so that we can track the state of their implementation. Issue #555 should be amended to mention support for multi-line strings.
Conclusion
It seems both those proposals capture something we want in the language, I propose to talk here about them. I am particularly interested in the design goal of raw strings, that is, supporting mult-line raw strings and forbid any kind of escape characters. Maybe we do not actually want this in the language and raw string should be single line and have escape characters!
Additionally, I would suggest we keep with existing solutions without modifications, such that users comming from either rust or swift don't get tricked into thinking the syntax works the way the same way as their host language, when in reality the behaviour has been altered.
Finally I propose to close this issue once the discussion is over and split those issues in two in order to keep track of them separately. Thank you for your time!
The text was updated successfully, but these errors were encountered: