Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-line strings proposal #1029

Closed
4 tasks done
andrevidela opened this issue Feb 4, 2021 · 1 comment
Closed
4 tasks done

Multi-line strings proposal #1029

andrevidela opened this issue Feb 4, 2021 · 1 comment

Comments

@andrevidela
Copy link
Collaborator

andrevidela commented Feb 4, 2021

Current Status

This issue splits off from #1004 so that we can keep track of its progress separately here.

ToDo:

  • Implement multi-line strings with the correct trimming behaviour Implement multi-line string #1097
  • add documentation
  • add to changelog
    • explain why old single quote behavior is not allowed to span multiple lines

Summary

We propose to introduce a special syntax for string literals that span multiple lines. They allow to insert larges pieces of text in a singular literal while remaining compatible with string interpolation (#555), raw-strings (#1027) and preserving the natural indentation level of the surrounding program.

Motivation

Strings are ubiquitous in programming and allow programmers to communicate through text. Another use for strings is to carry data that will be used by other systems that read strings. Typical examples of this are:

  • Generate some HTML headers
  • Print out some ascii-art.
  • Write out manually an HTTP request.
  • Output files given a known template.

What every example has in common is to write down a string literal across multiple lines. Additionally, some of them would benefit from string interpolation #555 while others would benefit from avoiding character escapes. The first two examples would benefit from avoid escaping characters, while the last two would benefit from string interpolation.

Here are each example with existing string syntax

HTML header

let header = "<head>\n    <script type=\"text/javascript\" src=\"./tracking.js\"></script>\n</head>"

or

let header = "<head>\n"
          ++ "    <script type=\"text/javascript\" src=\"./tracking.js\"></script>\n" 
          ++ "</head>"

in order to show the layout of the file using vertical space.

ascii-art

This example is taken from the idris compiler itself

banner : String
banner = "     ____    __     _         ___                                           \n" ++
         "    /  _/___/ /____(_)____   |__ \\                                          \n" ++
         "    / // __  / ___/ / ___/   __/ /     Version " ++ showVersion True version ++ "\n" ++
         "  _/ // /_/ / /  / (__  )   / __/      https://www.idris-lang.org           \n" ++
         " /___/\\__,_/_/  /_/____/   /____/      Type :? for help                     \n" ++
         "\n" ++
         "Welcome to Idris 2.  Enjoy yourself!"

As you can see this suffers from two problems: the \ needs to be escaped, and the version number needs to be inserted. In this proposal I will not suggest to fix both of them at the same time but will provide a solution for each of them individually, leading to two different compromises that have different properties.

Write out manually an HTTP request.

testRequest : String
testRequest = 
"POST /cgi-bin/process.cgi HTTP/1.1\n" ++ 
"User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT)\n" ++
"Host: www.tutorialspoint.com\n" ++
"Content-Type: application/x-www-form-urlencoded\n" ++
"Content-Length: length\n" ++
"Accept-Language: en-us\n" ++
"Accept-Encoding: gzip, deflate\n" ++
"Connection: Keep-Alive\n" ++
"\n" ++
"<?xml version=\"1.0\" encoding=\"utf-8\"?>\n"++
"<string xmlns=\"http://clearforest.com/\">string</string>"

This program describes an HTTP request that could be used for testing purposes (for example testing either the parsing, or the execution of the request). As you can see we have to add additional \n in order and escape the inner "

Output files given a known template.

packageDesc : (name, source, execName : String) -> String
packageDesc name source exec =
    "package = " ++ name ++ "\n"
    "sourcedir = " ++ source ++ "\n"
    "executable = " ++ exec

This program allows to describe idris packages containing the information necessary to create a binary out of an idris project. Typically a package manager written in idris would use such programs to generate project files interactively. We will see that this program does not actually benefit from multi-line strings if they do not support string interpolation.

The design goals are therefore as follows:

• Allow strings to be written across multiple lines.
• Allow indentation to be respected in nested programs.
• Remain compatible with string interpolation.

The proposal

Support multi-line strings using the same syntax rules as Swift that is:

"""hello"""

is an error, because the string is not multi-line

"""
hello\"""
this is an escaped multiline
string
\"""
"""

is allowed because the """ token is escaped

let verylongfunctionname = """
    the indentation here is 
    the one that
    everyone expects
    """

trims the leading whitespace using the last """ such that the string has value

the indentation here is 
the one that
everyone expects

This allows to control both the indentation and the trimming. In turn

hello = """
this is an error
        """

This error because the content of the multi-line string has an indentation level that is lower than what we ask to trim from the left.

The following are errors:

hello = """"""
hello = """""""
hello = """
this is not escaped """
"""
hello = """
the closing has to be on the next line """

In more technical terms, """ is both the leading and closing tokens of multi-line strings. The leading """ has to be followed by a line return that will be trimmed out of the string. The end """ has to be only preceded by whitespace. This whitespace allows to pick how many leading white-spaces are trimmed out of every line of the multi-line block.

The previous examples can be rewritten as

let header = """
    <head>
        <script type="text/javascript" src="./tracking.js"></script>
    </head>
    """

As you can see, the trimming whitespace allows to have a natural indentation in the let block.

However multi-line string is not a miracle solution to all our problems, in particular in the following:

banner : String
banner = """
         ____    __     _         ___                                           
        /  _/___/ /____(_)____   |__ \\                                          
        / // __  / ___/ / ___/   __/ /     Version 
    """ ++ showVersion True version ++ """
      _/ // /_/ / /  / (__  )   / __/      https://www.idris-lang.org           
     /___/\\__,_/_/  /_/____/   /____/      Type :? for help                     
    
    Welcome to Idris 2.  Enjoy yourself!
    """

This example suffers from being mixed with concatenation, a problem that is fixed with string interpolation.

testRequest : String
testRequest = """
POST /cgi-bin/process.cgi HTTP/1.1
User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT)
Host: www.tutorialspoint.com
Content-Type: application/x-www-form-urlencoded
Content-Length: length
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Connection: Keep-Alive

<?xml version="1.0" encoding="utf-8"?>
<string xmlns="http://clearforest.com/">string</string>
"""

Allowing to pick the trimming white-space allows to copy-paste code easily from example, specially when they come from external sources (like a request-creating tool like postman), and inserted in code (for tests typically).

The last example does not benefit from multi-line strings despite looking like it would. This is because the feature would need to be compatible with interpolated strings (#555) in order to behave as one would expect:

packageDesc : (name, source, execName : String) -> String
packageDesc name source exec = """
    package = {name}
    sourcedir = {source}
    executable = {exec}
    """

Interaction with string interpolation

String interpolation #555 has not been implemented, however, the last example clearly shows that this feature is incomplete without it. We could salvage the same design from #555 and use s""" as starting token of interpolated multi-line strings. One issue we notice is that the braces{ symbol might not be the correct symbol to use anymore since one might be tempted to write json templates using the multi-line syntax:

let jsonUser = """
\{ "name" = {name}
, "date" = \{ "day" : {day}, 
           "month" : {month}, 
           "year" : {year} \}
\}
"""

This is because { and } are defined in #555 as delimiters for interpolated expressions. Which means they need to be escaped for defining json objects.

Since JSON is a very common format to use and expose as API. I suggest we amend #555 in order to replace the delimiters for interpolated strings to \{ and }.

With the amendement the previous example would look like this:

let jsonUser = """
{ "name" = {name}
, "date" = { "day" : \{day}, 
           "month" : \{month}, 
           "year" : \{year} }
}
"""

Which allows the JSON structure to be more visible.

Conclusion

Multi-line strings particularly shine when implemented along with its sister features string interpolation and raw string. Using both of them we can achieve string manipulation that is easy to read and offers predictable semantics. Mundane tasks like displaying the Idris banner become obvious to read.

@mattpolzin
Copy link
Collaborator

I think we could close this one out as done (🎉) unless I'm missing something.

@gallais gallais closed this as completed Nov 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants