Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String Interpolation #165

Merged
merged 1 commit into from
Dec 11, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
161 changes: 161 additions & 0 deletions DIPs/4NNN-WGB.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
# String Interpolation

| Field | Value |
|-----------------|-----------------------------------------------------------------|
| DIP: | (number/id -- assigned by DIP Manager) |
| Review Count: | 0 (edited by DIP Manager) |
| Author: | Walter Bright walter@digitalmars.com |
| Implementation: | (links to implementation PR if any) |
| Status: | Will be set by the DIP manager (e.g. "Approved" or "Rejected") |

## Abstract

Instead of a format string followed by an argument list, string interpolation enables
embedding the arguments in the string itself.


## Contents
* [Rationale](#rationale)
* [Prior Work](#prior-work)
* [Description](#description)
* [Breaking Changes and Deprecations](#breaking-changes-and-deprecations)
* [Reference](#reference)
* [Copyright & License](#copyright--license)
* [Reviews](#reviews)

## Rationale

While the conventional format string followed by the argument list is perfectly fine for
short strings and a small number of arguments, it tends to break down with longer strings
with many arguments. Omitting an argument, having an extra argument, and having a mismatch
between a format specifier and its corresponding argument are common errors. By
embedding the argument in the format string tends to eliminate these errors. It's easier
to read and visually easier to review for correctness.

## Prior Work

* Interpolated strings have been implemented and well-received in many languages.
For many such examples, see [String Interpolation](https://en.wikipedia.org/wiki/String_interpolation).
* Jason Helson has submitted a DIP [String Syntax for Compile-Time Sequences](https://github.com/dlang/DIPs/pull/140).
* [Adam's string interpolation proposal](http://dpldocs.info/this-week-in-d/Blog.Posted_2019_05_13.html)

## Description

```
writefln(i"I ate %apples and %{d}bananas totalling %(apples + bananas) fruit.");
```
gets rewritten as:
```
writefln("I ate %s and %d totalling %s fruit.", apples, bananas, apples + bananas);
```
It will also work with printf:

```
printf(i"I ate %{d}apples and %{d}bananas totalling %{d}(apples + bananas) fruit.\n");
```
becomes:
```
printf("I ate %s and %d totalling %s fruit.\n", apples, bananas, apples + bananas);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should they all be d's?

```

The `{d}` syntax is for when the format specifier needs to be anything other that `s`,
which is the default. What goes between the `{` `}` is not specified so this capability
can work with foreseeable format specification improvements without needing to update
the core language. It also makes interpolated strings agnostic about what the format
specifications are, as long as they start with `%`.


The interpolated string starts as a special string, `InterpolatedString`, which is the same as a
`DoubleQuotedString` but with an `i` prefix and no `StringPostFix`. This appears in the grammar
as an `InterpolatedExpression` which is under `PrimaryExpression`.

`InterpolatedExpresssion`s undergo semantic analysis similar to `MixinExpression`.
The string scanned from left to right, according to the following grammar:

```
Elements:
Element
Element Elements

Element:
Character
'%%'
'%' Argument
'%' FormatString Argument

FormatString:
'{' FormatString '}'
CharacterNoBraces

CharacterNoBraces:
CharacterNoBrace
CharacterNoBrace CharacterNoBraces

CharacterNoBrace:
characters excluding '{' and '}'


Argument:
Identifier
Expression

Expression:
'(' Expression ')'
CharacterNoParens

CharacterNoParens:
CharacterNoParen
CharacterNoParen CharacterNoParens

CharacterNoParen:
characters excluding '(' and ')'
```

The `InterpolatedExpression` is converted to a tuple expression, where the first element
is the transformed string literal, and the `Argument`s form the rest of the elements.

The transformed string literal is constructed as follows:

If the `Element` is:

* `Character`, it is written to the output string.
* `'%%'`, a '%' is written to the output string.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'%%', a '%' is written to the output string.

Shouldn't it better be that %% stays as a %% in the resulting format string. Or else one will need to put %%%% in the interpolated string to get a % in the result of the writef or the printf.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you write a % to the output otherwise?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If during transformation of the interpolated string the %% becomes a %, then the format string will contain an isolated % which is an error in a format string. A double percent in the interpolated string has to stay a double percent in the format string, or else you would have to put 4 % chars.
The transformation is
interpolated string => format string => output
Example:
writef(i"Percent %{d}value %%") becomes
writef("Percent %d %", value) which is an error (at least undefined behaviour for printf.

* `'%' Argument` then '%s' is written to the output string.
* `'%' '{' FormatString '}' Argument` then '%' `FormatString` is written to the output string.

If the `Argument` is an `Identifier` it is inserted in the tuple as an `IdentifierExpression`.
If the `Argument` is an `Expression` it is lexed and parsed (including the surrounding parentheses)
like `MixinExpressions` and inserted in the tuple as an `Expression`.

Compile time errors will be generated if the `Elements` do not fit the grammar.

### Limitations

Interpolated string formats cannot be mixed with conventional elements:

```
writefln(i"making %bread using %d ingredients", 6); // error, %d is not a valid element
```

Interpolated strings won't work with `*` format specifications that require extra arguments.
This will produce a runtime error with `writefln` and undefined behavior with
`printf`, because the arguments won't line up with the formats. The compiler does not check
the formats for validity.

No attempt is made to check that the format specification is compatible with the argument type.
Making such checks would require that detailed knowledge of `printf` and `writef` be hardwired
into the core language, as well as knowledge of which formatting function is being called.


## Breaking Changes and Deprecations

Since the interpolated string is a new token, no existing code is broken.

## Reference

## Copyright & License
Copyright (c) 2019 by the D Language Foundation

Licensed under [Creative Commons Zero 1.0](https://creativecommons.org/publicdomain/zero/1.0/legalcode.txt)

## Reviews