Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String Interpolation #165

Merged
merged 1 commit into from
Dec 11, 2019
Merged

String Interpolation #165

merged 1 commit into from
Dec 11, 2019

Conversation

WalterBright
Copy link
Member

No description provided.

@wilzbach
Copy link
Member

Did you consider @adamdruppe's proposal on this?
http://dpldocs.info/this-week-in-d/Blog.Posted_2019_05_13.html

tl;dr: custom tuple-like struct which allows functions to customize output (think writeln or SQL statements).

@andre2007
Copy link

@WalterBright is my understanding correct that this proposal only works for direct stdout but assigning to e.g. a string variable is not possible?
This would be a big limitation.

@WalterBright
Copy link
Member Author

This would be a big limitation.

I'm not so sure about that. I use format strings all the time, and almost never use anything but a literal. If it does become a serious problem, this could be supported:

enum s = "hello %bar";
format(i"" ~ s);

But it doesn't really matter if interpolated strings don't fit every niche. They just have to fit the most used ones, as there's a fallback - use the current method.

@WalterBright
Copy link
Member Author

BTW, the goal for this design was to minimize the amount of typing a user has to do, and make it usable for printf as well as writef.

@WalterBright
Copy link
Member Author

Did you consider @adamdruppe's proposal on this?

Not specifically, though I knew people were working and thinking about it. Adam should do it as a proper DIP. I've amended the DIP to add references to Adam's and Jason's work.

@WalterBright
Copy link
Member Author

This would be a big limitation.

Thinking about this a bit more, the end result of an InterpolatedString is a tuple expression. Anything you can do with a tuple expression you can do with an InterpolatedString.

@andre2007
Copy link

Yes, all of my mid size D applications do not directly write to stdout but uses library functions for e.g. adding colors. Also if you want to build an http server (http response body), string interpolation would be really useful.
A generic solution which works for stdout but also for assigning a string variable or a function argument would be really nice.

Also in other languages string interpolation is not limited to stdout(for the languages I am aware of).

@WalterBright
Copy link
Member Author

I think there's a misunderstanding here. None of this DIP restricts it to stdout.

@andre2007
Copy link

Yes, it was a misunderstanding. You may add an example (string variable assignment, string argument assignment) to this DIP to make it clear for other readers. Thanks.

@atilaneves
Copy link

I don't understand how // error, %d is not a valid element is the case given the rule in the dip for when Element is Character.

@marler8997
Copy link
Contributor

My related PR: dlang/dmd#7988

@mdparker mdparker mentioned this pull request Sep 26, 2019
@marler8997
Copy link
Contributor

My comment here addresses my concerns with lowering interpolated strings to format string tuples. dlang/dmd#7988 (comment)

Lowering the interpolated string to a tuple of strings and expressions seems more versatile.

@anon17
Copy link

anon17 commented Sep 30, 2019

Pull 7988 mentions HTML generation as a use case for string interpolation, this article uses such vulnerability in Firefox.

@adamdruppe
Copy link
Contributor

Tuple based interpolation can do HTML generation more sanely (though I think html strings are mistakes anyway, but that's another thing) because as tuples, the types are available and then the function could - in theory at least - do some introspection and proper encoding based on that information.

but indeed i would be generally skeptical, just tuples with type information - like my proposal tweak talked about - makes it possible to actually do it right.

@ntrel
Copy link
Contributor

ntrel commented Oct 15, 2019

No attempt is made to check that the format specification is compatible with the argument type. Making such checks would require that detailed knowledge of printf and writef be hardwired into the core language

This is false, the language could lower an interpreted string to a Phobos function that checks the format string at compile time. E.g. format!”format_string”(args).

@boxed
Copy link

boxed commented Oct 15, 2019

Did you look at how swift does string interpolation? The Wikipedia reference here points to a vastly simplified example for swift that doesn't meantion how it handles escaping much smoother than any system I've seen before.

If the `Element` is:

* `Character`, it is written to the output string.
* `'%%'`, a '%' is written to the output string.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'%%', a '%' is written to the output string.

Shouldn't it better be that %% stays as a %% in the resulting format string. Or else one will need to put %%%% in the interpolated string to get a % in the result of the writef or the printf.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you write a % to the output otherwise?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If during transformation of the interpolated string the %% becomes a %, then the format string will contain an isolated % which is an error in a format string. A double percent in the interpolated string has to stay a double percent in the format string, or else you would have to put 4 % chars.
The transformation is
interpolated string => format string => output
Example:
writef(i"Percent %{d}value %%") becomes
writef("Percent %d %", value) which is an error (at least undefined behaviour for printf.

```
becomes:
```
printf("I ate %s and %d totalling %s fruit.\n", apples, bananas, apples + bananas);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should they all be d's?

@TurkeyMan
Copy link
Contributor

TurkeyMan commented Oct 15, 2019

@WalterBright

I'm not so sure about that. I use format strings all the time, and almost never use anything but a literal.

The cases where interpolated strings are useful is when the string is long... and in my experience, where I often generate shim or binding functions, those strings are always synthesised. They're rarely literals... so this feature is absolutely something I've wanted for 10 years, but won't actually suit my one common use case.

mixin(i"" ~ generatedCode) doesn't feel super satisfying; it looks more like a workaround to me. Maybe there's a better suggestion?

@SMietzner
Copy link

SMietzner commented Oct 15, 2019

I think the better way of interpolation rules would be:

  • a % is always followed directly by a format string argument OR opening curly brace
  • curly braces are mandatory for arguments

e.g.:
writefln(i"I ate %{apples} and %d{bananas} totalling %d{apples + bananas} fruit.");

This would

  • keep existing format rules / syntax as it is right now
  • eliminate parentheses since %( x + y) could be some code snippet, but %{ x + y } could not (with respect to the use of interpolated strings in mixins)

@SMietzner
Copy link

The much better way of interpolation rules would be:

  • a % is always followed directly by an opening curly brace
  • format string argument are an optional second argument separated by a colon

e.g.:

writefln(i"I ate %{apples} and %{bananas, "d"} totalling %{apples + bananas, "d"} fruit.");

This would

  • keep readability high
  • minimize ambiguity with respect to the use of interpolated strings in mixins:
    %( x + y) could be some code snippet, but %{ x + y } could not
  • make sense since the argument is actually required and the specifier is actually optional

@boxed
Copy link

boxed commented Oct 15, 2019

Look at Swift! They looked at what other languages did and made something nicer. There is no special interpolated string: \(expr) is used. Standard escaping rules apply but one can also do #"\#(str)"#. The number of # before and after the string modifies the escape sequence inside. You can always write literals as the literal text you want and still be able to do string interpolation.

@SMietzner
Copy link

Look at Swift! They looked at what other languages did and made something nicer. There is no special interpolated string: (expr) is used. Standard escaping rules apply but one can also do #"#(str)"#. The number of # before and after the string modifies the escape sequence inside. You can always write literals as the literal text you want and still be able to do string interpolation.

So how is formatting handled in swifts case? Formatting specifiers?

@boxed
Copy link

boxed commented Oct 15, 2019

Well they punted a bit on that I'm afraid, so people use function calls, operator overloading or string conversion operators. So here is a good opportunity to one up swift!

@adamdruppe
Copy link
Contributor

I still say we should just do the tuple of structs thing. That's a very simple rule and by far the most flexible - it doesn't even have to yield strings!

@AndrejMitrovic
Copy link
Contributor

Why use % in the string instead of something like $ which is more common in other languages and doesn't conflict with the existing usage of %?

It's very confusing to see something like %set in a format string and then wonder, is this %s followed by the string "et", or it's just trying to refer to the set variable? With $set, it's immediately obvious because it stands out.

@dejlek
Copy link

dejlek commented Oct 16, 2019

Why the percent or dollar? - If the string is interpolated, then the parser should know that everything inside braces should be evaluated, so writefln(i"I ate {apples} and {bananas} totalling {apples + bananas} fruit."); should work. If developer wants braces in the the INTERPOLATED string, then (s)he should double them. Look at Python's formatted strings people as a proof that this works, and have been widely accepted by an enormous community. Reference: https://docs.python.org/3/reference/lexical_analysis.html#formatted-string-literals

@marler8997
Copy link
Contributor

Having to double curly braces in an interpolated string would make it awkward to use for code generation. Curly braces work great for python because they don't use curly braces to delimit code blocks. I also hardly see dynamic code generation in python so it wouldn't matter anyway, however, in D, I think using string interpolation inside mixins would be a major use case.

@dejlek
Copy link

dejlek commented Oct 18, 2019

Why would you use the i-strings for code generation when you have q{} ones? I would also argue that code generation is not what typical D programmer does. - We need string interpolation for hundreds of other, different reasons. And if people still insist, then I would agree to use $ instead of % as $ is used in other languages and we are more familiar with that style.

@adamdruppe
Copy link
Contributor

adamdruppe commented Oct 18, 2019 via email

@dejlek
Copy link

dejlek commented Oct 18, 2019

I think we should have a survey here asking D developers on the forum to participate and tell us how often they generate complex blocks of D code. - Yes I am aware of the fact that developers want string mixins but the real question is how often people do this compared to the "regular" use-case when you just want to output some meaning text... I see people constantly talking about D being easy to prototype stuff, and for this to be true - easy, simple string interpolation without the need for some extra special characters to type all the time (% or $) is very much needed.

@marler8997
Copy link
Contributor

marler8997 commented Oct 18, 2019

@dejlek I implemented interpolated strings in this PR dlang/dmd#7988 and in the description I show an example of how it can be used in code generation.

string generateFunction(string attributes, string returnType, string name, string args, string body)
{
    import std.conv : text;
    return text(iq{
        $(attributes) $(returnType) $(name)($(args))
        {
            $(body)
        }
    });
}
mixin(generateFunction("pragma(inline)", "int", "add", "int a, int b", "return a + b;"));
assert(100 == add(25, 75));

Without interpolated strings, you can't insert dynamic content in the middle of a q{} string which virtually makes them useless for code generation. This is because if the content of the q{} string was all static, then you probably wouldn't need a mixin in the first place :)

Here's a real example that is in phobos as well: https://github.com/marler8997/interpolated_strings/blob/master/phobos_example.d

@gordrs
Copy link

gordrs commented Nov 2, 2019

Might I suggest consideration to how C# does its string interpolation. It ends up being very clean and very quick to write.

function($"This string has {value} within it");

@mdparker mdparker merged commit c4a84cf into dlang:master Dec 11, 2019
@aberba
Copy link

aberba commented Jul 13, 2020

string name;
string sentence = si"My name is ${name}";

si = string Interpolation

@baryluk
Copy link

baryluk commented May 15, 2021

Honestly, I like the way Python 3 formatted strings work:

print(f"You have {value} apples. And all fruits: {apples + bananas}. A type / format can be specified too: {s:10s}, {sqrt(value):.3f}.")

(Note that this is also very similar to C#, with a difference that format specifiers at the end are specified differently).

(Programming language Nim, also uses this syntax, with almost exactly same syntax, modifiers, formatters and other details, like Python).

And my favourite new form from Python 3.9 for debugging and quick value / expression dumps:

print(f"Debugging some values: {a+b=}")

Will for example display, Debugging some values: a+b=7. The string before the = is copied to the output literally (including any leading or trailing spaces). Also any spaces before and after =, are also copied literally. So f"value is {a + b = :.3f}", can for example display: value is a + b = 3.145

Honestly, there is no need for special characters like % or $ to indicate the start of the value. (I wrote string interpolation library for D back in 2011, and I used ${...} (with mandatory {}), but now I wouldn't do that.) If I was too choose, to have special character still, I would select $ again, the % feels visually more cluttered, and $ is closer to what many other languages do (PHP, JavaScript, Shell / Bash, Dart, R). A Ruby is one of the exceptions, where they use "my value is #{value + 2}, which I think has same visual clutter as %. Django Templates use double curly braces "my value is {{ value + 2 }}".

@boxed
Copy link

boxed commented May 16, 2021

@baryluk the python way has to have two escaping methods. And you can't escape from them. It's not very powerful and a big mess. Compare to swift which had one single special character \, and if you want to use it you write your string like #"foo"# and now that's like the python raw strings where but there is stil an available escape sequence #\ available! And if you want to write the literal #\? Just add more #. Its much more elegant.

@baryluk
Copy link

baryluk commented May 16, 2021

@baryluk the python way has to have two escaping methods. And you can't escape from them.

What do you mean? It is easy and clear to use. Never had issues with escaping in Python.

@boxed
Copy link

boxed commented May 16, 2021

@baryluk it's clear. Sure. But easy? Or good? More doubtful. R-strings exist because you want to escape from the escaping. But there is no such thing for f strings. {} in an f string always has to be escaped to {{}}. In Swift there's just strings. No r-strings, no f-strings. All the python use cases are handled with one elegant system.

@rempas
Copy link

rempas commented May 27, 2021

I really don't understand why we should use a character to specify that the string needs format and why we should put a special character before the format. The most logic way to do it as like that:

string name = "John";
int age = 19;

writeln("My name is {name} and I am {age} years old");

This is super simple and readable. Just add a curly bracket in the string and put the expression there. And of course we can use "{" to escape a bracket. I don't get why we must complicate things...

@boxed
Copy link

boxed commented May 27, 2021

@Godnyx You just complicated things. If your suggestion was implemented old code would break. And you'd have two escape characters: { and \.

As opposed to in Swift where there's just one escape character: \. And if you want a raw string because you want to write \ you can make a string with a single newline like this: #"\\n"#. You can add more # to alter the single escape sequence to contain more and more #. It's super elegant.

@adamdruppe
Copy link
Contributor

Yeah that would break tons of normal strings.

@baryluk
Copy link

baryluk commented May 28, 2021

After a bit of consideration and reading various DIPs and proposals on this topic, I agree now that both $ident and $(expr), are the best solutions for D in terms of syntax (not sure about standard and custom format specifiers), but that is a secondary concern. The use of \( ... ) seems interesting, and could be integrated into existing strings without breaking existing software, but then that precludes a lot of oportunities, and visually is a bit cluttered. In long strings (especially for code generation), for example int $name(int $var);, is so much cleaner than int \(name)(int \(var)) for example, and easier to follow / review. And having a new "string" type that carries additional metadata (like Adam's proposal and Alexi & John proposal), are crucial for success and extensibility beyond simple writefln IO or string formatting, as well being able to operate without extra allocations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.