-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add string interpolation from nimboost #6507
Add string interpolation from nimboost #6507
Conversation
The module isn't that large, I think it's fine to put its code into |
@bluenote10 |
lib/pure/strinterp.nim
Outdated
else: | ||
result = $chr(ord(if lowerCase: 'a' else: 'A') + v - 10) | ||
|
||
proc intToStr(n: SomeNumber, radix = 10, len = 0, fill = ' ', lowerCase = false): string = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this function be implemented in terms of the existing functions for converting an int to a string? It seems to me that this reinvents a lot of the code that we already have causing duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a closer look, there is not as much duplication as it seems. The part that can be rewritten is line 43 to 49, but it would be slightly less efficient, because it would create several temporary strings and requires two passes for the case conversion (toHex
only allows to produce upper case). Note that there is probably not much room for simplification for the sign placement logic at the bottom because it depends on whether "zero filling" is desired. I have a few smaller things though which makes the code slightly more concise, which I'll push.
lib/pure/strinterp.nim
Outdated
prefix[0] = '-' | ||
result = prefix & result | ||
|
||
proc alignStr(s: string, len: int, fill = ' ', trunc = false): string = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me this function seems to do too much, but maybe it makes the implementation easier. Might be worth considering splitting this up into a truncate
and indent
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we do have an indent
in the strutils module already, could it be used?
https://nim-lang.org/docs/strutils.html#indent,string,Natural,string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would be possible, but from the perspective of the macro it is nicer to have just one function to call which does all expected behavior at once, instead of having to build a more complex AST with chaining function calls. That's why I though it helps to rename these functions.
lib/pure/strinterp.nim
Outdated
|
||
proc floatToStr(v: SomeNumber, len = 0, prec = 0, sep = '.', fill = ' ', scientific = false): string = | ||
## Converts ``v`` to string with precision == ``prec``. If result's length | ||
## is lesser then ``len``, it aligns result to the right with ``fill`` char. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/then/than/
# nimboost's richstring | ||
# ----------------------------------------------------------------------------- | ||
|
||
proc parseIntFmt(fmtp: string): tuple[maxLen: int, fillChar: char] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to see some comments with examples of what this parses.
Same for the functions below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added some docstring, but note that this is not the best place to document the behavior for users, because these are only the internal helper functions. From a user perspective this is just about getting the traditional printf like behavior: the pad symbol of normal formatters like %5d
are a plain space, and %05d
is a special case, where the pad symbol becomes a zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bluenote10 This isn't for users, it's for the programmer. It's nice to see an example of what this procedure parses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you happy with the comments I have added? Or anything else to review? (I didn't move it into strutils now, as wished by Araq)
lib/pure/strinterp.nim
Outdated
|
||
macro fmt*(fmt: static[string]): untyped = | ||
## String interpolation macro with scala-like format specifiers. | ||
## Knows about: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: An empty line is required for a newline to be present in the generated docs, currently Knows about
will be on the same line as the above sentence.
Argh, that's a bit annoying. Please see the "outdated" comments too. |
@Yardanico Good point, I have added Anatoly now. Using @dom96 Yes in terms of the length it is not an issue. My only concern was that it comes with a large number of helper functions, which are only relevant to the macro and it would maybe be nice to have them in their own scope. |
@bluenote10 maybe also change 2012 -> 2017 ? |
It would be nice, but I don't think it's necessary. I still vote to put this in the strutils module. |
That's not how string interpolation should work in Nim. :-/ Nim is not about cryptic 1 letter mini languages. For example, to format toHex this should be used format"as Hex ${x.toHex}" We have the power of |
It does make sense from the perspective:
Making the implementation purely concat-based misses the point of string interpolation in my opinion. Doesn't this mean that all formatting has to go into the interpolated expression? I rarely use an interpolator without a formatter. I don't care about things like the A few common use cases and how I would currently have to approach them without a formatting mini language: # Output goal:
# Iteration: 1 MSE: 39194.898
# Iteration: 2 MSE: 26129.932
# Iteration: 3 MSE: 17419.955
# Iteration: 4 MSE: 11613.303
# Iteration: 5 MSE: 7742.202
var mse = 58792.347
for iter in 0 ..< 10:
mse = mse / 1.5
echo fmt"Iteration: ${iter+1}%5d MSE: ${mse}%12.3f"
# vs
echo fmt"Iteration: ${align($(iter+1), 5)} MSE: ${formatFloat(mse, ffDecimal, 3).align(12)}"
# Output goal:
# README.txt 0.002 MB
# LICENSE 0.005 MB
# movie.mp4 11.952 MB
let files = @[
("README.txt", 1925),
("LICENSE", 5203),
("movie.mp4", 12532921)
]
for file, fileSize in files.items():
echo fmt"${file}%-20s ${filesize.float / (1 shl 20)}%12.3f MB"
# vs
echo fmt"""${file & repeat(" ", 20 - len(file))} ${formatFloat(filesize.float / (1 shl 20), ffDecimal, 3).align(12)} MB""" These are minimal examples, real world use cases contain a whole bunch of variables, making conciseness a major factor. Having to do the formatting explicitly is about as much pain that even a formatting freak like me would avoid it. And the barrier to applying nice formatting has an immediate impact on the quality of all text output produced in a language. For instance, if you look at the visual quality of log messages of various projects they reflect this formatting barrier, i.e., For some time I was also experimenting with the ideas Jehan proposed here. They are nice from a design point of view, but pulling the formatting into the expression will always have a trade-off problem between conciseness and name space. I.e., in order to make the formatting syntax concise, a lot of common symbol names have to be used and even than it is less concise than the printf formatters. Conversely, using less conflicting symbols for the formatting makes the formatting cumbersome. My conclusion was that it is much easier to go for the printf-like syntax, because it covers such a broad range of use cases in a super concise way. If you absolutely don't want that in Nim we should discuss that what we can offer in terms of formatting instead, but I think the above examples show that purely concat-based string interpolation is an almost useless feature. |
That said, this is fine with me if you move it into its own stdlib module. I understand other programmers also expect to have this toolbox available. |
46073b2
to
b8e1e02
Compare
b8e1e02
to
1369d2b
Compare
Is there anything left here you want me to adjust? I hope everyone is happy as it is now. |
As I said, IMO sizes and times should be supported too and your alignments ignore unicode. |
Oh sorry, I thought your comments were meant for future iterations. I have implemented support for unicode formatting now, and I came up with a solution for "sizes" formatters. A few design decisions:
Are you happy with it? Would you mind moving the time formatters into a second PR? This would require a bit more thought. Oh, and I think there is still a Windows specific issue I'll have to fix... |
I don't agree with these design decisions. :-) The problem is that single letter format specifiers do not scale, so the first step is to allow multiple characters here:
With this optional terminating
|
Okay I've been thinking about his for a while, but I can't come up with something that is not ambiguous. My basic problems are: What is I've been thinking about doubling/escaping the characters following the closing bracket, and doubling/escaping of the terminal |
Parsed as "<x_parsed_as_size>"
"<x_parsed_as_size>{y}"
"${x}$ size $$"
We should patch interpolatedFragments for consistency.
Well I don't know. We could accept for now what we already have, but that potential "$ marks the end" rule cannot be reintroduced without breaking things so we better get it right this time. |
I think that the perfect is the enemy of the good. This seems to me a nice macro that supports a variety of common use cases |
A few more thoughts:
Overall I'm not yet sure if the language based on a unique terminal symbol is really an improvement in those 99% standard use cases. Also I don't think string interpolation is so much about being able to format any type in any imaginable way, but more about the basic problem of assembling a nicely formatted string from a large number of basic types like strings+ints+floats. My current proposal would be to keep the current |
That's biased. Times and sizes are often more common than plain ints and floats which have little meaning of their own. |
I would suggest a compromise: a simple formatter that handles named variables, left padding, and knows only strings, ints and floats. This covers 90% of common use cases and can be used without having to look up the minilanguage syntax. |
lib/pure/strinterp.nim
Outdated
of "Z": result.divisor = -7 | ||
of "Y": result.divisor = -8 | ||
else: | ||
quit "Illegal float format suffix: " & remainder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raise a ValueError here instead.
👍 to what @FedericoCeratto said. I actually just got an idea after looking at the Python f-string implementation. Something that would be extensible: type
MyObject = object
x: int
proc format(f: MyObject, pattern: string): string =
## Custom implementation for formatting of MyObject
return repeatChart(" ", pattern.parseInt()) & f.x
let t = getGmTime(getTime())
let c = MyObject(x: 42)
echo(f"Hello. It is currently {t:m past H}. My cool custom object's value\n{c:4}")
# Hello. It is currently 38 past 9. My cool custom object's value
# 42 This would allow for some very powerful extensibility and significantly simplify the formatter. The rule is that everything after the first |
@dom96 I like it very much. It saves typing and yet is "obvious" enough. Really nice. |
I'll see what I can do. But since this requires a complete rewrite it's probably best to have another PR once it's ready. I remember having a closer look at Python f-strings before and concluding that they have some drawbacks which made the Scala formatters superior. Fortunately, I can't remember exactly what was the issue :). |
If we do decide to choose my idea, and I'd love to hear other people's opinions, then I think it'd be best to move your implementation into a Nimble package (sorry!). |
There are already three string interpolation libraries on Nimble, what's the point of having a forth one? The whole point was to have one in the stdlib, everyone agreed to that in the RFC. |
By that I mean: if we decide to add the formatter I describe in my previous comment into the stdlib then your current implementation should go into Nimble. |
Yeah, that's fine. In fact this already comes from "nimboost" and I'll probably just push back the changes/improvements I made. I think an issue I had with a Python f-string approach is that I'm not sure if the formatter string validation can happen at compile time or at runtime only. For Python this doesn't matter, but it is just super annoying if your string interpolation fails at runtime just because of a stupid formatter mistake. I think for built-in types it is mandatory to do the formatter validation at compile time to avoid running into runtime errors all the time. The current implementation achieves this. I think my problem with implementing it was that I couldn't figure out how to build the AST depending on the type of the Note that your
There was another syntactical issue with f-strings as well, but I still can't remember what it was. |
Formatters could choose between compile time and runtime by accepting static strings. |
Hm, I might be confused, but the |
var v = n.int64 | ||
let s = v.int64 < 0 | ||
if s: | ||
v = v * -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This overflows for low(int64)
|
As discussed in #5600, this PR adds @vegansk nice string interpolation implementation from nimboost to the standard library.
To discuss: Should this live in a separate module or in
strutils
? If separate, what's a good name?In nimboost, the implementation relies on 4 modules:
richstring
,parsers
,formatters
, andlimits
. I minified the implementation by usingparseInt
instead of nimboost'sstrToInt
. This gets rid of the requirements onlimits
andparsers
, but I'm not sure if usingstrToInt
would have any benefits. The remaining things fit reasonably into one module. Should we consider toinclude
it fromstrutils
so that users don't need an extraimport
for string interpolation while keeping the implementation clean and separated?CC @Yardanico