user defined quote literals, eg: -12.3E+4022'dec128 ; -128'i8 is transformed as `i8("-128")` #228

timotheecour · 2020-05-19T04:14:11Z

summary

builtin literals (eg -128'i8) become regular user defined literals (UDL) that the parser represents as a litteral-call expression i8("-128"), with AST nkLitCall(nkIdent("i8"), nkStrLit("-128")); which has same semantics as nkCall
parser becomes lazy, makes no attempt at parsing the string into a numerical type (parsing is deferred till actually needed insde semphase, if at all)
new UDL (bigint, rational, dec128, f80 80 bit FP) become possible as library code, see examples below
existing literal nkCharLit .. nkFloat64Lit get replaced by a single nkLit kind, and TNode is simplified as follows:

    # of nkCharLit..nkUInt64Lit:
    #  intVal*: BiggestInt
    # of nkFloatLit..nkFloat648Lit:
    #  floatVal*: BiggestFloat
    of nkLit: # new
      value*: uint64 # represents a float or int depending on `typ`, cast as uint64
    of nkStrLit..nkTripleStrLit:
      strVal*: string
...

(or if nkFloat128Lit is still needed, to array[2,uint64] instead of uint64)

all operations involving nkLit involve casting value from uint64 to the appropriate type as specified by typ : PType (then casting back to uint64)
IMO it's possible to do all this without introducing breaking changes, but this can be discussed separately

details

nim parser transforms all quote literals (eg 123'i8) into litteral-call expressions as follows:

let a = -0x12e4567'bar
# parser transforms into:
bar("-0x12e4567") # AST: nkLitCall(nkIdent("bar"), nkStrLit("-0x12e4567"))

the string literal is all characters preceding ' that are in some set (eg: numbers + - + letters; precise set TBD); eg

[-12'i8] => [i8("-12")]
a=-123e-12'f64 => a=f64("-123e-12")

the parsing of the string litteral (eg -123e-12) is delayed until it's needed (eg for cgen, or vm)

benefits

remove edge case where T.low can't be used as a litteral for signed types T, eg -128'i8, see -128'i8 gives: Error: number out of range: '128'i8' timotheecour/Nim#125; because it'd be parsed as: i8("-128")
repr (and runnableExamples rendering etc) would preserve original source code formatting (refs runnableExamples should preserve source code doc comments, strings, and (maybe) formatting Nim#8871) in partical binary/octal/1_000_000; it'd also make it easier for nimpretty and nim doc
parsing becomes lazier, leading to potentially faster compile times in case some large chunk of code is statically disabled, eg via when defined cpp: let a = 12.3'f32 => 12.3 won't need to be parsed into a float
no more redundancy between the type (eg tyInt32) and the literal (eg nkInt32Lit) since we now just have the type + a single kind nkLit; AST is simplified, user macros and compiler code have less TNodeKind kinds to deal with
parser backward compatibility when new literals are introduced
suppose we implement these new literal handling in 1.3.7, then any nim version after that will be backward compatible using since, eg:

since 1.3.9: # time when 80 bit float literals are introduced
  let a = 1.2'f80 # this would break nim < 1.3.7 (parser error) but not nim 1.3.7

the "generalized" literal handling could also be backported to older nim (eg 1.2.2) using a simple hack: turn unrecognized literals (eg 1.2'f80) into an error PNode, but not a parser error, so that since 1.3.7: would work and not give parser error

enables user defined quote literals
everything becomes user defined, so dec128 (https://forum.nim-lang.org/t/6310#38884) can be written via:

let a = -12.3E+4022'dec128 # calls dec128("-12.3E+4022"), returning a `Decimal128`

the builtin literals are not special builtins anymore, and require symbols defined in system.nim, eg:

# system.nim
proc i8*(a: string): int8 # but we can hardcode these as `builtinI8` instead of `i8` if needed
proc f32*(a: string): float32 # these doesn't even have to be magic, but can be
# etc

with -0x12e4567'bar, if bar isn't defined in scope, it gives a regular CT error (bar not defined)

examples

all these types can be implemented as library solution and preserve nice native looking syntax, and also, would render as numerical types, not strings (pending updating syntax highlighters, including github linguist, as evidenced by ugly highlighting in this post)

80 bit float

let a = -1.2'f80

bigint

let a = -123456789'bigint # instead of bigint"-123456789"

decimal128

let a = -12.3E+4022'dec128

rational numbers

let a = 12/3'rational # or some other syntax if `/` is not in valid set of literals

complex numbers

let a = 1.2+3.2i'c # or some other syntax

symbolic math

let expr = diff(x^2's + y^2's, x's) # symbolic differentiation wrt sybmolic variable x; half baked idea here

note

since it's user defined, module-scoped aliases are possible, eg if a module deals a lot with rationals it can write:

template r(a: string): untyped = rational(a)
let a = 12/3'r * -4/5'r

I originally suggested an initial concept of this in RFC: represent all litterals as string compilerdev#7 but then realized it could be generalized to support arbitrary user defined literals and simplify the AST thanks to parser transformation, so that builtin literals (eg 123'i8) are no longer builtin and naturally extend to other user defined literals
literals without quote (quote as in 1'i8) can be handled uniformly as literals with quote by a fake litteral-call, eg:

const a = 1.2e12 => openLitteral("1.2e12")
const b = 1234 => openLitteral("1234")

openLitteral preserves the same semantics as const b = 1234, in that the type is not bound but kept open so that this remains valid:

const b = 1234
let b2: seq[float32] = @[b, 1.32]

VM

likewise for VM: TFullReg could be simplified as:

  TFullReg* = object
    case kind*: TRegisterKind
    of rkNone: nil
    # of rkInt: intVal*: BiggestInt
    # of rkFloat: floatVal*: BiggestFloat
    of rkLit:
      value*: uint64 #
      typ*: PType
...

with following benefits:

avoid need to represent all the integer types (including int8 etc) as int128 (wasteful)
make RT semantics match CT semantics, eg for float32 (see VM: float32 values incorrect, resulting in differences between RT and CT Nim#12884)

The text was updated successfully, but these errors were encountered:

Araq · 2020-05-19T06:31:05Z

Pretty nice but please work out how backwards compat can work. Having to patch existing macros is not acceptable. Been there, done that, it's expensive and a desaster for the ecosystem.

JohnAD · 2021-01-24T18:27:34Z

It is my intent to have a PR for this placed before the end of February 2021. Perhaps sooner as I already have it working; I just need to write more test cases to confirm it handles more border conditions and errors gracefully.

However my solution takes a somewhat different tack: rather than combine all numeric literals into a single token for later parsing, my solution leaves the current literals as-is and adds a new one: tkStrNumPrefixLit. (I'm not convinced that is a good name. I may shorten it to tkStrNumLit.) The tokenizer only assigns this token when it peeks and sees an identifier adjacent to the number. The tokenizer scans the identifier on the next pass.

In the parser, when the new token is seen it attempts to run the dotExpr function to resolve it. Essentially turning:

var a = 1234.56E7m     # or 1234.56E7'm

into

var A = "1234.56E7".m

My motive is to allow this function to enable the IEEE 745 decimal library (#308) I'm writing to also use this. Specifically "m" will be the suffix (a convention used in C# and a few other languages.) One of my goals to have this fully convert/resolve at compile-time.

The f32, f64, u8, etc suffixes will still be built-in to the lexer and will take priority over any proc/func/template.

Because this adds a token, other files such as docgen.nim etc will also need update, but those updates will be minimal. These changes do not modify the AST structure in any way, so macros should behave the same.

I will also make updates to documentation.

Admittedly, my solution is not as "pure" and comprehensive as yours. But it could be an intermediate place to make it more pure with another later PR.

Araq · 2021-03-24T12:29:27Z

nim-lang/Nim#17489 will soon be merged. This RFC has been implemented.

timotheecour mentioned this issue May 19, 2020

RFC: represent all litterals as string nim-lang/compilerdev#7

Open

This was referenced Jun 1, 2020

fix #14522, litterals > int64.high are now uint64 nim-lang/Nim#14530

Closed

User defined integer literals #216

Closed

timotheecour mentioned this issue Jun 17, 2020

Add bigints to standard library nim-lang/Nim#14696

Closed

timotheecour mentioned this issue Nov 19, 2020

repr produces invalid code nim-lang/Nim#14598

Open

Araq mentioned this issue Jan 24, 2021

Add decimal to standard library #308

Open

JohnAD mentioned this issue Feb 12, 2021

[superseded] Adding support for user-defined number suffixes nim-lang/Nim#17020

Closed

Araq closed this as completed Mar 24, 2021

timotheecour added the implemented label Mar 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

user defined quote literals, eg: -12.3E+4022'dec128 ; -128'i8 is transformed as `i8("-128")` #228

user defined quote literals, eg: -12.3E+4022'dec128 ; -128'i8 is transformed as `i8("-128")` #228

timotheecour commented May 19, 2020 •

edited

Loading

Araq commented May 19, 2020

JohnAD commented Jan 24, 2021 •

edited

Loading

Araq commented Mar 24, 2021

user defined quote literals, eg: -12.3E+4022'dec128 ; -128'i8 is transformed as i8("-128") #228

user defined quote literals, eg: -12.3E+4022'dec128 ; -128'i8 is transformed as i8("-128") #228

Comments

timotheecour commented May 19, 2020 • edited Loading

summary

details

benefits

examples

note

VM

Araq commented May 19, 2020

JohnAD commented Jan 24, 2021 • edited Loading

Araq commented Mar 24, 2021

user defined quote literals, eg: -12.3E+4022'dec128 ; -128'i8 is transformed as `i8("-128")` #228

user defined quote literals, eg: -12.3E+4022'dec128 ; -128'i8 is transformed as `i8("-128")` #228

timotheecour commented May 19, 2020 •

edited

Loading

JohnAD commented Jan 24, 2021 •

edited

Loading