Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

universal conversion operator (cf D's a.to!T) (std.conv.to) #84

Closed
timotheecour opened this issue Mar 28, 2018 · 41 comments
Closed

universal conversion operator (cf D's a.to!T) (std.conv.to) #84

timotheecour opened this issue Mar 28, 2018 · 41 comments
Labels

Comments

@timotheecour
Copy link
Member

in D std.conv.to allows universal type conversions, it's used very frequently and simplifies code a lot.

Is there anything similar in nim? if not could we add it?

import std.conv:to;
auto a1="123".to!int;
auto a2="123".to!double;
auto a3=123.to!string;

// custom conversion:
struct A{
  auto opCast(T) if(is(T==float) || is(T==double)) { return 1.0; }
}
auto a5=A.init.to!double;

Instead in nim I see parseInt, parseBool, etc which is:

  • not as scalable (introduces tons of symbols)
  • not as searcheable
  • not as elegant as D's to.
  • not generic (doesn't work in generic code)

proposed design:

see nim-lang/Nim#7488

related

@skilchen
Copy link

The good thing about Nim is: you can implement such a universal conversion library yourself and make it available for others to try it out. Asking others to write something like that for you is probably a waste of time.

@timotheecour
Copy link
Member Author

I mainly submitted this bug report for these reasons:

  • in case I missed an alternative way of doing it? looks like no, doesn't yet exist in nim

  • in case someone else is already working on this, the can link against this issue

  • suggesting this as an alternative design to current way of doing things (parseInt, parseBool etc) based on experience in another language

  • discuss design

  • would a well designed library be considered for integration in nim repo? or would that be best done in a seperate package

If noone gave it a try I guess I could give it a try

@skilchen
Copy link

some interesting stuff from the Nim Manual:

@andreaferretti
Copy link

andreaferretti commented Mar 28, 2018

This is the analogous to your example, no need to support this especially in the language

type A = object

proc to(a: A, x: typedesc[float32]): float32 = 1.0
proc to(a: A, x: typedesc[float64]): float64 = 1.0

let a = A()
echo a.to(float32)

@Araq
Copy link
Member

Araq commented Mar 28, 2018

Sounds useful:

import strutils, json

template to(source: string; dest: typedesc[bool]): untyped = parseBool(source)
template to(source: string; dest: typedesc[int]): untyped = parseInt(source)
template to(source: string; dest: typedesc[float]): untyped = parseFloat(source)
template to(source: string; dest: typedesc[JsonNode]): untyped = parseJson(source)

echo "yes".to(bool)
echo "0034".to(int)

@timotheecour
Copy link
Member Author

@Araq
why "yes".to(bool) instead of "yes".to[bool]?

https://nim-lang.org/docs/marshal.html uses to[T] for example.
maybe one advantage of a.to[T] forces T to be a type, whereas a.to(T) would not, and for this use case types is all that we need

@andreaferretti
Copy link

@timotheecour One reason is that to[T] uses generic polymorhpism, but in this case we want ad-hoc polymorphism (overloading). This allows us to write many different versions of to, one for each type we want to allow, with possibly different implementations.

Morever the signature of to (typedesc[...]) also guarantees that we can only pass a type

@GULPF
Copy link
Member

GULPF commented Mar 30, 2018

Additionally, "yes".to[bool] doesn't actually work because of #3502.

@PMunch
Copy link

PMunch commented Apr 2, 2018

Relating to the linked issue the streams module could do with the same treatment for its read procedures

@PMunch
Copy link

PMunch commented Apr 2, 2018

I really like this idea by the way. Converting to string is easy enough in Nim with $, having an equally easy "common converter" syntax for other types would be neat

@timotheecour
Copy link
Member Author

should this be in a nimble package or in the standard library?

  • Advantage of nimble package: (eg package conv.nimble)
    easier to evolve as less tied to nim distribution; eg can have breaking changes while it's in alpha

  • Advantage of nimble standard library (eg, in conv.nim)
    allows standard library itself to depend on it; eg, could make parseBool an alias of to(bool) and mark parseBool as deprecated

@ghost
Copy link

ghost commented Apr 2, 2018

Also if someone needs a little bit of sugar:

import strutils, json

template genTo(srcTyp, destTyp: typedesc, body: untyped): untyped {.dirty.} = 
  template to(source: srcTyp; dest: typedesc[destTyp]): untyped = 
    body

genTo(string, bool): parseBool(source)
genTo(string, int): parseInt(source)
genTo(string, float): parseFloat(source)
genTo(string, JsonNode): parseJson(source)
genTo(int, float): float(source)

echo "yes".to(bool)
echo "0034".to(int)
echo 5.to(float)

@data-man
Copy link

data-man commented Apr 2, 2018

genTo(string, XmlNode): parseXml(source)
genTo(string, DateTime): parse(source, "yyyy-MM-dd HH:mm:sszzz")

But for DateTime not very good.

@ghost
Copy link

ghost commented Apr 2, 2018

@data-man why it's not good for DateTime ? :)

@data-man
Copy link

data-man commented Apr 2, 2018

@Yardanico
Errors for strings without a time & timezone.

@ghost
Copy link

ghost commented Apr 2, 2018

@data-man that's a problem with a format string you've used to parse though, change it (remove zzz and it will work without timezone)

@data-man
Copy link

data-man commented Apr 2, 2018

@Yardanico
And if a times is also missed? :)
Maybe using scanf will be better:

if scanf(source, "$i-$i-$i", y, m, d):
...
elif scanf(source, "$i-$i-$i$s$i:$i:$i", y, m, d, hh, mm, ss):
...
elif scanf(source, "$i-$i-$i$s$i:$i:$i.$i$s$w", y, m, d, hh, mm, ss, ms, tz):
...

timotheecour referenced this issue in timotheecour/Nim Apr 3, 2018
timotheecour referenced this issue in timotheecour/Nim Apr 3, 2018
timotheecour referenced this issue in timotheecour/Nim Apr 3, 2018
@timotheecour
Copy link
Member Author

timotheecour commented Apr 3, 2018

Please look at my 1st attempt: nim-lang/Nim#7488 and design principles I outlined there

@timotheecour
Copy link
Member Author

@andreaferretti

One reason is that to[T] uses generic polymorhpism, but in this case we want ad-hoc polymorphism (overloading). This allows us to write many different versions of to, one for each type we want to allow, with possibly different implementations

to[T] can also be overloaded, eg with to[T:int], to[T:int|string] or even to[T:isConceptFoo],

I've created an issue https://github.com/nim-lang/Nim/issues/7517 ([RFC] guidelines for when to use typedesc vs generics) to discuss this particular aspect more generally as it seems to keep popping up in different contexts

@andreaferretti
Copy link

Well, yes, there is an overlap, but I would never use [T:int] as a type argument, it seems an abuse to use type bounds for concrete types

@yglukhov
Copy link
Member

yglukhov commented Apr 6, 2018

@andreaferretti, why not? That's just how partial specification works in C++.

@timotheecour
Copy link
Member Author

timotheecour commented Apr 6, 2018

/cc @yglukhov @andreaferretti

Well, yes, there is an overlap, but I would never use [T:int] as a type argument, it seems an abuse to use type bounds for concrete types

@andreaferretti, why not? That's just how partial specification works in C++.

It's only weird if you're used to Java or other languages with limited metaprogramming support

@andreaferretti
Copy link

Actually, the languages I am used most are Scala and Nim and the latter has nice support for this case in the form of typedesc parameters :-)

@andreaferretti
Copy link

I mean, I know it can be used this way, it just looks ugly compared to just passing the type as a parameter

@Bulat-Ziganshin
Copy link

Can we start with description of what is supposed to be implemented under this generic name? If it's supposed to only parse strings, then "123".parse(bool) indeed looks more clear. If it's supposed to handle other input types, the question is why we need to have a generic name for it? And may be longer name such as convertTo will be enough for such rarely useful feature?

If we don't yet have justification for generic conversion function, I propose to put this idea on hold.

And anyway, using parse instead of to or convertTo looks for me more readable (i.e. intention is more clear), and using parse(bool) instead of parseBool looks more systematic, so I support migrating to the parse name.

As @Araq mentioned, with parse we are also more clear on exception list - it can raise parsing exception and nothing else.

@Bulat-Ziganshin
Copy link

Bulat-Ziganshin commented Jul 18, 2018

On the second thought, I think I found the culprit: convertTo should be used only for the cases when we have the same data under different clothes. Examples include array/seq/openArray/list/tree/set, or list[(K,V)]==hashtable[K,V]==tree[K,V], or int/float/complex/gmp_int.

It's debatable whether string representation is the same data as container or so, especially since we can (de)serialize using XML, or pretty-printing, or with raw binary data. So I think that we should separate those two questions - strings should be processed with parse, while types holding the same data should be converted with convertTo or so.

@andreaferretti
Copy link

My vote is for parse(bool), parse(int), and so on

@kaushalmodi
Copy link

kaushalmodi commented Sep 10, 2018

I don't like parse.. does it mean parseTo or parseFrom? I wouldn't mind 2 extra chars for clarity.

I like the originally proposed to.


Update: I'd like to not allocate a special "parse" just for "from string" case. I don't think that is too special of a case. String is just another type.

A "to" works for all type representation conversions.

@andreaferretti
Copy link

Actually it is not ambiguous. You parse a string to get an int, you render or print an int to get a string (similarly with booleans and so on). The operation of converting an int, a bool etc. into a string is just not called parsing

@kaushalmodi
Copy link

OK, may be I got confused.. I thought this issue was to convert any type to any other type as I show in that table linked on my scripter.co page referenced in the first comment of this thread.

If parse implies only converting "from string", then that's just a subset of the whole proposal.

@kaushalmodi
Copy link

The operation of converting an int, a bool etc. into a string is just not called parsing

Are we talking about the same thing?

That's why I like the original to :)

@Bulat-Ziganshin
Copy link

Bulat-Ziganshin commented Sep 10, 2018

@kaushalmodi look at https://github.com/nim-lang/Nim/issues/7430#issuecomment-406054435 :

I think that we should separate those two questions - strings should be processed with parse, while types holding the same data should be converted with convertTo or so.

@kaushalmodi
Copy link

@Bulat-Ziganshin Thanks, I missed that comment. That explains my confusion. I'm not so sure about distinction between "parse" and "convert". I don't see "from string" as a special case.. it's just another type.

@Bulat-Ziganshin
Copy link

Bulat-Ziganshin commented Sep 10, 2018

@kaushalmodi make sure that you have read https://github.com/nim-lang/Nim/issues/7430#issuecomment-406049952 and entire https://github.com/nim-lang/Nim/issues/7430#issuecomment-406054435

Let's elaborate: you can convert one type to another type by a number of ways. For example, you can convert from string by parsing it as XML, JSON, Nim constant, asm constant and so on. So my point is that parse should be reverse of $ and nothing else. Since $ always returns a string, parse can parse only a string (or its equivalent such as seq[char]). So, the parameter type in parse(int) always specify the result type. Practically speaking, there is no need to specify the sourcetype because we know it from the parameter.

While convertTo should be a sort of operation reformatting container while keeping same values. So, convertTo may convert string into array/sequence of chars, but it can't convert the string into integer.

This makes parsing and converting different operations. While each parse and convertTo specialization is implemented in ad-hoc manner, usually by the package providing corresponding types, the language spec should provide these strict recommendations about functionality of the operations, and reject any contributions to standard libraries violating them.

@kcvinker
Copy link

The good thing about Nim is: you can implement such a universal conversion library yourself and make it available for others to try it out. Asking others to write something like that for you is probably a waste of time.

Luckily the OP didn't get a chance to hear this from this forum -- "Looking in internet to find a help in coding is waste. Try to create your own programming language from scratch."

@Araq
Copy link
Member

Araq commented Sep 14, 2018

Luckily the OP didn't get a chance to hear this from this forum -- "Looking in internet to find a help in coding is waste. Try to create your own programming language from scratch."

Creating a Nimble package is not the same as "create your own programming language".

@metagn
Copy link
Contributor

metagn commented Apr 28, 2020

Reiterating my comment on a PR:

In line with var T becoming more important recently thanks to sugar.dup finally being implemented (though this is not compatible with dup), I think the overload should be of (var T, S) where S is the old type and T is the new type. Like so:

import strutils

proc to(x: var float, str: string) = x = parseFloat(str)

var f: float
to(f, "1.0")

This is nice because we can have a generic as operator that calls this similar to in.

template `as`[T](toConvert: untyped, _: typedesc[T]): T =
  var converted: T
  to(converted, toConvert)
  converted

# with previous code block:
doAssert ("1.0" as float) == 1.0

We can also do things like get information from the var we're converting to, though I don't know if this is a good idea (maybe good for case objects):

type Obj = object
  changeSign: bool
  num: int

proc to(obj: var Obj, n: int) =
  if obj.changeSign:
    obj.num = -n
  else:
    obj.num = n

var obj = Obj(changeSign: true)
to(obj, 5)
echo obj.num # -5

Sidenote: I don't think a name like to is good for a non-operator proc, it's a tad too short, I thought of something like coerce or convert but to still sounds better. It can be an operator, but I would suggest if it is going to be an operator it should just be a popular library and not in the standard library itself, something like ::= or %= like json.

@timotheecour
Copy link
Member Author

timotheecour commented Apr 28, 2020

Here's my proposal:

  • there's a unique to template (outplace):
template to*[S](a: S, T: typedesc): T =
  var ret: T
  mixin from
  from(ret, a)
  ret
  • the real implementation is deferred to from (inplace) which can be overloaded:
proc from[T, S](result: var T, a: S) =
  ...

code looks/reads naturally whether you use inplace (to) or outplace (from), eg:

# outplace form
echo s.bar.to(JsonNode)
echo s.bar.to BsonNode # obviously, also works without ()

# inplace form; eg when you want to reuse object to avoid allocations
var ret: Foo
foo.from(JsonNode)
  • the main drawback of using a (universal) generic conversion function is that, in many languages, it can be hard to find which generic is selected in some code.

This is one of the main design goals of nim-lang/Nim#12076, which allows you to retrieve the symbol (if any) that would be selected after sigmatch (or nil if no match):

foo.from(bar) # which `from` was picked? where is it declared?
inspect foo.from(bar) # this gives you the symbol selected after sigmatch (or prints where it's declared)

note that nimsuggest can't help here, since it can't know what T was instantiated with.

special case: converting to string and parsing from string

these are so common they deserve a special case (but to/from could call these)

note

  • your suggestion to(obj, 5) doesn't read well, in particular in UFCS/MCS form: obj.to(5) doesn't quite make sense since arguments should be reversed; that's why I'm naming the inplace form as from: obj.from(bar)

a as T doesn't carry its weight for keyword use

  • I don't think a as T carries it's weight for the keyword as. a.to T is just as simple without using a reserved keyword and in many cases more convenient in expressions where you need parenthesis to disambiguate eg:
a or fun(1) as bool # no idea what grouping is used
a or (fun(1) as bool) # clear but increases nesting
a or fun(1).to(bool) # clear and doesn't increase nesting

There could be other uses for as that wouldn't be possible to do without a keyword, and I have some in mind I can describe.

operator vs to

It can be an operator

IMO operators are overused and should usually be reserved for cases where operator precedence makes a difference, eg by removing 1 level of parenthesis in the expression; it shouldn't be used for cases where a regular proc could be used in its place.

eg: in nim-lang/Nim#13023 (wrapnil operator) the initial idea s.maybe.foo[12].bar.name[] was improved by araq's suggestion of using an operator; this allows simplifying to: ?.s.foo[12].bar.name

@metagn
Copy link
Contributor

metagn commented Apr 28, 2020

As github syntax highlighting shows first thing in your post from is a keyword too. I can't think of a word to describe this operation (the one I wrote as to and you as from). I don't want an operator either. become? emplace?

Other languages don't have operators like as at such a low precedence like Nim does. In Nim it's equivalent to in and ==, in Rust and Java they're immediately below unary operators (in Nim, equivalent to 9.5). Edit: Sorry, as is not cast, Groovy's as is overloadable and at the same precedence with Nim (is/of/in/comparison). Though I still think it could benefit from a higher precedence. Not that hard to change, just 1 line to another, I won't start an RFC just make a small comment here: Same as .. (level 6) makes sense, below math & concat and above bool ops/comparison. Also a or fun(1) as bool correctly becomes a or (fun(1) as bool) and I don't see how that's not clear, all other languages do this (including Groovy).

If all else fails, this should be in a package, so people can choose whether they want an operator or a keyword. Other packages can provide support with a simple export.

Update: The best name I can come up with for the proc is "absorb", or something along those lines. What's most preferred is a name that also works for #191, which I was taking into consideration before.

There could be other uses for as that wouldn't be possible to do without a keyword, and I have some in mind I can describe.

I trust you but I can't think of these, if your ideas of as are incompatible with a (untyped, typedesc) overload then please tell me.

@timotheecour
Copy link
Member Author

timotheecour commented May 7, 2020

@hlaaftana

As github syntax highlighting shows first thing in your post from is a keyword too

when true:# D20200506T234004
  import strutils
  proc `from`[T, S](result: var T, a: S) =
    when S is string:
      when T is int: result = parseInt a
      elif T is float: result = parseFloat a
      else: doAssert false
    else: doAssert false

  template to*[S](a: S, T: typedesc): untyped =
    var ret: T
    # mixin `from`
    ret.from a
    ret

  template `as`*[S](a: S, T: typedesc): untyped =
    var ret: T
    ret.from a
    ret

  proc main()=
    var ret: int
    ret.from "12" # that's the from syntax that will be most often used
    `from`(ret, "12") # works too, useful in templates to "bind early"
    # ret from "12" # works pending https://github.com/nim-lang/Nim/pull/14241
    doAssert ret == 12
    doAssert "13".to(int) == 13
    doAssert "13.2".to(float) == 13.2
    let b = "13".to int
    doAssert b == 13
    doAssert "13.3".as(float) == 13.3
    ## only difference between to and as:
    doAssert "13.3" as float == 13.3 # possible
    # doAssert "13.4" to float == 13.4 # not possible
  main()

Also a or fun(1) as bool correctly becomes a or (fun(1) as bool) and I don't see how that's not clear, all other languages do this (including Groovy).

well you have to "know" the precedence table and that it's not interpreted as (a or fun(1)) as bool whereas a or fun(1).as bool would be more obvious, but I guess that just boils down to RTFM, so fine I guess.

I trust you but I can't think of these, if your ideas of as are incompatible with a (untyped, typedesc) overload then please tell me.

one of them was #163 but I'm now thinking this could be done with a pragma instead; other uses I had in mind could also be done with a pragma. What I'd really like is for alias to become a keyword, for this: alias foo = system.echo (better syntax for nim-lang/Nim#11992) but that's a separate topic.

If all else fails, this should be in a package

then let's make sure it doesn't fail :). #191 aside, $ is useful because it's the standard stringification operator, making it useful in generic code. Anything that's package specific would defeat its purpose.

one last thing: to reset or not to reset

that wasn't really discussed yet; IMO the thing that's overloaded should not reset:

# everything is defined in terms of `addFrom`
proc addFrom*[T, S](result: var T, a: S) = .. # this is what gets overloaded
  # no resetting of result

template `from`*[T, S](result: var T, a: S) = # single `from` template, doesn't get overloaded
  result.reset # resets
  result.addFrom(a)

template `as`*[S](a: S, T: typedesc): untyped = # single `to` template, doesn't get overloaded
  var ret: T
  ret.addFrom a
  ret

from and as are then just syntax sugar over addFrom, which is the only thing that needs to be overloaded.
Users code can choose between addFrom and from depending on whether they need reset or not. There are many use cases for not resetting, eg merging a json/bson object with some new data, or appending to a string or stream etc.

Long story short

  • a.addFrom b for the thing that's overloaded (reuses buffer; returns void)
  • a.from b resets a and calls addFrom (reuses buffer; returns void)
  • a.as T (which can also be written a as T) => works with rvalues; returns a value; can't reuse a buffer)

This gives a simple design where there's only 1 proc to overload, and you get from and as variants "for free"

@github-actions
Copy link

This RFC is stale because it has been open for 1095 days with no activity. Contribute a fix or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jun 19, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.