Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow defining custom literal syntax for creating collections #625

Open
3 of 5 tasks
rmunn opened this issue Nov 24, 2017 · 30 comments
Open
3 of 5 tasks

Allow defining custom literal syntax for creating collections #625

rmunn opened this issue Nov 24, 2017 · 30 comments

Comments

@rmunn
Copy link

rmunn commented Nov 24, 2017

I propose we allow custom literal syntax to be defined for creating user-defined collections. E.g., the syntax [ ... ], which currently is reserved for creating lists, could be declared to build an array, or a mathematical vector, or a PersistentVector from FSharpx.Collections, as long as the appropriate type (and how to construct the literal) can be determined at compile-time.

The existing way of approaching this problem in F# is to use [| ... |] syntax for arrays, and to use something like Vector.ofList [ 1; 2; 3 ] for other types, which tends to be uglier and/or harder to type than plain [ 1; 2; 3 ].

Rationale

In #619 (comment), @piaste made an interesting observation:

I think one major reason people currently use Lists by default, even when they don't need a linked list at all and it just hurts performance, is that it has the cleanest literal syntax available in F#.

Notably, both Fable-React and Websharper go with lists. In their HTML DSLs, every element is followed by two collections (attributes and inner HTML) and even using a basic array would start to look 'noisy' really fast (div [||] [| foo |] versus div [] [ foo ]) ...

This triggered a discussion which let to @dsyme suggesting the following as a possible approach:

Have the [ ... ] syntax resolved in a type-directed way (e.g. "if the target type is known to be a collection C with a builder pattern or IEnumerable constructor, then treat the syntax as if it is constructing a C, otherwise treat it as a regular F# list")) ...

Many people seemed to like the idea, so I've created this issue to track that particular suggestion.

Pros and Cons

The advantages of making this adjustment to F# are that many user-defined data types will appear a lot "cleaner" to use. For example, if someone adds a Relaxed Radix-Balanced vector type to FSharpx.Collections, creating one would be as simple as:

let vec : RRBVector<_> = [ 1; 2; 3 ]

Then if the primary constructor for RRBVector takes an array, then the [ 1; 2; 3 ] syntax would be turned into an array at compile-time and then passed to the RRBVector constructor at runtime. (In many cases, the type of vec would be determinable by type inference and it wouldn't be necessary to specify its type explicitly like this).

The disadvantages of making this adjustment to F# are twofold: first, there would be a significant amount of effort involved. Second, the [ ... ] syntax would become more "generic" and more cases like value restriction could be created. E.g., let coll = [] would now not only need to determine the type of the contents of coll, but also the type of the collection itself, before that code could compile.

This would be a breaking change to F#.

Extra information

Estimated cost (XS, S, M, L, XL, XXL): L

Related suggestions: #49 is a subset of this idea, and #619 is the discussion that led to this idea.

Affidavit (please submit!)

Please tick this by placing a cross in the box:

  • This is not a question (e.g. like one you might ask on stackoverflow) and I have searched stackoverflow for discussions of this issue
  • I have searched both open and closed suggestions on this site and believe this is not a duplicate
  • This is not something which has obviously "already been decided" in previous versions of F#. If you're questioning a fundamental design decision that has obviously already been taken (e.g. "Make F# untyped") then please don't submit it.

Please tick all that apply:

  • This is not a breaking change to the F# language design
  • I or my company would be willing to help implement and/or test this
@dsyme
Copy link
Collaborator

dsyme commented Nov 24, 2017

A few comments:

  • There are disadvantages to making the interpretation of literal syntax more generic - the programmer can easily get more confused about inferred types, and error messages can be harder to interpret. If you did a 👍 or ❤️ then have a think about this - it's easy to upvote and then experience bad error messages later.

  • I believe it would not be a breaking change if the default type is an F# list. But error messages would become worse

  • The syntax a :: b should either be deprecated or made collection-generic as part of this

  • The library function a @ b should either be deprecated or made collection-generic as part of this

  • It would make sense to assess other non-generic literal syntaxes as part of this, e.g. ref, ``!`

  • The "builder" patterns used in System.Collections.Immutable should really be considered

  • An unlisted advantage is that better syntactic symmetry is achieved for arrays e,.g. arr.[1] and int[] and array [1;2] and [1;2] (type-directed)

  • In multiple places it would be nicer in F# if type annotations could be given simply by a use of a non-generic type name, assuming no constructor was available, i.e. let vec = RRBVector [ 1; 2; 3 ]. This also applies to record types.

@alfonsogarciacaro
Copy link

alfonsogarciacaro commented Nov 26, 2017

For reference, when designing the Fable.React bindings I chose to use list because it's much more idiomatic in F#, but it would be more performant to use arrays as we have to convert the list into arrays before sending them to React EDIT: see #625 (comment)

Saying this, I acknowledge this change could harm readability. I've already found the int[] notation to be problematic for new F# programmers in workshops.

@scrwtp
Copy link

scrwtp commented Nov 26, 2017

How about adapting the computation expression builder mechanism (and syntax) to be used as programmable collection literals? This is what seq already does (on a syntactic level), albeit with some overhead in the form of yield, e.g.

let example1 = seq { yield 1; yield 2; yield 3 }

Would it be possible to either extend the existing builder mechanism to implicitly assume yields in each line (under some reasonable conditions), or otherwise introduce a new kind of builder with a set of methods strictly targeting collection construction? So that we can write something like this:

let old1 = [ 1; 2; 3]
let new1 = list { 1; 2; 3 }

let old2 = [| 1; 2; 3 |]
let new2 = array { 1; 2; 3 }

let old3 = Map.ofList [ 1, "fst"; 2, "snd" ]
let old3 = map { 1, "fst"; 2, "snd" }

and then perhaps make an "unqualified" { ... } to be a generic builder to be resolved to a concrete collection builder if it's used in a context where its type can be unambiguously inferred, e.g.

List.map ((+) 1) {1; 2; 3}

The benefits of this approach:

  • no need to touch the existing collection literals (and invalidate everything that was written about F# to this day in the process),
  • the existing literals can be treated as special case short-hands for their respective builders,
  • it's a uniform way of adding new collection types and could be used just as well to add System.Collections.Immutable to F# core or have users create their own custom collections,
  • it's cohesive with the existing syntax for builders.

@xperiandri
Copy link

xperiandri commented Nov 26, 2017

When talking about [] syntax for collections I assume that :: and @ will also be defined for them.

@rmunn
Copy link
Author

rmunn commented Nov 27, 2017

When defining :: and @ for custom types, the question of semantics comes up. There are two ways that the :: operator could be seen:

  1. x :: xs could mean "add x to the front of the xs collection, no matter whether that's efficient". Or:
  2. x :: xs could mean "add x to the xs collection in whatever way is most efficient for the way the xs data structure is represented."

This distinction comes up in cases like the PersistentVector data type, where adding an element at the end of the vector is O(1), but adding it at the start of the vector is O(N).

Likewise, the :: pattern-match operator could be read one of two ways:

  1. The x :: xs pattern could mean "The first element of the collection is given the name x, and the rest of the collection minus the first element is given the name xs." Or:
  2. The x :: xs pattern could mean "One element of the collection, whichever is most efficient to separate out, is given the name x, and the rest of the collection minus that element is given the name xs."

Again, with the PersistentVector data structure, it's O(1) to remove the last element, but O(N) to remove the first element. So an implementation of PersistentVector would prefer to write an x :: xs match pattern that would let x be the last item of the list, and xs be all but the last element. This is the approach taken by Clojure's conj function, where (conj coll x) means "Add x to the collection as efficiently as possible": (conj [1 2 3] 4) produces [1 2 3 4] since Clojure's vectors append efficiently, but (conj '(1 2 3) 4) produces '(4 1 2 3) since Clojure's lists prepend efficiently.

Or there's a third approach possible, which would be to say that :: can be defined in one of two ways (but each collection type would choose only one way to define it);

  1. If the collection adds and removes items from the front most efficiently, :: is defined as 'T -> 'T coll -> 'T coll. Prepending an item looks like x :: xs, and appending a single item does not have a defined operator. Also, the pattern match looks like x :: xs to strip one item off the front of the collection.
  2. If the collection adds and removes items from the front most efficiently, :: is defined as 'T coll -> 'T -> 'T coll. Appending an item looks like xs :: x, and prepending a single item does not have a defined operator. Also, the pattern match looks like xs :: x to strip one item off the back of the collection.

For collections that do not maintain order, then the :: operator and pattern match would be x :: xs, and would have the semantics of "give me one item from the collection, as efficiently as possible, and I don't care which."

The advantages and disadvantages of these three approaches would be:

  1. x :: xs always means the same thing. Advantage: consistency. Disadvantage: performance trap.
  2. x :: xs means inserting as efficiently as possible, whatever that means for that collection. Advantage: performance. Disadvantage: semantics trap. 1 :: [2; 3; 4] could produce [2; 3; 4; 1] if that's how that collection works, and the compiler can't catch that.
  3. x :: xs means prepend, xs :: x means append, and collection types must define just one of them. Advantage: performance and compile-time-safe semantics. Disadvantage: is it legal F# syntax to allow both (::) : 'T -> 'T coll -> 'T coll and (::) : 'T coll -> 'T -> 'T coll operators? Or would the syntax have to be changed to allow this? (Or I suppose the latter operator could be defined as a different operator, such as :::, and collections are required to define at least one of those two).

The question of semantics for @ is less important, because there are no cases (as far as I know) where flipping the order of the arguments would improve efficiency as dramatically as O(1) vs. O(N) for collections, so there's no need to consider alternate semantics for the @ operator.

@alfonsogarciacaro
Copy link

Sorry, I just realized that my comment above was not entirely correct. Actually the situation is a bit more complicated, because in Fable.React bindings the first list (attributes) compiles to a JS object and the second compiles to spread arguments. So something like:

div [
  Width 60.
  Height 40.
] [str "foo"; str "bar"]

In JS becomes:

React.createElement("div", {width: 60, height: 40}, "foo", "bar")

When possible this transformations are done in compile time, but in other cases this happens at runtime. There's some performance penalty but using arrays probably wouldn't improve the situation drastically.

In conclusion, a change like this wouldn't necessarily benefit Fable.React bindings.

@piaste
Copy link

piaste commented Nov 27, 2017

@dsyme

In multiple places it would be nicer in F# if type annotations could be given simply by a use of a non-generic type name, assuming no constructor was available, i.e. let vec = RRBVector [ 1; 2; 3 ]. This also applies to record types.

If this suggestion were implemented, I think this part would not require any new type annotation syntax - we could simply provide a trivial function (constructor or not)

let inline RRBVector (x : RRBVector<_>) = x

and let type inference do the rest.

@rmunn

The question of semantics for @ is less important, because there are no cases (as far as I know) where flipping the order of the arguments would improve efficiency as dramatically as O(1) vs. O(N) for collections, so there's no need to consider alternate semantics for the @ operator.

I think it's exactly the same issue as ::, actually. Often enough, when you do a @ b, especially recursively, one collection is large and one is small, so it makes a difference which one becomes the 'starting' collection.

@rmunn
Copy link
Author

rmunn commented Nov 27, 2017

@piaste - But in a @ b, which one is going to be large and which one will be small? The :: operator is asymmetrical, so you can tell at compile-time which parameter is large (the collection) and which one is small (the single item). With a @ b, you can't tell at compile-time whether the user is going to append a small list at the end of a large one, or append a large list at the end of a small one. Even in recursive cases, you can't tell if the user will be doing the equivalent of a fold or a foldBack, so you can't know ahead of time which side of the parameter will be large.

Yes, there are collections (such as linked lists) where it matters which side is the large side. But since you can't know ahead of time which will be which, the best you can do is document that a @ b always produces the collection ab, and not the collection ba (for collections in which order matters).

@xperiandri
Copy link

Or there's a third approach possible, which would be to say that :: can be defined in one of two ways (but each collection type would choose only one way to define it);

  1. If the collection adds and removes items from the front most efficiently, :: is defined as 'T -> 'T coll -> 'T coll. Prepending an item looks like x :: xs, and appending a single item does not have a defined operator. Also, the pattern match looks like x :: xs to strip one item off the front of the collection.
  2. If the collection adds and removes items from the front most efficiently, :: is defined as 'T coll -> 'T -> 'T coll. Appending an item looks like xs :: x, and prepending a single item does not have a defined operator. Also, the pattern match looks like xs :: x to strip one item off the back of the collection.

I like this one if possible

@dhwed
Copy link

dhwed commented Nov 27, 2017

@rmunn For your third approach, what about the associativity?

:: is right associative, so 1 :: 2 :: [] is read as 1 :: (2 :: []). Your new operator for collections that add and remove from the end most efficiently should be instead left associative, so that [] :: 1 :: 2 is read as ([] :: 1) :: 2

I'm no expert, but I doubt that it is possible for one operator to be either right or left associative depending on the types of the arguments, as it would require the typechecking to happen before the parsing?

You could use right associativity for both cases even though it isn't ideal - there is precedent for incorrect associativity with the <| operator anyway. But I would definitely suggest a separate <:: operator for this.

@dhwed
Copy link

dhwed commented Nov 27, 2017

If we're considering altering the type of [ ... ] literals based on type inference, then we should consider the same change for numeric literals.

If [1; 2] is a list here:

let l = [1; 2]

and an array here:

let a: int[] = [1; 2]

Then there is a symmetry in 12 being an int here:

let i = 12

and a byte here:

let b: byte = 12

@Rickasaurus
Copy link

It seems like we're talking some pretty major changes here. What would be the impact on backwards compatibility?

@rmunn
Copy link
Author

rmunn commented Nov 28, 2017

As @dsyme pointed out, this might not be a breaking change if the default type for the [ ... ] syntax (in the absence of other type annotations) was an F# list. However, I'm not yet ready to edit my original description to remove the "This is a breaking change" line, because the details have yet to be worked out, and some variants of this proposal could end up being breaking changes.

But there would certainly be an impact on documentation. Pretty much everything that's been written about F# lists would have to be revised to say that the [ ... ] syntax defaults to creating a linked list, but would create an array (or other data type) if used in a given context.

And having syntax that changes based on what context it's used in is one of the biggest negatives of this proposal. Perl's "scalar context" vs. "list context", and how the context can change the nature of what's passed in, causes LOTS of confusion (and thus, bugs) in Perl code. The story is better for F# than for Perl because the compiler would catch most errors of this type since they'd be type errors. (Perl is dynamically typed, so the compiler won't catch those errors for you in Perl). But the biggest danger arising from this proposal, as I see it, isn't backwards-incompatibility, it's the potential for confusion. Any time you see the [ ... ] syntax in F# code, you would have to stop and figure out what type that really is. (Though IDEs might be able to help with that.)

@0x53A
Copy link
Contributor

0x53A commented Nov 28, 2017

I really like this proposal.

Thinking about possible breaks / bugs I would vote to disallow using [ ] for seq.

The reason is eager vs lazy, and possible multiple-enumeration.

Consider:

// dummy, pretend this is actually expensive.
let getDataFromDatabase() = [1;2;3]

let data = [
    let d = getDataFromDatabase()
    printfn "Got data from db"
    for x in d ->
        printfn "Got %A" x
        x
]

data
|> Seq.iter (fun x -> ())


data
|> Seq.iter (fun x -> ())

with this proposal, and depending on the actual implementation, it might infer data as seq, and so change from eager to lazy. from once-enumerated to twice-enumerated.

@ReedCopsey
Copy link

ReedCopsey commented Nov 28, 2017

I would vote to disallow using [ ] for seq.

I do agree that it'd make sense to keep this eager - it'd keep one more piece of complexity out of the equation.

I would rather it "implement seq via list" vs "disallow seq". This would work fine since a list is a seq. Not all seq implementations are lazy evaluated - and there's nothing saying this one could/should be, as it's a new feature.

You could potentially even inferring the type as list if it's ambiguous/could work either way, which would make data be a int list in your scenario, but would still allow for let data : int seq = [ to be a seq (backed using a list as the implementation).

That'd keep it more closely aligned to existing code (where it's always list), but not have a surprising type to an end user, as they'd expect a seq.

@0x53A
Copy link
Contributor

0x53A commented Nov 28, 2017

I think "disallow" was the wrong term to use.

I would rather it "implement seq via list" vs "disallow seq".

is basically what I meant.

The code I posted compiles today, so it needs to also compile tomorrow. But it should stay list, not become seq.


More concrete, I would disallow this:

let doIter (s:seq _) = ...
doIter [ 1;2;3 ]

(type error)

and allow this but make sure this is list-typed, NOT seq-typed:

let doIter (s:#seq _) = ...
doIter [ 1;2;3 ]

@ReedCopsey
Copy link

More concrete, I would disallow this:

That's where I disagree - I would still allow that. I would just make the [ 1;2;3 ] evaluate as a list, or be a seq backed by a list still. Really, if you just always "implement seq via list` - I don't see any disadvantages, since you eliminate the issues with eager vs lazy.

@0x53A
Copy link
Contributor

0x53A commented Nov 29, 2017

My gut feeling is that in most cases F# values explicit over implicit.

And having a binding typed as seq but actually internally be a list without any visible indicators feels wrong in that regard somehow. You can always cast it ([ 1;2;3 ] :> seq _).

But I don't have too strong an opinion on this, my main goal was to give attention to lazy vs eager and multiple-enumeration.

@ReedCopsey
Copy link

@0x53A There is no "seq" type - a seq always has to be implemented by some type under the hood. What disadvantage would using list have over any other type? It's really just an implementation detail at that point.

@0x53A
Copy link
Contributor

0x53A commented Nov 29, 2017

It's really just an implementation detail at that point.

If you use the seq builder, it generates a lazy state-machine under the hood.

let s1 = seq { printfn "hello"; yield 1 } // lazy, can print hello multiple times if iterated multiple times (or never, if not iterated)
let s2 : int seq = [ printfn "hello"; yield 1 ] // ???
let a : int array = [ printfn "hello"; yield 1 ] // eager, prints hello once
let l : int list = [ printfn "hello"; yield 1 ] // eager, prints hello once

If [] is just a general collection initializer syntax, I would expect s2 to have the same behavior as seq {}.

@alfonsogarciacaro
Copy link

It's true that seq (or IEnumerable) just means something can be iterated. But I think most .NET programmers, particularly since LINQ, associate seq with a lazy collection that shouldn't be iterated more than once. So I'm with @0x53A in that I'd try to prevent confusion in this regard. Actually in the discussions about HKT I usually say one of the advantages of not having them when dealing with collections, for example, is the intent is more explicit whether you're manipulating lazy collections (Seq module) vs eager ones (List/Array).

@dsyme
Copy link
Collaborator

dsyme commented Dec 1, 2017

But I think most .NET programmers, particularly since LINQ, associate seq with a lazy collection that shouldn't be iterated more than once.

I don't think that's the case. seq { ... } can always be iterated multiple times, but will be recomputed each time.

@alfonsogarciacaro
Copy link

Yes, sorry. What I wanted to say is iterating a lazy IEnumerable more than once is not recommended as it's difficult to know how costly is to recompute the sequence (for example it may access a database).

@Rickasaurus
Copy link

Rickasaurus commented Jun 11, 2018

This is a pretty neat idea. My only concern is that we might see breakage for existing places in the code where type inference depends on these.

How would you add support for this to new data structures? Would it only be able to apply to things in F# Core?

@xperiandri
Copy link

xperiandri commented Sep 24, 2018

The syntax a :: b should either be deprecated or made collection-generic as part of this
The library function a @ b should either be deprecated or made collection-generic as part of this

Let's introduce an interface or duck typing so that :: work for all collection implementing instance or extension method Prepend and @ for all implementing Append. That will work both for concatenation with a single element and with a collection.

@xperiandri
Copy link

@dsyme is there any considerations for F# 5 about this?

@xperiandri
Copy link

As a new major version of F# comes, we could possibly switch to System.Collections.Immutable as primary collections but for backward compatibility move old collections out to a separate package.
Also unifying collection initialization syntax for all collection types.
Do you consider this option in general?

@Happypig375
Copy link
Contributor

@xperiandri
Breaking changes are not accepted

@dsyme
Copy link
Collaborator

dsyme commented Apr 13, 2023

There's relevant discussion going on here: fsharp/fslang-design#528 (comment)

@davidglassborow
Copy link

C# unifying collection literal syntax - dotnet/csharplang#5354

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests