Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renaming openarray #312

Open
mratsim opened this issue Jan 11, 2021 · 17 comments
Open

Renaming openarray #312

mratsim opened this issue Jan 11, 2021 · 17 comments

Comments

@mratsim
Copy link
Collaborator

mratsim commented Jan 11, 2021

Openarray are a soon-to-be first-class type in Nim (#178)

The name is inherited from Pascal, Modula 2, Oberon.

It corresponds to the following concepts in other languages:

  • C++ span
  • D slice
  • Go slice
  • Rust slice and Range (and MemRange)
  • Swift ArraySlice
  • Zig Slice

Given that view types are becoming first-class, I propose that we also gradually change the name openarray to something less foreign to developers who did not grow with a Wirth language (designer of Pascal, Modula, Oberon).

I think the new name should:

  • make it easier to map to the mental model of views:
    • access to unowned data
    • no copy
    • careful about data invalidation (which the borrow checker do for us)
  • limit questions about "what is an openarray?" and "What are they for anyway?"

As we have the type names Slice and Range already taken,
we can't use those. Which leaves span like C++, ArraySlice like swift or coming up with our own.

Name

I propose the name ArrayView which is more descriptive than span and more likely to be grasped by users coming from higher-level language than C++.
It also reuses the views narrative in the doc and {.experimental:"views".} pragma or compiler flag.
In terms of typing it's the same as today.

Procedures

Furthermore, I suggest we rename toOpenArray(container, start, stop) to toView(container, start, stop) and add an overload toView(container) that slices the whole arrays or seq length.
The first will significantly improve their use in low-level libraries that routinely deal with either C code (ptr + length) or buffers from memory, streams, IO, protocols, cryptography, ...
The second covers the very common use-case where we want to pass no-copy object to delegate processing, for example across a channel.

Stretch goal

Stretch goal (emphasize mine), backward compatibility is a significant concern.

Very often, especially during advent of code season, people, even experienced, are using slicing syntax with a[start ..< stop] instead of a.toOpenArray(start, stop-1), this leads to extra allocation and GC pressure. The toOpenArray syntax is very long to type and needs this easy to forget -1 in most cases.

Now that the compiler has an escape analysis, we should consider the slicing syntax to produce a view by default. We can have a[start..<stop].clone() to produce the old seq[T] if needed.

Note: we can also consider having slice return a view if the variable doesn't escape
and a seq if it does escape but slicing would require a special signature for that.
That said, the capability would be really interesting beyond slicing:

  • Futures and Flowvars are in many case consumed in the proc that allocated them,
    having the capability to decide the allocation (stack or heap) can significantly reduce overhead. (measured 2x on Weave overhead-bound benchmarks compared to an optimized memory pool, an order of magnitude compared to malloc or the GC)
  • Tasks, Generators and closure iterators. If an iterator or generator is created and consumed in its scope it can be stack allocated. This would help significantly the C compiler potentially leading to constant folding and "disappearing closure iterators" (https://godbolt.org/g/26viuZ)
  • That capability is called in C++ "Heap-allocation elision", though I think a dedicated RFC is necessary.

Trivia

toOpenArray is called slice in the compiler (with the compiler magic mSlice)

@Clyybber
Copy link

We will probably need two kinds of slices. One that can be copied and one that cannot be copied.

@c-blake
Copy link

c-blake commented Jan 11, 2021

When the length gets me down, I just do

template toOA(x, y, z): untyped = toOpenArray(x, y, z)

Like the template above one could probably type View[T] = openArray[T].

I will say that if you want to call it toView then I think the type should be called View not ArrayView just for internal consistency in Nim. (or toArrayView which is exactly as long as toOpenArray...)

I think (Array)View/to(Array)View could exist alongside openArray/toOpenArray for a long time, but I'm also very unsure that this is worth the confusion that might also arise from such name duplication for users. I can already see the "What is the diff between toArrayView and toOpenArray?" questions. There are several such Wirth-isms (e.g. ord/chr, though Python picked that one up, too). So, I think this is probably a tough judgement call. If Slice were untaken in Nim it might be easier.

@Araq
Copy link
Member

Araq commented Jan 11, 2021

So ... you want to rename it from a long name to an equally long name? And for what exactly, the users who cannot read a basic manual? There is no evidence that newcomers have trouble to pick up OpenArray as long as it happens to be part of the learning material they use.

@ZoomRmc
Copy link

ZoomRmc commented Jan 12, 2021

I'm not sure about renaming but I fully agree slicing should produce a view by default.

In regards to a rename: If something needs to be picked, it should be short and clear. From the top of my head: Frame could work. It's clear that it's a limited view, it's short, it's not awkward to type/pronounce as in "framed view", "frame that seq" etc. On a semantic level I think it suits better than Slice with which we're mostly stuck for historical reasons - slicing an object means physically removing a part of it, which is not happening with a view, and by this time, most slices in other languages are reference types.

@mratsim
Copy link
Collaborator Author

mratsim commented Jan 12, 2021

If we want short we can also reuse span from C++ but both slice and view are more descriptive than openarray.

We can always defer to learning material but we can also help people by using easier names.

Regarding frame I've never used it in that context so I can't comment.

@Araq
Copy link
Member

Araq commented Jan 13, 2021

Given the current design problem with first class openArrays, we might indeed need to rename openArray, we need both an immutable and a mutable variant and the planned var openArray proved to be insufficient:

proc `[]`(t: Table; key: K): var V

var x: Table[openArray] # but now `[]` only offers a single level of mutability!

So we I propose View (immutable view) and MView as the M prefix for "Mutable" is already used for mitems.

@HugoP707
Copy link

i dont like the idea of renaming it, however, i really want an overload for toOpenArray(container) that slices the whole arrays or seq length.

@c-blake
Copy link

c-blake commented Jan 13, 2021

I like short names (more than most) and am ok with View/toView and MView, but if folks care more about backward compat. then the existing openArray could pair with mOpenArray. There is enough Nim code out there using openArray (EDIT: at least 30% of nimble packages) that we may never want to deprecate that name. We may never have to, but "rename" to me suggests eventual deprecation.

@ZoomRmc
Copy link

ZoomRmc commented Jan 13, 2021

Shouldn't we better reserve View and MView for concepts, as these seem like more general terms, applicable to generic containers?

@c-blake
Copy link

c-blake commented Jan 13, 2021

I may be wrong, but I think if we could get -d:v1 below to work the bike-shedding about what name to use could maybe be reduced (i.e. C++ folk could use Span and MSpan since they're used to that and others could use whatever):

# template toView(x,y,z): untyped = toOpenArray(x,y,z) # just works
let x = [0, 1]
when defined(v1): # Error: type expected {or id expected if (View)}
  template View = openArray
  proc foo[T](x: View[T]): int = discard
  echo foo(x)

when defined(v2): # C error: too many args to function
  type View[T] = openArray[T]
  proc foo[T](x: View[T]): int = discard
  echo foo(x)     # gen code calls foo_mangle(x_mangle, 2)

when defined(v3): # works, but may not inherit magic properties
  type View[T] = concept a
    a.len is Ordinal
    a[0] is T
    for x in a: x is T
  proc foo[T](x: View[T]): int = discard
  echo foo(x)

I think v2 may be a bug to be reported? and might lose magical properties of openArray (and some forthcoming mOpenArray or MopenArray or whatever).

v3 works, but seems unlikely to inherit any magical properties. Maybe someone more steeped in the order of events in the compiler knows that even v1 will lose magical properties?

These questions also relate to symbol aliasing questions/PRs, linked to just for reference.

@juancarlospaco
Copy link
Contributor

I am neutral.

...but please consider that a ton of code uses openArray, so renaming/removing it will break a lot of code. 😐

@Araq
Copy link
Member

Araq commented Jan 14, 2021

openArray remains and needs to remain, we need the distinction between View and openArray, var openArray has an []= operator but var View does not! It'll be complex... :-(

@alaviss
Copy link

alaviss commented Mar 25, 2021

Will the notion of distinct openArray ever be introduced? It could be useful for slicing a string so that echo for example will print the string instead of an array.

@mratsim
Copy link
Collaborator Author

mratsim commented Mar 26, 2021

Will the notion of distinct openArray ever be introduced? It could be useful for slicing a string so that echo for example will print the string instead of an array.

I use this construct instead

proc foo[T: not char](oa: openarray[T]) =

@alaviss
Copy link

alaviss commented Mar 28, 2021

I'm not sure how that relates to the usage I was referring to...

@mratsim
Copy link
Collaborator Author

mratsim commented Mar 28, 2021

It avoids openarray[T] from matching against a string.

@alaviss
Copy link

alaviss commented Mar 28, 2021

I was referring to this usage:

type StringView = distinct openArray[char]

proc `[]`(s: string, range: Slice[int]): StringView

proc print[T](oa: openArray[T])
proc print(sv: StringView)

print("string"[0..2]) # prints "str"
print(['s', 't', 'r']) # prints "['s', 't', 'r']"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants