Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Array iterator classes private to hide implementation from client #193

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
e6691a1
Initial creation
pmetras Jan 9, 2022
84535ad
Remove RFC number
pmetras Jan 9, 2022
42b09a7
Update RFC to apply API change to all collection classes
pmetras Jan 19, 2022
5f5d79f
Line break on column 80
pmetras Jan 19, 2022
a52eb5f
List signature changes in collections
pmetras Jan 19, 2022
73099ee
https://github.com/ponylang/rfcs/pull/193#discussion_r787306884
pmetras Jan 19, 2022
873c378
https://github.com/ponylang/rfcs/pull/193#discussion_r787309484
pmetras Jan 19, 2022
b0fd354
https://github.com/ponylang/rfcs/pull/193#discussion_r787298485
pmetras Jan 19, 2022
f492370
https://github.com/ponylang/rfcs/pull/193#discussion_r787310073
pmetras Jan 19, 2022
62a7b03
https://github.com/ponylang/rfcs/pull/193#discussion_r787311214
pmetras Jan 19, 2022
48fc8bd
https://github.com/ponylang/rfcs/pull/193#discussion_r787302750
pmetras Jan 19, 2022
255c487
https://github.com/ponylang/rfcs/pull/193#discussion_r787303413
pmetras Jan 19, 2022
869eb82
Include changes from Github comments
pmetras Feb 10, 2022
2bfd088
Merge branch 'ponylang:main' into rfc-array
pmetras Feb 10, 2022
7bf7641
Generate HTML
pmetras Feb 23, 2022
06018fc
Merge branch 'rfc-array' of https://github.com/pmetras/rfcs into rfc-…
pmetras Feb 23, 2022
e60bb5a
Move to docs
pmetras Feb 23, 2022
16ec9ef
Set theme jekyll-theme-slate
pmetras Feb 23, 2022
5b91ce2
Merge branch 'ponylang:main' into rfc-array
pmetras Feb 24, 2022
7fda29d
Rename to index.html
pmetras Feb 24, 2022
4363ab0
Merge branch 'rfc-array' of https://github.com/pmetras/rfcs into rfc-…
pmetras Feb 24, 2022
5dbe1ca
Rename to index.html
pmetras Feb 24, 2022
68b7695
With Front Matter
pmetras Feb 24, 2022
c0da56d
Added Front Matter
pmetras Feb 24, 2022
b539018
Change layout to default
pmetras Feb 24, 2022
44103dd
Improve Front Matter
pmetras Feb 24, 2022
d61d656
Add rendered page and updates
pmetras Feb 24, 2022
18f8d49
Use Github rendering instead of pandoc
pmetras Feb 24, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
366 changes: 366 additions & 0 deletions text/0000-array-private-classes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,366 @@
- Feature Name: array_private_classes
- Start Date: 2022-01-08
- RFC PR: (leave this empty)
- Pony Issue: (leave this empty)

Rendered RFC: https://github.com/pmetras/rfcs/blob/rfc-array/text/0000-array-private-classes.md

# Summary

Collection classes should not expose internal classes through iterators functions.

Following information hiding design principle, the builtin classes of collection
data structures must not be made visible through iterator functions like `ArrayKeys`,
`ArrayValues` and `ArrayPairs`. These classes can be made private as they are
only used as return types for `Array` functions `keys`, `values` and `pairs`.
The return values for these functions are changed to the more general interface
`Iterator`.
Comment on lines +12 to +17
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without "Following information hiding design principle"

being removed. I can't support this RFC. There's no "design principle" that I want to codify into Pony via RFC as a precedent for making changes.


A new interface `CollectionIterator` is defined to allow for rewindable iterators,
like it is the case for `Array` `values`.

This design principle is applied to the other collection classes that expose
internals too like:
Comment on lines +22 to +23
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, no, I am not in favor of anything in here that I am in favor of based on an abstract design principle. I can't support this language being included.


* `List`
* `Map`
* persistent `Map`
* `Vec`
* persistent `Vec`
* `Set`
* `Itertools`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

itertools seems out of place to me here. I see no internals of itertools being exposed. Iter is the only public type in itertools.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed this below. There are references to ArrayValues within itertools which would need to change. Having itertools in this list is still out of place though as this list is for "other collection classes that expose internals too" which itertools is neither "[an]other collection class" nor does it "expose internals too".


This is a breaking change for collections' client code that use now internal
classes but a search on Github repositories shows that the impact should be
limited.

# Motivation

This change brings:

- Applying the design principle of
[hiding implementation details](https://en.wikipedia.org/wiki/Information_hiding)
but offer a general and stable interface. Returning interfaces instead of concrete
classes allows changing the implementation. Usually, one must return the most
general type that fullfils the contract of the function (in the case of the
functions discussed in this RFC, iteration).
- Collections' functions `keys`, `values` and `pairs` definitions are made more
general. Iterators implementation details are not public. Internal classes used
by implementation like `*Keys`, `*Values` and `*Pairs` are now
[opaque data types](https://en.wikipedia.org/wiki/Opaque_data_type). Generally,
when using these collection classes, clients are not interested by the iterators
implementation, but by the types these iterators return and that is provided by
the generic parameters.
- The generic return signature of these 3 iterating functions is simpler to
understand for clients of collection classes.
- Reduces the number of public classes in the standard library by hiding 18
specialised classes (iterators implementations) of which 3 are from the
`builtin` module.
- The interface `CollectionIterator` is added to create rewindable iterators
(can be re-start from first value).

This change remains compatible with the existing code base but for client code
that is directly using the classes `*Keys`, `*Values` and `*Pairs`. A search on
Github shows that the impact is very limited.

To quote Antoine de Saint-Exupéry:

> Perfection is achieved, not when there is nothing more to add, but when there
> is nothing left to take away.
Comment on lines +66 to +69
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To quote from the Pony Philosophy:

Performance

Runtime speed is more important than everything except correctness. If performance must be sacrificed for correctness, try to come up with a new way to do things. The faster the program can get stuff done, the better. This is more important than anything except a correct result.

Simplicity

Simplicity can be sacrificed for performance. It is more important for the interface to be simple than the implementation. The faster the programmer can get stuff done, the better. It’s ok to make things a bit harder on the programmer to improve performance, but it’s more important to make things easier on the programmer than it is to make things easier on the language/runtime.

If making this change to abstract types has a performance impact than it is not inline with our driving philosophy which places performance at a higher priority than simplicity.


Currently the API makes a guarantee that the return value of each of these functions
is a specific concrete class with a specific public name. That particular guarantee
does not actually serve any practical needs of users (except, arguably, the virtual
calls caveat that was mentioned on [Github discussion](https://github.com/ponylang/rfcs/pull/193#issuecomment-1009733064)). If it is not helpful to users in practice,
it can be removed. And if it can be removed, then it should be removed, in the
name of taking another step toward stabilizing the Pony standard library's API.
Comment on lines +71 to +76
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This linked issue comment is in line with my previous review comment concerning performance taking priority over simplicity in the Pony Philosophy. I would like to see some evidence that the compiler as it is today devirtualize the calls and as such we get both performance (no worse than today) and simplicity (improved by this RFC).


It's worth mentioning that if this change was being debated in reverse (the
current standard library had minimal interfaces for the return values, and
somebody proposed to make them public concrete classes instead, because they
found some tangible benefit in doing so), then it would not be a breaking change
at all, because the new proposed type would be a subtype of the existing return
types.

It'll always be easier to get more specific with these return types than more
general. So we should default to offering the most general return type that is
still useful to users, and ratchet down to more specific subtypes as needed (as
tangible use cases for more specific return types are discovered). Ratcheting down
will always be possible without a breaking change, but ratcheting up will be a
breaking change each time. It's best to make such "ratchet up" breaking changes
like this RFC before we reach Pony 1.0.0 as part of a general effort to make Pony
and its standard library stable.
Comment on lines +85 to +92
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not agree with this as motivation. It is always easier to start with many concrete implementations/returns which are then used to derive a generic implementation/return and trying to be generic before concrete presents more room for instability.


# Detailed design

Iterating functions in collections `keys`, `values` and `pairs` are changed to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your example they return a CollectionIterator, not an Iterator.

return `Iterator` and the classes that implement these iterators are made private.
Here are the full implementation of these functions for the `Array` class (changes
in other collection classes are identical).

As the function `values` of class `Array` uses an iterator with a `rewind` function
that is not part of the `Iterator` interface, a new interface `CollectionIterator`
is added to enable creation of rewindable iterators.

```pony
fun keys(): CollectionIterator[USize]^ =>
"""
Return an iterator over the indices in the array.
"""
_ArrayKeys[A, this->Array[A]](this)

fun values(): CollectionIterator[this->A]^ =>
"""
Return an iterator over the values in the array.
"""
_ArrayValues[A, this->Array[A]](this)

fun pairs(): CollectionIterator[(USize, this->A)]^ =>
"""
Return an iterator over the (index, value) pairs in the array.
"""
_ArrayPairs[A, this->Array[A]](this)
```

Note: To remain consistent with `Array` behaviour, functions `keys` and `pairs`
will return a `CollectionIterator`.

```pony
interface CollectionIterator[A] is Iterator[A]
"""
A `CollectionIterator` is an iterator that can be rewinded, that is start
again from first item. The data structure being iterated on can't change the
order it return iterated items.
"""
fun has_next(): Bool
"""
Return `true` when function `next` can be called to get next iteration item.
"""

fun ref next(): A ?
"""
Return the next item of the iteration or an error in case there are no other
items. A previous call to `has_next` check if we can continue iteration.
"""
Comment on lines +135 to +144
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be my personal preference or lack of knowledge on the topic, but is there a reason to repeat these methods which are found on Iterator?

If these methods must be repeated I would prefer we add tests which check that the properties of CollectionIterator in the stdlib match Iterator. That is, the elements from Implementing Iterators hold for "implementing CollectionIterators" just as well.


fun ref rewind(): Iterator[A]^
"""
Start the iterator over again from the beginning.
"""
```

The code of the standard library is adapted to remove use of these now private
classes, mainly in tests. Here are the files that must be changed:
Comment on lines +152 to +153
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What private classes? Isn't this RFC about making now public classes into private classes?


* `packages/builtin/array.pony` as shown above
* `packages/itertools/iter.pony` in function `cycle`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that in source cycle has a type annotation to ArrayValues[A!, Array[A!]]. This would indeed need to change, but does clarify by previous question/concern of where itertools needs to change.

* `packages/collections/heap.pony` in function `values`
* `packages/collection/builtin/_test.pony` in class `_TestArrayValuesRewind`
* `packages/collections/list.pony`
* `packages/collections/map.pony`
* `packages/collections/persistent/map.pony`
* `packages/collections/persistent/vec.pony`
* `packages/collections/set.pony`
* `test/libponyc/util.cc` to change the name of the class to `_ArrayValues`

## Detailed changes

In order to judge how the API becomes simpler to understand for clients of the
collections classes, here are the changes in the functions' signatures. The `-`
line shows the old signature while the `+` one is the new:

```pony
// Array
- fun keys(): ArrayKeys[A, this->Array[A]]^ =>
+ fun keys(): Iterator[USize]^ =>
- fun values(): ArrayValues[A, this->Array[A]]^ =>
+ fun values(): CollectionIterator[this->A]^ =>
- fun pairs(): ArrayPairs[A, this->Array[A]]^ =>
+ fun pairs(): Iterator[(USize, this->A)]^ =>

// Heap
- fun values(): ArrayValues[A, this->Array[A]]^ =>
+ fun values(): Iterator[this->A]^ =>

// List
- fun nodes(): ListNodes[A, this->ListNode[A]]^ =>
+ fun nodes(): Iterator[this->ListNode[A]]^ =>
- fun rnodes(): ListNodes[A, this->ListNode[A]]^ =>
+ fun rnodes(): Iterator[this->ListNode[A]]^ =>
- fun values(): ListValues[A, this->ListNode[A]]^ =>
+ fun values(): Iterator[this->A]^ =>
- fun rvalues(): ListValues[A, this->ListNode[A]]^ =>
+ fun rvalues(): Iterator[this->A]^ =>

// Map
- fun keys(): MapKeys[K, V, H, this->HashMap[K, V, H]]^ =>
+ fun keys(): Iterator[this->K]^ =>
- fun values(): MapValues[K, V, H, this->HashMap[K, V, H]]^ =>
+ fun values(): Iterator[this->V]^ =>
- fun pairs(): MapPairs[K, V, H, this->HashMap[K, V, H]]^ =>
+ fun pairs(): Iterator[(this->K, this->V)]^ =>

// Persistent Map
- fun val keys(): MapKeys[K, V, H] =>
+ fun val keys(): Iterator[K] =>
- fun val values(): MapValues[K, V, H] =>
+ fun val values(): Iterator[V] =>
- fun val pairs(): MapPairs[K, V, H] =>
+ fun val pairs(): Iterator[(K, V)] =>

// Persistent Vec
- fun val keys(): VecKeys[A]^ =>
+ fun val keys(): Iterator[USize]^ =>
- fun val values(): VecValues[A]^ =>
+ fun val values(): Iterator[A]^ =>
- fun val pairs(): VecPairs[A]^ =>
+ fun val pairs(): Iterator[(USize, A)]^ =>

// Set
- fun values(): SetValues[A, H, this->HashSet[A, H]]^ =>
+ fun values(): Iterator[this->A]^ =>
```

For instance, in `Array`

```pony
- fun keys(): ArrayKeys[A, this->Array[A]]^ =>
+ fun keys(): Iterator[USize]^ =>
```

now the client user knows now that she gets an iterator over `USize`.

Another more complex example with `Map`,

```pony
- fun keys(): MapKeys[K, V, H, this->HashMap[K, V, H]]^ =>
+ fun keys(): Iterator[this->K]^ =>
- fun values(): MapValues[K, V, H, this->HashMap[K, V, H]]^ =>
+ fun values(): Iterator[this->V]^ =>
```

we see that we iterate over keys (`K` type) or values (`V`). The initial signature
gave complex generic classes that the user has to find in the documentation to
understand that they are iterators, and then dive deeper to understand on what
they iterate.
Comment on lines +242 to +245
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think these new signatures are more readily understood.


# How We Teach This

This change keeps the code compatible in the vast majority of cases. When client
classes are defining objects of these now private types, the reason is usually
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This RFC would be making them private types, but as of now they are now public types.

to get access to the function `rewind` that was not defined in `Iterator`. By
adding the interface `CollectionIterator`, client code can easily be adapted,
replacing `ArrayValues[A]` by `CollectionIterator[A]`.

Also, client code generally uses these functions to iterate on the returned types
and does not try to access the iterator directly but is interested by the iterated
items. When client code refers to the iterator type, that's generally useless and
the code can be rewritten to be made shorter and more future proof.
Comment on lines +257 to +258
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is referring to a type useless? Code that is rewritten to update a referenced type is neither more nor less "useless" than before.


A [search on Github Pony code](https://github.com/search?q=%22ArrayValues%22+language%3APony&type=code)
finds 24 files using the class `ArrayValues`, of which 6 are copies of `array.pony` file.

For instance, in
[xml2xpath.pony](https://github.com/redvers/pony-libxml2/blob/bbca5d98d48854bfec2c6ee110220873ecc4df34/pony-libxml2/xml2xpath.pony#L41),
the code can be changed from

```pony
fun values(): ArrayValues[Xml2node, this->Array[Xml2node]]^ ? =>
if (allocated) then
ArrayValues[Xml2node, this->Array[Xml2node]](nodearray)
else
error
end
```

to

```pony
fun values(): CollectionIterator[Xml2node]^ ? =>
if (allocated) then
nodearray.values()
else
error
end
```

In this sample, the developer was not really concerned by the type of the iterator
but that the `values` function must return an `CollectionIterator` over `Xml2node`.
Comment on lines +287 to +288
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again you cannot speak on behalf of the mindset of the developer about whether or not Red was "not really concerned" about an aspect of his code.

You splice your changes into your explanation here in a way that suggests Red was only concerned that values returns a yet unavailable CollectionIterator type. Which gets at my comment just above that you cannot speak on behalf of the mindset of the developer here.

The new version makes the code simpler to understand.

This change in `array.pony` and other collections will break such code but it
can be easily adapted to use the new API. And it will make the standard library
easier to learn by reducing the number of public types.
Comment on lines +291 to +293
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree that reducing the number of public types makes it initially easier to learn Pony. However, it the changes suggested here result in a performance decrease I cannot support these changes. My reason being that it takes only slightly longer to learn what ArrayValues, ArrayKeys and ArrayPairs return then have that knowledge "banked" for future use of Array.


# How We Test This

Pony tests must continue to pass. No additional tests are need as after review
the existing coverage in Pony standard library tests is sufficient
Comment on lines +297 to +298
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see tests added that guarantee all CollectorIterators follow the rules for implementing Iterators. They "should" but as part of duplicating the methods of Iterators test to ensure the absence of future regressions need to be added.


# Drawbacks

Will break any existing code that uses any of the classes that are currently
public and will be made private by this RFC.

# Alternatives

1. Stay as is. Continue the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we continue the conversation of keeping this as is? If no change is needed then no conversation is needed? Are you suggesting other changes which should be pulled into this RFC from Zulip?

[discussion on Zulip](https://ponylang.zulipchat.com/#narrow/stream/189959-RFCs/topic/Make.20Array.20iterators.20private).

2. Update the existing concrete classes to include `rewind` where needed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this an alternative?


3. Instead of defining a new type of iterator with the `CollectionIterator` interface,
we can consider only the rewindable part of it in an `Rewindable` interface:

```pony
interface Rewindable[A]
fun ref rewind(): A^
"""
Rewind the type `A`.
"""
```

Then one can create a rewindable iterator by combining these two interfaces into
a intersection type:

```pony
type CollectionIterator[A] is (Iterator[A] & Rewindable[Iterator[A]])
```
Comment on lines +312 to +328
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my preference, aka how I would write my code, but I have had conversations which swayed me against this. I do not, as I sit here now, recall the reasoning that swayed me, but I am leaving a review comment here to voice my position on this alternative.


Perhaps, a more general interface name instead of `Rewindable` would be `Resetable`
to define the traits of a type that can be reset to its initial state, as in the
case of iterators `reset` == `rewind`.

The rewindable type can be used with other types than `Iterator`, like a data
structure that would implement a rewindable property. This alternative was
[put aside](https://github.com/ponylang/rfcs/pull/193#discussion_r780793165) to
prevent name colisions in `builtin` with user-named types.

## Comparison of alternatives

1. Do nothing: return public concrete classes.
2. Update the existing return classes and add `rewind`.
3. Add a `CollectionIterator` interface for collections.
4. Create intersection type by defining a `Rewindable` interface and combine
with `Iterator`.
5. Change `Iterator` to add `rewind`.

| Justification / Alternative # | 1 | 2 | 3 | 4 | 5 |
|:------------------------------------------|:---:|:---:|:---:|:---:|:---:|
| a. Simpler stdlib API | --- | --- | + | +++ | +++ |
| b. Simpler function signatures | --- | --- | + | +++ | +++ |
| c. Stay compatible with existing stdlib | +++ | +++ | + | +++ | + |
| d. Evolutive | --- | + | +++ | +++ | + |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes for the sake of changes is not evolution -- there must be positive or negative selection which lead microevolution into macroevolution. Put another way, if all the suggested changes have negative effects than it is still "evolution" to do nothing as we do not bring those negative effects forward with us.

Side note: I am a career "full stack biologist" who has done work ranging from cancer to conservation to COVID-19 -- Pony is the project that I volunteer my time with when not working on the "evolving nature" of biological problems.

| e. No impact on compiler | +++ | +++ | - | +++ | - |
| f. No impact on performance | +++ | +++ | +++ | +++ | +++ |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you benchmarked these changes to ensure no performance impact?

| g. Limit stdlib pollution with interfaces | +++ | +++ | + | + | +++ |
| g. Limit stdlib pollution with classes | --- | --- | +++ | ++ | +++ |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is "g" repeated?


# Unresolved questions

- Must analyze how `Range` and `Reverse` would be impacted if defined as
`CollectionIterator`. Particularly, impact on existing client code.
- Possible candidates to be analyzed: `StringBytes`, `StringRunes`, `ByteSeqIter`
and probably others will be found in stdlib code.
- Understand why the intersection type proposal is not considered as it seems to
provide a better flexibility and is easier to understand.
Comment on lines +359 to +366
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of these are questions.