Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First crack at new List API, intended to replace "array-as-vec" operations on arrays #13030

Closed

Conversation

dlongnecke-cray
Copy link
Contributor

@dlongnecke-cray dlongnecke-cray commented May 15, 2019

This PR is intended to propose (and not implement - not quite yet) an API for a new List type, modeled in the spirit of the Python list. It is being written to address the concerns raised in #9452, and will partially resolve #12411 (the other half being the design of a Deque type).

I know that @mppf in particular has raised the idea of limiting the number of List operations that move elements within the list. Moving list elements is a problem because it invalidates any references to List elements held by tasks that were taken before the move occurred.

There are three general safety issues to worry about in the design of this List type that can affect the API (and implementation):

  • Parallel safety: Since Chapel is a highly concurrent language, this is a justified concern. Modifications of a List should be parallel safe in general.
  • Reference invalidation: Any time list elements are moved around, references to List elements are potentially invalidated. IE, if we have a hold a reference to an element at the end of a List, and then insert at the front of the List, that reference is no longer semantically valid (it now points do a different element).
  • Iterator invalidation: Appends to the end of a well implemented List don't invalidate freestanding references, but they can potentially (?) invalidate iterators. This may be an unfounded concern (solved by parallel safety).

I will leave this PR open after we resolve the questions raised in #12411, and use it after we decide on the direction of the implementation.


Michael proposed a implementation that would avoid invalidating references on resize and maintain O(1) indexing https://github.com/chapel-lang/chapel/pull/12791/files#r275528150.

While I'm planning on using Vass's implementation used in VectorList #12791 to get myself started, it seems like a good idea to adopt a more performant indexing strategy in the long run.

If anyone else has any implementation questions/ideas/concerns, please feel free (I could use the answers as well!).


One avenue of debate is whether List should provide methods such as:

  • insert
  • remove
  • pop

Even pop can be considered to invalidate references in a way (what if the element returned is dropped on the floor?). So if we really wanted to limit the API in the name of safety, we could omit the pop method too.

But...how far will we go with this line of reasoning before a List object ceases to be useful at all? I feel personally that if we omit the methods insert and remove, our List ADT is really nothing more than a glorified stack.

Just my two cents. Please critique away!

Copy link
Member

@lydia-duncan lydia-duncan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are all the API questions/suggestions I have

modules/standard/Lists.chpl Show resolved Hide resolved
modules/standard/Lists.chpl Outdated Show resolved Hide resolved

:arg other: A List, the elements of which are appended to this List.
*/
proc extend(other: List(this._etype)) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be able to get away with using that method wrapper here as well. If not, we could try using the generic version of List and having checks in the body of extend to ensure the appropriate behavior is maintained.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true, might that be the more appropriate solution here?

Copy link
Member

@lydia-duncan lydia-duncan May 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I'd want to look at the generated documentation with the method call before making a decision

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working on that, waiting on the filesystem...

Is not currently parallel safe.

The following operations may trigger a move:
- insert
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to link to the definitions of these functions elsewhere in the documentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if I would repeat the information in the module documentation, but I think it is appropriate here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be sure, the module documentation is in no way final at this point, and it contains a lot of the ramblings and musings that I put in the OP. I hope to have it sorted out after we get the API and semantics of List finalized.

But yes, that is a really good reminder, thanks!

/*
Insert an element at a given position in this List, shifting all elements
following that index one to the right. Note that `a.insert(a.size, x)`
is equivalent to `a.append(x)`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Append or extend?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to link to the functions you are mentioning, both here and later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

modules/standard/Lists.chpl Outdated Show resolved Hide resolved
:return: An element from this List.

:throws IllegalArgumentError: If the given index is out of bounds.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we documented this/these/writeThis functions in other places?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like builtins would be documented in an obvious part of the Chapel Docs (I know we do here https://chapel-lang.org/docs/primers/specialMethods.html, but haven't checked the manual itself).

modules/standard/Lists.chpl Outdated Show resolved Hide resolved
modules/Makefile Outdated Show resolved Hide resolved
@dlongnecke-cray
Copy link
Contributor Author

Additional food for thought: #12411 (comment)

Of these options, I'd figure we're best targeting the "most general" List first, but feel free to advocate another option instead.

modules/standard/Lists.chpl Outdated Show resolved Hide resolved
modules/standard/Lists.chpl Outdated Show resolved Hide resolved
modules/standard/Lists.chpl Outdated Show resolved Hide resolved
modules/standard/Lists.chpl Outdated Show resolved Hide resolved
modules/standard/Lists.chpl Outdated Show resolved Hide resolved
modules/standard/Lists.chpl Outdated Show resolved Hide resolved
modules/standard/Lists.chpl Outdated Show resolved Hide resolved
modules/standard/Lists.chpl Outdated Show resolved Hide resolved

:arg x: The element to remove.

:throws IllegalArgumentError: If there is no such item.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably use a different Error

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a ..note about it invalidating references

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Python API throws a ValueError, which is consistent with our use of IllegalArgumentError. Do you have an idea for a different error type?

modules/standard/Lists.chpl Outdated Show resolved Hide resolved
modules/standard/Lists.chpl Outdated Show resolved Hide resolved
@mppf
Copy link
Member

mppf commented May 16, 2019

The "truncated" API that you first suggested without insert or remove is pretty much a Stack ADT, so we could provide a Stack type as well? Or, given your new stance on the matter, we can just stick with the Python approach and use List as a superset of stack-like operations.

A Stack could just have a List and offer a restricted set of methods. But I think it's reasonable to proceed with List with parallel-safety warnings on certain operations.

For example, I'm not so sure the reverse method is necessary, but I am in favor of a toString method whereas @mppf is not (believing writeThis to be sufficient).

reverse seems O.K. to me to keep but I also don't know of a compelling reason to have it. We could certainly leave it out for now.

You (@mppf) think toString is unnecessary, but do you also think toArray is unnecessary as well? I carried over isEmpty from the Chapel string, and toArray from @vasslitvinov VectorList implementation.

I think toArray (in some form) is worthwhile.

Should we perhaps provide a concat function instead of overloading +? Or should we just punt on it completely?

You already have firstList.extend(secondList). I don't think we need a separate concat or + function.

Since there is a precedent established for using a param flag to enable parallel safety, I feel like opting to use parSafe is a reasonable choice.

Sounds good to me.

The big question (assuming no objectors here) is whether parSafe should be true or false by default.

I don't think we have to decide this right now.

well as iterators to List elements produced by tasks before the move
occurred.
*/
class List {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It must be a record if you want to overload = or supply a copy initializer

@dlongnecke-cray
Copy link
Contributor Author

Should the List type support a deep copy on assignment (IE, an overload of the = operator along with making it a record) or not?

IE, when I assign L2 to L1, the contents of L1 are now a copy of the elements contained in L2.

It seems to me that this would go against the behavior expected by Python users, where lists are assigned by reference and copy.deepCopy or some other operation must be called to get a deep copy of the list.

Thoughts?

@lydia-duncan
Copy link
Member

I'm more curious to see if there are any operations that you (@lydia-duncan) feel are unnecessary in the API, and could be removed.

Or, to the contrary, is there something you feel every good list type should have that we don't?

Nothing jumps out at me at the moment, either way, but I do have a vague feeling that there might be something in the back of my mind. I'll let you know if anything comes of that :)

@mppf
Copy link
Member

mppf commented May 16, 2019

Should the List type support a deep copy on assignment (IE, an overload of the = operator along with making it a record) or not?

I think it should deep copy, because that's how arrays work in Chapel, and I think we're trying to make something array-like but more flexible (in terms of adding elements etc).

@dlongnecke-cray
Copy link
Contributor Author

So it sounds like @bradcray was strongly in favor of making List a record as well, so unless anyone objects that will be the standard for List and all containers added going forward.

@bradcray
Copy link
Member

I haven't read through all 78 comments to check, but a quick search suggests not: Has anyone asked the question as to whether we'd prefer upper-case vs. lower-case names for these types? (list vs. List)

Lower-case makes it feel a bit more first-class to me and reduces the amount of Shift-Typing required. It would also permit us to call the module List rather than Lists if that was preferable.

(This obviously relates a bit to #6698).

@dlongnecke-cray
Copy link
Contributor Author

It seems to me that the type name should be List rather than list, because the latter implies to me that the type is a "built-in" type in the same way that string is. However I think List is intended to be a "standard library" container, and so naming it in a manner consistent with all the other containers we have so far (whatever that might be/end up being) is a good idea.

I have no personal preference either way, so I guess this ultimately depends on what is decided on for #6698.

@bradcray
Copy link
Member

Other libraries use lowercase, though, don't they? (e.g., C++ STL?). I think there are aspects of libraries that are important and core enough to be as though they are as good as built-in (writeln and writef come to mind) and expect that list will eventually feel similarly core, especially given how popular push_back() has become for arrays. That's not to say that it has to be lowercase, but I don't think we should think of it as being in the ghetto in any way just because it's not a reserved word or something the compiler needs to deal with.

@dlongnecke-cray
Copy link
Contributor Author

I think that it ultimately comes down to what the style guide ends up saying about the naming convention for records and classes.

I don't envision the string type to be a record or class, but rather a core primitive type baked into the language itself. When I see a List, however, I think to myself "ok, this is the standard library implementation of something that I very well could implement myself". In that respect I'd expect it (along with every other container provided by the STL) to abide by the same naming conventions used by all classes and records in the standard library.

As for whether the naming convention for classes and records should be some_object or SomeObject or someObject, that doesn't particularly matter to me as long as it's consistent.

If it's the need for consistency (IE #6698 itself) you are debating, then that's another can of worms.

@bradcray
Copy link
Member

The style guide doesn't exist, though, and I don't think we should wait for it to be written. In a sense, we're debating some of the seeds of the style guide here and now...

@dlongnecke-cray
Copy link
Contributor Author

If we want to live in a world similar to C++ STL where snake_case_is_the_norm_everywhere, I'm totally OK with that too, as long as everything in the STL is internally consistent (which snake case for all types certainly would be).

Maybe I should take a poll?

@bradcray
Copy link
Member

I definitely think it's a decision that's bigger than those of us paying attention to this issue, but I didn't mean to imply snake_cast. Rather I was envisioning camelCase. In #6698, I think Michael suggested that a potential deciding point might be that value-based data types could be camelCase and reference-/class-based data types could be PascalCase, and I think that is an intriguing notion that has resonated pretty well with me over the past 12 hours or so.

@dlongnecke-cray
Copy link
Contributor Author

dlongnecke-cray commented May 17, 2019

Oh.

That never (edit, to clarify: the notion of using camelCase for records) even crossed my mind. To me using different naming conventions recordCase vs ClassCase seems like an abomination, but that's probably because I've never seen it before...

@dlongnecke-cray
Copy link
Contributor Author

So it seems like most questions/concerns about the List API have been resolved, with the exception of:

  • I'm not sure why @mppf wants to use the in intent for insert and append? Wouldn't that make an unnecessary copy of the element to be appended?
  • It is undecided whether to call the type List or list.

As for the latter, I will proceed with List, and can change the name later if we decide otherwise.

@mppf
Copy link
Member

mppf commented May 20, 2019

I'm not sure why @mppf wants to use the in intent for insert and append? Wouldn't that make an unnecessary copy of the element to be appended?

We resolved several issues with array-as-vec storing owned elements by using the in intent on push_back. At the same time, the in intent is appropriate here because it creates the copy if the argument is another variable but not if it's the result of a function call. The important point is that the value must then be "move"d into the List storage.

The obvious alternative is to use = in the implementation of append however that has the problem that it assumes that the elements were default initialized, which I'd like us to get away from, so that we can support Lists storing elements with no default (such as non-nilable class instances).

@dlongnecke-cray
Copy link
Contributor Author

dlongnecke-cray commented May 20, 2019

What kind of issues (if you could point me towards a thread) were array-as-vec having that warranted use of the in intent?

More importantly, why does = assume that elements are default initialized? Does that mean that all containers will have to be designed around avoiding use of the = operator? I'd much rather the semantics of the = operator be reworked to avoid making such assumptions.

My worry here is (in addition to not understanding these edge cases that well): what kind of hoops are Chapel users going to have to jump through to write their own containers? Being forced to use a move because nilability checks require you to avoid = seems pretty scary to me.

I'm worried about heading into a world where a user has to use in and avoid using = when implementing (IE, a type with the routine) append for reasons that aren't immediately self evident.

It might be that these are both just temporary workarounds and not intended to be a permanent state of affairs?

@mppf
Copy link
Member

mppf commented May 21, 2019

@dlongnecke-cray - we can talk offline about the history of in intent with push_back.

More importantly, why does = assume that elements are default initialized?

It does not. It assumes that the LHS and RHS elements are initialized at all. If it does not, when a user writes =, then they have to contend with the possibility that their data structure stores anything at all. Random garbage data.

The overriding concern of the initializers effort is that user code should not have access to un-initialized memory. (With exceptions for certain low-level things, of course).

Does that mean that all containers will have to be designed around avoiding use of the = operator in what is otherwise a fundamental use pattern?

Container authors have to choose between:

  • using = and insisting that a default initializer exist for the contained type
  • using = and using some other initializer to initialize the elements
  • using in intent and managing which parts of the underlying buffer are initialized and which are not
  • building atop of another container that already supports one of these choices

In practice I suspect that once we get list to keep track of what in the underlying buffer is initialized, user-written containers will simply build upon list.

Being required to use a move because nilability requires one to avoid = seems pretty scary to me.

We should talk more about why this seems scary to you but I don't think it's relevant to the topic of this page (the List API). In particular, I think we can and should document somewhere how a user can achieve a "move" and find a way to express it in some reasonable way. However I don't think that part is necessary for progress on this list task.

Lastly, please note that this is not a problem unique to Chapel. C++ allows types that have no default initializer. These can still be stored in a std::vector. A C++ vector manages which elements in the buffer are initialized.

@dlongnecke-cray
Copy link
Contributor Author

Sure we can talk about why avoiding = seems scary to me offline. And you're right, lack of an idiomatic move isn't a blocker.

I don't really have any API related questions left, now. The rest pertain to the implementation, other than use of the in intent.

You're calling it list instead of List, does that mean you have a preference for the former?

@dlongnecke-cray
Copy link
Contributor Author

Thanks for all the help, everyone. I'll be closing this now!

@LouisJenkinsCS
Copy link
Member

Parallel safety: Since Chapel is a highly concurrent language, this is a justified concern. Modifications of a List should be parallel safe in general.

By parallel safety, is it intended to be a non-blocking data structure or a synchronized one?

@LouisJenkinsCS
Copy link
Member

Also is there any intention on supporting reductions of lists like this? I think that a naive optionally-synchronized list is fine so long as you can concurrently reduce multiple instances of them.

var l = new LinkedList(int);
forall i in 1..100 with (+ reduce l) {
  l.push_back(i);
}

This would be desirable for bulk insertion, and faster than non-blocking when a lot of data is being merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

deque/list design: API
6 participants