Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should set.remove() return the removed element? #15819

Closed
dlongnecke-cray opened this issue Jun 11, 2020 · 24 comments
Closed

Should set.remove() return the removed element? #15819

dlongnecke-cray opened this issue Jun 11, 2020 · 24 comments

Comments

@dlongnecke-cray
Copy link
Contributor

Currently set.remove() does not return the removed element at all, instead choosing to drop it on the floor. This choice was inspired by Python, where set.remove also returns None instead of the removed value.

Some of my reasoning for why Python's set.remove might return None instead of the removed element:

  • In Python, all types have reference semantics
  • Because of this, all set values are required to be immutable to ensure correctness

Chapel has value types (records) and reference types (classes) while Python only has reference types. In Chapel values added to a set are not required to be immutable. To achieve immutability without this requirement we make a copy of any element added to a set and only allow access to the set element by const ref.

In addition to copying added elements, because users can overload the == operator for their types, this can create a scenario where two instances of a type (such as a record) evaluate as equal but contain significant differences.

While one might argue that this is not overloading the == operator in good faith, that this can happen in practice suggests that set.remove() should return the removed value, -OR- at the very least we should add another method that returns the removed value such as removeAndReturn.

@mppf
Copy link
Member

mppf commented Jun 11, 2020

@dlongnecke-cray - I'm not sure if you expressed an opinion for the answer to "Should set.remove() return the removed element?" but my opinion is that the answer is "yes, it should".

@dlongnecke-cray
Copy link
Contributor Author

dlongnecke-cray commented Jun 11, 2020

Oh, good point.

I think set.remove() should return the removed element as well.

Do you know what happens to returned values that are not assigned in a multilocale context? Are they still potentially communicated across locales? This might be a separate issue at heart, but it might be reason enough to justify adding a method to set such as .drop() that removes the value without returning it.

@mppf
Copy link
Member

mppf commented Jun 11, 2020

Well, if set.remove contains an on statement, then yes, removing it on a different locale from the set would introduce communication. But most likely if somebody cares about performance the don't want the communication related to the on statement either...

@dlongnecke-cray
Copy link
Contributor Author

dlongnecke-cray commented Jun 11, 2020

I wrote the last sentence of my question wrong, edited for clarity ><.

@bradcray
Copy link
Member

What will set.remove() return if it doesn't find the element?

@dlongnecke-cray
Copy link
Contributor Author

dlongnecke-cray commented Jun 12, 2020

Well historically (based on what we've done for other such methods), we'd have it halt. An alternative is to make it throw, though I think we've generally disapproved of such an approach. If we want to break precedent we use an out parameter and return a bool...

@bradcray
Copy link
Member

That makes me think I might prefer to stick with the current interface (which, as I understand it, returns true or false depending on whether or not it was found?).

On my first quick read of this issue, it seemed odd to me that in saying mySet.remove(10) I'd care to have 10 returned back to me. I think I understand your OP's point to be that since (1) sets look for the element using == and (2) a given == may permit some sort of approximate match, (3) the value that's remove()d might be different in an interesting way than the thing I passed in. And while I can see that's the case, I'm pretty sure I'd rather get true or false back instead of taking a halt.

If we either used the out argument you propose (or maybe cleaner, returned a tuple), we'd need to have some way of creating a dummy value of eltType for any type used to create sets which seems iffy to me (and potentially expensive depending on the type).

If we had option types, we could have it return an option type value of course.

We could also consider having remove() remain as it is today and have a fetchRemove() or somesuch that would return the value in one of these forms if/when we wanted to add it.

@dlongnecke-cray
Copy link
Contributor Author

dlongnecke-cray commented Jun 12, 2020

I would prefer to introduce a method like .drop, which will retain the current semantics of .remove(), and then update .remove() to halt if the element is not contained in the set.

Though I like the above names a bit more than fetchRemove, your idea would avoid breaking existing code, so it's probably better.

@bradcray
Copy link
Member

I worry about having something as fundamental as set.remove() halting in a parallel language. It seems like there'd be no way to safely remove elements in a parallel context because you'd never know whether someone else would slightly beat you to removing them.

Is there a precedent for .remove() returning the value in other value-based languages like C++?

@dlongnecke-cray
Copy link
Contributor Author

It seems like there'd be no way to safely remove elements in a parallel context because you'd never know whether someone else would slightly beat you to removing them.

I view it as a problem also shared by list.pop and map.remove. It seems like parSafe=true should handle a data race nicely?

Is there a precedent for .remove() returning the value in other value-based languages like C++?

It looks like the C++ unordered_set type has erase, which returns 1 if the element was removed, so basically the exact same semantics as set.remove has today.

@bradcray
Copy link
Member

It seems like parSafe=true should handle a data race nicely?

What does "nicely" mean if the routine halts, though, when the value you thought was there isn't because someone else just removed it?

I agree map.remove seems like it should share a similar design. list.pop() feels a little different because multiple things could pop simultaneously and both succeed... until the list becomes empty, I suppose.

@dlongnecke-cray
Copy link
Contributor Author

Part of me wants to say that I feel like a more general locking mechanism for containers in the spirit of #12306 could help solve this problem and others.

We could also revisit the idea of having set.remove throw an exception instead of halting once again (it seems like exceptions were designed to solve this sort of problem).

@mppf
Copy link
Member

mppf commented Jun 12, 2020

We could also revisit the idea of having set.remove throw an exception instead of halting once again (it seems like exceptions were designed to solve this sort of problem).

Halting for this seems untenable to me. Throwing would be fine, in my opinion.

If we had option types, we could have it return an option type value of course.

I'm pretty sure we could get an option type together in short order, if we really wanted to.

I agree map.remove seems like it should share a similar design.

I think the problem is worse for map.remove because with a map you'll remove based on the key but might want to get the value back.

@bradcray
Copy link
Member

I'd still be inclined to leave remove() as-is on the basis of (a) not breaking existing code, (b) C++'s precedence and then adding a removeFetch() or remove(val, returnVal=true) or something overload if/when someone really wants this feature.

@dlongnecke-cray
Copy link
Contributor Author

If we want to throw then there are a whole lot of other cases where we currently halt that we might want to revisit.

I'm fine with leaving things as is for now but I do think for set to be feature complete we should offer some variant of removeFetch, remove... (as Brad suggests) in the future.

@bradcray
Copy link
Member

Some notes from a Set deep-dive meeting today:

  • people seemed to like supporting both a throwing routine that returned the value and a non-throwing routine that returned a bool, where the main decision was what to call each one
  • we also discussed trying to sketch out what an ultimate option-type-based approach might look like so that we could avoid precluding our ability to support it cleanly/nicely later if that became of interest

Names for throwing routine that returns the value:

  • remove()
  • removeFetch()
  • fetchRemove()
  • ...others?...

Names for routine that just returns a bool:

  • remove()
  • discard()
  • ...others...?

Noted that:

  • Python uses remove() for return bool, throw if not there
  • Swift uses remove() for returning option value
  • Rust uses remove() for returning a bool

@dlongnecke-cray
Copy link
Contributor Author

dlongnecke-cray commented Oct 28, 2021

I prefer remove() or take() for the option that returns the value. I agreed with Brad about disliking take() at first, but the name has grown on me because I think the verb strongly describes the action taking place, moreso than remove(). (I can see how one might confuse that with saying "this set method takes() an item", though.) I am ok with discard() for the bool version that does not throw but would really love drop().

For the future version that may or may not return an option type I'd be ok with maybeRemove or maybeTake (on the assumption that we will name the option type Maybe).

proc ref remove(x: eltType): eltType throws;
proc ref drop(x: eltType): bool;
proc ref maybeRemove(x: eltType): Maybe(eltType); // Somewhere, in a galaxy one major version away

@bradcray
Copy link
Member

For the future version that may or may not return an option type I'd be ok with maybeRemove or maybeTake (on the assumption that we will name the option type Maybe).

<crazy idea>Maybe we permit identifiers to end with ? and take the stylistic approach that routines that return option types end in ?, so remove?() for the option type version</crazy idea>

@dlongnecke-cray
Copy link
Contributor Author

That is a really beautiful idea, and one of the reasons why it appeals to me is that I've always imagined nilability to just be a very focused application of option types. My secret hope is that one day we could unify the two so that nilability is nothing more than syntactic sugar for Maybe(SomeClass), but I don't know how feasible that would be.

@dlongnecke-cray
Copy link
Contributor Author

dlongnecke-cray commented Oct 29, 2021

I do worry that we might be overburdening the meaning/disambiguation of the ? token if we were to do that, which is IMO the biggest reason why that may not be a good idea. Another reason is it's essentially relying on convention (I mistakenly interpreted your idea as meaning a proc name suffixed with ? would automatically have its return type transformed into an option type), and I thought we were trying to get away from that and relying more on language patterns (the last discussion I recall being about e.g. sync lock identifiers suffixed with $).

@bradcray
Copy link
Member

relying on convention..., and I thought we were trying to get away from that and relying more on language patterns (the last discussion I recall being about e.g. sync lock identifiers suffixed with $).

I don't think we're anti-convention in our standard libraries (which is why we're putting in effort trying to unify on capitalization conventions, argument names, etc.). The issue with $ was that it was a convention designed to protect users from deadlocks by making reads/writes of synchronization variables, which was a red flag that maybe the language design should not have made that so invisible that the convention was necessary to understand the code (where the current requirement to use a .readXY()/.writeXY() variant addresses that).

In contrast, in proposing this, I'm trying to come up with a convention which would pack some attractive options into a restricted namespace without requiring someone's favorite version of the routine to become to verbose / ugly (e.g., haltingRemove(), throwingRemove(), or optionRemove()). I don't think this is actually workable (because ! already has too many meanings to be included in user identifiers), but I sometimes wish we could have:

  • mySet.remove!(x) be the halting remove
  • mySet.remove() be the throwing remove, and
  • mySet.remove?() be the option-type remove.

@mppf
Copy link
Member

mppf commented Oct 29, 2021

I like the idea of mySet.remove?() but I think we might have parser challenges with it if applied broadly (i.e. any identifier can end in ?), because it could mean two three things:

I am not saying these are impossible to support at the same time - just that it is not already obvious to me how hard this would be in the parser.

The remove?() makes sense to me as a way to indicate it returns an option type. But I don't think that a halting version should be in the API design at all as an alternative, so I wouldn't want remove!().

Also, I am worried that combining with optional chaining (see #17577) will be confusing. Say you wanted to call a method on the removed element if there was one, with optional chaining. You would write mySet.remove?()?.someMethod() which is maybe too much...

@bradcray
Copy link
Member

because it could mean two three things:

Good point. I wasn't thinking carefully and was only thinking of ? being supported as a type query type of symbol and forgetting about its (still relatively new) role as a nilable class type specifier.

@lydia-duncan
Copy link
Member

Given the decision on #18649 to keep remove as is and support a different method for removing and returning, I'm closing this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants