Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maturing out the package (Path to DataStructures.jl v1.0) #479

Open
3 tasks
oxinabox opened this issue Jan 9, 2019 · 17 comments
Open
3 tasks

Maturing out the package (Path to DataStructures.jl v1.0) #479

oxinabox opened this issue Jan 9, 2019 · 17 comments
Milestone

Comments

@oxinabox
Copy link
Member

oxinabox commented Jan 9, 2019

What do we need to do before we would be happy to tag DataStructures.jl v1.0?
I mean, we are one of the most common dependencies in the package ecosystem,
and julia 1.0 is out.

My thoughts;

It hurts people when we break things, and while we are good at deprecating,
getting on the the "strict" side of semver is something we need to do for the sake of the ecosystem

@oxinabox
Copy link
Member Author

oxinabox commented Aug 28, 2020

People keep asking when will DataStructures 1.0 come out.
The answer is as soon as we completely audit the package to make sure there is nothing that needs to be removed/corrected to follow modern julia conventions.
And have actioned those corrections.
A lot of the code in the package is pretty old, and follows e.g. conventions from julia 0.3.
We need to identify and correct these.

I will post a list of files at the bottom of this post.
Then people can go through this and use the checklist to make sure everything is good, one file at a time.

How to audit a file to check if 1.0 ready

  • Answer the question of should we even keep this in the package. E.g. no data structure that somone just kind of made up (like the now removed `Classifier Collection). If ours is is a worse version of something in another package it should also go.
  • Type Names Correct/Sensible: for example make sure we don't have a type called MagicVectors that is a subtype of vector. That should be called MagicVector (singular).
  • No constructors taking types as positional arguments. For example there should not be MagicVector(::Type{T}) where T, instead there should just be MagicVector{T}().
  • No functions that duplicate constructors. For example there should not be a function magic_vector(xs), only MagicVector(xs).
  • No functions that are improperly cased and pretend to be constructors.
    • E.g. no MagicVector(xs) = Vector(xs) .* "Magic", that should just be a normal function magic_vector(xs) = Vector(xs) .* "Magic"
    • E.g. no MagicVector(xs)=Vector{Magic}(xs), that should be a const type alias const MagicVector=Vector{Magic}
  • No exported functions that could be just done with overloads of Base. E.g. don't need top, peak, head all of which act the same as first but on a particular data structure.
  •  No internal helpers exported. If something isn't useful to the end user it should not be exported. In general The vast majority of the public API should be constructor or overloads of Base functions. Thus any export should be looked at closely.
  • Correct return types push! and empty! and delete! should return the collection. (this is relatively minor as it would be a bug if they didn't so it isn't breaking for use to fix it later.)
  • No defining methods in ways that disagree with Base: for example because Base has insert!(xs, index, value) we should not have no instert!(xs, val) (push! is for that). Conversely we should not have push!(xs, key, value) ( we dio both of these right now in various trees, and in accumulators respectively)

All these things are things that have changed during the life-time of this package.

List of files that need to be checked

(Checking each will also require checking the top-level file for relevant exports)

  • accumulator.jl
  • default_dict.jl
  • robin_dict.jl
  • dict_support.jl

  • trie.jl
  • balanced_tree.jl
  • circ_deque.jl
  • circular_buffer.jl
  • dibit_vector.jl

  • disjoint_set.jl
  • fenwick.jl
  • int_set.jl
  • sparse_int_set.jl

  • list.jl
  • multi_dict.jl
  • mutable_list.jl

  • deque.jl
  • queue.jl
  • stack.jl

  • sorted_dict.jl
  • sorted_multi_dict.jl
  • sorted_set.jl
  • tokens.jl
  • tokens2.jl

  • priorityqueue.jl
  • heaps.jl
  • heaps/arrays_as_heaps.jl
  • heaps/binary_heap.jl
  • heaps/minmax_heap.jl
  • heaps/mutable_binary_heap.jl

  • container_loops.jl
  • delegate.jl

Many of these should be easy, because they have kind of already been done.
E.g. Heaps were done for 0.18; robindict was created after julia 1.0 and so should follow conventions.
But lets get them all and check them off using the list criteria listed above.

Higher level process:

0.19 will be the last release before 1.0.
0.19 will be the same as 1.0 but with deprecations.
1.0 will come out the same day as 0.19

We will stick to making patch releases on 0.18 with deprecations while we can.
But once we can't we will swap master to 0.19-DEV, and say on that until we can release it and 1.0.
That might take a 6 or more months; depends how many (if any) people are about to spend time helping with the audit.
If necessary we will have a backport branch for 0.18 bug-fixes

Once @eulerkochy's GSOC work is complete we will not accept any new data structures until after 1.0 released.

If anyone want to help out, please feel strongly encouraged to do so.

@nickrobinson251
Copy link
Member

nickrobinson251 commented Aug 30, 2020

No exported functions that could be just done with overloads of Base

How about reverse_iter? Can this just be Iterators.reverse? The only issue i see is that reverse_iter(s::Stack) returns a DequeIterator, not a Iterators.Reverse, but idk if Iterator's API is really supposed to guarantee Iterators.reverse(::T) -> Iterators.Reverse{T} or not. (edit: no it's not, that's why we have reverse, rather than just Reverse, so we don't have to guarantee returning a Reverse)

@nickrobinson251
Copy link
Member

nickrobinson251 commented Sep 4, 2020

  • No exported functions that could be just done with overloads of Base.

  • No defining methods in ways that disagree with Base

I think this suggests we should make a call on which side this falls: #296

@nickrobinson251 nickrobinson251 added this to the 1.0 milestone Sep 4, 2020
jordancluts added a commit to jordancluts/DataStructures.jl that referenced this issue Apr 27, 2021
I was looking over the Queue method per JuliaCollections#479 and while I didn't find any issues I noticed that during printing at the REPL the backing Deque is printed inside. Decided to try to write a small show method to hide the backing type to pretty it up.
@oxinabox oxinabox mentioned this issue Nov 4, 2021
@StephenVavasis
Copy link
Contributor

Could anyone explain in more detail the above-mentioned recommendation regarding the insert! function? In particular, does the recommendation mean that insert!(ss, k), where ss is a SortedSet and k is a new key, should be deprecated? Note that push! does not replicate this functionality because insert!(ss,k) returns a bool to indicate whether the item was already present and a semitoken indexing the newly inserted item, whereas push!(ss,k) returns the container.

@oxinabox
Copy link
Member Author

Could anyone explain in more detail the above-mentioned recommendation regarding the insert! function?

Base defines insert! as:

insert!(a::Vector, index::Integer, item)

Insert an item into a at the given index. index is the index of item in the resulting a.

and it returns the collection.

In particular, does the recommendation mean that insert!(ss, k), where ss is a SortedSet and k is a new key, should be deprecated?

yes, it doesn't match the description of the Base function.

Note that push! does not replicate this functionality because insert!(ss,k) returns a bool to indicate whether the item was already present and a semitoken indexing the newly inserted item, whereas push!(ss,k) returns the container.

Yeah, so in that case we can't deprecate it for push!.
We would need to make some other plan.
Like st_push! or something? or a submodule that defines a bunch of function names for things that return subtokens?
Maybe a case could be made for push!(::Type(SemiToken),::SortedSet, x) or something?

@StephenVavasis
Copy link
Contributor

Here is a more serious conflict with Base: according to the Base docs, pop!(s) is supposed to return the last item in s if s is an ordered collection. However, pop! on the sorted containers currently returns the first item. Which is the lesser of two evils, a breaking change for the package or doing the opposite of the Base documentation?

@StephenVavasis
Copy link
Contributor

Maybe deprecate pop!(s) for sorted containers to popfirst!(s), and introduce new functionality poplast!(s) (the latter not present in Base)? Then, several years in the future, deprecate poplast!(s) to just pop!(s)?

@oxinabox
Copy link
Member Author

Maybe deprecate pop!(s) for sorted containers to popfirst!(s), and introduce new functionality poplast!(s) (the latter not present in Base)? Then, several years in the future, deprecate poplast!(s) to just pop!(s)?

yeah this, sounds right.
but I would argue for putting a deprecation warning in poplast! in 0.19 as it is defined saying this will just be pop! in 1.0, which will be released not years in the future but rather days.

@StephenVavasis
Copy link
Contributor

Doesn't this approach for deprecating pop!(s) mean that everyone who upgrades from 0.18 to 1.0, skipping 0.19, will encounter a breaking change that is also silent?

With regard to renaming insert!: since insert! currently has a different API for each of the three containers, how about if we choose the three names ss_push!, sd_push! and smd_push! ?

@oxinabox
Copy link
Member Author

Doesn't this approach for deprecating pop!(s) mean that everyone who upgrades from 0.18 to 1.0, skipping 0.19, will encounter a breaking change that is also silent?

yes, that is what they get for skipping versions. But we will write release notes etc.

With regard to renaming insert!: since insert! currently has a different API for each of the three containers, how about if we choose the three names ss_push!, sd_push! and smd_push! ?

Lets split this off to another issue or a PR and we can think about our options.

@StephenVavasis
Copy link
Contributor

OK. I am currently working on a major overhaul of the sorted containers to address a number of open issues at once. I will include these new names in my forthcoming PR; and then you or another reviewer can comment on them.

@StephenVavasis
Copy link
Contributor

With regard to attaching proper meanings to push!, pop!, and insert!: Kevin Squire @kmsquire wrote a long message on discourse about how these functions are somewhat muddled in Base:

https://discourse.julialang.org/t/taking-push-and-pop-seriously/34326

He proposed a more principled way to implement them in Base, but, as far as I can tell, did not get any traction for this proposal. Presumably this is because it is now too late to make changes to Base. If we try to follow Base conventions in the DataStructures package, then we will end up with the same muddled semantics.

I'd also like to point out open issue #403 which is basically a request for clearer semantics for these functions.

@Suavesito-Olimpiada
Copy link

One thing I've found is that sometimes I would like to use special ordering that I do not want to define with isless, maybe because it would be type piracy or simply does not make sense outside the context of the algorithm implemented. Would it be possible for the data structures that need an ordering function to always accept something like lt for sort?

@StephenVavasis
Copy link
Contributor

This is possible-- the sorted containers are parameterized by an order function that you can customize. Please see the documentation for an example in which strings are ordered by case-insensitive comparison.

@stevengj
Copy link

stevengj commented Aug 30, 2024

I don't understand why you're setting such a high bar on 1.0, which has caused this issue to stay open for five years.

Lots of packages depend on DataStructures.jl right now, and by using pre-1.0 version numbers means every non-bugfix released is treated as "breaking" by the package system whether or not it is backwards compatible. This is painful for downstream developers as well as users.

Version numbers are cheap. Can you just make the next minor release 1.0? If you ever want a breaking change you can just mark the package as 2.0 in the future.

@oxinabox
Copy link
Member Author

oxinabox commented Sep 2, 2024

I don't understand why you're setting such a high bar on 1.0, which has caused this issue to stay open for five years.

When this was open @ChrisRackauckas had gotten made at us for releasing 0.18 of DataStructures.
So we said no more breaking releases til we can get to 1.0.
We haven't made a a 0.y "minor release" since this issue has been open AFAIK.
Since then have just been backporting stuff to #release-0.18, and tagging 0.18.x releases, and working with #master as the branch where we prepare to do all this stuff.
But yes, when we said that I had more time, and we had more people actively working on the package.
So was hoping to do it in much less time.

The current #master branch is breaking and does have a few good improvements.
I would not be opposed to cutting a 1.0 branch from that. (+ delete any deprecations)
It would decrease my work, because wouldn't have to keep backporting bug fixes.

But it would mean a lot of packages would need to go and update.
On the other hand, we do have better tooling like CompatHelper now to ease that pain.

@ChrisRackauckas
Copy link
Contributor

You went from v0.12 to v0.18 in the span of 2 years (some of those shouldn't've been marked as breaking probably?), and now this one is 5 years in the making. There's a middle ground.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants