Sortedcontainerupdates #787

StephenVavasis · 2022-01-20T21:42:20Z

Major update to sorted containers.

Address issue #239: Dict constructors on typed data but without explicit type declarations should no longer return {Any,Any} items because collect is now used when explicit types are omitted

Address issue #479: Deprecate insert!, pop!(s), introduce popfirst!, poplast!, ss_push!, sd_push!, smd_push!

Address issue #528: push! documentation improved

Address issue #667: Docs extensively cleaned up and much greater use of documenter.jl

Address issue #669: Expanded use of tokens including a:b notation for tokens

Address issue #671: Add reverse iteration

Address issue #742: New union, merge, symdiff, setdiff, intersect algorithms for sorted containers to improve performance

Address issue #726: Clarify docs on ss_push!, sd_push!, and sdm_push! (formerly insert!)

Address issue #767: Implement firstindex, lastindex, reverse for sorted container iteration

Address issue #781: Use collect on an intermediate step
to infer types in untyped construction of sorted containers

Also, this commit cleans up the constructors and adds two new iteration-helper functions inclusive_key and exclusive_key that were useful in my own work.

Address issue 239: Dict constructors on typed data but without explicit type declarations should no longer return {Any,Any} items because collect is now used when explicit types are omitted Address issue 479: Deprecate insert!, pop!(s), introduce popfirst!, poplast!, ss_push!, sd_push!, smd_push! Address issue 528: push! documentation improved Address issue 667: Docs extensively cleaned up and much greater use of documenter.jl Address issue 669: Expanded use of tokens including a:b notation for tokens Address issue 671: Add reverse iteration Address issue 742: New union, merge, symdiff, setdiff, intersect algorithms for sorted containers to improve performance Address issue 726: Clarify docs on insert! (now deprecated) Address issue 767: Implement firstindex, lastindex, reverse for sorted container iteration Address issue 781: Use collect on an intermediate step to infer types in untyped construction of sorted containers Also, this commit cleans up the constructors and adds two new iteration-helper functions inclusive_key and exclusive_key that were useful in my own work.

StephenVavasis · 2022-01-20T21:58:42Z

The following test from the new testset, which is meant to exercise a new show method, is failing on some platforms:

    my_assert(remove_spaces(repr(MIME{Symbol("text/plain")}(), factors5)) ==
              "SortedMultiDict{Char,Int64,ForwardOrdering}" *
              "(ForwardOrdering(),['a'=>1,'b'=>2,'c'=>3])")

Here, remove_spaces is a locally written function-- see below. So apparently on some platforms, show works differently? Anyone know the differences? Or is there a better way to write a unit test for show?

function remove_spaces(s::String)
    b = Vector{UInt8}()
    for c in s
        if !isspace(c)
            push!(b,UInt8(c))
        end
    end
    String(b)
end

StephenVavasis · 2022-01-21T03:16:09Z

The reason the test is failing is: In Julia 1.0, printing the vector [1=>'a',2=>'b'] prepends the output with Pair{Int64,Char}. Prepending this eltype-declaration does not happen in Julia 1.7. Apparently I need to think of a more robust unit test for a show method.

…uccessful parse

StephenVavasis · 2022-01-21T16:20:19Z

I replaced the failing unit test for show with another that does not require precisely a particular string, and now all tests are successful.

oxinabox

sorry I have not finished reviewing this.
This broadly speaking looks good and if all the existing tests still pass then it should be good

src/DataStructures.jl

oxinabox · 2022-04-18T11:36:50Z

src/sorted_container_iteration.jl

+
+const SDMContainer = Union{SortedDict, SortedMultiDict}
+const SortedContainer = Union{SDMContainer, SortedSet}
+const Token = Tuple{SortedContainer, IntSemiToken}


This is a surprising definition for me

Are you asking why Token is implemented as a tuple rather than a struct? This is a legacy issue. In pre-1.0 Julia, any struct that contained an immutable object had itself to be heap allocated. However, in pre-1.0 Julia, operations on tuples containing an immutable object didn't always allocate. In particular, deref((mysorteddict, semitok)) did not heap-allocate its 2-tuple argument, whereas a hypothetical deref(Token(mysorteddict, semitok)), where Token is the constructor of a hypothetical struct, would have allocated.

More my surprise was that a token included both the container and the indexing object.
I would have expected the token to just be the indexing object (but that is defined as a semitoken)
is the idea that the token is enough to fully retrieve the stored object?
But semitoken is only half of what is expected?

I think full tokens are useful, e.g., deref(token_firstindex(s)) as opposed to deref((s,firstindex(s))) where the user needs to be consistent with the argument s. So if it is OK with you, I would like to leave these.

oxinabox · 2022-04-18T11:39:24Z

src/sorted_container_iteration.jl

+    token_firstindex(m::SortedContainer)
+
+
+Return the token of


again this definition of token feels weird to me

I would like to leave these. See other comment.

oxinabox · 2022-04-18T11:42:44Z

src/sorted_container_iteration.jl

+function deref(ii::SortedDictToken)
+    has_data(ii)
+    deref_nocheck(ii)
+end
+
+@inline function deref_nocheck(ii::SortedMultiDictToken)
+    @inbounds kdrec = ii[1].bt.data[ii[2].address]
+    return kdrec.k => kdrec.d
+end


Perhaps we should be implementing these deref and deref_nocheck pairs, using Base.@propagate_inbounds and Base.@boundscheck.

Then when we want deref_nocheck(k) we could just write @inbounds deref(k)

similar for the other foo/foo_nocheck pairs?

I tried to implement this, but it is not working properly, so perhaps I am misunderstanding the rules. Here is the new definition of deref:

Base.@propagate_inbounds function deref(ii::SortedDictToken) Base.@boundscheck has_data(ii) @inbounds kdrec = ii[1].bt.data[ii[2].address] return Pair(kdrec.k, kdrec.d) end

My intention is that if a deref operation is prefixed by @inbounds, then has_data will not be invoked. But this is not happening:

julia> s = SortedDict{Int,String}() SortedDict{Int64, String, Base.Order.ForwardOrdering}() julia> deref(token_firstindex(s)) ERROR: BoundsError Stacktrace: [1] has_data @ c:\Users\vavasis\OneDrive - University of Waterloo\Julia_pkg_devdir\DataStructures.jl\src\sorted_container_iteration.jl:442 [inlined] [2] deref(ii::Tuple{SortedDict{Int64, String, Base.Order.ForwardOrdering}, DataStructures.Tokens.IntSemiToken}) @ DataStructures c:\Users\vavasis\OneDrive - University of Waterloo\Julia_pkg_devdir\DataStructures.jl\src\sorted_container_iteration.jl:242 [3] top-level scope @ REPL[24]:1 julia> @inbounds deref(token_firstindex(s)) ERROR: BoundsError Stacktrace: [1] has_data @ c:\Users\vavasis\OneDrive - University of Waterloo\Julia_pkg_devdir\DataStructures.jl\src\sorted_container_iteration.jl:442 [inlined] [2] deref(ii::Tuple{SortedDict{Int64, String, Base.Order.ForwardOrdering}, DataStructures.Tokens.IntSemiToken}) @ DataStructures c:\Users\vavasis\OneDrive - University of Waterloo\Julia_pkg_devdir\DataStructures.jl\src\sorted_container_iteration.jl:242 [3] top-level scope @ REPL[25]:1

OK, nvm, I figured this out too. Apparently @inbounds is not necessarily active if it is called directly from the REPL as opposed to its being called from a compiled function.

src/sorted_container_iteration.jl

oxinabox

some further progress, sorry this is taking so long

oxinabox · 2022-04-19T08:10:39Z

docs/Project.toml

 Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
-
-[compat]
-Documenter = "0.23"


Probably we should keep this line, so when/if Documenter makes a new breaking release we don't get it automatically.

OK, I put it back. I'm not sure how it got deleted.

oxinabox · 2022-04-19T08:50:02Z

docs/src/sorted_containers.md

 is similar to SortedDict except that each key can be associated with
 multiple values. The key=>value pairs in a SortedMultiDict are stored
 according to the sorted order for keys, and key=>value pairs with the
 same key are stored in order of insertion.

 The containers internally use a 2-3 tree, which is a kind of balanced
-tree and is described in many elementary data structure textbooks.
+tree and is described in data structure textbooks.


maybe we should mention that it is backed by a single array, rather than linked-list style?

oxinabox · 2022-04-19T08:52:11Z

docs/src/sorted_containers.md

+is not a bits-type.
+If code profiling indicates that statements using tokens are allocating memory,
+then it may be advisable to rewrite the application code using semitokens
+more than tokens.


Suggested change

more than tokens.

rather than tokens.

oxinabox · 2022-04-19T08:54:17Z

docs/src/sorted_containers.md

-    searchsortedfirst(sc,k)
+```@docs
+Base.first(sc::SortedContainer)
+```


Is there a reason there are in seperate @docs blocks?
rather than

```@docs deref_key(token::Token) ... Base.first(sc::SortedContainer) ```

I think that works doesn't it?

I didn't know about the other syntax for documenter, but now that you told me, I have fixed it.

oxinabox · 2022-04-19T08:55:49Z

docs/src/sorted_containers.md

+ss_push!(ss::SortedSet, k)
 ```

 ```@docs
-pop!(ss::SortedSet, k)
+sd_push!(sd::SortedDict, p::Pair)
 ```

 ```@docs
-pop!(ss::SortedSet)
+smd_push!(smd::SortedMultiDict, p::Pair)
 ```


It might be nice to put the Base functions before our special new ones?

oxinabox · 2022-04-19T09:05:03Z

src/DataStructures.jl

+    import Base.+
+    import Base.-
+    import Base.isequal
+    export sd_push!, ss_push!, smd_push!, poplast!


Rather than all 3 sd_push!, ss_push!, smd_push! is there a single name we can overload for these that refers to what it returns, rather than hungarian notation for it's first argument?
Like push_and_get_token! or something? (is that what they are doing?)

Done. The new function is called push_return_token!. Since it is a single function name, I unified the APIs by making the return variables Bool,IntSemiToken. The Bool return variable is redundant for SortedMultiDict because push always creates a new record for SortedMultiDict (i.e., the first return variable is always true for a SortedMultiDict).

oxinabox · 2022-04-19T09:12:15Z

src/sorted_container_iteration.jl

+
+const SDMContainer = Union{SortedDict, SortedMultiDict}
+const SortedContainer = Union{SDMContainer, SortedSet}
+const Token = Tuple{SortedContainer, IntSemiToken}


More my surprise was that a token included both the container and the indexing object.
I would have expected the token to just be the indexing object (but that is defined as a semitoken)
is the idea that the token is enough to fully retrieve the stored object?
But semitoken is only half of what is expected?

oxinabox · 2022-04-19T09:14:14Z

src/sorted_container_iteration.jl

+the first entry of the sorted container `m`, or the past-end token
+if the container is empty.  Time: O(log *n*)
+"""  
+token_firstindex(m::SortedContainer) = (m, firstindex(m))


do we need this to be exported?
It seems like if someone needs it is is a trivial and obvious definition.

See other comment about token_firstindex.

oxinabox · 2022-04-19T09:14:34Z

src/sorted_container_iteration.jl

+if the container is empty.  Time: O(log *n*)
+"""
+Base.lastindex(m::SortedContainer) = endof(m)
+endof(m::SortedContainer) = IntSemiToken(endloc(m.bt))


as per startof do we need this? or can we just use lastindex?

I have deprecated startof and endof in favor of firstindex and lastindex.

oxinabox · 2022-04-19T09:15:02Z

src/sorted_container_iteration.jl

+last entry of the sorted container `m`, or the before-start semitoken
+if the container is empty.  Time: O(log *n*)
+"""
+token_lastindex(m::SortedContainer) = (m, lastindex(m))


as per token_firstindex do we need to export this?
It is a trivial and obvious definition

See other comment about token_firstindex

oxinabox · 2022-04-19T09:31:39Z

src/sorted_container_iteration.jl

+
+These five invocations of `in` use the index structure of the sorted
+container and test equality based on the order object of the keys
+rather than `isequal`. Therefore, these five are all faster than


should we comment here that if isequal is defined consistently with isless etc this should have same behavour?

The issue here is not whether isless and isequal are consistent. Rather, the issue is whether the user has defined an ordering (such as case-insensitive for strings) for the container different from isequal and isless.

oxinabox · 2022-04-19T09:32:29Z

src/sorted_container_iteration.jl

+
+(:)(t1::Token, t2::Token) = _colon(t1,t2)
+
+function _colon(t1::Token, t2::Token)


why not define this as : ?

The definition function Base.(:)(a,b) gives a syntax error (noted above). However, function (:)(a,b) works, so I have fixed this (but I still have the issue with import noted above).

OK, I figured this out.

oxinabox

Nice work.
I assume that the logic is at least as correct as before as all previous tests still pass.

Some mostly stylistic comments to resolve

oxinabox · 2022-04-19T09:32:59Z

src/sorted_container_iteration.jl

+(t1::Token, numstep::Integer) = 
+    numstep >= 0 ? stepforward(t1, numstep) : stepback(t1, -numstep)
+


this easily fits on 1 line, no?

Suggested change

+(t1::Token, numstep::Integer) =

numstep >= 0 ? stepforward(t1, numstep) : stepback(t1, -numstep)

+(t1::Token, numstep::Integer) = numstep >= 0 ? stepforward(t1, numstep) : stepback(t1, -numstep)

oxinabox · 2022-04-19T09:33:14Z

src/sorted_container_iteration.jl

+-(t1::Token, numstep::Integer) = 
+    numstep >= 0 ? stepback(t1, numstep) : stepforward(t1, -numstep)


this easily fits on 1 line, no?

Suggested change

-(t1::Token, numstep::Integer) =

numstep >= 0 ? stepback(t1, numstep) : stepforward(t1, -numstep)

-(t1::Token, numstep::Integer) = numstep >= 0 ? stepback(t1, numstep) : stepforward(t1, -numstep)

oxinabox · 2022-04-19T09:33:45Z

src/sorted_container_iteration.jl

+function stepforward(t1::Token, numstep::Integer)
+    m = t1[1]
+    j = t1[2].address
+    !(j in m.bt.useddatacells) && throw(BoundsError())


what do you think of

Suggested change

!(j in m.bt.useddatacells) && throw(BoundsError())

(j in m.bt.useddatacells) || throw(BoundsError())

I can make this change, but I have used the idiom !(correct) && throw(...) (rather than correct || throw(...)) throughout my code. Do you recommend that I change them all.

oxinabox · 2022-04-19T09:38:05Z

src/sorted_dict.jl

-
-"""
-    in(p, sc)
+@deprecate insert!(m::SortedDict, k, d) sd_push!(m::SortedDict, k=>d)


can we move this into src/deprecations.jl so we remember to delete it in next release?

oxinabox · 2022-04-19T09:40:23Z

test/test_sorted_containers.jl

+using Base: Ordering, Forward, Reverse, ForwardOrdering,
+            ReverseOrdering


this fits on one line

Suggested change

using Base: Ordering, Forward, Reverse, ForwardOrdering,

ReverseOrdering

using Base: Ordering, Forward, Reverse, ForwardOrdering, ReverseOrdering

oxinabox · 2022-04-19T09:40:48Z

test/test_sorted_containers.jl

 import Base.lt
-import Base.ForwardOrdering
-import Base.ReverseOrdering
-import DataStructures.IntSemiToken
+import DataStructures.eq


We should make the imports into usings i think

oxinabox · 2022-04-19T09:41:24Z

test/test_sorted_containers.jl

-import DataStructures.eq
+using Base: Ordering, Forward, Reverse, ForwardOrdering,
+            ReverseOrdering
+using DataStructures: IntSemiToken
 import Base.lt


isn't this called Base.lessthan now days?

I checked ordering.jl in Base, and I still see lt. Where did you see lessthan?

oxinabox · 2022-04-19T09:41:48Z

test/test_sorted_containers.jl


 struct CaseInsensitive <: Ordering
 end

 lt(::CaseInsensitive, a, b) = isless(lowercase(a), lowercase(b))
 eq(::CaseInsensitive, a, b) = isequal(lowercase(a), lowercase(b))

+


Suggested change

# for testing custom ordering

oxinabox · 2022-04-19T09:43:41Z

test/test_sorted_containers.jl

+function remove_spaces(s::String)
+    b = Vector{UInt8}()
+    for c in s
+        if !isspace(c)
+            push!(b,UInt8(c))
+        end
+    end
+    String(b)
+end
+


Suggested change

function remove_spaces(s::String)

b = Vector{UInt8}()

for c in s

if !isspace(c)

push!(b,UInt8(c))

end

end

String(b)

end

remove_spaces(s::String) = replace(s, re"\s+"=>"")

I found out that it is actually r"\s+" rather than re"\s+", but aside from that, this is done.

oxinabox · 2022-04-19T09:45:06Z

test/test_sorted_containers.jl

@@ -1765,3 +2093,4 @@ end
        end
    end
 end
+


can we label the new tests with a comment including the URL of the github issues that they prove is solved?

Renamed push_return_token! to push_return_semitoken!, which accurately describes the operation. Documented the fact that values(s) is not currently writeable.

StephenVavasis · 2022-07-06T22:18:10Z

@oxinabox I believe I have addressed all of your comments in this PR. In the last commit that I just made, I have added comments to the test sets to indicate where the various issues are tested. Let me know if you need anything else before you can merge this PR.

oxinabox · 2022-07-14T16:14:23Z

src/DataStructures.jl

@@ -39,12 +42,14 @@ module DataStructures
    export SortedDict, SortedMultiDict, SortedSet
    export SDToken, SDSemiToken, SMDToken, SMDSemiToken
    export SetToken, SetSemiToken
-    export startof
+    export startof, endof ## both are deprecated


We don't need to explictly export deprecated things @deprecate does that for us

Suggested change

export startof, endof ## both are deprecated

oxinabox

Sounds good.
You have merge rights.
Merge when you are happy

added some print statements to debug CI failures

4b8c108

StephenVavasis force-pushed the sortedcontainerupdates branch from c9bffaf to 4b8c108 Compare January 21, 2022 03:02

Changed test for exact output from show to more forgiving test of s…

95c2478

…uccessful parse

This was referenced Apr 9, 2022

Setindex broken #799

Closed

Improved Deque Documentation + docstrings #800

Merged

Add missing argument to insert! docstring #802

Closed

StephenVavasis requested a review from oxinabox April 15, 2022 12:22

oxinabox reviewed Apr 18, 2022

View reviewed changes

oxinabox reviewed Apr 19, 2022

View reviewed changes

Addressing round 1 of comments from oxinabox

2aa37f4

StephenVavasis mentioned this pull request May 9, 2022

undesirable output when showing empty set in the REPL JuliaLang/julia#45229

Open

StephenVavasis added 2 commits May 19, 2022 09:28

Two fixes:

879703e

Renamed push_return_token! to push_return_semitoken!, which accurately describes the operation. Documented the fact that values(s) is not currently writeable.

Previous commit was premature - had bugs

f540e4e

StephenVavasis mentioned this pull request May 19, 2022

How to efficiently modify all values of a SortedDict? #812

Open

StephenVavasis mentioned this pull request Jun 8, 2022

Updating a SortedMultiDict #815

Closed

updated testset with comments to refer to issues that have been resolved

1c63d2a

oxinabox reviewed Jul 14, 2022

View reviewed changes

oxinabox approved these changes Jul 14, 2022

View reviewed changes

Remove explicit export of startof and endof

ebf63ca

StephenVavasis merged commit 7d2e3c6 into JuliaCollections:master Jul 15, 2022

This was referenced Jul 17, 2022

Seeking advice: SortedDict constructors #781

Closed

missing some generic interfaces for sorted containers #767

Closed


		(:)(t1::Token, t2::Token) = _colon(t1,t2)

		function _colon(t1::Token, t2::Token)

		+(t1::Token, numstep::Integer) =
		numstep >= 0 ? stepforward(t1, numstep) : stepback(t1, -numstep)

		-(t1::Token, numstep::Integer) =
		numstep >= 0 ? stepback(t1, numstep) : stepforward(t1, -numstep)

	-(t1::Token, numstep::Integer) =
	numstep >= 0 ? stepback(t1, numstep) : stepforward(t1, -numstep)
	-(t1::Token, numstep::Integer) = numstep >= 0 ? stepback(t1, numstep) : stepforward(t1, -numstep)

	!(j in m.bt.useddatacells) && throw(BoundsError())
	(j in m.bt.useddatacells) \|\| throw(BoundsError())

		using Base: Ordering, Forward, Reverse, ForwardOrdering,
		ReverseOrdering

@@ @@ -1765,3 +2093,4 @@ end @@
                       end
                   end
               end

Sortedcontainerupdates #787

Sortedcontainerupdates #787

Conversation

StephenVavasis commented Jan 20, 2022 • edited Loading

StephenVavasis commented Jan 20, 2022

StephenVavasis commented Jan 21, 2022

StephenVavasis commented Jan 21, 2022 • edited Loading

oxinabox left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oxinabox left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oxinabox left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StephenVavasis commented Jul 6, 2022

Choose a reason for hiding this comment

oxinabox left a comment

Choose a reason for hiding this comment

StephenVavasis commented Jan 20, 2022 •

edited

Loading

StephenVavasis commented Jan 21, 2022 •

edited

Loading