Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search algorithm doesn't take duplicate keys into account #16

Closed
Mamonaku opened this issue Dec 31, 2017 · 7 comments
Closed

search algorithm doesn't take duplicate keys into account #16

Mamonaku opened this issue Dec 31, 2017 · 7 comments

Comments

@Mamonaku
Copy link

Mamonaku commented Dec 31, 2017

Hello,
I think the algorithm :

fileprivate func search(for newElement: Element) -> Match<Index> {
    guard !isEmpty else { return .notFound(insertAt: endIndex) }
    var left = startIndex
    var right = index(before: endIndex)
    while left <= right {
        let dist = distance(from: left, to: right)
        let mid = index(left, offsetBy: dist/2)
        let candidate = self[mid]

        if areInIncreasingOrder(candidate, newElement) {
            left = index(after: mid)
        } else if areInIncreasingOrder(newElement, candidate) {
            right = index(before: mid)
        } else {
            // If neither element comes before the other, they _must_ be
            // equal, per the strict ordering requirement of `areInIncreasingOrder`.
            return .found(at: mid)
        }
    }
    // Not found. left is the index where this element should be placed if it were inserted.
    return .notFound(insertAt: left)
}

Doesn't take into account the case where duplicate keys are present. In particular, in calculating the mid point, it'll give the index value of the first occurence of newElement == candidate, not the smallest index.

As a result, enumerating all keys verifying k == 0 (for example), will fail. One has to find the lowest value for which areInIncreasingOrder(candidate, newElement) == true and areInIncreasingOrder(newElement, candidate + 1) == false.

If I find a quick solution, I'll let you know.

@ole
Copy link
Owner

ole commented Dec 31, 2017

@Mamonaku Thanks for reporting. I'm not sure I agree that this should be considered a bug. Can you provide the code of an actual algorithm that fails but shouldn't?

@Mamonaku
Copy link
Author

Mamonaku commented Jan 2, 2018

Hello @ole ,
this should do. I ran it on a playground, with the latest version of your code:

var sorted = SortedArray<Int>();

NSLog("--- Start SortedArray ----")
for i in 1..<1000 {
    sorted.insert(Int(arc4random()));
    if i%100 == 0 {
        sorted.insert(0)
    }
}
NSLog("--- End ----")

NSLog("\(sorted.search(for: 0))") ;
NSLog("\(sorted[0])") ;
NSLog("\(sorted[8])") ;
NSLog("\(sorted[6])") ;

output is

found(6)
0
0
0

The problem is that if I want to enumerate multiple keys (which should be possible because a Sorted Array does store multiple keys, so access ought to be implemented), they I can't find them anymore.

hope this helps.
-M

@ole
Copy link
Owner

ole commented Jan 2, 2018

@Mamonaku Thanks for the example. Sorry, I still don't understand what you're trying to do.

I want to enumerate multiple keys (which should be possible because a Sorted Array does store multiple keys, so access ought to be implemented)

What do you mean by this? What's the concrete problem you're trying to solve?

As far as I can tell, the example you posted above doesn't represent an algorithm that fails because of SortedArray's behavior — it just demonstrates how the type works, but not what you're trying to achieve.

As a side note, the SortedArray.search(for:) method is fileprivate and therefore not part of the official API. Furthermore, its behavior is clearly documented:

///   If the array contains multiple elements that are equal to `newElement`,
///   there is no guarantee which of these is found.

(I'm not saying the functionality you're looking for is wrong or that it shouldn't exist. I'm just trying to understand.)

@Mamonaku
Copy link
Author

Mamonaku commented Jan 2, 2018

@ole ,
ok... sorted.index(of:0) if this is preferable. Well, if one can't find where all the objects are, what's the point of storing them in the said data structure?

I was looking for a storage solution, where objects are keyed by categories, so that multiple objects of the same category could be enumerated, through a sequence protocol conformance. I think it would be a good addition to your code, but that's just me.

I went for a BTree eventually, with duplicate key support, but I found it overkill (although works fine).

@ole
Copy link
Owner

ole commented Jan 3, 2018

Thanks, now it makes sense to me. index(of:) does indeed guarantee that it find the first element, so this behaving differently is definitely a bug.

@ole
Copy link
Owner

ole commented Jan 3, 2018

@Mamonaku I opened #18 to fix the index(of:) behavior.

@Mamonaku
Copy link
Author

Mamonaku commented Jan 3, 2018

Thanks!

@ole ole closed this as completed in #18 Jan 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants