-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Counted Set #132
base: main
Are you sure you want to change the base?
Counted Set #132
Changes from all commits
7ec883e
88524bf
4375f7c
e43e7d2
0c5fd75
a379f8c
1c567ea
89fd08f
e77d82c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,105 @@ | ||||||
//===----------------------------------------------------------------------===// | ||||||
// | ||||||
// This source file is part of the Swift Collections open source project | ||||||
// | ||||||
// Copyright (c) 2021 Apple Inc. and the Swift project authors | ||||||
// Licensed under Apache License v2.0 with Runtime Library Exception | ||||||
// | ||||||
// See https://swift.org/LICENSE.txt for license information | ||||||
// | ||||||
//===----------------------------------------------------------------------===// | ||||||
|
||||||
extension CountedSet: Collection { | ||||||
/// The position of an element in a counted set. | ||||||
@frozen | ||||||
public struct Index: Comparable { | ||||||
/// An index in the underlying dictionary storage of a counted set. | ||||||
/// | ||||||
/// This is used to distinguish between indices that point to elements with | ||||||
/// different values. | ||||||
@usableFromInline | ||||||
let storageIndex: Dictionary<Element, UInt>.Index | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: Please follow The Leading Underscore Rule when naming non-public members and types. (Throughout this PR) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, sorry. In my defense, that was discouraged for normal (that is, not Standard Library) Swift code until around three months ago. I actually didn’t realize the guidance changed until I tried to pull it just now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Interesting -- where was it discouraged? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, I can’t find it now. Maybe I was thinking of Google’s Swift style guide? Never mind then. |
||||||
|
||||||
/// The relative position of the element. | ||||||
/// | ||||||
/// This doesn't actually correspond to a distinct part of memory. Instead, | ||||||
/// it is used to distinguish between indices that point to elements of the | ||||||
/// same value. | ||||||
/// | ||||||
/// For example, the first index pointing to a value has an index of 0, the | ||||||
/// second index pointing to that value will have an index of 1, and so on. | ||||||
/// | ||||||
/// When a counted set is subscripted with an index, the stored multiplicity | ||||||
/// is retrieved using the `storageIndex` and compared with the `position` | ||||||
/// to determine the index's validity. | ||||||
@usableFromInline | ||||||
let position: UInt | ||||||
|
||||||
@inlinable | ||||||
public static func < ( | ||||||
lhs: CountedSet<Element>.Index, | ||||||
rhs: CountedSet<Element>.Index | ||||||
) -> Bool { | ||||||
guard lhs.storageIndex != rhs.storageIndex else { | ||||||
return lhs.storageIndex < rhs.storageIndex | ||||||
} | ||||||
return lhs.position < rhs.position | ||||||
} | ||||||
|
||||||
@usableFromInline | ||||||
init(storageIndex: Dictionary<Element, UInt>.Index, position: UInt) { | ||||||
self.storageIndex = storageIndex | ||||||
self.position = position | ||||||
} | ||||||
} | ||||||
|
||||||
@inlinable | ||||||
public func index(after i: Index) -> Index { | ||||||
guard i.position + 1 < rawValue[i.storageIndex].value else { | ||||||
return Index( | ||||||
storageIndex: rawValue.index(after: i.storageIndex), | ||||||
position: 0 | ||||||
) | ||||||
} | ||||||
|
||||||
return Index(storageIndex: i.storageIndex, position: i.position + 1) | ||||||
} | ||||||
|
||||||
@inlinable | ||||||
public subscript(position: Index) -> Element { | ||||||
let keyPair = rawValue[position.storageIndex] | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: it is usually more readable to bind the key-value pair to separate variables, as in
Suggested change
|
||||||
precondition( | ||||||
position.position < keyPair.value, | ||||||
"Attempting to access CountedSet elements using an invalid index" | ||||||
) | ||||||
return keyPair.key | ||||||
} | ||||||
|
||||||
/// The number of elements in the set. | ||||||
/// | ||||||
/// - Complexity: O(*k*), where *k* is the number of unique elements in the | ||||||
/// set. | ||||||
@inlinable | ||||||
public var count: Int { | ||||||
Int(rawValue.values.reduce(.zero, +)) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is fine per the Collection protocol, but it seems like a missed opportunity. What if we kept a running count of items in this set, updating it on every mutation? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Storing the count would mean this is no longer an implicit data structure. I’m not really sure that’s a worthwhile tradeoff. Is there precedent for doing that with similar data structures? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh definitely -- I don't think having to store one extra integer matters much in this case -- the hash table in the underlying dictionary already comes with way too much memory overhead to call this a truly implicit data structure. 🙈 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was thinking it was an implicit data structure for a dictionary, but okay then. |
||||||
} | ||||||
|
||||||
@inlinable | ||||||
public var startIndex: Index { | ||||||
Index(storageIndex: rawValue.startIndex, position: 0) | ||||||
} | ||||||
|
||||||
@inlinable | ||||||
public var endIndex: Index { | ||||||
Index(storageIndex: rawValue.endIndex, position: 0) | ||||||
} | ||||||
|
||||||
/// A value equal to the number of unique elements in the set. | ||||||
/// | ||||||
/// - Complexity: O(*k*), where *k* is the number of unique elements in the | ||||||
/// set. | ||||||
@inlinable | ||||||
public var underestimatedCount: Int { | ||||||
rawValue.count | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is neat, but if we decide to add a running total instead of using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
If we change it to reference By the way, would it be possible to add a more efficient conditional implementation of Swift Algorithms’ Unique methods for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is a good idea in general, and
The way to do this right now would require Algorithms to import Collections, which would be a bad idea. However, Swift does have support for cross-module overlays, which are modules that automatically get loaded whenever some module A and module B are both imported to a source file. (Apple's SDKs use this to add, e.g., MapKit-related functionality to SwiftUI when a file imports both of these modules.) If this functionality was cleaned up & promoted to a full-blown language feature, including SwiftPM support for defining cross-import modules, then we could use that to provide uniquing overloads that take CountedSets. Another alternative is to define a set of additional collection protocols in a separate package, and have swift-collections and swift-algorithms both depend on that. This would let swift-algorithms provide generic overloads for specific algorithms to speed them up when possible. I don't think we need to do this for CountedMultiset right now, but it's something to think about for later. (FWIW, I think we'll eventually want to define some sort of a dictionary protocol at least, and possibly a UniqueCollection protocol.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, the easiest way is to simply provide the uniquing methods directly on the counted multiset type, without importing Algorithms. This would work, but perhaps it would be even better to provide a (I'm not sure we need to do this though -- we are already exposing the storage dictionary, after all, so folks can simply access |
||||||
} | ||||||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
//===----------------------------------------------------------------------===// | ||
// | ||
// This source file is part of the Swift Collections open source project | ||
// | ||
// Copyright (c) 2021 Apple Inc. and the Swift project authors | ||
// Licensed under Apache License v2.0 with Runtime Library Exception | ||
// | ||
// See https://swift.org/LICENSE.txt for license information | ||
// | ||
//===----------------------------------------------------------------------===// | ||
|
||
extension CountedSet: ExpressibleByDictionaryLiteral { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see the utility of this, but I do expect There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given that protocol literally uses a
|
||
/// Creates a counted set initialized with a dictionary literal. | ||
/// | ||
/// Do not call this initializer directly. It is called by the compiler to | ||
/// handle dictionary literals. To use a dictionary literal as the initial | ||
/// value of a counted set, enclose a comma-separated list of key-value pairs | ||
/// in square brackets. | ||
/// | ||
/// For example, the code sample below creates a counted set with string keys. | ||
/// | ||
/// let countriesOfOrigin = ["BR": 2, "GH": 1, "JP": 5] | ||
/// print(countriesOfOrigin) | ||
/// // Prints "["BR", "BR", "JP", "JP", "JP", "JP", "JP", "GH"]" | ||
/// | ||
/// - Parameter elements: The element-multiplicity pairs that will make up the | ||
/// new counted set. | ||
/// - Precondition: Each element must be unique. | ||
@inlinable | ||
public init(dictionaryLiteral elements: (Element, UInt)...) { | ||
_storage = RawValue( | ||
uniqueKeysWithValues: elements.lazy.filter { $0.1 > .zero } | ||
) | ||
} | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
//===----------------------------------------------------------------------===// | ||
// | ||
// This source file is part of the Swift Collections open source project | ||
// | ||
// Copyright (c) 2021 Apple Inc. and the Swift project authors | ||
// Licensed under Apache License v2.0 with Runtime Library Exception | ||
// | ||
// See https://swift.org/LICENSE.txt for license information | ||
// | ||
//===----------------------------------------------------------------------===// | ||
|
||
extension CountedSet: Sequence { | ||
/// Returns an iterator over the elements of this collection. | ||
/// | ||
/// - Complexity: O(1) | ||
/// - Remark: A type-erased wrapper is used instead of an opaque type purely | ||
/// to preserve compatibility with older versions of Swift. | ||
@inlinable | ||
public func makeIterator() -> AnyIterator<Element> { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We'll need to implement a custom I recommend coding the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I figured as much. While I couldn’t see any downside to using I don’t suppose you could explain why that’s frowned upon here? As the comment notes, I’d have used an opaque type if I was allowed. Is there some benefit to making a dedicated type? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, it's a performance issue. Like
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ( |
||
AnyIterator( | ||
_storage | ||
.lazy | ||
.map { (key, value) -> [Repeated<Element>] in | ||
let repetitions = value.quotientAndRemainder( | ||
dividingBy: UInt(Int.max) | ||
) | ||
|
||
var result = [Repeated<Element>]( | ||
repeating: repeatElement(key, count: .max), | ||
count: Int(repetitions.quotient) | ||
) | ||
result.append(repeatElement(key, count: Int(repetitions.remainder))) | ||
|
||
return result | ||
} | ||
.joined() | ||
.joined() | ||
.makeIterator() | ||
) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We must not use a capital S here, as "multiset" is a single word.
I'm not sure I like this module name. What other type do you expect will under this module? It may make sense to just call this something boring like
CountedMultisetModule
-- although I do hope we will find a better name.(Note that I don't know if we'll ever have a multiset that keeps duplicate items in a flat hash table -- that construct seems more likely to be based on a HAMT, and so it would probably fit better in a module dedicated to that family of types, such as the one @msteindorfer is working on in #31. We'll probably also have a
SortedMultiset
that will be based onComparable
instead ofHashable
.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seemed to be some interest in signed multisets, specifically. I don’t entirely see the benefit myself, but after spending too much time agonizing over the module name I decided to just pick that and change it later if necessary.
I’m not a big fan of the “Module” suffix, but if that’s necessary it’s necessary.