Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Counted Set #132

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
Draft

Counted Set #132

wants to merge 9 commits into from

Conversation

Saklad5
Copy link

@Saklad5 Saklad5 commented Dec 4, 2021

Description

A new implicit data structure has been added: an unsigned multiset. This has been discussed extensively on the Swift Forums, most notably here and here.

Detailed Design

Since this is an implicit data structure, I have introduced conformance to RawRepresentable, where the RawValue is Dictionary<Element, UInt>. I have used UInt instead of Int to make it clearer that this is not a signed multiset, which may be added in the future.

A potential point of controversy: I have implemented CountedSet.underestimatedCount as the number of unique elements, since that is indeed guaranteed to be less than or equal to its cardinality.

CountedSet implements both Collection and SetAlgebra, among other protocols.

This is not quite ready for release yet, in my estimation, but the contribution guidelines encourage early pull requests, and the core functionality has been implemented.

Work needs to be done on the Codable implementation: in particular, it should be possible to decode a flat list of objects into a CountedSet, as one of the intended uses of the structure is to act as an unordered Array.

Representations for Mirror, String, and debugging also need to be customized.

Additional API may be warranted as well, since a lot of functionality (like the count of a given element) is currently accessible only by reading the underlying dictionary directly.

/// An unordered, counted multiset.
@frozen
public struct CountedSet<Element: Hashable>: RawRepresentable {
  // Allows internal setter to be referenced from inlined code
  @usableFromInline
  internal var _storage = RawValue()

  @inlinable @inline(__always)
  public var rawValue: [Element: UInt] { _storage }

  @inlinable
  public var isEmpty: Bool { rawValue.isEmpty }

  /// Creates an empty counted set with preallocated space for at least the
  /// specified number of unique elements.
  ///
  /// - Parameter minimumCapacity: The minimum number of elements that the
  ///   newly created counted set should be able to store without reallocating
  ///   its storage buffer.
  @inlinable
  public init(minimumCapacity: Int) {
    self._storage = .init(minimumCapacity: minimumCapacity)
  }

  @inlinable
  public init?(rawValue: [Element: UInt]) {
    guard rawValue.values.allSatisfy({ $0 > .zero }) else { return nil }
    _storage = rawValue
  }

Documentation

I’ve taken care to document all new API at the symbol-level, particularly where it may be confusing to users. However, some of it may benefit from additional usage examples and guidance.

I have not updated the Documentation folder guides yet.

Testing

There is complete test coverage for the Sequence and SetAlgebra implementations, including dedicated tests for each axiom mandated by SetAlgebra. Coverage for other elements is more spotty, but does exist.

Performance

I have not implemented performance benchmarks yet.

Source Impact

This is a purely additive change, so it does not break any existing API.

Checklist

  • I've read the Contribution Guidelines
  • My contributions are licensed under the Swift license.
  • I've followed the coding style of the rest of the project.
  • I've added tests covering all new code paths my change adds to the project (to the extent possible).
  • I've added benchmarks covering new functionality (if appropriate).
  • I've verified that my change does not break any existing tests or introduce unexpected benchmark regressions.
  • I've updated the documentation (if appropriate).

This module will contain CountedSet and potentially other multiset
implementations.
Similar to other collections in the package, CountedSet is backed by an internal _storage property, and exposes an initializer for preallocating storage.
Sequence conformance is implemented by iterating over the stored elements, repeating according to each stored count.
Collection conformance is implemented by using the index of an element in the underlying storage along with a position to distinguish between copies of the same element.
The implementation requires all elements to be unique and all multiplicities to be positive.

Tests have been added for initializing empty CountedSets with and without a minimum capacity.
Using unsigned integers for multiplicity of an element in a counted set makes it much clearer that negative values are not allowed.

Initializing a counted set with a dictionary literal can now be done with any given multiplicity. Negative values are now barred by the compiler, and multiplicities of zero can simply be discarded.

This change necessitated a slight tweak to the way iteration is performed: since Swift.repeatElement(_:count:) requires Int counts, multiple calls are used for multiplicities larger than Int.max.

CountedSet.count will trap on cardinalities larger than Int.max, as with most Sequences.
The custom implementation of count adds the multiplicities of the elements together, which is far more efficient than the default approach of iterating through the set manually.

underestimatedCount has also been customized: it exposes the count of the underlying dictionary, which is guaranteed to be less than or equal to the cardinality of the counted set itself. This may be used to avoid trapping for sets with very large multiplicities.
Each requirement has been implemented according to the mathematical definitions, complete with careful documentation to ensure users understand how that may diverge from expectations.

Because Swift.Dictionary does not specify time complexity for every method being used, part of the implementation cannot either.

Tests have been added to exercise all of the SetAlgebra implementation, including each of the axioms required by SetAlgebra.
This operation adds the multiplicities of each element together.

A test has been added to cover it.
@Saklad5 Saklad5 requested a review from lorentey as a code owner December 4, 2021 18:53
@Saklad5 Saklad5 marked this pull request as draft December 4, 2021 18:54
Copy link
Member

@lorentey lorentey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good approach, thanks!

I did an initial review pass and added a number of comments that we should address before merging.

  • As mentioned below, I think CountedMultiset would be a better name for this type, Foundation's precedent nothwithstanding.

  • As mentioned below, I'm vetoing the SetAlgebra conformance (sorry!) -- we'll need to go through the features of that protocol and individually consider each one, adapting/removing ones that don't make sense in a multiset. This means that we'll lose all the functionality that SetAlgebra provides by default -- which isn't a big deal, as the default implementations are pretty bad anyway.

  • A proper multiset should provide a direct operation to query the multiplicity of a particular member. I.e., we need a generalization of contains that returns an integer instead of a boolean value.

  • Like the standard Set, I expect CountedMultiset to conform to Hashable, ExpressibleByArrayLiteral and CustomStringConvertible, and to conditionally conform to Codable.

    I'm not sure what format would be best for the description of this type -- but given that this is a counted multiset, perhaps it makes sense to use a dictionary-like printout instead of repeating duplicates, unlike a dupes-preserving SortedMultiset. I recommend investigating existing precedents in other languages.

  • We do not need the RawRepresentable conformance; it seems far more straightforward to expose the underlying dictionary through a custom property, like with OrderedSet.

  • Operations that return/initialize multisets (such as union, intersection, subtracting etc) need to document where the representative values in their results come from. (E.g., an initializer that takes a sequence could specify that representative values are the first instances that appear on the input, while union could say that for values that appear in both inputs, the representative value is taken from the first set (a.k.a. self).

//
//===----------------------------------------------------------------------===//

extension CountedSet: SetAlgebra {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I don't think it's feasible to conform this type to SetAlgebra -- that protocol isn't designed to support multisets, and I expect that some algorithms that are generic over it will be ill-prepared to work on them. 😞

That said, I think it does make sense to provide at least some of the SetAlgebra operations; union/formUnion/intersection/formIntersection/subtracting/subtract in particular seem like desirable operations, with the semantics implemented here.

We will need to manually provide subtract and formSubtract, as we'll no longer be able to rely on SetAlgebra implementing them for us. (And also because SetAlgebra implements them using intersection & symmetricDifference which is very wasteful.)

We'll also need to provide explicit implementations for the full suite of isSubset(of:) etc predicates, as well as overloads for all of these that take a generic sequence instead of a CountedSet.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Painful as it is, I think you’re right about SetAlgebra. The actual requirements of the protocol clearly assume an uncounted set, and all the documented warnings in the world won’t change that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, SetAlgebra isn't the best protocol in the stdlib -- it supports the standard unordered Set and OptionSet, and sadly that's about it; other set-like things tend to have a difficult time conforming to it, and we can't really define interesting generic algorithms over it. (At least, not very efficiently.)

(There is still a huge benefit in having it in the stdlib, in that it standardizes names for the most common set operations. 😀)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it’s a good example of premature specialization: it adds a lot of requirements that should be attached to Set directly; including ExpressibleByArrayLiteral, for no discernible reason.

Still, it’s not the most frustrating protocol of the bunch. I’d say SignedNumeric deserves that title.

Anyway, unless someone thinks it is worth having MultisetAlgebra as a protocol, I’ll just remove it. Maybe a future major version of Swift can fix SetAlgebra.

@inlinable
@discardableResult
public mutating func insert(_ newMember: __owned Element)
-> (inserted: Bool, memberAfterInsert: Element) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we aren't going to be constrained by SetAlgebra's questionable design choices, we should change the signature of this method to return something that makes more practical sense, such as the index of the newly inserted item.

(OrderedSet.insert is one precedent for this.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have a better suggestion: returning the new multiplicity of the inserted element.

Copy link
Member

@lorentey lorentey Dec 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good! We may need to provide both though, because people may need the index to figure out the exact variant that's in the set. (This is relevant for element types where equal values can be still distinguished.)

This can always be reconstructed using firstIndex(of:), but returning it directly in insert helps eliminate an extra hash table lookup.

(Oh, this reminds me, CountedMultiset will need to implement the (underscored) _customIndexOfEquatableElement and _customLastIndexOfEquatableElement requirements of Collection to speed up firstIndex(of:)/lastIndex(of:). The OrderedSet codebase has an example on how to do this, as well as other tricks in the same vein.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I literally didn’t know those existed. I’m guessing they have default implementations so most people don’t know about them?

/// This is used to distinguish between indices that point to elements with
/// different values.
@usableFromInline
let storageIndex: Dictionary<Element, UInt>.Index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Please follow The Leading Underscore Rule when naming non-public members and types.

(Throughout this PR)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, sorry. In my defense, that was discouraged for normal (that is, not Standard Library) Swift code until around three months ago.

I actually didn’t realize the guidance changed until I tried to pull it just now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting -- where was it discouraged?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I can’t find it now. Maybe I was thinking of Google’s Swift style guide? Never mind then.

/// set.
@inlinable
public var count: Int {
Int(rawValue.values.reduce(.zero, +))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine per the Collection protocol, but it seems like a missed opportunity.

What if we kept a running count of items in this set, updating it on every mutation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing the count would mean this is no longer an implicit data structure. I’m not really sure that’s a worthwhile tradeoff.

Is there precedent for doing that with similar data structures?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh definitely -- Set and Dictionary do it, for example, to prevent having to scan their bitmap to figure out their element count.

I don't think having to store one extra integer matters much in this case -- the hash table in the underlying dictionary already comes with way too much memory overhead to call this a truly implicit data structure. 🙈

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking it was an implicit data structure for a dictionary, but okay then.

/// set.
@inlinable
public var underestimatedCount: Int {
rawValue.count
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is neat, but if we decide to add a running total instead of using reduce above, then we should return it here, too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

underestimatedCount has always been incredibly unclear to me, in terms of its intended role in Collections. I don’t believe anything actually says it should be equivalent to count in such cases, so I figured it’d be a useful way to express the (less computationally expensive) number of unique elements.

If we change it to reference count, we should add another property with this value.

By the way, would it be possible to add a more efficient conditional implementation of Swift Algorithms’ Unique methods for CountedSet, in case anyone ends up using them together? I’m guessing the answer is no, but it’d be a nice-to-have.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

underestimatedCount has always been incredibly unclear to me, in terms of its intended role in Collections. I don’t believe anything actually says it should be equivalent to count in such cases, so I figured it’d be a useful way to express the (less computationally expensive) number of unique elements.

This is a good idea in general, and Collection does allow this! underestimatedCount is typically only used in contexts that are generic over Sequence, but that doesn't mean it's a bad idea to make it diverge from count -- it can still be quite beneficial.

By the way, would it be possible to add a more efficient conditional implementation of Swift Algorithms’ Unique methods for CountedSet, in case anyone ends up using them together? I’m guessing the answer is no, but it’d be a nice-to-have.

The way to do this right now would require Algorithms to import Collections, which would be a bad idea.

However, Swift does have support for cross-module overlays, which are modules that automatically get loaded whenever some module A and module B are both imported to a source file. (Apple's SDKs use this to add, e.g., MapKit-related functionality to SwiftUI when a file imports both of these modules.) If this functionality was cleaned up & promoted to a full-blown language feature, including SwiftPM support for defining cross-import modules, then we could use that to provide uniquing overloads that take CountedSets.

Another alternative is to define a set of additional collection protocols in a separate package, and have swift-collections and swift-algorithms both depend on that. This would let swift-algorithms provide generic overloads for specific algorithms to speed them up when possible.

I don't think we need to do this for CountedMultiset right now, but it's something to think about for later. (FWIW, I think we'll eventually want to define some sort of a dictionary protocol at least, and possibly a UniqueCollection protocol.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, the easiest way is to simply provide the uniquing methods directly on the counted multiset type, without importing Algorithms. This would work, but perhaps it would be even better to provide a uniqueMembers view instead, exposing the underlying dictionary's keys view.

(I'm not sure we need to do this though -- we are already exposing the storage dictionary, after all, so folks can simply access keys directly.)

internal var _storage = RawValue()

@inlinable @inline(__always)
public var rawValue: [Element: UInt] { _storage }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not have a rawValue property -- I think it would be better to use some other name, such as OrderedSet's elements.

Swift practice generally dislikes the use of UInt for anything other than bit patterns, so while UInt makes sense from a validation/performance sense, for usability/consistency we should probably stick to using Int for these counters.

(E.g., note that count returns a signed Int, not an unsigned one.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally used a Dictionary<Element, Int> as a backing store, but decided to change it purely to eliminate confusion if a signed multiset was ever added. No one can look at a UInt and think it can hold a negative value.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts about whether that is sufficient justification for using UInt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree with that, but I feel using UInt here would be diverging too much from established API design precedents. The dice have fallen this way long ago, and it's too late to change our minds about it.

(FWIW, I do like that in Swift we tend not to do arithmetic on UInt values, or deal with signed/unsigned conversions in regular code.)

public struct CountedSet<Element: Hashable>: RawRepresentable {
// Allows internal setter to be referenced from inlined code
@usableFromInline
internal var _storage = RawValue()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please spell out the underlying type here, or use an explicit typealias. Do not initialize the stored property here.

internal var _storage: [Element: UInt]

}

@inlinable
public init?(rawValue: [Element: UInt]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should replace this with a simple unlabeled initializer that takes an [Element: Int] value.

I'm not sure the initializer needs to be failable -- it seems fine to simply filter out items with invalid counts. (Although items with negative counts could be worth signaling somehow...)

Given that I'm vetoing the SetAlgebra conformance, we'll definitely need to add additional initializer(s) to replace SetAlgebra's generic init.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the initializer required by RawRepresentable: filtering out items would mean that the input wasn’t used as a rawValue, which would violate the protocol.

I don’t intend for this to be used often; it’s mainly important for deserialization and similar tasks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to say I think we should get rid of the RawRepresentable conformance! (It comes with interoperability overtones and we don't need any of the functionality it provides.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree about removing RawRepresentable, but I’ll acquiesce.

In that case, do we even need a dictionary initializer? I definitely don’t think a nonfailable unlabeled initializer is appropriate, as it would not be value preserving.

Copy link
Author

@Saklad5 Saklad5 Dec 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I said I’d acquiesce, but upon reflection a lot of the desired functionality is replicating RawRepresentable, particularly with regards to serialization and equality-checking. If I dropped RawRepresentable, I’d just have to replicate its implementations explicitly.

Moreover,CountedMultiset has a similar role to OptionSet: most operations are actually calling a different operation on the raw value, and it implements set operations backed by a non-set type. I’d argue that’s a significant precedent.

RawRepresentable is an extremely simple protocol, and literally every part of it is relevant. Moreover, guaranteeing lossless conversion to and from a dictionary makes it much easier for users to implement exotic forms of serialization for.

You keep saying it isn’t needed, but I don’t see how that justifies its exclusion. Is there some undesired effect of conformance that I’m unaware of?

Note that CountedMultiset could expose the underlying value as both rawValue and elements (they both point to _storage anyway), and add other initializers. For instance, I think there should be an unsafe variant of dictionary initializer that performs no checks on the dictionary whatsoever as a performance optimization.

/// multiplicity is decremented by one.
/// - Parameter member: An element to remove from the set.
/// - Complexity: O(*k*), where *k* is the number of unique elements in the
/// set, if the multiplicity of the given element was one. Otherwise, O(1).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Hm, this complexity guarantee doesn't look quite right to me -- ignoring non-unique storage, removing a single item from a dictionary is expected to take O(1) hashing/equality checks on average, as long as Element properly implements hashing.

It looks like the O(k) part comes from the Dictionary.removeValue(forKey:) docs, which seem to be bogus -- they aren't consistent with how we document the complexity of other dictionary operations.

(Strictly speaking, removeValue(forKey:) is indeed an O(n) operation, but so is Dictionary.subscript(key:).getter, Set.contains(_:) etc. -- hash tables have terrible worst-case performance. The same reasoning that allows us to say that updating a dictionary value is expected to be an "O(1)" operation also applies to removals. There is a bit of handwaving involved.)

I recommend changing this (and every other primitive operation, from contains(_:) to insert(_:)) to say something like:

/// - Complexity: This operation is expected to perform O(1) hashing/equality checks on
///    average, provided that `Element` implements high-quality hashing.

(Note the careful wording -- Equatable doesn't guarantee that == takes O(1) time, so we cannot promise anything about how long dictionary operations will take in real time -- we can only put an upper bound on how many times they are expected to compare items. And of course random hash collisions make the worst case performance linear, so we need to talk about "expected" number of operations "on average". 😛)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent a while pondering the time complexity here, and ultimately decided it was better to underpromise than overpromise. You wouldn’t want to do this in the middle of a loop, for instance.

Copy link
Member

@lorentey lorentey Dec 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's perfectly fine to remove items from a Dictionary in a loop! Removals are slower than insertions, but only by a (small) constant factor -- the two operations have exactly the same algorithmic complexity.

(In fact, removals can sometimes win by never having to resize the hash table -- in the current implementation at least. Shrinking dictionaries on removals is still a wishlist item for the stdlib.)

Copy link
Author

@Saklad5 Saklad5 Dec 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if Dictionary.removeValue(forKey:) has a documented time complexity of O(n), I’m going to assume a future non-breaking update can actually make it O(n).



private let x: CountedSet<Character> = ["a": 1, "b": 2, "c": 3, "d": 4]
private let y: CountedSet<Character> = ["e", "f", "a", "f"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hardwiring these particular values, these tests should be using the combinatorial testing features provided in the _CollectionsTestSupport package to exhaustively exercise the implementations over many different combinations of inputs, generated automatically.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was pretty sure I something like that existed, but decided to get the tests hammered out for an initial pull request before looking into it.

@Saklad5
Copy link
Author

Saklad5 commented Dec 9, 2021

Thank you for the detailed feedback, I really appreciate it!

@Saklad5
Copy link
Author

Saklad5 commented Dec 12, 2021

A proper multiset should provide a direct operation to query the multiplicity of a particular member. I.e., we need a generalization of contains that returns an integer instead of a boolean value.

I’ve come up with a few different candidates for how this should be expressed, and I’d like feedback on which to use.

A. CountedMultiset.multiplicity(of:)
B. CountedMultiset.count(of:)
C. CountedMultiset.subscript(_:)

I’m favoring A, personally.

@xwu
Copy link
Contributor

xwu commented Dec 12, 2021

It's nice to use the same term consistently where possible. Since this type is being called a counted multiset, it makes sense to me that we should be able to ask about the count of something.

@Saklad5
Copy link
Author

Saklad5 commented Dec 12, 2021

It's nice to use the same term consistently where possible. Since this type is being called a counted multiset, it makes sense to me that we should be able to ask about the count of something.

That’s true, but Swift usually uses count to mean cardinality, and that’s not what this would be. I’m concerned it could be easily confused with CountedMultiset.count, the sum of all multiplicities and the cardinality of the collection.

You raise an excellent point about the name, though: I was using CountedSet mainly because Foundation’s (terrible) implementation does. @lorentey said it should be named CountedMultiset instead, but now that I think about it that seems redundant at best and confusing at worst.

I don’t think there is any precedent for the name “counted multiset", and even “counted set" doesn’t seem to show up outside the context of Foundation and derivations upon it. The most common term in academia and other programming languages, by far, is “multiset”, followed by “bag”.

@lorentey: Instead of CountedSet or CountedMultiset, how about simply Multiset? That seems a lot clearer, and lends itself naturally to proposed variations like SortedMultiset or even SignedMultiset.

@LePips
Copy link

LePips commented Oct 14, 2024

Hello all 👋 I've recently been on the LeetCode grind and multisets/multi-set-counts are frequently used, so a lack of this data structure is fresh on the mind.

Is there any interest in continuing the work in this PR? Having read all previous forum posts, issue threads, and comments on this review I feel confident on taking on the rest of the work required.

Some response to/support of comments made in reviews:

  • renaming to MultiSet. I consider Counted_ to be a bit redundant.
  • use count(of:) for element counts. Since Swift 6.0 fixed the type checker issue so that count(where:) could [finally] be implemented, similar API is now allowed and doesn't use un-Swift-like synonyms. This obviously would depend on this package having the Swift 6.0 toolchain as minimum when this is shipped.
  • remove the RawRepresentable conformance and use exposed elements for stronger typing.

Would there also be interest in an alternate implementation that stores the elements instead of purely keeping the count? I've seen comments here and there on other language implementations that some people do advocate an [Int: [Element]] equivalent. Using count(of:) or multiset[element].count are the easy APIs to expose to mirror this implementation's main purpose.

This would of course bring other implementation questions of the set operations and enumeration. I have some ideas but will defer to an actual implementation if interested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants