-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BitSet
API Review
#143
Comments
Can you provide your reasoning behind these suggestions?
I don't think moving the collection conformance to a view would be reasonable for We can certainly precompute I don't see a reason to precompute
Sure, it can be. But I see nothing about
What for?
For what reason?
Good catch, thanks.
Yep, creating a
Thanks! |
I don't understand the benchmark in the original API review. Does the input size refer to the number of elements, or did both sets contain a single element of increasing value?
This suggestion was partly in response to feedback on the discardable results of the
Some tools don't handle the spaces properly. |
We have not yet done the work of setting up benchmarks for these two data structures -- that's one reason they're still on a branch. (The other is the lack of documentation.) The benchmark results in the original proposal do not reflect the current implementation.
Yep, this is a reasonable suggestion, for SetAlgebra. We'll need to quantify the performance benefits of eliminating the return values; if the hypothesis about the perf improvement holds true, and it's significant enough to be worth the effort of going through S-E, then we should consider this for addition to SetAlgebra. Meanwhile, provided that benchmarks indicate a benefit we cannot achieve by other means (such as making the relevant code inlinable), we can add this (The existence of
Which tools? |
What do you think about (Consider that |
What's your opinion on Should |
We currently have a These two types provide two different perspectives on the same underlying data structure. Ideally it ought to be possible to treat them as views of each other, making it easy to switch between the two sets of APIs as needed. However, The goal of the extra invariant is to allow better memory management for One way to unify the invariants of these two types would be to give up on Allowing two equal bitsets to differ in the size of their storage can be somewhat problematic though, as |
What about naming? I'm pretty sure bitset & bitarray work best as single words (e.g., C++ has |
Unfortunately, it's not obvious whether to use a conditional
I've seen reports on the forums, but I can't remember the details. Mishandling of spaces is a recurring issue. https://github.com/apple/swift/issues?q=is:issue+label:bug+in:title+spaces
It will be more useful when we have
This reminds me of your SR-7648 feature (which needs an I'll think about the other questions. |
Instead of having two top-level types, can one of them become a nested type? For example, move let count = 256 * 256 //-> 65536
var value = BitArray(repeating: false, count: count)
value.nonzeroBits.isEmpty //-> true
value.nonzeroBits.insert(count - 1) //-> (inserted: true, memberAfterInsert: 65535)
value.nonzeroBits.insert(count) //-> Fatal error: out of bounds.
On second thought, I don't like the heterogeneous comparison operators, because they complicate generic code. Therefore, I'm against generic operations on
Other libraries use camel case: |
I'm not sure I follow how heterogeneous comparison operators reflect the ergonomics of inserting sequences of bits represented by fixed-width integers of varying bit widths? |
My argument was badly worded, but I was trying to say that we should wait for a more general solution. I've just noticed that the generic |
(On providing a labeled subscript in
The
Apologies, but my position is going to be quite rude: We are in the year 2022, and this has been a solved problem for many, many decades. Tools that can't handle spaces in file (or directory) names are broken, and they need to be fixed. There are no valid excuses. Please do file a bug if the human readable file names in this package cause any concrete issues in Swift tooling, so that those tools can be fixed quickly. (The spaces are going to stay, though.)
There are multiple implementations for big integers, we just don't have any in the Standard Library.
I don't think so! All signed binary integer types are expected to implement a twos complement representation (or to at least to pretend that they do so in their conversion methods); I think it would be very much desirable to make full use of this fact when converting between integers and bit sequences.
Oh, yes! We definitely need that -- it's the primitive operation. (We also badly need that as a
So here is my thinking about that: from my personal experience, I find But it feels weird to rigidly prefer one or the other -- these are really just two sides of the same coin. Even though I think
I'm not sure I get this argument. I'm not holding my breath for implicit integer widening, and I don't really see how it would relate to this type -- I would not expect integer widening to allow implicit conversions between signed/unsigned integer types, for example. I feel a bit ambivalent about the generic methods on BitSet because they seem useful, but they are weirdly lopsided -- they do allow more convenient use when inserting/removing items and when calling The natural alternative would be for (If we did end up with |
That's fair. Most data structure references also prefer "bit vector", "bit array" and "bit set", although "bitmap" is usually spelled as a single word. |
Update: on reflection, I found an important use case for member-subscript operations in extension BitSet {
public subscript(member member: Int) -> Bool { get set }
public subscript(members bounds: Range<Int>) -> Slice<BitSet> { get }
public subscript<R: RangeExpression>(members bounds: R) -> Slice<BitSet> where R.Bound == Int { get }
} Counter-intuitively, the individual member lookup/assignment operation However, as /// Accesses the contiguous subrange of the collection’s elements that are
/// contained within a specific integer range.
///
/// let bits: BitSet = [2, 5, 6, 8, 9]
/// let a = bits[3..<7] // [5, 6]
/// let b = bits[4...] // [5, 6, 8, 9]
/// let c = bits[..<8] // [2, 5, 6]
///
/// This enables you to easily find the closest set member to any integer
/// value.
///
/// let firstMemberNotLessThanFive = bits[5...].first // Optional(6)
/// let lastMemberBelowFive = bits[..<5].last // Optional(2)
///
/// - Complexity: Equivalent to two invocations of `index(after:)`.
public subscript(members bounds: Range<Int>) -> Slice<BitSet> { ... } (I'm vacillating on whether we should have a variant of |
I think the Boolean subscript looks better with a bits[contains: member] = true The other subscripts could have an let lastMemberBelowFive = bits[intersection: ..<5].last
|
... and now Keeping a precalculated count is important for some use cases though, so I added a separate |
You've missed a couple of doc fixes — the /// let firstMemberNotLessThanFive = bits[5...].first // Optional(6)
/// let lastMemberBelowFive = bits[..<5].last // Optional(2) However, if the |
For the /// Bit arrays are encoded as an unkeyed container of `UInt` values,
/// representing the total number of bits in the array, followed by
/// UInt-sized pieces of the underlying bitmap. /// Bit sets are encoded as an unkeyed container of `UInt` values,
/// representing pieces of the underlying bitmap. this will make archives incompatible between 32-bit and 64-bit platforms. |
"Multisets" or "bags" are sometimes called "counted sets", so is there a better name than |
D'oh. Very nice catch, thank you! The right thing to do would be to switch to consistently using (The obvious thing to do would be to encode bit arrays as strings of the form (Edit:) We could also use a compact(ish) string encoding such as (generalized) base64 -- that would have equal overhead whether the serialization target is a text or binary format. However, I don't know how I feel about the idea of adding a base64 encoder/decoder to this package. (Encoding bitmaps as integer values has rather high storage (and time) overhead if the output is a text format like JSON. Then again, I expect compression would fix storage overhead, and the time overhead of decimal conversion might be eclipsed by the need to parse syntax, at least on the decoding side.) |
This is a fair point, but I don't think it'd be a true source of confusion in actual practice. There is no such thing as a "multi bit set" ("bit multiset"? "bitbag"?), and the way the type is nested under (To nitpick, "counted set" and "multiset" aren't interchangeable terms outside abstract mathematics. "Counted set" refers to a particular implementation of a multiset where duplicates aren't preserved -- so it is a more specific term.) That said, suggestions for naming alternatives would be welcome. (Note: brevity is a virtue.) |
The premise here is unacceptable! 😄 If a Note: this isn't an explicit Standard Library requirement for the The general convention I'm enforcing is that if a (A (weaker) corollary of this rule is that collections with stridable indices ought to be random-access collections. We will see if this corollary survives the addition of a tree-based List type.)
I really do not think so. I would be open to allowing easier conversions between BitArray's integer indices and BitSet's opaque indices, but using the same type would be a nonstarter for me. These are two very different collection types, even though they share the same underlying data structure. However, I do not have much affection for the The range-expression-taking I decided to add the |
This would make it possible to encode with
I dislike the way Base64 reorders the bits. For example, mouse-over the diagram in Fast Base64 encoding with SSE vectorization.
I can't think of a better name than |
That is only the default implementation when users specify an index type that meets the constraints but don't opt for custom behavior. A type can perfectly well implement I do agree this tends to be very confusing to end users though :) The same thing can be said for the bugs that arise when users assume indices begin at 0. |
|
Yes, but having equivalent codable representations for distinct types is at best a nice-to-have feature, never a requirement. In the case of
I think those would be very interesting experiments to do outside of this package!
I don't really see your point here -- if the stdlib's default implementations only work correctly in cases that satisfy some condition that does not derive from stated requirements, then for all intents and purposes the stdlib does assume that those conditions will hold. These assumptions need to be clearly documented. (Another case like this is the fact that
Maybe! This package isn't ABI stable, but previously we added (The presence of the attribute doesn't carry any semantic promise that we won't change the memory layout in future releases -- it just lets these developers to get a calling convention that matches the package's normal behavior.)
Hm, this is a good question. For In
Hm, the MinimalEncoder/Decoder facility ought to catch such type mismatches. We'll need to take a look at what's going on. |
That's not what I'm saying. When certain conditions are met regarding the |
Sure. |
Ah, it's all good! This is on a Very well spotted! 👍 |
Can this issue be closed now? I'll take a closer look at the implementation when BitCollections are reviewed on the forums. |
Sure! |
I missed the GSoC 2021 topic on the forums, so I'd like to post my feedback here instead.
I'm working on similar types (
SetOfUInt8
andSetOfUInt16
), so I have a few suggestions forBitSet
.Could the collection APIs be moved to a separate type?
This method would pre-compute the
startIndex
andindex(before: endIndex)
into stored properties of the view.Could the membership APIs also be combined as a custom subscript?
Could the
_Word
type haveOptionSet
conformance, for some default implementations?Should the spaces in source file names be removed?
(There's also an inconsistency with the
BItSet+Invariants
file name.)Could some of the tests be extracted into a
SetAlgebra
conformance checker?(I haven't reviewed the tests, but I think line 115 is incorrect.)
The text was updated successfully, but these errors were encountered: