[SE-0270] Add Collection Operations on Noncontiguous Elements #69766

jmschonfeld · 2023-11-09T23:32:22Z

This implements the API described by SE-0270 Add Collection Operations on Noncontiguous Elements. Specifically, this adds the RangeSet and DiscontiguousSlice types and associated APIs to the stdlib.

jmschonfeld · 2023-11-10T00:33:11Z

@swift-ci please smoke test

jmschonfeld · 2023-11-13T17:10:47Z

@swift-ci please smoke test

jmschonfeld · 2023-11-13T20:58:17Z

@swift-ci please smoke test

jmschonfeld · 2023-11-13T22:54:39Z

@swift-ci please build toolchain macOS platform

jmschonfeld · 2023-11-14T17:48:04Z

@swift-ci please smoke test

jmschonfeld · 2023-11-14T21:07:05Z

@swift-ci please smoke test

jmschonfeld · 2023-11-15T20:58:20Z

@swift-ci please smoke test

jmschonfeld · 2023-11-27T23:09:02Z

@swift-ci please smoke test

jmschonfeld · 2023-11-28T20:15:40Z

@swift-ci please smoke test macOS platform

stdlib/public/core/CollectionAlgorithms.swift

stephentyrone · 2023-11-28T20:45:17Z

stdlib/public/core/DiscontiguousSlice.swift

+      hasher.combine(element)
+      count += 1
+    }
+    hasher.combine(count) // discriminator


What does including count actually get us here?

@lorentey please correct me if I'm wrong - I believe we added this while discussing this type and comparing it to existing collection types. I believe this is a discriminator for cases where you may have a Hashable parent type such as the following:

struct Parent<Base: Collection>: Hashable where Base.Element: Hashable { let slice1: DiscontiguousSlice<Base> let slice2: DiscontiguousSlice<Base> }

Using the count of the slice as a discriminator here causes the hash values of two unequal parents Parent(slice1: [1], slice2: [2, 3]) and Parent(slice1: [1, 2], slice2: [3]) to have two different hash values resulting in a better hashing function.

I believe Array has this discriminator for a similar reason: https://github.com/apple/swift/blob/6af9c99d66554ec9925125ed2afc4c74162e757c/stdlib/public/core/Array.swift#L1801-L1814

Yes -- variable-sized values must use a self-delimiting hash encoding, to prevent (accidental or deliberate) hash collisions for aggregate types that include such values as components, such as @jmschonfeld's struct Parent.

On a second look, I believe combining the count as the last step isn't quite good enough, though -- we'll need to either swallow the cost of retrieving the count in a separate pass and combining it up front, or we need to insert discriminators before each item and a different delimiter at the end:

public func hash(into hasher: inout Hasher) { for element in self { hasher.combine(1 as UInt8) // discriminator hasher.combine(element) } hasher.combine(0 as UInt8) // delimiter }

I think retrieving the count up front is generally going to be cheaper. I'll submit a suggestion implementing this in a separate thread.

The expectation is that we would in theory be able to reconstruct/decode the hashed value if we are given hash encoding, even if some unrelated bytes got appended at the end of it. (If we want to delimit the encoding using a count, then this means that the count needs to appear first, as it wouldn't be distinguishable from actual elements later.)

This seems like we're taking it too far; you cannot actually prevent all collisions, only make a reasonable best-effort. I buy the rationale as far as using some delimiter, but it's not at all clear to me that we gain anything by using count instead of an arbitrarily chosen delimiter byte.

Getting hashing right is an important detail, because it has security implications. Playing loose with the rules can lead to exploitable vulnerabilities; we should not let the Standard Library go down that path.

For Set/Dictionary to work as promised, we must avoid allowing repeatable hash collisions. Two distinct pieces of input data must not ever reliably hash to the same value.

When hashing general purpose container types, care must be taken to make sure the encoding remains robust even when concatenated with arbitrary other encodings. Delimiting the hash encoding with a specific bit pattern usually doesn't work in this case, as we cannot prevent the element type from emitting the same pattern. (Unlike with special-purpose collections where the Element type is known, such as String.) The two approaches above (counting elements or combining per-instance discriminators) are the two basic techniques we can choose from.

Hashing the count of items requires two passes in this case; using per-instance discriminators would feed significantly more data to the hasher. Of these two options, the former seems preferable.

(The current implementation is great! It explicitly feeds the count to the hasher before hashing any of the items. Accordingly, this can be marked resolved, unless anyone has an objection.)

jmschonfeld · 2023-11-29T23:15:27Z

@swift-ci please smoke test

lorentey

I made lots of notes, but most of them are tiny formatting nits with suggestions.

stdlib/public/core/CollectionAlgorithms.swift

lorentey · 2023-11-29T23:27:36Z

stdlib/public/core/RangeReplaceableCollection.swift

+    for range in inversion.ranges {
+      result.append(contentsOf: self[range])
+    }
+    self = result


Observation: This implementation will not work for noncopyable elements. The most general RRC.removeAll(where:) implementation shares this problem, though, so I don't think it is a real objection, even if we end up generalizing the existing protocols for noncopyable support.

Ah good point. I'd be happy to update it to something that is compatible, but I can't think of a good way to remove multiple subranges at once otherwise because any mutation to the collection will invalidate any remaining subrange indices we have yet to process

I don't think this is implementable in a reasonable way without self-assignment. The question is: would it be better to reformulate this entry point as a nonmutating func removingSubranges API? That would be a more honest description of how it works, and it would directly translate to a consuming algorithm.

Update: I see we already have a removingSubranges API on Collection, but that returns a DiscontiguousSlice, and it would clash with this one. Bummer! (I can't judge if we need to keep both...)

Yeah as you mentioned the current API under review has both, but the removing one returns a DiscontiguousSlice in constant time wrapping the base collection rather than creating a copy and removing various elements from it

stdlib/public/core/RangeSet.swift

stdlib/public/core/RangeSetRanges.swift

test/api-digester/Outputs/stability-stdlib-source-x86_64.swift.expected

jmschonfeld · 2023-12-02T22:33:24Z

@swift-ci please smoke test

jmschonfeld · 2023-12-05T17:01:32Z

@swift-ci please smoke test

jmschonfeld · 2023-12-05T19:36:36Z

@swift-ci please smoke test

jmschonfeld · 2023-12-05T19:36:48Z

@swift-ci please build toolchain macOS Platform

jmschonfeld · 2023-12-05T21:01:08Z

@swift-ci please smoke test Linux platform

lorentey

I did another review pass -- this still looks good. I added some new notes, incl. some API-level observations.

Because we aren't entirely sure if we'll be able to keep things opaque forever, I recommend exposing ABI entry points for at least the most important high-level helper functions. (If we do end up wishing to make more things @inlinable later, not having the symbols exposed will otherwise be extremely painful.)

stdlib/public/core/MutableCollection.swift

stdlib/public/core/RangeReplaceableCollection.swift

stdlib/public/core/RangeSet.swift

stdlib/public/core/RangeSetRanges.swift

stdlib/public/core/CollectionAlgorithms.swift

stdlib/public/core/DiscontiguousSlice.swift

stdlib/public/core/RangeSet.swift

jmschonfeld · 2023-12-15T20:29:21Z

@swift-ci please smoke test

jmschonfeld · 2023-12-16T00:52:19Z

@swift-ci please smoke test macOS platform

…d arrays

jmschonfeld · 2023-12-18T18:55:45Z

@swift-ci please smoke test

jmschonfeld · 2023-12-18T18:56:02Z

Rebased to pick up the new ABI checker tests and added the symbols to fix the macOS test failure

jmschonfeld · 2024-01-04T22:52:59Z

@swift-ci please smoke test

jmschonfeld · 2024-01-05T21:48:40Z

@swift-ci please smoke test

lorentey · 2024-01-08T19:19:49Z

stdlib/public/core/DiscontiguousSlice.swift

+@available(SwiftStdlib 5.11, *)
+extension DiscontiguousSlice.Index: CustomStringConvertible {
+  public var description: String {
+    "<base: \(String(reflecting: base)), rangeOffset: \(_rangeOffset)>"


Observation: this will be quite unreadable when the reflection of base produces a structured multi-line string. I'm not sure what we can practically do about that here -- it is a solvable problem, but it requires adding machinery for context-sensitive string embedding that would not be appropriate to implement in this PR.

lorentey

Looks good!

The big caveat is that this is an opaque Collection type, a big deviation from most previous stdlib additions. (The closest existing analogue is probably CollectionDifference.) I have precious little working experience with these, so I can't fully foresee what problems we'll need to prepare for. However, I think the current code does as good job as can be expected.

My second worry is about test coverage, especially of corner cases. The test-to-code ratio is pretty low -- the 586 lines of new tests are unlikely to cover all of the new functionality. (Especially as these are all concrete test cases.) Without exhaustive tests, we do not know whether this will actually work as intended.

We should probably have some combinatorial tests to validate the consistency of the protocol conformances. Collection is a very tricky protocol, and this feature is adding multiple new implementations of it, some quite subtle.

The standard checkHashable/checkCollection/checkBidirectionalCollection/checkMutableCollection/etc. routines implement some exhaustive checks to try to catch typical implementation issues. They only validate a subset of our semantic requirements, but they are great at checking things we'd not usually think to explicitly test for.

The StdlibUnittest modules also include some test collection implementations (Minimal*Collection) that are intended to exercise hidden corners in collection algorithms and lazy transformations like DiscontiguousSlice. (They implement protocol requirements with no shortcuts (sometimes in the most awkward/unhelpful manner), and they come with extra hooks to e.g. verify that expected calls do occur.)

…ang#69766) * Adds RangeSet/DiscontiguousSlice to the stdlib * Remove redundant DiscontiguousSlice.Index: Comparable conformance * Attempt to fix embedded build * Attempt to fix macOS test failures * Fix Constaints/members.swift failure on linux * Add exceptions to ABI/source checker to fix macOS tests * Fix incremental dependency test failure * Remove inlining/unfreeze implementation for future improvements * Simplify indices(where:) implementation * Address review feedback * Add test for underscored, public slice members * Address feedback on inlining, hashing, and initializing with unordered arrays * Fix ABI checker issues * Remove MutableCollection extension for DiscontiguousSlice * Make insertion return a discardable Bool * Fix ABI checker tests * Fix other ABI checker tests due to dropping MutableCollection subscript

jmschonfeld force-pushed the rangeset-revival branch from 90dadbe to 215329b Compare November 13, 2023 20:58

jmschonfeld marked this pull request as ready for review November 28, 2023 20:16

jmschonfeld requested review from a team, hborla and xedin as code owners November 28, 2023 20:16

stephentyrone reviewed Nov 28, 2023

View reviewed changes

stdlib/public/core/CollectionAlgorithms.swift Outdated Show resolved Hide resolved

stephentyrone reviewed Nov 28, 2023

View reviewed changes

stdlib/public/core/CollectionAlgorithms.swift Outdated Show resolved Hide resolved

stephentyrone reviewed Nov 28, 2023

View reviewed changes

lorentey reviewed Nov 30, 2023

View reviewed changes

jmschonfeld requested a review from lorentey December 13, 2023 17:39

lorentey reviewed Dec 15, 2023

View reviewed changes

jmschonfeld added 2 commits December 18, 2023 09:51

Adds RangeSet/DiscontiguousSlice to the stdlib

32d1372

Remove redundant DiscontiguousSlice.Index: Comparable conformance

8c49a31

jmschonfeld added 11 commits December 18, 2023 09:51

Attempt to fix embedded build

ace5bef

Attempt to fix macOS test failures

4fe0abd

Fix Constaints/members.swift failure on linux

3229314

Add exceptions to ABI/source checker to fix macOS tests

e1c3cd8

Fix incremental dependency test failure

642bf3c

Remove inlining/unfreeze implementation for future improvements

df6b4c9

Simplify indices(where:) implementation

9bf1343

Address review feedback

fbb1ce3

Add test for underscored, public slice members

c5d5b0a

Address feedback on inlining, hashing, and initializing with unordere…

e163121

…d arrays

Fix ABI checker issues

f8ae577

jmschonfeld force-pushed the rangeset-revival branch from 6675bd9 to f8ae577 Compare December 18, 2023 18:55

jmschonfeld added 3 commits January 4, 2024 14:44

Remove MutableCollection extension for DiscontiguousSlice

45cdbfe

Make insertion return a discardable Bool

53ced85

Fix ABI checker tests

ee71761

This was referenced Jan 4, 2024

Change insert(:within:) to return whether the range set was modified swiftlang/swift-se0270-range-set#4

Closed

add a precondition to prevent slice assignment out-of-bounds swiftlang/swift-se0270-range-set#2

Closed

Fix other ABI checker tests due to dropping MutableCollection subscript

28809ea

lorentey reviewed Jan 8, 2024

View reviewed changes

lorentey approved these changes Jan 8, 2024

View reviewed changes

jmschonfeld merged commit 2404013 into swiftlang:main Jan 9, 2024

jmschonfeld deleted the rangeset-revival branch January 10, 2024 16:47

benrimmington mentioned this pull request Aug 13, 2024

[stdlib] Should the RangeSet types be @frozen? #75851

Closed

Kyle-Ye mentioned this pull request Sep 29, 2025

Fix moveSubranges documentation issue #84563

Merged

[SE-0270] Add Collection Operations on Noncontiguous Elements #69766

[SE-0270] Add Collection Operations on Noncontiguous Elements #69766

Uh oh!

Conversation

jmschonfeld commented Nov 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmschonfeld commented Nov 10, 2023

Uh oh!

jmschonfeld commented Nov 13, 2023

Uh oh!

jmschonfeld commented Nov 13, 2023

Uh oh!

jmschonfeld commented Nov 13, 2023

Uh oh!

jmschonfeld commented Nov 14, 2023

Uh oh!

jmschonfeld commented Nov 14, 2023

Uh oh!

jmschonfeld commented Nov 15, 2023

Uh oh!

jmschonfeld commented Nov 27, 2023

Uh oh!

jmschonfeld commented Nov 28, 2023

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lorentey Nov 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmschonfeld commented Nov 29, 2023

Uh oh!

lorentey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lorentey Nov 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmschonfeld Nov 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jmschonfeld commented Dec 2, 2023

Uh oh!

jmschonfeld commented Dec 5, 2023

Uh oh!

jmschonfeld commented Dec 5, 2023

Uh oh!

jmschonfeld commented Dec 5, 2023

Uh oh!

jmschonfeld commented Dec 5, 2023

Uh oh!

lorentey left a comment

jmschonfeld commented Nov 9, 2023 •

edited

Loading

lorentey Nov 29, 2023 •

edited

Loading

lorentey Nov 29, 2023 •

edited

Loading

jmschonfeld Nov 30, 2023 •

edited

Loading