From f468a3f46c067f3a865f7167dcaef7037d5253c7 Mon Sep 17 00:00:00 2001 From: Nate Cook Date: Tue, 26 Apr 2022 07:35:12 -0500 Subject: [PATCH 1/2] Add a section describing 'find empty' behavior --- .../Evolution/StringProcessingAlgorithms.md | 41 ++++++++++++++++++- 1 file changed, 40 insertions(+), 1 deletion(-) diff --git a/Documentation/Evolution/StringProcessingAlgorithms.md b/Documentation/Evolution/StringProcessingAlgorithms.md index 001ce1fec..5d49eaa8c 100644 --- a/Documentation/Evolution/StringProcessingAlgorithms.md +++ b/Documentation/Evolution/StringProcessingAlgorithms.md @@ -1028,11 +1028,50 @@ extension RangeReplaceableCollection where Element: Equatable { [SE-0346]: https://github.com/apple/swift-evolution/blob/main/proposals/0346-light-weight-same-type-syntax.md [stdlib-pitch]: https://forums.swift.org/t/pitch-primary-associated-types-in-the-standard-library/56426 +#### Searching for empty strings and matches + +Empty matches and inputs are an important edge case for several of the algorithms proposed above. For example, what is the result of `"123.firstRange(of: /[a-z]*/)`? How do you split a collection separated by an empty collection, as in `"1234".split(separator: "")`? For the Swift standard library, this is a new consideration, as current algorithms are `Element`-based and cannot be passed an empty input. + +Languages and libraries are nearly unanimous about finding the location of an empty string, with Ruby, Python, C#, Java, Javascript, etc, finding an empty string at each index in the target. Notably, Foundation's `NSString.range(of:)` does _not_ find an empty string at all. + +The methods proposed here follow the consensus behavior, which makes sense if you think of `a.firstRange(of: b)` as returning the first subrange `r` where `a[r] == b`. If a regex can match an empty substring, like `/[a-z]*/`, the behavior is the same. + +```swift +let hello = "Hello" +let emptyRange = hello.firstRange(of: "") +// emptyRange is equivalent to '0..<0' (integer ranges shown for readability) +``` + +Because searching again at the same index would yield that same empty string, we advance one position after finding an empty string or matching an empty pattern when finding all ranges. This yields the position of every valid index in the string. + +```swift +let allRanges = hello.ranges(of: "") +// allRanges is equivalent to '[0..<0, 1..<1, 2..<2, 3..<3, 4..<4, 5..<5]' +``` + +Splitting with an empty separator (or a pattern that matches empty string), uses this same behavior, resulting in a collection of single-element substrings. Interestingly, a couple languages make different choices here. C# returns the original string instead of its parts, and Python rejects an empty separator (though it permits regexes that match empty strings). + +```swift +let parts = hello.split(separator: "") +// parts == ["h", "e", "l", "l", "o"] + +let moreParts = hello.split(separator: "", omittingEmptySubsequences: false) +// parts == ["", "h", "e", "l", "l", "o", ""] +``` + +Finally, searching for an empty string within an empty string yield, as you might imagine, the empty string: + +```swift +let empty = "" +let range = empty.firstRange(of: empty) +// empty == empty[range] +``` + ## Alternatives considered ### Extend `Sequence` instead of `Collection` -Most of the proposed algorithms are necessarily on `Collection` due to the use of indices or mutation. `Sequence` does not support multi-pass iteration, so even `trimPrefix` would problematic on `Sequence` because it needs to look 1 `Element` ahead to know when to stop trimming. +Most of the proposed algorithms are necessarily on `Collection` due to the use of indices or mutation. `Sequence` does not support multi-pass iteration, so even `trimmingPrefix` would problematic on `Sequence` because it needs to look one `Element` ahead to know when to stop trimming and would need to return a wrapper for the in-progress iterator instead of a subsequence. ### Cross-proposal API naming consistency From 212cd743bb51cf5afc18e24f649124952f40f18a Mon Sep 17 00:00:00 2001 From: Nate Cook Date: Tue, 26 Apr 2022 08:29:51 -0500 Subject: [PATCH 2/2] Update Documentation/Evolution/StringProcessingAlgorithms.md Co-authored-by: Michael Ilseman --- Documentation/Evolution/StringProcessingAlgorithms.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/Evolution/StringProcessingAlgorithms.md b/Documentation/Evolution/StringProcessingAlgorithms.md index 5d49eaa8c..365fc4c87 100644 --- a/Documentation/Evolution/StringProcessingAlgorithms.md +++ b/Documentation/Evolution/StringProcessingAlgorithms.md @@ -1059,7 +1059,7 @@ let moreParts = hello.split(separator: "", omittingEmptySubsequences: false) // parts == ["", "h", "e", "l", "l", "o", ""] ``` -Finally, searching for an empty string within an empty string yield, as you might imagine, the empty string: +Finally, searching for an empty string within an empty string yields, as you might imagine, the empty string: ```swift let empty = ""