You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I had a need for a data structure recently and couldn't find any good package implementations. I think it might be a nice addition to this package.
Essentially, the problem was that I needed to generate some Unicode data tables. This required a structure that allowed me to model the entire 21-bit space of Unicode Code Points, and essentially "paint" regions with data I parsed from the Unicode files. The second part was that I had to optimise the data - for example, of the dozen or so Bidi_Class values, I only needed data to group scalars in to 4 categories. Certain code points would be disallowed by earlier stages of the process and could be skipped, and I wanted to combine data sets from different files in order to reduce lookups. At the same time, Unicode scalars are sparse; most of that 21-bit space is unassigned.
Anyway, I had a brief look at the various tree candidates and nothing really suited me. I don't really care so much about lookup time, since the emphasis was on creating something that could support flexible editing operations. I eventually made a simple sort of number-line, which, well, paints regions of a comparable space with a value:
varline=SegmentedLine<Int,String?>(bounds:0..<100, value:nil)// After setting values <5 to "small" and values >10 to "large",// the gap is left with its previous value, "medium".
line.set(0..<20, to:"medium")
line.set(0..<5, to:"small")
line.set(10..<60, to:"large")print(line)// | [0..<5]: "small" | [5..<10]: "medium" | [10..<60]: "large" | [60..<100]: nil |
letstring="Bob is feeling great"// Create a SegmentedLine for the collection's contents.// Start by setting a font attribute over the entire string.vartags=SegmentedLine(
bounds: string.startIndex..<string.endIndex,
value:[Font.custom("Comic Sans")]as[Any])// Set each word to a different color.// Use 'modify' to append the attribute, but only for the region// we're modifying.
for word:Substring in string.split(separator:""){
tags.modify(word.startIndex..<word.endIndex){ attributes in
attributes.append(Color.random())}}// Check the result.// - ✅ Every segment still contains the font attribute.// - ✅ Each word also contains its individual color attribute.
for (range, attributes) in tags.segments {print(#""\#(string[range])""#,"-", attributes)}// "Bob" [Font.custom("Comic Sans"), Color.orange]// " " [Font.custom("Comic Sans")]// "is" [Font.custom("Comic Sans"), Color.green]// " " [Font.custom("Comic Sans")]// "feeling" [Font.custom("Comic Sans"), Color.pink]// " " [Font.custom("Comic Sans")]// "great" [Font.custom("Comic Sans"), Color.yellow]
// ℹ️ Imagine we have a complex SegmentedLine with lots of small segments// capturing granular details, and we'd like to simplify it.enumComplexData{case categoryA, categoryB, categoryC // ...}letcomplexLine:SegmentedLine<Int,ComplexData>=// ...print(complexLine)// | [0..<2]: categoryA | [2..<4]: categoryB | [4..<12]: categoryC | ...// 1️⃣ Perhaps we can map these to a smaller number of states.enumSimplifiedData{case valid, invalid
}varsimplifiedLine= complexLine.mapValues{ complex inSimplifiedData(validating: complex)}print(simplifiedLine)// | [0..<2]: valid | [2..<4]: valid | [4..<12]: valid | ...// 2️⃣ Notice that we have lots of segments for boundaries which// which are no longer important. 'combineSegments' can clean them up.
simplifiedLine.combineSegments()print(simplifiedLine)// | [0..<2000]: valid | [2000..<2024]: invalid | [2024..<2056]: valid | ...
It has been super helpful to have this thing. In particular, to do something like reduce the Bidi_Class values, there are 2 dead-simple operations to map the elements, and then to perform a kind of restartable left-fold to gather elements in to larger regions. Like this. I actually generate an indexed static data table straight from the SegmentedLine. It's pretty nifty, I think.
It's nothing particularly groundbreaking, but it has been surprisingly effective and, as I said, I couldn't find any good package solution to solve this. I think it's an interesting design space to think of operations that this kind of structure could make easier.
The text was updated successfully, but these errors were encountered:
@lorentey I added a modify operation and some better examples to the original post, along with a link to the finished implementation (now called SegmentedLine). There are tests in the package, too, of course.
I had a need for a data structure recently and couldn't find any good package implementations. I think it might be a nice addition to this package.
Essentially, the problem was that I needed to generate some Unicode data tables. This required a structure that allowed me to model the entire 21-bit space of Unicode Code Points, and essentially "paint" regions with data I parsed from the Unicode files. The second part was that I had to optimise the data - for example, of the dozen or so
Bidi_Class
values, I only needed data to group scalars in to 4 categories. Certain code points would be disallowed by earlier stages of the process and could be skipped, and I wanted to combine data sets from different files in order to reduce lookups. At the same time, Unicode scalars are sparse; most of that 21-bit space is unassigned.Anyway, I had a brief look at the various tree candidates and nothing really suited me. I don't really care so much about lookup time, since the emphasis was on creating something that could support flexible editing operations. I eventually made a simple sort of number-line, which, well, paints regions of a comparable space with a value:
Implementation:
https://github.com/karwa/swift-url/blob/df08ccec114350c4bb845d28c5d5850c20521cca/Sources/UnicodeDataStructures/Shared/SegmentedLine.swift
It has been super helpful to have this thing. In particular, to do something like reduce the
Bidi_Class
values, there are 2 dead-simple operations to map the elements, and then to perform a kind of restartable left-fold to gather elements in to larger regions. Like this. I actually generate an indexed static data table straight from the SegmentedLine. It's pretty nifty, I think.It's nothing particularly groundbreaking, but it has been surprisingly effective and, as I said, I couldn't find any good package solution to solve this. I think it's an interesting design space to think of operations that this kind of structure could make easier.
The text was updated successfully, but these errors were encountered: