-
Notifications
You must be signed in to change notification settings - Fork 10.5k
DataProtocol and new inline Data #20225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataProtocol and new inline Data #20225
Conversation
// | ||
//===----------------------------------------------------------------------===// | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@airspeedswift this is what we are expecting for ContiguousCollection
. Let me know if we need to adjust this. I presume our work will just be predicated on approval of your proposals for this. I just wanted to get this churning on CI first and iron out any performance issues before hand.
stdlib/public/core/String.swift
Outdated
return try body(UnsafeRawBufferPointer(start: ptr, count: len)) | ||
} | ||
} | ||
extension UnsafeBufferPointer: _HasContiguousBytes {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@milseman I presume the _HasContiguousBytes needs to remain for ABI stability. But the functionality was moved to ContiguousCollection
. Do you foresee any issue with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(We discussed this a bit yesterday — the primary use here for _HasContiguousBytes
is as an existential, so, ContiguousCollection
can't be used directly. However, it might be possible to parent ContiguousCollection
on something like this which exposes the functionality anyway.)
stdlib/public/core/Sequence.swift
Outdated
@@ -548,6 +548,10 @@ public protocol Sequence { | |||
__consuming func _copyContents( | |||
initializing ptr: UnsafeMutableBufferPointer<Element> | |||
) -> (Iterator,UnsafeMutableBufferPointer<Element>.Index) | |||
|
|||
func withUnsafeBytesIfSupported<T>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@airspeedswift Again this is our presumption of spelling for the Sequence
methods
@@ -142,7 +163,7 @@ func sampleData(_ type: SampleKind) -> Data { | |||
} | |||
|
|||
func benchmark_AccessBytes(_ N: Int, _ data: Data) { | |||
for _ in 1...10000*N { | |||
for _ in 0..<10000*N { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eeckstein another surprising performance hit, I noticed some really heavy traffic on ClosedRange<Int>.count
(it seems O(N)?!) Changing to Range<Int>
reduced the benchmark chatter.
return CollectionOfOne(self) | ||
} | ||
|
||
public var _borrowedBytes: (buffer: UnsafeRawBufferPointer, owner: Any)? { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@airspeedswift this is a potential optimization point we would really like to have but is currently completely removable.
} | ||
|
||
public var _borrowedBytes: (buffer: UnsafeRawBufferPointer, owner: Any)? { | ||
return withUnsafeBytes { (buffer) -> (buffer: UnsafeRawBufferPointer, owner: Any) in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This, of course, pending approval from the stdlib for pulling tricks like this. The alternative is to move _borrowedBytes
down into Sequence
or something, which would really give us what we want: a way to get partial ownership of the underlying buffer without copying.
|
||
|
||
extension Data : MutableDataProtocol, MutableContiguousCollection { | ||
// -- ContiguousCollection ------------------------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a style thing I think we can likely get rid of dividers like these from my initial revision
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok will do
public protocol DataProtocol : RandomAccessCollection where Element == UInt8, SubSequence : DataProtocol { | ||
// -- Core Requirements ---------------------------------------------------- | ||
// FIXME: Remove in favor of opaque type on `regions`. | ||
associatedtype RegionSequence: Sequence where RegionSequence.Element : DataProtocol & ContiguousCollection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty big FIXME that we're going to need to address. We might want to file a Radar to track this separately internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Filed 45739467)
func copyBytes<DestinationType, R: RangeExpression>(to: UnsafeMutableBufferPointer<DestinationType>, from: R) -> Int where R.Bound == Index | ||
|
||
|
||
var _borrowedBytes: (buffer: UnsafeRawBufferPointer, owner: Any)? { get } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we end up with this here, we should add a reminder to actually document this and what it does.
@@ -932,6 +932,8 @@ extension ContiguousArray: RangeReplaceableCollection { | |||
} | |||
} | |||
|
|||
extension ContiguousArray : ContiguousCollection {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should EmptyCollection
be ContiguousCollection
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It likely can't hurt, especially if we're doing this for CollectionOfOne
.
} | ||
} | ||
|
||
extension CollectionOfOne : DataProtocol where Element == UInt8 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should EmptyCollection where Element == UInt8
be DataProtocol
?
@swift-ci Please test |
@swift-ci Please benchmark |
Build comment file:Build failed before running benchmark. |
Build failed |
@swift-ci Please test |
@swift-ci Please benchmark |
Build failed |
Build failed |
Build comment file:Build failed before running benchmark. |
@swift-ci Please test |
Build failed |
Build failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, I'm just focussing on whether the new API's are type safe.
let concreteRange = range.relative(to: self) | ||
let slice = self[concreteRange] | ||
|
||
return ptr.withMemoryRebound(to: UInt8.self) { buffer -> Int in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, binding or rebinding the memory type is a huge red flag--I
only expect to see it a workaround for some legacy API. In this case,
you want to view a typed buffer as a sequence of bytes, so
ptr.withUnsafeBytes()
is what you want. Why won't that work?
|
||
@discardableResult | ||
public func copyBytes<R: RangeExpression>(to ptr: UnsafeMutableRawBufferPointer, from range: R) -> Int where R.Bound == Index { | ||
let bound = ptr.bindMemory(to: UInt8.self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't figure out why you want to bind memory here, but I can assure
you that it leads to undefined behavior.
_ body: (UnsafeBufferPointer<UInt8>) throws -> R | ||
) rethrows -> R { | ||
% if mutable: | ||
return try body(UnsafeBufferPointer<UInt8>(self.bindMemory(to: UInt8.self))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes! This API is outright incompatible with Swift's type safety. If it was legal to do this, then we would have no reason for UnsafeRawPointer
.
@atrick Please ignore my comments above — I realize where the deficiency in my mental model is. I must have previously missed the following paragraphs on typed memory from https://developer.apple.com/documentation/swift/unsaferawpointer:
Specifically, the fact that binding is not temporary is our issue here. I'll push out fixes tomorrow, but we might have an issue with conforming some pointer types to |
Build comment file:Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -Onone
Code size: -swiftlibs
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
Looks like CI benchmarks are showing the same conditional conformance devirtualization issues we've been seeing locally. Good to have a baseline for Andy's fixes |
@atrick Thanks for all of your hard work on this stuff! Looks like we're hitting a lot of unfortunate edge cases. |
@swift-ci Please test macOS |
Build failed |
^That's an lldb test failure, expecting certain lldb output when breakpoints are set. Likely due to the fact that Data's layout has changed. Unfortunately, running those same commands segfaults lldb on my machine so I can't even tell what the output should be. @jrose-apple Who might we be able to turn to to ask about this? Running through what lldb is doing in |
Looks like we'll need to make Not sure how possible this will be given that we no longer have a stored property for the length, and I'm not sure we can switch on Swift enums from C++. |
@dcci, who owns the summary providers these days? And how is this supposed to work across OS releases? (which I guess won't be a problem after Foundation lands this work, but…) |
For now the plan is to take out the data provider here and XFAIL the test if necessary so we're unblocked; we can figure out how to add this back in later (if at all) considering that the layout of |
8dd8dd9
to
d030354
Compare
@swift-ci please smoke test |
Please test with following pull request: @swift-ci Please test |
Build failed |
@swift-ci Please benchmark |
Build comment file:Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -Onone
Code size: -swiftlibs
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
(Now that tests have passed on this and apple/swift-lldb#1134, I plan on merging them in parallel once we have approval.) |
//===--- Collection Conformances ------------------------------------------===// | ||
|
||
// FIXME: When possible, expand conformance to `where Element : Trivial`. | ||
extension Array : ContiguousBytes where Element == UInt8 { } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given this will become ABI and can't be loosened later, are you sure you want to add this? You can use withContiguousStorageIfAvailable
instead in this case, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I don't mind adding it, just thinking you might want to reserve flexibility for yourself)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot use withContiguousStorageIfAvailable because this is being used as a protocol to identify it as a contiguous DataProtocol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@airspeedswift Is this because the conditional conformance produces a symbol name based on the associated type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@itaiferber Right. And you can only have one conditional conformance. I can envision ABI-stable ways to loosen conditional conformances in the future, but we don't have any right now. cc @DougGregor
@phausler but you have an init from a general sequence of UInt8, and so can have Array follow that path then fast-path into withContiguousStorageIfAvailable, no? Might amount to much the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@airspeedswift Sure, the conditional conformance limitation I'm aware of. In the future, we're almost certainly going to need an ABI-stable way to loosen these anyway; even if not, we're willing to live with this for now.
We're also not using this just in init<S : Sequence>
— it's important for general consumers of DataProtocol
who rely on ContiguousBytes
for the associatedtype
requirement.
@itaiferber Should the title and initial comment be updated? ( |
|
@benrimmington I've updated the PR title and comment. |
This is an implementation of DataProtocol and a new smaller and leaner struct Data. The major change here is that we now promise that all Data's are contiguous. In order to enforce this requisite we added a new protocol to define contiguous collections (aptly named ContiguousCollection). This protocol is a draft of the work in progress proposed API for the standard library. Additionally it offers a direct byte accessor if safe on sequence to unify the sequence initialization to allow for fast paths.
Data itself now is represented in 16 bytes(on 64 bit platforms). This allows it to be stored in existentials without needing to heap allocate. Additionally during this refactor to shrink the representation we were able to make similar wins to the small string optimization so that we can store Data's on the stack when under a platform limit. This makes small Data structures never have to heap allocate. For example:
Data()
has zero heap allocations andData([0xd, 0xa])
wont heap allocate for the storage of the data either since small representations can be stored completely on the stack efficiently. Larger data or slices however will still have a reference type storing the backing allocation as they did before. Our hope is that this will reduce the traffic on malloc which will reduce memory fragmentation and be considerably more efficient for small buffers.