[SR-3342] Investigate Different Bridged Collection Allocation Layouts #45930
Labels
good first issue
Good for newcomers
improvement
performance
standard library
Area: Standard library umbrella
Additional Detail from JIRA
md5: 6a7560206774e41295068639d7aa9105
Issue Description:
(Note: all of the arguments found below also apply to Dictionary and Set; they have the same design.)
The current design of an Array bridged to NSArray involves three potential allocations in the non-verbatim bridged case (e.g. `Array<Int> as NSArray`):
A: The Native Array storage (includes count/capacity metadata)
B: The Bridged Array storage
C: _SwiftDeferredNSArray, a class that just stores a pointer to A and B
Today when we bridge an Array we end up with this layout:
design 1:
On first access of C as an NSArray, B is allocated and populated with the bridged contents of A. The idea behind this design is that sometimes some API may request an NSArray be constructed and then never access it. In this case, we can save a lot of work by deferring the construction of B.
That said: can we eliminate one of these allocations? I think we can all agree A and B should be separate to avoid massive space wasteage on never-bridged arrays. The question is if we should fold C into A or B, producing:
design 2:
Or
design 3:
The standard library team concluded design 2 is incredibly dubious, as it has many significant disadvantages:
Requires Array operations to get bogged down in atomically invalidating B.
Leaks of B, as it lays hidden in every native Array regardless of how many "bridged" Arrays there are left.
Exposes A to objc_setAssociatedObject, which means the compiler can't optimize deinits of Arrays in the same way (they can have arbitrary hidden side-effects).
Regardless, this design has the advantage that it makes Array verbatim bridgeable. So bridging an Array of Arrays is a no-op. It's not toll-free, because B must still be constructed and populated on first access.
Design 3, however, is legitimately interesting. It avoids an extra allocation at the cost of always making the (much larger) B allocation. The initialization of B can still be deferred by storing an atomic flag to run a CAS loop on (currently the pointer to B is used as this flag). Basically this design could be worth it if it turns out most bridged Arrays are actually used, making deferred allocation a waste.
I suspect this isn't true, and Design 1 is actually the best one. A Dictionary containing Arrays is a good example of something that produces lots of bridged arrays that will probably never be accessed.
As such the Swift team has no intent to work on this. But this is a great issue for a Swift community member to look into! I'm happy to mentor anyone who wants to investigate.
The text was updated successfully, but these errors were encountered: