-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snapshots on Apple Silicon devices #424
Comments
One possible (bad) solution could be to allow for snapshot variants based on architecture. Unfortunately, this would mean recording multiple times which would almost certainly become a nightmare. |
this and a few other open issues seem to be duplicates of #313 but knowing of more scenarios leading to the same error is great for prioritizing. I've also recently upgraded to an M1 and can confirm that like in the original issue, this doesn't happen on iOS 12, and on iOS 13+ the difference image has pixels off most usually by 1, which isn't visible in the image itself (looks plain black) but post-filtering with a simple Threshold tool will reveal those pixels edit: our team has a mix of Intel and Apple Silicon so the "near future" is right now 😅 |
I have implemented a custom assertSnapshot in the referenced PR above for anyone needing to work in Silicon/Intel mixed teams. You can find it here on line 22. While I don't think it's a good solution, I don't know what a better alternative would be. |
In my case snapshots began rendering identically under Apple Silicon after I added 'Host Application' for unit test targets. I had to use Host App because Xcode 12.4 running under Rosetta can't launch unit tests without it. Before that I considered adding precision:0.99 to asserts, but it's also not a good solution. In my case there were differences in rounded corners clipping |
Hey @voidless. Interesting to know about the "Host Application" fixing the issue. Unfortunately, the "Host Application" cannot be always used, so I'm afraid that's not an option for a lot of us. |
Can you give an example? We use host apps generated by cocoapods, they only have almost empty appdelegate.m, main.storyboard files and info.plist and are linked with the corresponding framework. |
I guess that could work, but I wonder if there's an easier way not to add all that noisy and cumbersome infrastructure (even if Cocoapods automates part of that). Also, AFAIK, you cannot have a host app for Swift Packages... |
We have this issue only when rendering shadows. We were having this issue on a view that has a drop shadow. After removing the drop shadow from the view here, it works the same for both M1 and Intel. Still could not really understand why nor how to fix it yet 😞 UPDATE: |
We also have a team with a mix of Intel and M1, plus our CI is now Intel but it is bound to be M1 someday. We were already using Host Application, we tried different iOS versions, tried with and without Rosetta both on Xcode.app and Simulator.app without any luck. Having different snapshots for each architecture would not work for us, as people with an M1 can't generate snapshots for Intel and viceversa, and that would be bound to fail in CI. We opted to temporarily (until we are all using arm64) lower the |
We also have the same problem when using shadowOpacity. tests are failing, we also changed the precious but not working |
I submitted a pull request adding a fuzzy compare implemented in objective-c which is 30 times faster than the vanilla matcher under worst-case conditions (all pixels compared.) This matcher will report failure if any pixel component absolute difference exceeds a user provided maximum -OR- if the average pixel component absolute difference exceeds a user provided maximum. Only CGImage, NSImage, and UIImage are supported but I will be adding fuzzy compare for other matchers soon. |
I'm not 100% sure, but in our case we get differences in rendering some custom icon even when using host apps. |
I wonder if any kind of solution that employs imprecise matching would result in loads of "modified" snapshots generated every time they're re-recorded on a "mismatching" platform: as far as I understand, there's currently no way to avoid recording a new variant of a snapshot if the new version of the snapshot still fuzzy matches the original: To re-record automatically, we delete the snapshots and run the tests - there's no chance in this case that fuzzy matching would prevent recording of new versions... OTOH, when "re-recording" manually, I can pass Overall, how do you people handle recording of imprecisely matching snapshots in general? |
Any update on this? Do you have a branch you can refer to for this work? |
Is there a chance that this can be addressed somehow? @stephencelis |
I've found that forcing the snapshots to be taken in sRGB seems to work. I do this in the device descriptions: public static func iPhoneSe(_ orientation: ViewImageConfig.Orientation)
-> UITraitCollection {
let base: [UITraitCollection] = [
.init(displayGamut: .SRGB),
.init(forceTouchCapability: .available),
.init(layoutDirection: .leftToRight),
.init(preferredContentSizeCategory: .medium),
.init(userInterfaceIdiom: .phone)
]
...
public static func iPhone8Plus(_ orientation: ViewImageConfig.Orientation)
-> UITraitCollection {
let base: [UITraitCollection] = [
.init(displayGamut: .SRGB),
.init(forceTouchCapability: .available),
.init(layoutDirection: .leftToRight),
.init(preferredContentSizeCategory: .medium),
.init(userInterfaceIdiom: .phone)
]
... Has anyone else tried this? May not be ideal for the final fix, but might be a clue. |
I tried setting just the |
We tried to use less extension CALayer {
static func swizzleShadow() {
swizzle(original: #selector(getter: shadowOpacity), modified: #selector(_swizzled_shadowOpacity))
swizzle(original: #selector(getter: shadowRadius), modified: #selector(_swizzled_shadowRadius))
swizzle(original: #selector(getter: shadowColor), modified: #selector(_swizzled_shadowColor))
swizzle(original: #selector(getter: shadowOffset), modified: #selector(_swizzled_shadowOffset))
swizzle(original: #selector(getter: shadowPath), modified: #selector(_swizzled_shadowPath))
}
private static func swizzle(original: Selector, modified: Selector) {
let originalMethod = class_getInstanceMethod(self, original)!
let swizzledMethod = class_getInstanceMethod(self, modified)!
method_exchangeImplementations(originalMethod, swizzledMethod)
}
@objc func _swizzled_shadowOpacity() -> Float { .zero }
@objc func _swizzled_shadowRadius() -> CGFloat { .zero }
@objc func _swizzled_shadowColor() -> CGColor? { nil }
@objc func _swizzled_shadowOffset() -> CGSize { .zero }
@objc func _swizzled_shadowPath() -> CGPath? { nil }
} Important: as we work in a framework, we had to create a class that acts as the final class MyPrincipalClass: NSObject {
override init() {
CALayer.swizzleShadow()
}
} Note that after declaring the principal class, in order for it to work, you should add it to the <?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
...
...
<key>NSPrincipalClass</key>
<string>NameOfTheTestBundle.MyPrincipalClass</string>
</dict>
</plist> |
I've found a fix for this. It's not ideal, but it seemingly works for the moment: Run both Xcode and the Simulator in Rosetta. This has been sufficient for our >1,000 snapshot tests at GitHub, across app targets and frameworks/modules. |
I tried this and some snapshots still fail for me. :( |
Yep, no luck for us either. It would be nice with some recognition about this issue from the repo owners. |
My company created a script that downloads any failed snapshots from a CI build to your local working copy and places them in the correct directories. This kind of script is very useful for many reasons, but in this scenario you could have a workflow where:
This way you only have one set of snapshots all with Apple Silicon as the source of truth. As more developers move their machines to arm64 they can record locally, but I've found even in that scenario its still useful to have the option to download from CI. |
I can only recommend taking a look at this, that's a fuzzy comparator by @JWStaiert that properly accounts image nature of the snapshots: https://github.com/JWStaiert/SnapshotTestingEx. (See #424 (comment) for the details). As far as I can recall, the thing is that, by default, the precision is meant to account the number of different "pixels", not the difference in the "pixel values". The fuzzy comparator changes that for the better. We're successfully using it together with the patch (see #424 (comment)) that provides a solution for avoiding overwriting the snapshots if they're fuzzy matching. |
Gonna describe my current situation, and what I am facing, no idea if someone has said some of these that I am going to mention sorry for repetition if is the case, I have been reading a lot of stuff lately on how to fix it. Current scenario: in the project I work for, CI Intel github actions, the iOS team is changing to M1 and I was the first one getting it (being the last one would saved me many headackes 😄), all the tests recorded are recorded with Intel machines, ~10% failing in the M1, and the new tests recorded with M1, fails in CI. What I tried:
Looks like mainly tests with shadows Tried out the modify the Hope it helps someone, cheers! |
We're also having this problem as we start transitioning developer devices to Apple Silicon, while some of us are still going to be on Intel for a while longer. There are really two issues at play here:
Any solutions involving standardising where these things happen is difficult to achieve - some developers will be on Intel by choice, some are on M1. Our CI currently only supports Intel and we have no ETA on Apple Silicon support. That leaves us with needing to be able to verify snapshots accurately on Intel machines. Of course, if snapshots need to always be verifiable on Intel, that means we cannot generate snapshots on Apple Silicon machines, which makes life very difficult for developers on M1s as they cannot generate new or updated snapshots unless we can find some kind of automated way (like a Github Action) that re-generates new and updated snapshots on an Intel executor and re-commits them. This is currently an option we're considering but I'd prefer to avoid this. I think the best solution to this problem is a revised image diffing strategy that can account for minor differences between the two architectures. We have tried adjusting the precision with mixed results - sometimes it makes a test pass, sometimes it does not. Additionally, it seems that changing the precision has a serious impact on performance - I've seen a fairly big test with 10 snapshots take 10x longer by changing the precision from 1.0 to 0.99 - it took over 40 seconds! I'd like to try the SnapshotTestingEx solution above but unfortunately it looks like this needs a patched version of SnapshotTesting - it would be great to get some movement on this. I'm curious if @stephencelis or @mbrandonw have run into this issue in their own apps and if they've found a solution. |
We've tried changing the testname so you record different images based on your devices architecture. This code can be placed in a wrapper for the snapshot testing so it's always applied or applied based on an var testID = identifier
Architecture.isArm64 || Architecture.isRosettaEmulated {
testID.append("-arm64")
}
assertSnapshot(matching: view, testName: testID) import Foundation
@objc(PSTArchitecture) public class Architecture: NSObject {
/// Check if process runs under Rosetta 2.
/// https://developer.apple.com/forums/thread/652667?answerId=618217022&page=1#622923022
/// https://developer.apple.com/documentation/apple-silicon/about-the-rosetta-translation-environment
@objc public class var isRosettaEmulated: Bool {
// Issue is specific to Simulator, not real devices
#if targetEnvironment(simulator)
return processIsTranslated() == EMULATED_EXECUTION
#else
return false
#endif
}
/// Returns true on M1 macs and false on intel machines or M1 macs using Rosetta
public static var isArm64: Bool {
#if arch(arm64)
return true
#else
return false
#endif
}
}
fileprivate let NATIVE_EXECUTION = Int32(0)
fileprivate let EMULATED_EXECUTION = Int32(1)
fileprivate let UNKNOWN_EXECUTION = -Int32(1)
private func processIsTranslated() -> Int32 {
let key = "sysctl.proc_translated"
var ret = Int32(0)
var size: Int = 0
sysctlbyname(key, nil, &size, nil, 0)
let result = sysctlbyname(key, &ret, &size, nil, 0)
if result == -1 {
if errno == ENOENT {
return 0
}
return -1
}
return ret
} |
For me the solution was to exclude ARM64 architecture for simulator by using |
@lukeredpath I combined the subpixel thresholding into the PR 👍 |
We have basically used @pimms fork with great success, it is still a wonder to me why it is not merged yet, because it does get the job done, without a performance penalty. |
One alternative solution, maybe useful for somebody. The best way to work around this issue for me was to avoid supporting both M1 and intel, and instead perform the snapshot testing in CI using exclusively M1 machines |
@acecilia This does indeed would solve the issue. However, it is not feasible for the time being, because our CI/CD is running on Github, which is Intel-Based for the foreseeable future. We have switched to a self-hosted runner for performance reason, but AWS is not giving access to M1-Based Mac-Minis to the general public yet, we're still stuck with a Intel-based machine for now. |
@choulepoka I see. Yes, in my case the machines are self hosted, so I could just replace intel with M1 machines without having to wait for third party support |
@choulepoka @acecilia self-hosted runners are not supported on M1 hardware yet. |
It's easy to get out of rosetta and run the builds natively by just doing |
Any idea why this is happening? I.e. why does the same simulator produce different snaps based on whatever the current hardware is? Is it due to differences in the color space actively selected? I.e. if all Macs have their display set to Generic RGB Profile, does this help? If not, doesn't this represent a bug in iOS simulator that Apple should fix? I cannot understand why the simulation would differ on this detail based on the host hardware. |
It sounds like all bets are off when rendering across architectures as they can theoretically go through entirely different rendering pipelines. |
There now is a prerelease of the runner software with Apple Silicon support: https://github.com/actions/runner/releases/tag/v2.292.0 |
I was unable to find it. Can you provide us with a pointer? |
was this it? - #481 |
#490
490 makes some changes to the API, possibly not optimally, to allow external extensions to SnapshotTesting, and updates the docs to point to my SnapshotTesingEx package where the matching routines are implemented.
… On Jun 15, 2022, at 6:31 AM, Mark Turner ***@***.***> wrote:
I was unable to find it. Can you provide us with a pointer?
was this it? - #481
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.
|
Testing with a empty screen with navigation bar and bottom bar with some icons, the image diffs between our M1 Macs and our Intel ones are on the SFSymbols buttons we use on the botton bar about 2~3 pixels on each one. Setting the I'll be following this discussion to see if we can figure out a better solution |
#628 should resolve this issue with a new perceptual difference calculation. The small differences in Intel/Apple Silicon hardware accelerated anti-aliasing/shadows/blur rendering are under the 2% DeltaE value which is nearly imperceivable by the human eye. Using a |
Note even with perceptual precision, if you re-record snapshots that were originally recorded on an Intel device on an Apple Silicon device, you will often get updated snapshots (with imperceptible differences) in your git diff, which is not ideal because then it's not clear whether the differences are due to actual changes or just the different chips. Is there any fix for that? |
@simondelphia isn't this true of any test that uses precision? I'd think failing tests should be your guide on whether or not to re-record, not git diffs. No? |
When you have Currently whenever someone makes a PR with snapshots and they're on Intel, we've been re-recording them on Apple Silicon to ensure a consistent standard (because most of our contributors are using Apple Silicon). And it's just cleaner to enforce a standard because of this issue. Even though the CI uses Intel devices, it's not a problem there because of the precision parameters and |
Also sometimes we get snapshot tests that pass even though they shouldn't, which I suppose is because the perceptual precision is below 1 (necessary for the CI), so ultimately we have to re-record frequently. |
Did you see a noticeable drop-off in performance (test suite taking significantly longer) after applying this workaround Ari? |
No, not at all when we measured it! To be honest, I think in some cases it might improve the performance because there's almost no overhead in rendering the shadows. On the other hand, I don't think any increase/drop-off in performance should be noticeable. Edit: tested with simulator, not in real devices/device farm. |
It seems that Apple Silicon machines generate snapshots that are slightly different from the ones generated on x86 machines. The problem is very similar to the one where the very same simulator generates different images depending on the OS version.
The issues persist when running the simulator using Rosetta.
Taking into account that it’ll be quite common in the near future to have a team of developers with Intel and Apple Silicon architectures… What’s the best way to handle it?
Thanks
The text was updated successfully, but these errors were encountered: