ARKit is quite good at tracking images, but it struggles to disambiguate similar compositions. Core ML can help fill in the gaps.
ARKit is a powerful tool that allows developers to create Augmented Reality apps. It comes loaded with image detection and tracking functionality, which allows apps to "anchor" virtual content contextually on to real-world surfaces.
For the best experience, image detection should be robust across lighting conditions, orientation, and other printing/reproduction irregularities. ARKit prioritizes this stronger, uninterrupted tracking experience over fine disambiguation between tracking images. Consequently, ARKit is fairly "lenient" when it comes to image detection.
Consider an application where a different AR experience is triggered off of each playing card. (Perhaps we learn the story of the different Queens and their path to royalty.)
Unfortunately, ARKit considers the Queen of Clubs and the Queen of Diamonds to be compositionally too similar to track separately.
This ambiguity makes it impossible to build the above experience using ARKit alone. (Both queens will be recognized in each card). By inspection, however, these two images should be easy for a machine to differentiate. Their colors and compositions are plenty different.
Core ML can be employed to help disambiguate the playing cards using a simple image classifier. Compared to its vast capabilities, differentiating a few static compositions is a trivial task for machine learning. Using Create ML, Custom Vision, Watson, or any other drag-and-drop service capable of generating a .mlmodel
file, you can have a robust image classifier with as few as 5 training images per classification.
The general workflow for employing Core ML alongside ARKit is simple:
- ARKit informs you that it has detected a reference image coming in from the camera
- grab a snapshot of this real-world object
- feed it into your machine learning classifier
- Use the results to show the correct content
While the high-level approach isn't complicated, the low-level execution is more difficult.
This project serves as an example and a host for utility methods to make it easier to incorporate Core ML into your ARKit applications.
The tricky functionality is abstracted behind a simple MLRecognizer
class. Instantiate it with a reference to your MLModel
and your ARSceneView
.
lazy var recognizer = MLRecognizer(
model: PlayingCards().model,
sceneView: sceneView
)
Then, use the classify
method to receive a classification for a given ARImageAnchor
func classify(imageAnchor: ARImageAnchor, completion: @escaping (Result<String>) -> Void)
Thats it! Go build something cool.
See ARSceneViewController
for an example implementation.
In the ARSCNViewDelegate
renderer(_:didAdd:for:)
callback, we forward the image anchor to the MLRecognizer
to be snapshotted, cropped, deskewed, and classified.
func renderer(_ renderer: SCNSceneRenderer, didAdd node: SCNNode, for anchor: ARAnchor) {
guard let imageAnchor = anchor as? ARImageAnchor else { return }
// send off anchor to be snapshotted, cropped, deskewed, and classified
recognizer.classify(imageAnchor: imageAnchor) { [weak self] result in
if case .success(let classification) = result {
// update app with classification
self?.attachLabel(classification, to: node)
}
}
}
ARKit+Utilities
contains a number of useful utilities for capturing and cropping images from ARKit scenes.
extension ARSCNView {
/**
Functionally equivalent to `SCNView`'s `snapshot()`,
except only including the raw camera image,
not any virtual geometry that may be in the scene.
*/
public func capturedImage() -> UIImage?
/**
Returns a cropped and deskewed image of the raw camera image of a given `ARImageAnchor`,
not including any virtual geometry that may be in the scene.
*/
public func capturedImage(from anchor: ARImageAnchor) -> UIImage?
}
Add the files located in the Library folder to your project!