-
-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for face tracking #9078
Comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the project you are working on
Based on discussion by Mux, Malcolm, iFire, Lyuma and Saracan, 9 Feb 2024 XR meeting and experimentation in preceding weeks.
The ability to track facial expressions has become a common feature in the animation world. Often implemented through cameras recording the players face and analyses of this video feed to animate a virtual character. Often this leads to recorded animations that can be played back in game or rendered out to a movie.
This technology is now also being adopted in the XR space and this is where we find our primary use case for Godot. Real time facial analysis can drive the players avatar which often is represented in a mirror image of the player, or broadcast over the network in social applications such as VRChat, Meta Horizons, Apples facetime, etc. so the players facial expression and mouth movement, synched up with the players voice is presented to other participants.
This animation data often is captured as a number of named blend shapes, however the number and definition of these blend shapes varies widely amongst various implementations.
A popular standard has arisen within the industry called Unified Expressions which defines a very complete set of blend shapes.
Here an overview of these blend shapes is provided alongside mappings to the data provided by a number of common facial tracker solutions.
The idea is to introduce support for this standard.
Describe the problem or limitation you are having in your project
Godot does not currently have support for facial trackers.
Describe the feature / enhancement and how it helps to overcome the problem or limitation
As mentioned above the goal of this proposal is to provide an implementation of the unified expressions standard so that it is both easy to obtain data from various different sources, and apply this to 3D assets that are set up with facial blend shapes supported through this standard.
The main goal here is to make a solution that is platform agnostic. The developer of a Godot game requiring this feature does not have pre-existing knowledge of the hardware used by the end user, nor should be required to implement multiple implementations just to cater to the different hardware that is out there. By using the unified expression standard we can circumvent this issue.
As the primary use cases for this implementation can be found in XR applications, the plan is to add the required functionality to the XR Server. The design described below will cater to non-XR applications of this technology as well as the XR Server is always available and this solution does not require an active XR interface.
Note also that phone based facial tracking for applications such as snapchat falls under the AR umbrella and are thus considered XR applications.
Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams
This proposal will divide the solution into two distinct parts that work together.
Providers
A new
RefCounted
subclass will be added calledXRFaceTrackingProvider
. This class will hold an array of floats containing all blend shapes as defined in the unified expression standard. We’ll define an enum with all the required entries. For reference this is our class definition as we currently assume it to be (practical additions may be added during development):The class contains access methods to get and set blend shape values and an
update
method. Thisupdate
method should use the GDVIRTUAL pattern so this can be implemented in GDScript or through GDExtension.Implementations of this class can be created for each face tracking device. For instance in the OpenXR vendor plugin we may implement the Meta facial tracker extension. As part of this extension we will implement a
XRMetaFaceTrackerProvider
subclass. In the implementation of the_update
function we will fetch all blend shape data from the Meta extension, and populate all the blend shapes performing any needed conversions/remapping of the incoming data, to the unified expression standard.When the Meta face tracker extension is activated we will instantiate the provider class and register this class with the
XRServer
providing a usage path in a similar fashion as we do withXRPositionalTrackers
. This usage path will identify what source the face tracking relates to, taking example from OpenXR, the usage path in this case would be/user/head
. The assumption here is that our player only has one head, and will only use one face tracker at a time, and we can thus find the active face tracking by this usage path.In
XRServer::_process
we will call theupdate
function of all registered providers (or should we call this function_process
as well?).By making the usage path a string, we leave open the door for other usage paths to be introduced in the future and add flexibility to this system. One such use case that was discussed was a multiplayer setup where a face tracker tracks multiple faces.
The implementation in
XRServer
thus can be fairly simple.XRServer
simply maintains a named dictionary of Ref instances and implements the functions:void add_face_tracker_provider(p_path, p_provider)
: Adds a new face tracker.void remove_face_tracker_provider(p_provider)
: Removes this provider.PackedStringArray get_face_tracker_paths()
: Retrieves a list of register pathsRef<XRFaceTrackerProvider> get_face_tracker_provider(p_path)
: Get a face tracker provider by pathWe also add a
face_tracker_provider_added
and aface_trakcer_provider_removed
signal to theXRServer
that notifies the world of any change to our registered providers.Consumers
Now that we are obtaining face tracking data in a standardised way and made it accessible from our server, we need to have something that consumes this information.
As part of this proposal we will implement one such consumer, a node called
XRFaceTracker
. This node will have two properties:provider
, which holds a provider path for the provider we will consume (seetracker
onXRNode3D
for implementation details).target
, which holds a node path to the asset that holds the face mesh on which we’ll manipulate the blend shapes (MeshInstance3D or its related skeleton?).When our node enters the tree we will attempt to obtain the provider related to our path set in our
provider
property. We will also connect to ourXRServer
signals so we can react to the provider being removed or added.We will also examine the target node when our node enters the tree and attempt to map its blend shapes to our unified expressions entries. This process applies a fuzzy logic algorithm or a common names lookup algorithm to match blend shapes in our asset, to our unified expression blend shapes. Two rules are important here:
Assets can be prepared to work with raw data from multiple face tracking sources and thus result in multiple blend shapes being found that match our unified expression blend shape. In this case we should pick the best match.
For the same reason assets may provide related blend shapes both split and combined (e.g. BrowInnerUpLeft, BrowInnerUpRight and BrowInnerUp), in this case only the split blend shapes must be assigned and the combined blend shape ignored.
There may be other such rules that we will discover as we implement this.
In our
_process
function we will apply the relevant blend shape values stored in our provider to our asset, animating our asset.If this enhancement will not be used often, can it be worked around with a few lines of script?
This is integral logic to the XR system, while some of it will exist in script or in plugins, the core feature has to be added to the XRServer.
Is there a reason why this should be core and not an add-on in the asset library?
This is integral logic to the XR system, while some of it will exist in script or in plugins, the core feature has to be added to the XRServer.
The text was updated successfully, but these errors were encountered: