Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for face tracking #9078

Closed
BastiaanOlij opened this issue Feb 12, 2024 · 1 comment
Closed

Add support for face tracking #9078

BastiaanOlij opened this issue Feb 12, 2024 · 1 comment
Assignees
Milestone

Comments

@BastiaanOlij
Copy link

Describe the project you are working on

Based on discussion by Mux, Malcolm, iFire, Lyuma and Saracan, 9 Feb 2024 XR meeting and experimentation in preceding weeks.

The ability to track facial expressions has become a common feature in the animation world. Often implemented through cameras recording the players face and analyses of this video feed to animate a virtual character. Often this leads to recorded animations that can be played back in game or rendered out to a movie.

This technology is now also being adopted in the XR space and this is where we find our primary use case for Godot. Real time facial analysis can drive the players avatar which often is represented in a mirror image of the player, or broadcast over the network in social applications such as VRChat, Meta Horizons, Apples facetime, etc. so the players facial expression and mouth movement, synched up with the players voice is presented to other participants.

This animation data often is captured as a number of named blend shapes, however the number and definition of these blend shapes varies widely amongst various implementations.

A popular standard has arisen within the industry called Unified Expressions which defines a very complete set of blend shapes.
Here an overview of these blend shapes is provided alongside mappings to the data provided by a number of common facial tracker solutions.

The idea is to introduce support for this standard.

Describe the problem or limitation you are having in your project

Godot does not currently have support for facial trackers.

Describe the feature / enhancement and how it helps to overcome the problem or limitation

As mentioned above the goal of this proposal is to provide an implementation of the unified expressions standard so that it is both easy to obtain data from various different sources, and apply this to 3D assets that are set up with facial blend shapes supported through this standard.
The main goal here is to make a solution that is platform agnostic. The developer of a Godot game requiring this feature does not have pre-existing knowledge of the hardware used by the end user, nor should be required to implement multiple implementations just to cater to the different hardware that is out there. By using the unified expression standard we can circumvent this issue.

As the primary use cases for this implementation can be found in XR applications, the plan is to add the required functionality to the XR Server. The design described below will cater to non-XR applications of this technology as well as the XR Server is always available and this solution does not require an active XR interface.

Note also that phone based facial tracking for applications such as snapchat falls under the AR umbrella and are thus considered XR applications.

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

This proposal will divide the solution into two distinct parts that work together.

Providers
A new RefCounted subclass will be added called XRFaceTrackingProvider. This class will hold an array of floats containing all blend shapes as defined in the unified expression standard. We’ll define an enum with all the required entries. For reference this is our class definition as we currently assume it to be (practical additions may be added during development):

class XRFaceTrackingProvider : public RefCounted {
public:
	// Enum and strings of blend shapes as defined in the Unified Expressions standard.
enum BlendShapeEntry {
FT_EYELOOKUPRIGHT,
	FT_EYELOOKDOWNRIGHT,
	FT_EYELOOKINRIGHT,
…
FT_TONGUELEFT,
FT_TONGUEROLL,
FT_MAX
};

const String blend_shape_names[FT_MAX] = {
	“EyeLookUpRight”,
	“EyeLookDownRight”,
	“EyeLookInRight”,
	…
	“TongueLeft”,
	“TongueRoll”
};

float get_blend_shape(BlendShapeEntry p_blend_shape) const;
void set_blend_shape(BlendShapeEntry p_blend_shape, float p_value);
protected:
	static void _bind_methods();
	void update();
	GDVIRTUAL0(_update);

private:
float blend_shape_values[FT_MAX];
};

VARIANT_ENUM_CAST(XRFaceTrackingProvider::BlendShapeEntry)

The class contains access methods to get and set blend shape values and an update method. This update method should use the GDVIRTUAL pattern so this can be implemented in GDScript or through GDExtension.

Implementations of this class can be created for each face tracking device. For instance in the OpenXR vendor plugin we may implement the Meta facial tracker extension. As part of this extension we will implement a XRMetaFaceTrackerProvider subclass. In the implementation of the _update function we will fetch all blend shape data from the Meta extension, and populate all the blend shapes performing any needed conversions/remapping of the incoming data, to the unified expression standard.

Note, this can result in blend values being duplicated, or averages stored, in situations where there are related blend shapes such as BrowInnerUpLeft, BrowInnerUpRight and BrowInnerUp.

When the Meta face tracker extension is activated we will instantiate the provider class and register this class with the XRServer providing a usage path in a similar fashion as we do with XRPositionalTrackers. This usage path will identify what source the face tracking relates to, taking example from OpenXR, the usage path in this case would be /user/head. The assumption here is that our player only has one head, and will only use one face tracker at a time, and we can thus find the active face tracking by this usage path.

In XRServer::_process we will call the update function of all registered providers (or should we call this function _process as well?).

By making the usage path a string, we leave open the door for other usage paths to be introduced in the future and add flexibility to this system. One such use case that was discussed was a multiplayer setup where a face tracker tracks multiple faces.

The implementation in XRServer thus can be fairly simple. XRServer simply maintains a named dictionary of Ref instances and implements the functions:
void add_face_tracker_provider(p_path, p_provider): Adds a new face tracker.
void remove_face_tracker_provider(p_provider): Removes this provider.
PackedStringArray get_face_tracker_paths(): Retrieves a list of register paths
Ref<XRFaceTrackerProvider> get_face_tracker_provider(p_path): Get a face tracker provider by path

We also add a face_tracker_provider_added and a face_trakcer_provider_removed signal to the XRServer that notifies the world of any change to our registered providers.

Consumers
Now that we are obtaining face tracking data in a standardised way and made it accessible from our server, we need to have something that consumes this information.

As part of this proposal we will implement one such consumer, a node called XRFaceTracker. This node will have two properties:
provider, which holds a provider path for the provider we will consume (see tracker on XRNode3D for implementation details).
target, which holds a node path to the asset that holds the face mesh on which we’ll manipulate the blend shapes (MeshInstance3D or its related skeleton?).

When our node enters the tree we will attempt to obtain the provider related to our path set in our provider property. We will also connect to our XRServer signals so we can react to the provider being removed or added.

We will also examine the target node when our node enters the tree and attempt to map its blend shapes to our unified expressions entries. This process applies a fuzzy logic algorithm or a common names lookup algorithm to match blend shapes in our asset, to our unified expression blend shapes. Two rules are important here:
Assets can be prepared to work with raw data from multiple face tracking sources and thus result in multiple blend shapes being found that match our unified expression blend shape. In this case we should pick the best match.
For the same reason assets may provide related blend shapes both split and combined (e.g. BrowInnerUpLeft, BrowInnerUpRight and BrowInnerUp), in this case only the split blend shapes must be assigned and the combined blend shape ignored.
There may be other such rules that we will discover as we implement this.

In our _process function we will apply the relevant blend shape values stored in our provider to our asset, animating our asset.

If this enhancement will not be used often, can it be worked around with a few lines of script?

This is integral logic to the XR system, while some of it will exist in script or in plugins, the core feature has to be added to the XRServer.

Is there a reason why this should be core and not an add-on in the asset library?

This is integral logic to the XR system, while some of it will exist in script or in plugins, the core feature has to be added to the XRServer.

@BastiaanOlij BastiaanOlij added this to the 4.x milestone Feb 12, 2024
@Calinou Calinou changed the title Godot Face Tracking Add support for face tracking Feb 12, 2024
@akien-mga akien-mga modified the milestones: 4.x, 4.3 Feb 19, 2024
@akien-mga
Copy link
Member

Implemented by godotengine/godot#88312.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants