GEDDnet: A Network for Gaze Estimation with Dilation and Decomposition

Dilated Convolution

We use dilated-convolutions to capture high-level features at high-resolution from eye images. We replace some regular convolutional layers and max-pooling layers of a VGG16 network by dilated-convolutional layers with different dilation rates.

Gaze Decomposition

We propose gaze decomposition for appearance-based gaze estimation, which decomposes the gaze estimate into the sum of a subject-independent term estimated from the input image by a deep convolutional network and a subject-dependent bias term.

During training, both the weights of the deep network and the bias terms are estimated. During testing, if no calibration data is available, we can set the bias term to zero. Otherwise, the bias term can be estimated from images of the subject gazing at different gaze targets. The proposed gaze decompostion method enables low complexity calibraiton, i.e., using calibration data collected when subjects view only one or a few gaze targets and the number of images per gaze target is small.

Setup

1. Prerequisites

Tensorflow == 1.15

python == 3.7

opencv

2. Datasets

Preprocess the dataset so that it contains:

(1) A 120$\times$120 face image: face_img

(2) Two 80$\times$120 eye images: left_eye_img and right_eye_img

(3) Pitch and yaw gaze angles in radian: eye_angle. Remember pitch first!!

(4) An integer to index each subject: subject_index. When the images of a subject are flipped horizontally, the index changes, i.e., subj_index+total_num_subject

In dataset['face_img'] in train.py, the shape of the mat should be $N \times 120 \times 120$. The shape of dataset['eye_img'] should be $N \times 80 \times 120$. The shape of dataset['eye_angle'] should be $N \times 2$. The shape of dataset['subject_index'] should be $N \times 1$.

3. Online Data Augmentation

During training, PreProcess.py will perform online data augmentatioin, including random horizontal flipping, rotate and cropping. The face_img will be cropped from 120$\times$120 to 96$\times$96; the eye_img will be cropped from 80$\times$120 to 64$\times$96; The subject_index will changes to subject_index + total_num_subject if the image is flipped horizontally.

4. Training and Testing

For training, just simplily run:

cd code
python train.py --num_subject *total_num_subject_ignoring_horizontal_flipping*

For inference, run

cd code
python infer.py

Note that a trained model data/models and an example of camera matrix data/camera_matrix.mat are provided.

Bibtex

@article{chen2022towards,
 title={Towards High Performance Low Complexity Calibration in Appearance Based Gaze Estimation}, 
 author={Chen, Zhaokang and Shi, Bertram},
 journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
 year={2022},
 volume={},
 number={},
 pages={1-1},
 publisher={IEEE},
 doi={10.1109/TPAMI.2022.3148386}}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Figure		Figure
code		code
data		data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GEDDnet: A Network for Gaze Estimation with Dilation and Decomposition

Dilated Convolution

Gaze Decomposition

Setup

1. Prerequisites

2. Datasets

3. Online Data Augmentation

4. Training and Testing

Bibtex

About

Releases

Packages

Languages

czk32611/GEDDnet

Folders and files

Latest commit

History

Repository files navigation

GEDDnet: A Network for Gaze Estimation with Dilation and Decomposition

Dilated Convolution

Gaze Decomposition

Setup

1. Prerequisites

2. Datasets

3. Online Data Augmentation

4. Training and Testing

Bibtex

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages