Robust facial feature tracking under varying face pose and facial ...
g=0 width=100% bgcolor=ccccff>
« back to results for ""
Below is a cache of http://www.ecse.rpi.edu/~qji/Papers/feature_tracking.pdf. It's a snapshot of the page taken as our search engine crawled the Web.
The web site itself may have changed. You can check the current page or check for previous versions at the Internet Archive.
Yahoo! is not affiliated with the authors of this page or responsible for its content.
Robust facial feature tracking under varying face pose and facial expression
Pattern Recognition 40 (2007) 3195 3208
www.elsevier.com/locate/pr
Robust facial feature tracking under varying face pose and facial expression
Yan Tong
a
, Yang Wang
b
, Zhiwei Zhu
c
, Qiang Ji
a,
a
Department of Electrical, Computer, and Systems Engineering, Rensselaer Polytechnic Institute, Troy, NY 12180-3590, USA
b
National ICT Australia, Eveleigh, NSW 1430, Australia
c
Sarnoff Corporation, Princeton, NJ 08543-5300, USA
Received 22 October 2006; received in revised form 23 February 2007; accepted 28 February 2007
Abstract
This paper presents a hierarchical multi-state pose-dependent approach for facial feature detection and tracking under varying facial expression
and face pose. For effective and efcient representation of feature points, a hybrid representation that integrates Gabor wavelets and gray-level
proles is proposed. To model the spatial relations among feature points, a hierarchical statistical face shape model is proposed to characterize
both the global shape of human face and the local structural details of each facial component. Furthermore, multi-state local shape models are
introduced to deal with shape variations of some facial components under different facial expressions. During detection and tracking, both facial
component states and feature point positions, constrained by the hierarchical face shape model, are dynamically estimated using a switching
hypothesized measurements (SHM) model. Experimental results demonstrate that the proposed method accurately and robustly tracks facial
features in real time under different facial expressions and face poses.
2007 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
Keywords: Facial feature detection and tracking; Active shape model; Face pose estimation
1. Introduction
Face plays an essential role for human communication. It
is the main source of information to discriminate and identify
people, to interpret what has been said by lipreading, and to
understand ones emotion and intention based on the emotional
facial expressions. The facial feature points are the prominent
landmarks surrounding facial components: eyebrows, eyes,
nose, and mouth. They encode critical information about fa-
cial expression and head movement. Therefore, facial feature
motion can be dened as a combination of rigid head motion
and nonrigid facial deformation. Accurate localization and
tracking facial features are important in applications such as
vision-based humanmachine interaction, face-based human
identication, animation, entertainment, etc. Generally, the fa-
cial feature tracking technologies could be classied into two
categories: model-free and model-based tracking algorithms.
The model-free tracking algorithms
[17]
are general pur-
pose point trackers without the prior knowledge of the object.
Corresponding author. Tel.: +1 518 2766440.
E-mail address:
qji@ecse.rpi.edu
(Q. Ji).
0031-3203/$30.00
2007 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
doi:10.1016/j.patcog.2007.02.021
Each facial feature point is usually tracked by performing a
local search for the best matching position, around which the
appearance is most similar to the one in the initial frame. How-
ever, the model-free methods are susceptible to the inevitable
tracking errors due to the aperture problems, noise, and occlu-
sion. Model-based methods, on the other hand, focus on explicit
modeling the shape of the objects. Recently, extensive work has
been focused on the shape representation of deformable objects
such as active contour models (Snakes)
[8]
, deformable tem-
plate method
[9]
, active shape model (ASM)
[10]
, active ap-
pearance model (AAM)
[11]
, direct appearance model (DAM)
[12]
, elastic bunch graph matching (EBGM)
[13]
, morphable
models
[14]
, and active blobs
[15]
. Although the model-based
methods utilize much knowledge on face to realize an effective
tracking, these models are limited to some common assump-
tions, e.g. a nearly frontal view face and moderate facial ex-
pression changes, and tend to fail under large pose variations
or facial deformations in real-world applications.
Given these challenges, accurate and efcient tracking of fa-
cial feature points under varying facial expression and face pose
remains challenging. These challenges arise from the potential
variability such as nonrigid face shape deformations caused by
3196
Y. Tong et al. / Pattern Recognition 40 (2007) 3195 3208
Input
Database
Preprocessing
Face detection
Eye detection
Gabor Transform
Training
(offline)
Hybrid f acial feature point
representation
Gabor jet
samples for
each fiducial
point
Greylevel
gradient for
each contour
point
Multi-state hierarchical
shape model
Global shape
model for
whole face
Local multi -state
shape model for each
facial component
Facial feature
tracking
Project mean shapes of global
and local models using the
estimated face pose of previous
frame
Search global feature points
based on global shape model
Search each
facial feature point
for each facial component under
different state assumptions of a
local multi -state shape model
Estimate the state of each facial
component and its feature point
positions by a SHM model
Estimate the 3D face pose by the
tracked
global
facial feature
points
Fig. 1. The owchart of the automatic facial feature tracking system based on the multi-state hierarchical shape model.
facial expression change, the nonlinear face transformation re-
sulting from pose variations, and illumination changes in real-
world conditions. Tracking mouth and eye motion in image
sequences is especially difcult, since these facial components
are highly deformable, varying in both shape and color, and
subject to occlusion.
In this paper, a multi-state pose-dependent hierarchical
shape model is presented for facial feature tracking under
varying face pose and facial expression. The owchart in
Fig. 1
summarizes our method. Based on the ASM, a two-level
hierarchical face shape model is proposed to simultaneously
characterize the global shape of a human face and the local
structural details of each facial component. Multi-state lo-
cal shape models are further introduced to deal with shape
variations of facial components. To compensate face shape
deformation due to face pose change, a robust 3D pose estima-
tion technique is introduced, and the hierarchical face shape
model is corrected based on the estimated face pose to im-
prove the effectiveness of the shape constraints under different
poses. Gabor wavelet jets and gray-level proles are combined
to represent the feature points in an effective and efcient
way. Both states of facial components and positions of feature
points are dynamically estimated by a multi-modal tracking
approach.
The rest of the paper is arranged as follows. Section 2 pro-
vides a detailed review on the related work of model-based
facial feature tracking approaches. Section 3 presents our pro-
posed facial feature tracking algorithm including the hierarchi-
cal multi-state pose-dependent face shape model, the hybrid
feature representation, and the proposed multi-modal facial fea-
ture tracking algorithm. Section 4 discusses the experimental
results. The paper concludes in Section 5, with a summary and
discussion for future research.
2. Related work
2.1. Facial feature tracking in nearly frontal view
Extensive recent work in facial component detection and
tracking has utilized the shape representation of deformable
objects, where the facial component shape is represented by a
set of facial feature points.
Wiskott et al.
[13]
present the EBGM method to locate facial
features using object adopted graphs. The local information
Y. Tong et al. / Pattern Recognition 40 (2007) 3195 3208
3197
of feature points is represented by Gabor wavelets, and the
geometry of human face is encoded by edges in the graph.
The facial features are extracted by maximizing the similarity
between the novel image and model graphs.
Recently, statistical models have been widely employed in
facial analysis. The ASM
[10]
proposed by Cootes et al., is
a popular statistical approach to represent deformable objects,
where shapes are represented by a set of feature points. Feature
points are searched by gray-level proles, and principal com-
ponent analysis (PCA) is applied to analyze the modes of shape
variation so that the object shape can only deform in specic
ways that are found in the training data. Robust parameter esti-
mation and Gabor wavelets have also been employed in ASM
to improve the robustness and accuracy of feature poi