# Introduction

## Human face recognition

Human face recognition is a subarea of object recognition which aims to identify a face given a scene or still images. It is very complex problem with high dimensionality due to the nature of digital images. Face recognition benefits many fields such as computer security and video compression. Two approaches are commonly used in face recognition are video-based and still images. Since the 80's, image-based recognition is more dominant in face recognition in comparison with the video-based approach. Few recent studies took advantages of the features of video scenes as it provides more dynamic characteristic of the human face that help the recognition process. Also, farm sequences provide more features of 3D representation and high resolution images. Besides, in video-based recognition the prediction accuracy can be improved using the farm sequence.

Fig.1 Temporal HMM graph.

Motivated by speaker adaptation, this paper presents an Adaptive Hidden Markov model to recognize human face from frames sequence. The proposed model trains HMM on the training data and then improves the recognition constantly using the test data. A sample figure is displayed in Figure 1 that captures the following:

• HMM is used to study the temporal dynamics in the training process
• Then the temporal features of this test sequence is analyzed over time by the HMM of each subject
• The likelihood's are then compared to obtain the identity of the test video sequence

One advantage of this proposed idea is that the model can include dynamical characteristics.

## Hidden Markov Model (HMM)

Hidden Markov Model is graphical model that suitable to represent sequential data. HMM consists of initial state $\pi_i$, unobserved states $q_t$, transition matrix A, and emission matrix B. HMM characterized by $\lambda=(A,B,\pi)$ :
Fig.2 HMM graph.

Given N of states $S ={S_1 ,S_2 , ,S_N }$ and $q_t$ state of time T

A a transition matrix where $a_ij$ is the (i,j) entry in A:

$a_ij=P(q_t=S_j|q_{t-1}=S_i)$ where $1\leq i,j \leq N$

B the observation pdf $B={b_i(O)}$

$b_i(O)=\sum_{k=1}^M c_{ik} N(O,\mu_{ik},U_{ik})$ where $1\leq i \leq N$

where $c_{ik}$ is the mixture coefficient for $k_th$ mixure component of $S_i$

M number of component in Gaussian mixture model .

$\mu_{ik}$ is the mean vector and $U_ik$ is the covariance matrix .

the intial state $\pi_i=p(q_t=S_i)$ wherer $1\leq i \leq N$

# Features extraction

In computer vision there are common approaches that are used for feature extraction such as Pixel value ,Eigen-coefficients,and DCT. These approaches help us to reduce the dimensionality and solve the problem in feature space. Without doing this step our problem will be computationally intractable. In this study Principal Component Analysis PCA was used to represent the images in low-dimensional features. The Eigenanalysis was performed to produce new features vectors projected in the eigenspace by computing the covariance, eigenvectors and eigenvalues. The Feature extraction procedure was as follows. The given face database contains T number of images for each subject, where we have a total of L subjects:

$\, F_l = \{ f_{l,1},f_{1,2},f_{l,3},……f_{l,t} \}$

$\, 1 \leq l \leq L$

$F_l$ is a tuple of T training face images of subject l. The images in this dataset only contains the face portion of the subjects. Several eigenvectors, ${V_1, V_2, ..., V_d}$, obtained by performing eigen-analysis on the L*T training samples. Corresponding feature vector of each image, $e_{l,t}$, is then generated by projecting the training images into the obtained eigenvectors. The set of all the projected training images (feature vectors) is then used as observations to train the HMM.

# Temporal HMM

Each subject l modeled by fully connected HMM consisted of N states and observed variables O. The training started by initializing the HMM $\lambda=(A,B,\pi)$. Then the observation vectors are separated using vector quantization into N classes which then used to initially estimate of the probability density function B. Then the MLE $P(O|\lambda)$was iteratively computed using EM algorithm as define below:

The probability of initial state is $\pi_i=\frac{P(O,q_1=i|\lambda)}{P(O|\lambda)}$

The transition matrix $a{ij}=\frac {\sum_{t=1}^T P(O,q_{t-1}=i, q_{t}=j|\lambda)}{\sum_{t=1}^T P(O,q_{t-1}=i|\lambda)}$

The mixture coefficient $c{ik}=\frac {\sum_{t=1}^T P(q_{t}=i, m_{q,t}=k|O,\lambda)}{\sum_{t=1}^T \sum{k=1}^M P(q_{t}=i,m_{q,t}=k|O,\lambda)}$

The mean vector $\mu{ij}=\frac {O_t\sum_{t=1}^T P(q_{t}=i, m_{qt}=k|O,\lambda)}{\sum_{t=1}^T P(q_{t}=i,m_{qt}=k|O,\lambda)}$

The covariance $U{ik}=(1-\alpha)C_e+\alpha\frac {\sum_{t=1}^T (O_t-\mu_{ik})(O_t-\mu{it})^T P(q_{t}=i, m_{qt}=k|O,\lambda)}{\sum_{t=1}^T P(q_{t}=i,m_{q,t}=k|O,\lambda)}$, where $m_{qt}$ denotes the mixture component of state $q$. $P(O|\lambda_k)=\max_l P(o|\lambda_l)$

Motivated by speech speaker-dependent recognition this paper proposed an adaptive HMM that trains the HMM during the recognition process. At the recognition process step; after a test sequence is recognized for one subject;, this same sequence is used to update the HMM of that subject. Two questions have to be addressed in this scenario. Firstly, the basis on which we justify the current sequence to be used as an updating sequence for its successive iterations, and secondly using HMM. Hence, this adaptive learning approach has some challenges as there is a need to estimate the correctness and the value of the new information. The proposed model in this study computes the likelihood difference between the estimated likelihood and a predefined threshold determine through experiments. Then the model uses EM algorithm to iteratively estimate the Maximum a posterior MAP which used to adapt the HMM given initial state $\lambda_{old}$,and observation vectors O. We should mention that the covariance is not updated but the mean is updated as follow.

$\mu_{ik}=(1-\beta)\mu^{old}_{ik}+\beta\frac {\sum_{t=1}^T O_t P(q_{t}=i, m_{qt}=k|O,\lambda)}{\sum_{t=1}^T P(q_{t}=i,m_{qt}=k|O,\lambda)}$

Where $\beta$ is a weighting factor between zero and one, which sets the tendency towards the new value of the mean. By decreasing the value of this parameter, each new $\mu$ will be closer to the previous value. In this work authors have set the value of this parameter to 0.3.

# Model Evaluation

The proposed model was tested on 3 datasets: Task, Task-new,and Mobo[1].Task database is consisted of videos of 21 subjects while they reading and typing on the computer while the new task datasets contain video of 11 subject in different lighting and cameras settings. Mobo data consisted of 24 video for each subjects while they are in different walking positions. The video frames were cropped manually to 16x16 pixels in Task database and to 48x48 pixels for Mobo database. For each subject, 150 frames were used to for training and 150 for testing. The frames’ location and length were randomly chosen from each user video. To evaluate the performance of the proposed model, the model was compared with a baseline image-recognition algorithm and with the temporal HMM .

## The Baseline algorithm

The baseline algorithm in this study is individual PCA (IPCA )a commonly used in imaged-based face recognition algorithm recognition Applying the baseline algorithm on Task database yielded a 9.9% error rate with 12 eigenvectors. In Mobo dataset , the baseline algorithm recognition yielded 2.4% error rate using 7 eigenvectors.

 Template:namespace detect