\draft

# Introduction

## Human face recognition

Human face recognition is a subarea of object recognition aims to identify persons face given a scene or still images. Face recognition benefits many fields such as computer security and video compression. Two approaches are commonly used in face recognition are video-based and still images. Since 80’s, image-based recognition approach is more dominant in face recognition in comparison with the video-based approach. Few recent studies took advantages of the features of video scenes as it provides more dynamic characteristic of the human face that help the recognition process. Also, video scene provides more features of 3D representation and high resolution images. Besides, in video-based recognition the prediction accuracy can be improved using the farm sequence. Motivated by speaker adaptation, this paper presents an Adaptive Hidden Markov model to recognise human face from frames sequence. The proposed model train HMM on the training data and then improve the recognition constantly using the test data.

## Hidden Markov Model (HMM)

Hidden Markov Model is graphical model that suitable to represent sequential data. HMM consists of the probability of initial state $\pi_i$, unobserved states $q_t$, the probability of transition matrix A, and emission matrix B. HMM characterized by $\lambda=(A,B,\pi)$ :
Fig.1 HMM graph.

Given N of states $S ={S_1 ,S_2 , ,S_N }$ and $q_t$ state of time T

A a transition matrix where $a_ij$ is the (i,j) entry in A:

$a_ij=P(q_t=S_j|q_{t-1}=S_i)$ where $1\leq i,j \leq N$

B the observation pdf $B={b_i(O)}$

$b_i(O)=\sum_{k=1}^M c_{ik} N(O,\mu_{ik},U_{ik})$ where $1\leq i \leq N$

where $c_{ik}$ is the mixture coefficient for $k_th$ mixure component of $S_i$

M number of component in Gaussian mixture model .

$\mu_{ik}$ is the mean vector and $U_ik$ is the covariance matrix .

the intial state $\pi_i=p(q_t=S_i)$ wherer $1\leq i \leq N$

# Features extraction

In computer vision there is common approaches that in used for features extraction such as Pixel value ,Eigen-coefficients,and DCT. . In this study Principal Component Analysis PCA were used to represent the images in low-dimensional features. The Eigenanalysis was performed to produce new features vectors projected in the eigenspace by computing the covariance, eigenvectors and eigenvalues. The Feature extraction procedure was applied on each on T number of images for each L subjects to generates corresponding feature vectors $e_{l,t}$. For all the features vectors the mean vector $\mu$ and the covariance matrix $C_e$ were computed.

$F_l$={$f_{l,1},f_{1,2},f_{l,3},……f_{l,t}$}

$O_l$={$e_{l,1},e_{1,2},e_{l,3},……e_{l,t}$}

# Temporal HMM

Each subject l modeled by fully connected HMM consisted of N states and observed variables O. The training started by intilaizing the HMM $\lambda=(A,B,\pi)$. Then the observation vectors are separated using vector quantization into N classes which then used to initially estimate of the probability density function B. Then the MLE $P(O|\lambda)$was iteratively computed using EM algorithm as define below:

The probability of initial state is $\pi_i=\frac{P(O,q_1=i|\lambda)}{P(O|\lambda)}$

The transition matrix $a{ij}=\frac {\sum_{t=1}^T P(O,q_{t-1}=i, q_{t}=j|\lambda)}{\sum_{t=1}^T P(O,q_{t-1}=i|\lambda)}$

The mixture coefficient $c{ik}=\frac {\sum_{t=1}^T P(q_{t}=i, m_{q,t}=k|O,\lambda)}{\sum_{t=1}^T \sum{k=1}^M P(q_{t}=i,m_{q,t}=k|O,\lambda)}$

The mean vector $\mu{ij}=\frac {O_t\sum_{t=1}^T P(q_{t}=i, m_{qt}=k|O,\lambda)}{\sum_{t=1}^T P(q_{t}=i,m_{qt}=k|O,\lambda)}$

The covariance $U{ik}=(1-\alpha)C_e+\alpha\frac {\sum_{t=1}^T (O_t-\mu_{ik})(O_t-\mu{it})^T P(q_{t}=i, m_{qt}=k|O,\lambda)}{\sum_{t=1}^T P(q_{t}=i,m_{q,t}=k|O,\lambda)}$

$P(O|\lambda_k)=max_l P(o|\lambda_l)$

Motivated by speech speaker-dependent recognition this paper proposed an adaptive HMM that trains the HMM during the recognition process. The adaptive learning approach has some challenges as there is a need to estimate the correctness and the valuableness of the new information. The proposed model in this study computes the likelihood difference between the estimated likelihood and a predefined threshold determine through experiments. Then the model uses EM algorithm to iteratively estimate the Maximum a posterior MAP which used to adapt the HMM given initial state $\lambda_{old}$,and observation vectors O. We should mention that the covraiance is not updated but the mean is updated as follow.

$\mu_{ik}=(1-\beta)\mu^{old}_{ik}+\beta\frac {\sum_{t=1}^T O_t P(q_{t}=i, m_{qt}=k|O,\lambda)}{\sum_{t=1}^T P(q_{t}=i,m_{qt}=k|O,\lambda)}$

# Model Evaluation

The proposed model was tested on 3 datasets: Task, Task-new,and Mobo[1].Task database is consisted of videos of 21 subjects while they reading and typing on the computer while the new task datasets contain video of 11 subject in different lighting and cameras settings. Mobo data consisted of 24 video for each subjects while they are in different walking positions. The video frames were cropped manually to 16x16 pixels in Task database and to 48x48 pixels for Mobo database. For each subject, 150 frames were used to for training and 150 for testing. The frames’ location and length were randomly chosen from each user video. To evaluate the performance of the proposed model, the model was compared with a baseline image-recognition algorithm and with the temporal HMM .

## The baseline algorithm

The baseline algorithm in this study is individual PCA (IPCA )a commonly used in imaged-based face recognition algorithm recognition Applying the baseline algorithm on Task database yielded a 9.9% error rate with 12 eigenvectors. In Mobo dataset , the baseline algorithm recognition yielded 2.4% error rate using 7 eigenvectors.