contributions on Video-Based Face Recognition Using Adaptive Hidden Markov Models

From statwiki
Jump to: navigation, search

Video-based face recognition

In the beginning, most of the methods used for video-based face recognition were based on the still-to-still techniques which aimed at selecting good frame and then performed some relative processing. Recently, it took the direction of spatio-temporal representations. Most of the existing systems address video-based face recognition problems using the following steps:

1. Detect a face (face detection)

2. Track the face over time (face tracking). Sometimes, selecting frames containing frontal faces and/or valued cues is necessary

3. When a frame satisfying certain criteria (e.g. size, pose or illumination)is acquired, recognition is performed (face recognition).

Figure 1 shows the process.

Fig.1 Abstraction of the process of video-based face recognition

Online and offline video-based face recognition

  • Now we try to look at two application scenarios investigated by the authors of the current paper. One is to recognize the human face from the video sequence in an online fashion and do not know when the subject will leave or another subject will come in. In this case, we need to know the recognition results up to the current frame immediately. Many online recognition and verification systems of human faces belong to this kind of application, which is called online video. The other scenario is to process the video content offline, like indexing the meeting records or analyzing surveillance videos, where what matters are the recognition results after all the frames of one sequence have been captured. Such scenario is called offline video. Both scenarios are illustrated in Figure 2.
Fig.2 Online and offline video

For the online video, via using a face tracking program, human faces can be tracked while cropping the face region for recognition. With the face tracking, it is possible as well to know whether the current frame and the previous frames belong to the same subject. Here, the idea referred to in the original paper (using majority voting) could be used to get information about which subject is mostly recognized among all the previous frames. As a result, a decision will be made on whether or not the current frame should be used to update the eigenspace.

For offline video, updating based on majority voting can still be used in processing frames one by one. However, as shown in the third row of Figure 2, once a sequence is done with recognition, all the frames in such sequence could be used to update the eigenspace of the most recognized subject. This is not feasible in online video due to the fact that there is a need to store all the previous frames in one sequence.


Notes

  • There are some issues associated with video-based face recognition systems. One of them is resolution. One of the chief advantages of video over still frames is the potentially better face recognition performance provided by video-based systems. However, videos are usually of low resolution containing faces mostly in non-frontal poses. Use of 3D face models has been suggested as a way to compensate for the low resolution, poor contrast and non-frontal pose.