joint training of a convolutional network and a graphical model for human pose estimation

From statwiki
Revision as of 20:58, 13 November 2015 by X435liu (talk | contribs) (Created page with "== Introduction == Human body pose estimation, or specifically the localization of human joints in monocular RGB images, remains a very challenging task in computer vision. Rece...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Introduction

Human body pose estimation, or specifically the localization of human joints in monocular RGB images, remains a very challenging task in computer vision. Recent approaches to this problem fall into two broad categories: traditional deformable part models and deep-learning based discriminative models. Traditional models rely on the aggregation of hand-crafted low-level features and then use a standard classifier or a higher level generative model to detect the pose, which require the features to be sensitive enough and invariant to deformations. Deep learning approaches learn an empirical set of low and high-level features which are more tolerant to variations. However, it’s difficult to incorporate prior knowledge about the structure of the human body.

This paper proposes a new hybrid architecture that consists of a deep Convolutional Network Part-Detector and a part-based Spatial-Model. This combination and joint training significantly outperforms existing state-of-the-art models on the task of human body pose recognition.

Model

Convolutional Network Part-Detector

Higher-Level Spatial-Model

Unified Model

Results