Dynamic Routing Between Capsules STAT946

From statwiki
Jump to navigation Jump to search

Presented by

Yang, Tong(Richard)

Introduction

Hinton's Critiques on CNN

What is wrong with "standard" neural nets?

  • They have too few levels of structure:
    • Neurons, Layers, Whole Nets
  • We need to group neurons in each layer into "capsules" that do a lot of internal computation and then output a compact result.
    • A capsule is inspired by a mini-column.

What does a capsule represent?

  • Each capsule represents the presence and the instantiation parameters of a multi-dimensional entity of the type that the capsule detects.
  • In the visual pathway, for example, a capsule detects a particular type of object or object-part.
  • A capsule outputs two things:
    • 1. The probability that an object of that type is present.
    • 2. The generalized pose of the object which includes position, orientation, scale, deformation, velocity, color etc.

Capsules do coincidence filtering

  • A typical capsule receives multi-dimensional prediction vectors from capsules in the layer below and looks for a tight cluster of predictions.
  • if it finds a tight cluster, it outputs:
    • 1. A high probability that an entity of its type exists in its domain.
    • 2. The center of gravity of the cluster, which is the generalized pose of that entity.
  • This is very good at filtering out noise because high-dimensional coincidence do not happen by chance.
    • Its much better than normal "neurons".

Mathematical Representations

Capsule

\begin{align} s_j = \sum_{i}c_{ij}\hat{u}_{j|i} \end{align}

where

\begin{align} \hat{u}_{j|i} = W_{ij}u_i \end{align}

Two Key Features of Capsule Network

Squashing

\begin{align} v_j = \frac{||s_j||^2}{1+||s_j||^2}\frac{s_j}{||s_j||} \end{align}

Routing By Agreement

\begin{align} c_{ij} = \frac{exp(b_ij)}{\sum_{k}exp(b_ik)} \end{align}

Empirical Results

MINST

MultiMNIST