Dynamic Routing Between Capsules STAT946
Presented by
Yang, Tong(Richard)
Introduction
Hinton's Critiques on CNN
What is wrong with "standard" neural nets?
- They have too few levels of structure:
- Neurons, Layers, Whole Nets
- We need to group neurons in each layer into "capsules" that do a lot of internal computation and then output a compact result.
- A capsule is inspired by a mini-column.
What does a capsule represent?
- Each capsule represents the presence and the instantiation parameters of a multi-dimensional entity of the type that the capsule detects.
- In the visual pathway, for example, a capsule detects a particular type of object or object-part.
- A capsule outputs two things:
- 1. The probability that an object of that type is present.
- 2. The generalized pose of the object which includes position, orientation, scale, deformation, velocity, color etc.
Capsules do coincidence filtering
- A typical capsule receives multi-dimensional prediction vectors from capsules in the layer below and looks for a tight cluster of predictions.
- if it finds a tight cluster, it outputs:
- 1. A high probability that an entity of its type exists in its domain.
- 2. The center of gravity of the cluster, which is the generalized pose of that entity.
- This is very good at filtering out noise because high-dimensional coincidence do not happen by chance.
- Its much better than normal "neurons".
Mathematical Representations
Capsule
\begin{align} s_j = \sum_{i}c_{ij}\hat{u}_{j|i} \end{align}
where
\begin{align} \hat{u}_{j|i} = W_{ij}u_i \end{align}
Two Key Features of Capsule Network
Squashing
\begin{align} v_j = \frac{||s_j||^2}{1+||s_j||^2}\frac{s_j}{||s_j||} \end{align}
Routing By Agreement
\begin{align} c_{ij} = \frac{exp(b_ij)}{\sum_{k}exp(b_ik)} \end{align}