Dynamic Routing Between Capsules STAT946
Presented by
Yang, Tong(Richard)
Introduction
Hinton's Critiques on CNN
Four arguments against pooling
- It is a bad fit to the psychology of shape perception: It does not explain why we assign intrinsic coordinate frames to objects and why they have such huge effects.
- It solves the wrong problem: We want equivariance, not invariance. Disentangling rather than discarding.
- It fails to use the underlying linear structure: It does not make use of the natural linear manifold that perfectly handles the largest source of variance in images.
- Pooling is a poor way to do dynamic routing: We need to route each part of the input to the neurons that know how to deal with it. Finding the best routing is equivalent to parsing the image.
Mathematical Representations
Capsule
\begin{align} s_j = \sum_{i}c_{ij}\hat{u}_{j|i} \end{align}
where
\begin{align} \hat{u}_{j|i} = W_{ij}u_i \end{align}
Two Key Features of Capsule Network
Squashing
\begin{align} v_j = \frac{||s_j||^2}{1+||s_j||^2}\frac{s_j}{||s_j||} \end{align}
Routing By Agreement
\begin{align} c_{ij} = \frac{exp(b_ij)}{\sum_{k}exp(b_ik)} \end{align}