Dynamic Routing Between Capsules STAT946: Difference between revisions

From statwiki
Jump to navigation Jump to search
Line 9: Line 9:
== Four arguments against pooling ==
== Four arguments against pooling ==


** It is a bad fit to the psychology of shape perception: It does not explain why we assign intrinsic coordinate frames to objects and why they have such huge effects.
* It is a bad fit to the psychology of shape perception: It does not explain why we assign intrinsic coordinate frames to objects and why they have such huge effects.


** It solves the wrong problem: We want equivariance, not invariance. Disentangling rather than discarding.
* It solves the wrong problem: We want equivariance, not invariance. Disentangling rather than discarding.


** It fails to use the underlying linear structure: It does not make use of the natural linear manifold that perfectly handles the largest source of variance in images.
* It fails to use the underlying linear structure: It does not make use of the natural linear manifold that perfectly handles the largest source of variance in images.


** Pooling is a poor way to do dynamic routing:  We need to route each part of the input to the neurons that know how to deal with it. Finding the best routing is equivalent to parsing the image.
* Pooling is a poor way to do dynamic routing:  We need to route each part of the input to the neurons that know how to deal with it. Finding the best routing is equivalent to parsing the image.


= Mathematical Representations =
= Mathematical Representations =

Revision as of 21:28, 1 April 2018

Presented by

Yang, Tong(Richard)

Introduction

Hinton's Critiques on CNN

Four arguments against pooling

  • It is a bad fit to the psychology of shape perception: It does not explain why we assign intrinsic coordinate frames to objects and why they have such huge effects.
  • It solves the wrong problem: We want equivariance, not invariance. Disentangling rather than discarding.
  • It fails to use the underlying linear structure: It does not make use of the natural linear manifold that perfectly handles the largest source of variance in images.
  • Pooling is a poor way to do dynamic routing: We need to route each part of the input to the neurons that know how to deal with it. Finding the best routing is equivalent to parsing the image.

Mathematical Representations

Capsule

\begin{align} s_j = \sum_{i}c_{ij}\hat{u}_{j|i} \end{align}

where

\begin{align} \hat{u}_{j|i} = W_{ij}u_i \end{align}

Two Key Features of Capsule Network

Squashing

\begin{align} v_j = \frac{||s_j||^2}{1+||s_j||^2}\frac{s_j}{||s_j||} \end{align}

Routing By Agreement

\begin{align} c_{ij} = \frac{exp(b_ij)}{\sum_{k}exp(b_ik)} \end{align}

Empirical Results

MINST

MultiMNIST