Dynamic Routing Between Capsules STAT946: Difference between revisions

From statwiki
Jump to navigation Jump to search
Line 16: Line 16:


* Pooling is a poor way to do dynamic routing:  We need to route each part of the input to the neurons that know how to deal with it. Finding the best routing is equivalent to parsing the image.
* Pooling is a poor way to do dynamic routing:  We need to route each part of the input to the neurons that know how to deal with it. Finding the best routing is equivalent to parsing the image.
== Equivariance ==
* Without the sub-sampling, convolutional neural nets give "place-coded" equivariance for discrete translations.
=== Two types of equivariance ===
==== Place-coded equivariance ====
If a low-level part moves to a very different position it will be represented by a different capsule.
==== Rate-coded equivariance ====
If a part only moves a small distance it will be represented by the same capsule but the pose outputs of the capsule will change.
Higher-level capsules have bigger domains so low-level place-coded equivariance gets converted into high-level rate-coded equivariance.


= Mathematical Representations =
= Mathematical Representations =

Revision as of 22:48, 1 April 2018

Presented by

Yang, Tong(Richard)

Introduction

Hinton's Critiques on CNN

Four arguments against pooling

  • It is a bad fit to the psychology of shape perception: It does not explain why we assign intrinsic coordinate frames to objects and why they have such huge effects.
  • It solves the wrong problem: We want equivariance, not invariance. Disentangling rather than discarding.
  • It fails to use the underlying linear structure: It does not make use of the natural linear manifold that perfectly handles the largest source of variance in images.
  • Pooling is a poor way to do dynamic routing: We need to route each part of the input to the neurons that know how to deal with it. Finding the best routing is equivalent to parsing the image.

Equivariance

  • Without the sub-sampling, convolutional neural nets give "place-coded" equivariance for discrete translations.

Two types of equivariance

Place-coded equivariance

If a low-level part moves to a very different position it will be represented by a different capsule.

Rate-coded equivariance

If a part only moves a small distance it will be represented by the same capsule but the pose outputs of the capsule will change.

Higher-level capsules have bigger domains so low-level place-coded equivariance gets converted into high-level rate-coded equivariance.

Mathematical Representations

Capsule

\begin{align} s_j = \sum_{i}c_{ij}\hat{u}_{j|i} \end{align}

where

\begin{align} \hat{u}_{j|i} = W_{ij}u_i \end{align}

Two Key Features of Capsule Network

Squashing

\begin{align} v_j = \frac{||s_j||^2}{1+||s_j||^2}\frac{s_j}{||s_j||} \end{align}

Routing By Agreement

\begin{align} c_{ij} = \frac{exp(b_ij)}{\sum_{k}exp(b_ik)} \end{align}

Empirical Results

MINST

MultiMNIST