Dynamic Routing Between Capsules STAT946: Difference between revisions

From statwiki
Jump to navigation Jump to search
Line 7: Line 7:
= Hinton's Critiques on CNN =
= Hinton's Critiques on CNN =


== What is wrong with "standard" neural nets? ==
== Four arguments against pooling ==


* They have too few levels of structure:
** It is a bad fit to the psychology of shape perception: It does not explain why we assign intrinsic coordinate frames to objects and why they have such huge effects.
** Neurons, Layers, Whole Nets


* We need to group neurons in each layer into "capsules" that do a lot of internal computation and then output a compact result.
** It solves the wrong problem: We want equivariance, not invariance. Disentangling rather than discarding.
** A capsule is inspired by a mini-column.


== What does a capsule represent? ==
** It fails to use the underlying linear structure: It does not make use of the natural linear manifold that perfectly handles the largest source of variance in images.


* Each capsule represents the presence and the instantiation parameters of a multi-dimensional entity of the type that the capsule detects.
** Pooling is a poor way to do dynamic routing:  We need to route each part of the input to the neurons that know how to deal with it. Finding the best routing is equivalent to parsing the image.
* In the visual pathway, for example, a capsule detects a particular type of object or object-part.
* A capsule outputs two things:
** 1. The probability that an object of that type is present.
** 2. The generalized pose of the object which includes position, orientation, scale, deformation, velocity, color etc.
 
== Capsules do coincidence filtering ==
 
* A typical capsule receives multi-dimensional prediction vectors from capsules in the layer below and looks for a tight cluster of predictions.
* if it finds a tight cluster, it outputs:
** 1. A high probability that an entity of its type exists in its domain.
** 2. The center of gravity of the cluster, which is the generalized pose of that entity.
* This is very good at filtering out noise because high-dimensional coincidence do not happen by chance.
** Its much better than normal "neurons".


= Mathematical Representations =
= Mathematical Representations =

Revision as of 22:28, 1 April 2018

Presented by

Yang, Tong(Richard)

Introduction

Hinton's Critiques on CNN

Four arguments against pooling

    • It is a bad fit to the psychology of shape perception: It does not explain why we assign intrinsic coordinate frames to objects and why they have such huge effects.
    • It solves the wrong problem: We want equivariance, not invariance. Disentangling rather than discarding.
    • It fails to use the underlying linear structure: It does not make use of the natural linear manifold that perfectly handles the largest source of variance in images.
    • Pooling is a poor way to do dynamic routing: We need to route each part of the input to the neurons that know how to deal with it. Finding the best routing is equivalent to parsing the image.

Mathematical Representations

Capsule

\begin{align} s_j = \sum_{i}c_{ij}\hat{u}_{j|i} \end{align}

where

\begin{align} \hat{u}_{j|i} = W_{ij}u_i \end{align}

Two Key Features of Capsule Network

Squashing

\begin{align} v_j = \frac{||s_j||^2}{1+||s_j||^2}\frac{s_j}{||s_j||} \end{align}

Routing By Agreement

\begin{align} c_{ij} = \frac{exp(b_ij)}{\sum_{k}exp(b_ik)} \end{align}

Empirical Results

MINST

MultiMNIST