# Difference between revisions of "from Machine Learning to Machine Reasoning"

(→Probabilistic Models) |
|||

Line 21: | Line 21: | ||

== Probabilistic Models == | == Probabilistic Models == | ||

+ | Graphical models describe the factorization of joint probability distributions into elementary conditional distributions with specific independence assumptions. The probabilistic rules then induce an algebraic structure on the space of conditional probability distributions, describing relations in an arbitrary set of random variables. |

## Revision as of 20:57, 5 November 2015

## Introduction

Learning and reasoning are both essential abilities associated with intelligence and machine learning and machine reasoning have received considerable attention given the short history of computer science. The statistical nature of machine learning is now understood but the ideas behind machine reasoning is much more elusive. Converting ordinary data into a set of logical rules proves to be very challenging: searching the discrete space of symbolic formulas leads to combinatorial explosion (cite). Algorithms for probabilistic inference (cite) still suffer from unfavourable computational properties (cite). Algorithms for inference do exist but they do however, come at a price of reduced expressive capabilities in logical inference and probabilistic inference.

Humans display neither of these limitations.

The ability to reason is the not the same as the ability to make logical inferences. The way that humans reason provides evidence to suggest the existence of a middle layer, already a form of reasoning, but not yet formal or logical. Informal logic is attractive because we hope to avoid the computational complexity that is associated with combinatorial searches in the vast space of discrete logic propositions.

It turns out that deep learning and multi-task learning show that we can leverage auxiliary tasks to help solve a task of interest. This idea can be interpreted as a rudimentary form of reasoning.

## Auxiliary Tasks

In order to consider the relevance of an auxiliary task, let us consider the task of of identifying person from face images. It remains expensive to collect and label millions of images representing the face of each subject with a good variety of positions and contexts. However, it is easier to collect training data for a slightly different task of telling whether two faces in images represent the same person or not (cite): two faces in the same picture are likely to belong to two different people; two faces in successive video frames are likely to belong to the same person. These two tasks have much in common image analysis primitives, feature extraction, part recognizers trained on the auxiliary task can help solve the original task.

Figure below illustrates the a transfer learning strategy involving three trainable models. The preprocessor P computes a compact face representation of the image and the comparator labels the face. We first assemble two preprocessors P and one comparator D and train this model with abundant labels for the auxiliary task. Then we assemble another instance of P with classifier C and train the resulting model using a restrained number of labelled examples from the original task.

## Reasoning Revisited

Little attention has been paid to the rules that describe how to assemble trainable models that perform specific tasks. However, these composition rules play an extremely important rule as they describe algebraic manipulations that let us combine previously acquire knowledge in order to create a model that addresses a new task.

We now draw a bold parallel: "algebraic manipulation of previously acquired knowledge in order to answer a new question" is a plausible definition of the word "reasoning".

Composition rules can be described with very different levels of sophistication. For instance, graph transformer networks (depicted in the figure below) (cite) construct specific construct specific recognition and training models for each input image using graph transduction algorithms. The specification of the graph transducers then should be viewed as a description of the composition rules.

## Probabilistic Models

Graphical models describe the factorization of joint probability distributions into elementary conditional distributions with specific independence assumptions. The probabilistic rules then induce an algebraic structure on the space of conditional probability distributions, describing relations in an arbitrary set of random variables.