Difference between revisions of "Unsupervised Learning of Optical Flow via Brightness Constancy and Motion Smoothness"

From statwiki
Jump to: navigation, search
(Optical Flow)
Line 14: Line 14:
 
2. Groups of pixels move together (motion smoothness).
 
2. Groups of pixels move together (motion smoothness).
  
Both of these assumptions are derived from real-world implications. Firstly, the time between two consecutive frames of a video are so minuscule, such that it becomes extremely improbable for the intensity of a pixel to completely change, even if its location has changed. Secondly, pixels do not teleport. The assumption that groups of pixels move together implies that there is spacial coherence or smoothing between objects.  
+
Both of these assumptions are derived from real-world implications. Firstly, the time between two consecutive frames of a video are so minuscule, such that it becomes extremely improbable for the intensity of a pixel to completely change, even if its location has changed. Secondly, pixels do not teleport. The assumption that groups of pixels move together implies that there is spacial coherence and that the image motion of objects changes gradually over time, creating motion smoothness.
 +
 
 +
Given these assumptions, imagine a frame (which is 2D image) with a pixel at position <math> (x,y) </math> at some time t, and in later frame, the pixel is now in position <math>(x + \Delta x, y + \Delta) </math> at some time <math> t + \Delta t </math>.
 +
 
 +
Then by the first assumption, the intensity of the pixel at time t is the same as the intensity of the pixel at time <math> t + \Delta t </math>:
 +
:<math>I(x+\Delta x,y+\Delta y,t+\Delta t) = I(x,y,t)</math>
 +
 
 +
Using Taylor series, we get:
 +
:<math>I(x+\Delta x,y+\Delta y,t+\Delta t) = I(x,y,t) + \frac{\partial I}{\partial x}\Delta x+\frac{\partial I}{\partial y}\Delta y+\frac{\partial I}{\partial t}\Delta t+</math>
 +
From these equations it follows that:
 +
:<math>\frac{\partial I}{\partial x}\Delta x+\frac{\partial I}{\partial y}\Delta y+\frac{\partial I}{\partial t}\Delta t = 0</math>
 +
which results in
 +
:<math>\frac{\partial I}{\partial x}V_x+\frac{\partial I}{\partial y}V_y+\frac{\partial I}{\partial t} = 0</math>
 +
where <math>V_x,V_y</math> are the <math>x</math> and <math>y</math> components of the velocity or optical flow of <math>I(x,y,t)</math> and <math>\tfrac{\partial I}{\partial x}</math>, <math>\tfrac{\partial I}{\partial y}</math> and <math>\tfrac{\partial I}{\partial t}</math> are the derivatives of the image at <math>(x,y,t)</math> in the corresponding directions. <math>I_x</math>,<math> I_y</math> and <math> I_t</math> can be written for the derivatives in the following.
 +
 
 +
This is an equation in two unknowns and cannot be solved as such. This is known as the ''[[Motion perception#The aperture problem|aperture problem]]'' of the optical flow algorithms. To find the optical flow another set of equations is needed, given by some additional constraint. All optical flow methods introduce additional conditions for estimating the actual flow.
  
 
== Problem & Motivation ==
 
== Problem & Motivation ==

Revision as of 02:17, 20 November 2018

Presented by

  • Hudson Ash
  • Stephen Kingston
  • Richard Zhang
  • Alexandre Xiao
  • Ziqiu Zhu

Optical Flow

Optical flow is the apparent motion of image brightness patterns in objects, surfaces and edges in videos. In more laymen terms, it tracks the change in position of pixels between two frames caused by the movement of the object or the camera, and it does this on the basis of two assumptions:

1. Pixel intensities do not change rapidly between frames (brightness constancy).

2. Groups of pixels move together (motion smoothness).

Both of these assumptions are derived from real-world implications. Firstly, the time between two consecutive frames of a video are so minuscule, such that it becomes extremely improbable for the intensity of a pixel to completely change, even if its location has changed. Secondly, pixels do not teleport. The assumption that groups of pixels move together implies that there is spacial coherence and that the image motion of objects changes gradually over time, creating motion smoothness.

Given these assumptions, imagine a frame (which is 2D image) with a pixel at position [math] (x,y) [/math] at some time t, and in later frame, the pixel is now in position [math](x + \Delta x, y + \Delta) [/math] at some time [math] t + \Delta t [/math].

Then by the first assumption, the intensity of the pixel at time t is the same as the intensity of the pixel at time [math] t + \Delta t [/math]:

[math]I(x+\Delta x,y+\Delta y,t+\Delta t) = I(x,y,t)[/math]

Using Taylor series, we get:

[math]I(x+\Delta x,y+\Delta y,t+\Delta t) = I(x,y,t) + \frac{\partial I}{\partial x}\Delta x+\frac{\partial I}{\partial y}\Delta y+\frac{\partial I}{\partial t}\Delta t+[/math]

From these equations it follows that:

[math]\frac{\partial I}{\partial x}\Delta x+\frac{\partial I}{\partial y}\Delta y+\frac{\partial I}{\partial t}\Delta t = 0[/math]

which results in

[math]\frac{\partial I}{\partial x}V_x+\frac{\partial I}{\partial y}V_y+\frac{\partial I}{\partial t} = 0[/math]

where [math]V_x,V_y[/math] are the [math]x[/math] and [math]y[/math] components of the velocity or optical flow of [math]I(x,y,t)[/math] and [math]\tfrac{\partial I}{\partial x}[/math], [math]\tfrac{\partial I}{\partial y}[/math] and [math]\tfrac{\partial I}{\partial t}[/math] are the derivatives of the image at [math](x,y,t)[/math] in the corresponding directions. [math]I_x[/math],[math] I_y[/math] and [math] I_t[/math] can be written for the derivatives in the following.

This is an equation in two unknowns and cannot be solved as such. This is known as the aperture problem of the optical flow algorithms. To find the optical flow another set of equations is needed, given by some additional constraint. All optical flow methods introduce additional conditions for estimating the actual flow.

Problem & Motivation

The current mainstream approach to solving optimal flow problems, albeit widely successful, has been a result of supervised learning methods using convolutional neural networks (convnets). The inherent challenge with these supervised learning approaches lies in the groundtruth flow, the process of gathering provable data for the measure of the target variable for the training and testing datasets. However, directly obtaining the motion field ground-truth is not possible and instead, segmentation ground-truthing is generally used. Since the segmentation ground-truthing isn't always automated, it requires laborious labeling of items in the video, sometimes manually using a ground-truth labeling software. Then as the training and test datasets become larger in size, the more laborious the ground-truthing becomes.

In the case of the KITTI dataset, a collection of images captured from driving cars around a mid-sized city in Germany, accurate segmentation ground truth for the training and testing data is obtained using high-tech laser scanners, as well as a GPS localization device installed onto the top of the cars.

Traditional Methods (Horn and Schunk)

-Under Construction-