# Unsupervised Learning of Optical Flow via Brightness Constancy and Motion Smoothness

## Presented by

• Hudson Ash
• Stephen Kingston
• Richard Zhang
• Alexandre Xiao
• Ziqiu Zhu

## Optical Flow

Optical flow is the apparent motion of image brightness patterns in objects, surfaces and edges in videos. In more laymen terms, it tracks the change in position of pixels between two frames caused by the movement of the object or the camera. Most optical flows are estimated on the basis of two assumptions:

1. Pixel intensities do not change rapidly between frames (brightness constancy).

2. Groups of pixels move together (motion smoothness).

Both of these assumptions are derived from real-world implications. Firstly, the time between two consecutive frames of a video are so minuscule, such that it becomes extremely improbable for the intensity of a pixel to completely change, even if its location has changed. Secondly, pixels do not teleport. The assumption that groups of pixels move together implies that there is spacial coherence and that the image motion of objects changes gradually over time, creating motion smoothness.

Given these assumptions, imagine a video frame (which is 2D image) with a pixel at position $(x,y)$ at some time t, and in later frame, the pixel is now in position $(x + \Delta x, y + \Delta)$ at some time $t + \Delta t$.

Then by the first assumption, the intensity of the pixel at time t is the same as the intensity of the pixel at time $t + \Delta t$:

$I(x+\Delta x,y+\Delta y,t+\Delta t) = I(x,y,t)$

Using Taylor series, we get:

$I(x+\Delta x,y+\Delta y,t+\Delta t) = I(x,y,t) + \frac{\partial I}{\partial x}\Delta x+\frac{\partial I}{\partial y}\Delta y+\frac{\partial I}{\partial t}\Delta t$ ignoring the higher order terms.

From the two equations, it follows that:

$\frac{\partial I}{\partial x}\Delta x+\frac{\partial I}{\partial y}\Delta y+\frac{\partial I}{\partial t}\Delta t = 0$

which results in

$\frac{\partial I}{\partial x}V_x+\frac{\partial I}{\partial y}V_y+\frac{\partial I}{\partial t} = 0$

where $V_x,V_y$ are the $x$ and $y$ components of the velocity (displacement over time) or optical flow of $I(x,y,t)$ and $\tfrac{\partial I}{\partial x}$, $\tfrac{\partial I}{\partial y}$, and $\tfrac{\partial I}{\partial t}$ are the derivatives of the image at $(x,y,t)$ in the corresponding directions.

This can be rewritten as:

$I_xV_x+I_yV_y=-I_t$

or

$\nabla I^T\cdot\vec{V} = -I_t$

Where $\nabla I^T$ is known as the spatial gradient

Since this results in one equation with two unknowns $V_x,V_y$, it results into what is known as the aperture problem of the optical flow algorithms. In order to solve the optical flow problem, another set of constraints are required, which is where assumption 2 can be applied.

Traditional approaches to the optical flow problem consisted of many differential (gradient-based) methods. Horn and Schunck, 1981, being one of the first to create an approach for for optical flow estimation, is one of the most famous examples. Without going into the math, Horn and Schunk created constraints based on spatio-temporal derivatives of image brightness. Their estimation tries to solve the aperture problem by adding a smoothness condition where that the optical flow field varies smoothly through the entire image.

propose a method based on first order derivatives and add a smoothness condition on the flow vectors to the general conditions. They assume that object motion in a sequence will be rigid and approximately constant, that a pixel’s neighborhood in said objects will have similar velocity, therefore, changing smoothly over space and time. Nevertheless, this condition is not very realistic in many cases and it yields bad results [14] since the images’ flow has a lack of continuity, especially in the boundaries between different objects. Therefore the results obtained in these areas will not be correct. Poor results are also obtained in the sequences where there are multiple objects, each having different motion

It wasn't until 2015 that FlowNet [Dosovitskiy et al., 2015] was proposed as the first approach to use a deep neural network for end-to-end optical flow estimation.

-under construction-

## Problem & Motivation

The approaches to solving optimal flow problems, albeit widely successful, has mostly been a result of supervised learning methods using convolutional neural networks (convnets). The inherent challenge with these supervised learning approaches lies in the groundtruth flow, the process of gathering provable data for the measure of the target variable for the training and testing datasets. Directly obtaining the motion field ground-truth is not possible but instead, segmentation ground-truthing is generally used. Segmentation ground-truth is the classification of all pixels in an image. Since the segmentation ground-truthing isn't always automated, it requires laborious labeling of items in the video, sometimes even requiring manually using a ground-truth labeling software. Then as the training and test datasets become larger in size, the more laborious the segmentation ground-truthing becomes.

In the case of the KITTI dataset, a collection of images captured from driving cars around a mid-sized city in Germany, accurate segmentation ground truth for the training and testing data is obtained using high-tech laser scanners, as well as a GPS localization device installed onto the top of the cars.

The paper "Back to Basics: Unsupervised Learning of Optical Flow via Brightness Constancy and Motion Smoothness" by Jason J. Yu, Adam W. Harley and Konstantinos G. Derpanis presents an unsupervised approach to address the supervised challenges of optical flow.