# Introduction

This paper builds off of ideas from PointNet (Qi et al., 2017). The name PointNet is derived from the network's input - a point cloud. A point cloud is a set of three dimensional points that each have coordinates [math] (x,y,z) [/math]. These coordinates usually represent the surface of an object. For example, a point cloud describing the shape of a torus is shown below.

Point cloud torus

Processing point clouds is important in applications such as autonomous driving where point clouds are collected from an onboard LiDAR sensor. These point clouds can then be used for object detection. However, point clouds are challenging to process because:

1. They are unordered. If [math] N [/math] is the number of points in a point cloud, then there are [math] N! [/math] permutations that the point cloud can be represented.
2. The spatial arrangement of the points contains useful information, thus it needs to be encoded.
3. The function processing the point cloud needs to be invariant to transformations such as rotation and translations of all points.

Previously, typical point cloud processing methods handled the challenges of point clouds by transforming the data with a 3D voxel grid or by representing the point cloud with multiple 2D images. When PointNet was introduced, it was novel because it directly took points as its input. PointNet++ improves on PointNet by using a hierarchical method to better capture local structures of the point cloud.

Examples of point clouds and their associated task. Classification (left), part segmentation (centre), scene segmentation (right)

# Review of PointNet

The PointNet architecture is shown below. The core idea of the network is to apply a symmetric function on transformed points. Points are transformed to handle invariance with point cloud transformations, and this transformation is represented by the multi-layer perception network and T-Net. The symmetric function solves the challenge imposed by having unordered points; a symmetric function will produce the same value no matter the order of the input. This symmetric function is represented by the max pool layer.

PointNet architecture. The blue highlighted region is when it is used for classification, and the beige highlighted region is when it is used for segmentation.

# PointNet++

## Method

PointNet++ architecture
]

### Grouping Layer

Example of the two ways to perform grouping
]

# Sources

1. Charles R. Qi, Li Yi, Hao Su, Leonidas J. Guibas. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space, 2017

2. Charles R. Qi, Hao Su, Kaichun Mo, Leonidas J. Guibas. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation, 2017