16-825 Assignment 5: Point Cloud Processing

Name: Jaskaran Singh Sodhi

Andrew ID: jsodhi

Late Days Used: 0

1. Classification Model

Test accuracy: 0.9783

Class	Predicted Class	Interpretation
Chair	Chair
Vase	Vase
Lamp	Lamp
Chair	Chair
Vase	Vase
Lamp	Lamp
Chair	Lamp	The model fails on this fancy looking chair since it might not have seen such a form for a chair before
Vase	Lamp	The model clearly fails as the given vase looks a little bit like a lamp
Lamp	Vase	The model clearly fails as the given lamp looks lika a vase

Interesting Observations

It was a little difficult to generate a failure case for the chair class since the training dataset had disproportionately more chairs than lamps or vases.
Above renderings are made for 800 samples per point where the test accuracy was a bit lower. However, the given test accuracy of 97.83 was of the test dataset while training, and is not representative of the renderings. (due to lack of enough vRAM)

2. Segmentation Model

Test accuracy: 0.9086

Class	Interpretation	Prediction Accuracy
Chair		0.93375
Chair	Due to the chair being quite shallow (almost flat with a bend), the model is not able to differentiate between the seat and the armrest. This causes the poor performance.	0.73875
Chair		0.9325
Chair		0.91
Chair		0.895
Chair		0.98625
Chair	In this case, the stem of the chair is quite continuous, making it hard to segent the chair into its base and the beginning of the seat. Thus the model performs poorly.	0.7475
Chair		0.9575

Interesting Observations

Above renderings are made for 800 samples per point where the test accuracy was a bit lower. However, the given test accuracy of 90.86 was of the test dataset while training, and is not representative of the renderings. (due to lack of enough vRAM)

3. Robustness Analysis

Experiment 1: Rotated Point Clouds

During evaluation, for each point cloud, we sample a 3D random rotation (in a given range so as to not make the viewpoint weird), and then make a prediction (cls/seg).

Task: Classification

Class	Predicted Class	Predicted Class
Chair	Chair	Lamp
Chair	Chair	Lamp
Vase	Vase	Lamp
Vase	Vase	Lamp
Lamp	Lamp	Vase
Lamp	Lamp	Lamp

Accuracy: 0.2655

Interpretation

The interesting thing we learn from this experiment is that in the training dataset, two out of these three classes, i.e., chairs and vases, are almost always upright. The model somehow learns this orthogonal property of chairs and vases, and as soon as onjects are rotated, tends to predict them as lamps, which it has seen to often contain. This could be due to the number of samples of chairs exceeding the other two categories, since chairs often contain vertical and horizontal features (corners, etc.).

Task: Segmentation

Class	Accuracy	Accuracy
Chair	0.95375	0.44375
Chair	0.9125	0.2675
Chair	0.94375	0.32125
Chair	0.8075	0.32875
Chair	0.89625	0.32875

Interpretation

The interesting thing we notice from this experiment is that implicitly, the segmentation network is learning to segment sections of the chair based on the heights (z-values perhaps) of the points. This is why as soon as we rotate the point cloud, the model performs very poorly. Perhaps the implementation of transforms would help make the model robust to rotations.

Experiment 2: Sampled Points per Object

During evaluation, for each point cloud, we change the number of points sampled per point cloud form the default 800 to 100, and then make a prediction (cls/seg).

Task: Classification

Class	Predicted Class	Predicted Class
Chair	Chair	Chair
Chair	Chair	Chair
Vase	Lamp	Lamp
Lamp	Lamp	Lamp
Vase	Vase	Lamp
Vase	Vase	Lamp

Accuracy: 0.7576

Interpretation

For the classification task, the downsampled model does not perform as poorly, and only in one instance, where in fact the object is quite vague, does it fail.

Task: Segmentation

Class	Accuracy	Accuracy
Chair	0.38625	0.30
Chair	0.9925	0.975
Chair	0.96375	0.8375
Chair	0.9775	0.775
Chair	0.92375	0.7875

Accuracy : 0.7075

Interpretation

We see that upon downsampling, the decrease in the model's performance is not as much as before. This is because in most of the point clouds, the majority of points are concentrated in the seat. And so, when points are randomly sampled, they tend to be sampled on the seat, And so, the horizontal slabbing features that the model has learned work very well, resulting in a high accuracy.

4. Expressive Architectures

4.1 PointNet++

Processing each batch: For the PointNet++ implementation we start by randomly sampling some points in our point cloud. These points are sampled from a uniform distribution for now (more intelligent sampling can be used). We then use knn_gather to find the k-nearest neighbors of those sampled points in the parent point cloud. Now, we find the centroid of each of those clusters, and then find the x, y and z translations of each point in a cluster from its centroid. For the current implementation, we are using those translations as locality features. This process constructs one set abstraction.

We perform 2 such set abstractions before feeding the obtained features into our original point cloud architecture.

Using this architecture, we are able to push our classification accuracy to 0.9884

Below are the visualizations of some renderings from PointNet and PointNet++, for the classification task.

Class	PointNet Predicted Class	PointNet++ Predicted Class
Chair	Chair	Chair
Chair	Chair	Chair
Lamp	Lamp	Lamp
Lamp	Lamp	Lamp

Shown below as the overall result of prediction by both PointNet and PointNet++, corresponding to above visualizations.

4.2 DGCNN

To implement a Dynamic Graph CNN for point cloud classification, we need to first implement an edge convolution operator. Taking inspiration from the paper, we do this as follows,

def edge_conv(self, x, k):

    # input : (B, 3, N)
    x = torch.transpose(x, 1, 2) # (B, N, 3)
    dists, idxs, nn = knn_points(x, x, K=k) # (B, N, K)

    addition_factor = torch.arange(0, x.shape[0], device=x.device).view(-1, 1, 1) * x.shape[2]
    addition_factor.reshape(-1,1)

    idxs += addition_factor # (B, N, K)

    batch_idxs = torch.arange(x.shape[0], device = x.device).view(x.shape[0], 1, 1)
    batch_idxs = batch_idxs.repeat(1, x.shape[1], k)
    feature = x[batch_idxs, idxs, :] # (B, 3, K, N)

    x = x.unsqueeze(2).repeat(1, 1, k, 1) # (B, 3, K, N)

    out = torch.cat((feature - x, x), dim = 1).permute(0, 1, 3, 2)

    # output : (B, 6, N, K)
    return out

As we can see, this edge convolution operator makes use of the k-nearest neighbor approach to learn some information about its neighbors in a 'graph'. Since it isn't possible to make a graph (mesh) from a point cloud directly, these edges can be learned through this convolution. We insert this convolution between each layer of the PointNet, and then use the vanilla PointNet architecture.

The implementation for this can be found in the codebase, but since I was facing a memory issue with CUDA as soon as I went above batch size = 1, I couldn't train this in the given time. The training pipeline work though.