1. Q1
  2. Q2
  3. Q3
  4. Q4

16-825 Assignment 5: Point Cloud Processing

Name: Jaskaran Singh Sodhi

Andrew ID: jsodhi

Late Days Used: 0


1. Classification Model

Test accuracy: 0.9783

Class Point Cloud Predicted Class Interpretation
Chair cls_chair_success Chair
Vase cls_vase_success Vase
Lamp cls_lamp_success Lamp
Chair cls_chair_success Chair
Vase cls_vase_success Vase
Lamp cls_lamp_success Lamp
Chair cls_vase_failure Lamp The model fails on this fancy looking chair since it might not have seen such a form for a chair before
Vase cls_vase_failure Lamp The model clearly fails as the given vase looks a little bit like a lamp
Lamp cls_lamp_failure Vase The model clearly fails as the given lamp looks lika a vase

Interesting Observations






2. Segmentation Model

Test accuracy: 0.9086

Class Point Cloud Predicted Point Cloud Interpretation Prediction Accuracy
Chair seg_chair_success seg_chair_success 0.93375
Chair seg_chair_success seg_chair_success Due to the chair being quite shallow (almost flat with a bend), the model is not able to differentiate between the seat and the armrest. This causes the poor performance. 0.73875
Chair seg_chair_success seg_chair_success 0.9325
Chair seg_chair_success seg_chair_success 0.91
Chair seg_chair_success seg_chair_success 0.895
Chair seg_chair_success seg_chair_success 0.98625
Chair seg_chair_success seg_chair_success In this case, the stem of the chair is quite continuous, making it hard to segent the chair into its base and the beginning of the seat. Thus the model performs poorly. 0.7475
Chair seg_chair_success seg_chair_success 0.9575

Interesting Observations






3. Robustness Analysis

Experiment 1: Rotated Point Clouds

During evaluation, for each point cloud, we sample a 3D random rotation (in a given range so as to not make the viewpoint weird), and then make a prediction (cls/seg).

Task: Classification

Class Point Cloud Predicted Class Rotated Point Cloud Predicted Class
Chair ablation1_chair Chair ablation1_chair_rot Lamp
Chair ablation1_chair Chair ablation1_chair_rot Lamp
Vase ablation1_vase Vase ablation1_vase_rot Lamp
Vase ablation1_vase Vase ablation1_vase_rot Lamp
Lamp ablation1_lamp Lamp ablation1_lamp_rot Vase
Lamp ablation1_lamp Lamp ablation1_lamp_rot Lamp

Accuracy: 0.2655

Interpretation

The interesting thing we learn from this experiment is that in the training dataset, two out of these three classes, i.e., chairs and vases, are almost always upright. The model somehow learns this orthogonal property of chairs and vases, and as soon as onjects are rotated, tends to predict them as lamps, which it has seen to often contain. This could be due to the number of samples of chairs exceeding the other two categories, since chairs often contain vertical and horizontal features (corners, etc.).

Task: Segmentation

Class Point Cloud Segmented Point Cloud Accuracy Rotated Point Cloud Segmented Point Cloud Accuracy
Chair ablation2_chair ablation2_chair_rot 0.95375 ablation2_chair ablation2_chair_rot 0.44375
Chair ablation2_chair ablation2_chair_rot 0.9125 ablation2_chair ablation2_chair_rot 0.2675
Chair ablation2_chair ablation2_chair_rot 0.94375 ablation2_chair ablation2_chair_rot 0.32125
Chair ablation2_chair ablation2_chair_rot 0.8075 ablation2_chair ablation2_chair_rot 0.32875
Chair ablation2_chair ablation2_chair_rot 0.89625 ablation2_chair ablation2_chair_rot 0.32875

Interpretation

The interesting thing we notice from this experiment is that implicitly, the segmentation network is learning to segment sections of the chair based on the heights (z-values perhaps) of the points. This is why as soon as we rotate the point cloud, the model performs very poorly. Perhaps the implementation of transforms would help make the model robust to rotations.

Experiment 2: Sampled Points per Object

During evaluation, for each point cloud, we change the number of points sampled per point cloud form the default 800 to 100, and then make a prediction (cls/seg).

Task: Classification

Class Point Cloud Predicted Class Downsampled Point Cloud (N = 80) Predicted Class
Chair ablation3_chair Chair ablation3_chair_rot Chair
Chair ablation3_chair Chair ablation3_chair_rot Chair
Vase ablation3_vase Lamp ablation3_vase_rot Lamp
Lamp ablation3_vase Lamp ablation3_vase_rot Lamp
Vase ablation3_lamp Vase ablation3_lamp_rot Lamp
Vase ablation3_lamp Vase ablation3_lamp_rot Lamp

Accuracy: 0.7576

Interpretation

For the classification task, the downsampled model does not perform as poorly, and only in one instance, where in fact the object is quite vague, does it fail.

Task: Segmentation

Class Point Cloud Segmented Point Cloud Accuracy Downsampled Point Cloud (N = 80) Segmented Point Cloud Accuracy
Chair ablation4_chair ablation4_chair_rot 0.38625 ablation4_chair ablation4_chair_rot 0.30
Chair ablation4_chair ablation4_chair_rot 0.9925 ablation4_chair ablation4_chair_rot 0.975
Chair ablation4_chair ablation4_chair_rot 0.96375 ablation4_chair ablation4_chair_rot 0.8375
Chair ablation4_chair ablation4_chair_rot 0.9775 ablation4_chair ablation4_chair_rot 0.775
Chair ablation4_chair ablation4_chair_rot 0.92375 ablation4_chair ablation4_chair_rot 0.7875

Accuracy : 0.7075

Interpretation

We see that upon downsampling, the decrease in the model's performance is not as much as before. This is because in most of the point clouds, the majority of points are concentrated in the seat. And so, when points are randomly sampled, they tend to be sampled on the seat, And so, the horizontal slabbing features that the model has learned work very well, resulting in a high accuracy.






4. Expressive Architectures

4.1 PointNet++

Processing each batch: For the PointNet++ implementation we start by randomly sampling some points in our point cloud. These points are sampled from a uniform distribution for now (more intelligent sampling can be used). We then use knn_gather to find the k-nearest neighbors of those sampled points in the parent point cloud. Now, we find the centroid of each of those clusters, and then find the x, y and z translations of each point in a cluster from its centroid. For the current implementation, we are using those translations as locality features. This process constructs one set abstraction.

We perform 2 such set abstractions before feeding the obtained features into our original point cloud architecture.

Using this architecture, we are able to push our classification accuracy to 0.9884

Below are the visualizations of some renderings from PointNet and PointNet++, for the classification task.

Class Point Cloud PointNet Predicted Class Point Cloud PointNet++ Predicted Class
Chair ablation3_chair Chair ablation3_chair_rot Chair
Chair ablation3_chair Chair ablation3_chair_rot Chair
Lamp ablation3_lamp Lamp ablation3_lamp_rot Lamp
Lamp ablation3_lamp Lamp ablation3_lamp_rot Lamp

Shown below as the overall result of prediction by both PointNet and PointNet++, corresponding to above visualizations.

a

4.2 DGCNN

To implement a Dynamic Graph CNN for point cloud classification, we need to first implement an edge convolution operator. Taking inspiration from the paper, we do this as follows,

def edge_conv(self, x, k):

    # input : (B, 3, N)
    x = torch.transpose(x, 1, 2) # (B, N, 3)
    dists, idxs, nn = knn_points(x, x, K=k) # (B, N, K)

    addition_factor = torch.arange(0, x.shape[0], device=x.device).view(-1, 1, 1) * x.shape[2]
    addition_factor.reshape(-1,1)

    idxs += addition_factor # (B, N, K)

    batch_idxs = torch.arange(x.shape[0], device = x.device).view(x.shape[0], 1, 1)
    batch_idxs = batch_idxs.repeat(1, x.shape[1], k)
    feature = x[batch_idxs, idxs, :] # (B, 3, K, N)

    x = x.unsqueeze(2).repeat(1, 1, k, 1) # (B, 3, K, N)

    out = torch.cat((feature - x, x), dim = 1).permute(0, 1, 3, 2)

    # output : (B, 6, N, K)
    return out

As we can see, this edge convolution operator makes use of the k-nearest neighbor approach to learn some information about its neighbors in a 'graph'. Since it isn't possible to make a graph (mesh) from a point cloud directly, these edges can be learned through this convolution. We insert this convolution between each layer of the PointNet, and then use the vanilla PointNet architecture.

The implementation for this can be found in the codebase, but since I was facing a memory issue with CUDA as soon as I went above batch size = 1, I couldn't train this in the given time. The training pipeline work though.

b