Test accuracy: 0.9783
Class | Point Cloud | Predicted Class | Interpretation |
---|---|---|---|
Chair | Chair | ||
Vase | Vase | ||
Lamp | Lamp | ||
Chair | Chair | ||
Vase | Vase | ||
Lamp | Lamp | ||
Chair | Lamp | The model fails on this fancy looking chair since it might not have seen such a form for a chair before | |
Vase | Lamp | The model clearly fails as the given vase looks a little bit like a lamp | |
Lamp | Vase | The model clearly fails as the given lamp looks lika a vase |
Test accuracy: 0.9086
Class | Point Cloud | Predicted Point Cloud | Interpretation | Prediction Accuracy |
---|---|---|---|---|
Chair | 0.93375 | |||
Chair | Due to the chair being quite shallow (almost flat with a bend), the model is not able to differentiate between the seat and the armrest. This causes the poor performance. | 0.73875 | ||
Chair | 0.9325 | |||
Chair | 0.91 | |||
Chair | 0.895 | |||
Chair | 0.98625 | |||
Chair | In this case, the stem of the chair is quite continuous, making it hard to segent the chair into its base and the beginning of the seat. Thus the model performs poorly. | 0.7475 | ||
Chair | 0.9575 |
During evaluation, for each point cloud, we sample a 3D random rotation (in a given range so as to not make the viewpoint weird), and then make a prediction (cls/seg).
Class | Point Cloud | Predicted Class | Rotated Point Cloud | Predicted Class |
---|---|---|---|---|
Chair | Chair | Lamp | ||
Chair | Chair | Lamp | ||
Vase | Vase | Lamp | ||
Vase | Vase | Lamp | ||
Lamp | Lamp | Vase | ||
Lamp | Lamp | Lamp |
Accuracy: 0.2655
The interesting thing we learn from this experiment is that in the training dataset, two out of these three classes, i.e., chairs and vases, are almost always upright. The model somehow learns this orthogonal property of chairs and vases, and as soon as onjects are rotated, tends to predict them as lamps, which it has seen to often contain. This could be due to the number of samples of chairs exceeding the other two categories, since chairs often contain vertical and horizontal features (corners, etc.).
Class | Point Cloud | Segmented Point Cloud | Accuracy | Rotated Point Cloud | Segmented Point Cloud | Accuracy |
---|---|---|---|---|---|---|
Chair | 0.95375 | 0.44375 | ||||
Chair | 0.9125 | 0.2675 | ||||
Chair | 0.94375 | 0.32125 | ||||
Chair | 0.8075 | 0.32875 | ||||
Chair | 0.89625 | 0.32875 |
The interesting thing we notice from this experiment is that implicitly, the segmentation network is learning to segment sections of the chair based on the heights (z-values perhaps) of the points. This is why as soon as we rotate the point cloud, the model performs very poorly. Perhaps the implementation of transforms would help make the model robust to rotations.
During evaluation, for each point cloud, we change the number of points sampled per point cloud form the default 800 to 100, and then make a prediction (cls/seg).
Class | Point Cloud | Predicted Class | Downsampled Point Cloud (N = 80) | Predicted Class |
---|---|---|---|---|
Chair | Chair | Chair | ||
Chair | Chair | Chair | ||
Vase | Lamp | Lamp | ||
Lamp | Lamp | Lamp | ||
Vase | Vase | Lamp | ||
Vase | Vase | Lamp |
Accuracy: 0.7576
For the classification task, the downsampled model does not perform as poorly, and only in one instance, where in fact the object is quite vague, does it fail.
Class | Point Cloud | Segmented Point Cloud | Accuracy | Downsampled Point Cloud (N = 80) | Segmented Point Cloud | Accuracy |
---|---|---|---|---|---|---|
Chair | 0.38625 | 0.30 | ||||
Chair | 0.9925 | 0.975 | ||||
Chair | 0.96375 | 0.8375 | ||||
Chair | 0.9775 | 0.775 | ||||
Chair | 0.92375 | 0.7875 |
Accuracy : 0.7075
We see that upon downsampling, the decrease in the model's performance is not as much as before. This is because in most of the point clouds, the majority of points are concentrated in the seat. And so, when points are randomly sampled, they tend to be sampled on the seat, And so, the horizontal slabbing features that the model has learned work very well, resulting in a high accuracy.
Processing each batch: For the PointNet++ implementation we start by randomly sampling some points in our point cloud. These points are sampled from a uniform distribution for now (more intelligent sampling can be used). We then use knn_gather
to find the k-nearest neighbors of those sampled points in the parent point cloud. Now, we find the centroid of each of those clusters, and then find the x, y and z translations of each point in a cluster from its centroid. For the current implementation, we are using those translations as locality features. This process constructs one set abstraction.
We perform 2 such set abstractions before feeding the obtained features into our original point cloud architecture.
Using this architecture, we are able to push our classification accuracy to 0.9884
Below are the visualizations of some renderings from PointNet and PointNet++, for the classification task.
Class | Point Cloud | PointNet Predicted Class | Point Cloud | PointNet++ Predicted Class |
---|---|---|---|---|
Chair | Chair | Chair | ||
Chair | Chair | Chair | ||
Lamp | Lamp | Lamp | ||
Lamp | Lamp | Lamp |
Shown below as the overall result of prediction by both PointNet and PointNet++, corresponding to above visualizations.
To implement a Dynamic Graph CNN for point cloud classification, we need to first implement an edge convolution operator. Taking inspiration from the paper, we do this as follows,
def edge_conv(self, x, k):
# input : (B, 3, N)
x = torch.transpose(x, 1, 2) # (B, N, 3)
dists, idxs, nn = knn_points(x, x, K=k) # (B, N, K)
addition_factor = torch.arange(0, x.shape[0], device=x.device).view(-1, 1, 1) * x.shape[2]
addition_factor.reshape(-1,1)
idxs += addition_factor # (B, N, K)
batch_idxs = torch.arange(x.shape[0], device = x.device).view(x.shape[0], 1, 1)
batch_idxs = batch_idxs.repeat(1, x.shape[1], k)
feature = x[batch_idxs, idxs, :] # (B, 3, K, N)
x = x.unsqueeze(2).repeat(1, 1, k, 1) # (B, 3, K, N)
out = torch.cat((feature - x, x), dim = 1).permute(0, 1, 3, 2)
# output : (B, 6, N, K)
return out
As we can see, this edge convolution operator makes use of the k-nearest neighbor approach to learn some information about its neighbors in a 'graph'. Since it isn't possible to make a graph (mesh) from a point cloud directly, these edges can be learned through this convolution. We insert this convolution between each layer of the PointNet, and then use the vanilla PointNet architecture.
The implementation for this can be found in the codebase, but since I was facing a memory issue with CUDA as soon as I went above batch size = 1, I couldn't train this in the given time. The training pipeline work though.