ICRA 2026 · Vienna, Austria

Contrastive Learning on 3D Point Clouds for Robotic Geometric Defect Detection

Alexander Tarvo1, Yusen Wan1, Xu Chen1

1MACS Lab, University of Washington

Anomaly localization by COSARAD on Real3D-AD: heatmaps of point-wise anomaly probability above light-blue ground-truth masks with defects in red.
Anomaly localization by COSARAD on the Real3D-AD dataset. Color intensity denotes point-wise anomaly probability; ground-truth masks are shown in light blue with defects in red.

Abstract

Robotic quality inspection is emerging as a key enabler in intelligent manufacturing, allowing robots to transcend human limitations in endurance, consistency, and access to complex structures. Most existing approaches emphasize 2D image-based surface defect detection, yet they often overlook geometric defects, which are more prevalent and challenging in industrial inspection.

We formulate geometric defect detection as anomaly detection in 3D point clouds and propose a framework that integrates contrastive learning with spatially aware comparisons of local geometries. We partition point-cloud surfaces into patches and use contrastive learning to train a neural feature extractor that captures rich geometric representations. An anomaly detection algorithm then identifies defects by comparing patch-level features in a spatially consistent manner. On the Real3D-AD benchmark, our method reaches a mean object-level AUROC of 0.901, establishing a new state of the art and demonstrating the potential of robotic inspection to detect subtle geometric anomalies.

Method Overview

COSARAD pipeline: contrastive feature extractor trained on good/positive/negative patch triplets, and inference comparing a test object against templates via spatial-aware memory banks.
The COSARAD pipeline. Training: a contrastive extractor learns discriminative patch features from labeled point clouds. Inference: a test object is compared against templates via spatially aware, location-specific memory banks.

Our framework operates in two stages. During training, we construct triplets of surface patches from labeled point clouds and learn a feature extractor to distinguish anomalous patches from defect-free ones. During inference, a test object is registered against several defect-free templates in a common coordinate frame, and each test patch is scored only against reference patches drawn from the same location.

Contrastive feature extractor

A PointNet++ backbone, trained with a triplet loss, learns highly expressive patch representations. In this representation space, a "good" patch sits close to another good patch from the same location and far from an anomalous one. Because our triplets are spatially fixed, standard mining schemes do not apply; instead, we use a memory-efficient EMA-based miner with yield control, together with strong geometric augmentation, to keep training informative and robust to object orientation.

Spatially aware patch comparison

Instead of storing every template patch in a single global memory bank, COSARAD maintains many small, location-specific banks, comparing a test patch only against template patches from the same location. This spatially aware comparison prevents false negatives — cases where a part feature that is perfectly valid in one place goes undetected when it appears in the wrong location. Comparing against multiple templates further makes the method robust to the manufacturing tolerances and measurement noise present in real point clouds.

Statistical anomaly scoring

For each patch, the anomaly score is Cohen's effect size between the test-to-template distance distribution and the template-to-template distances, calibrating each score by the natural geometric variability around that location. Per-patch scores are aggregated into an object score and diffused into a dense per-point anomaly map for precise localization.

Experimental Results

COSARAD sets a new state of the art on Real3D-AD and Anomaly-ShapeNet benchmarks. On Real3D-AD it reaches a mean object-level AUROC of 90.1% (vs. 82.9% for the strongest prior method, PointCore) and a point-level AUROC of 95.1%. On Anomaly-ShapeNet it reaches 91.3% object-level and 96.1% point-level AUROC. We evaluate with 10-fold cross-validation over disjoint object classes, so the extractor never sees the class it is tested on. The model trains on a single consumer GPU (RTX 5090, 32 GB).

Per-object accuracy on Real3D-AD, as object-level / point-level AUROC (%). Best in bold, second best underlined; "–" denotes a value not reported.
Object Reg3D-AD PO3AD Simple3D MC3D-AD PointCore COSARAD (ours)
Airplane71.6/63.180.4/–76.5/88.185.0/62.866.0/60.895.6/98.1
Candybar82.7/72.278.5/–85.1/96.277.8/73.697.6/76.0100.0/98.6
Car69.7/71.865.4/–98.1/99.274.9/81.986.6/70.692.2/98.0
Chicken85.2/67.668.6/–82.6/86.171.5/64.084.1/78.092.1/90.2
Diamond90.0/83.580.1/–100/99.095.5/94.296.3/81.0100.0/97.9
Duck58.4/50.382.0/–78.7/96.683.1/82.268.4/71.298.0/98.1
Fish91.9/85.285.9/–91.2/99.691.2/90.699.2/78.297.6/96.6
Gemstone41.7/54.569.3/–70.4/97.356.0/45.853.4/51.597.6/97.0
Seahorse76.2/81.775.6/–93.0/94.290.1/95.097.3/84.163.4/91.7
Shell35.8/81.180.0/–85.1/97.651.5/47.186.1/78.177.8/89.8
Starfish50.6/71.775.8/–69.5/85.876.6/69.065.2/73.682.3/94.1
Toffees68.5/75.977.1/–88.9/96.878.3/93.492.9/74.584.7/91.4
Mean70.4/70.576.7/–80.4/92.378.2/76.882.9/73.190.1/95.1
Box plots of object-level and point-level ROC AUC on Anomaly-ShapeNet, comparing PO3AD, Simple3D, MC3D-AD, and COSARAD.
Object- and point-level AUROC on Anomaly-ShapeNet, compared against recent baselines.

Ablations confirm both innovations — the contrastive feature extractor and the spatially aware patch comparison — matter: replacing the learned features with handcrafted FPFH descriptors drops Real3D-AD O-AUROC from 90.1% to 70.6%, and replacing the location-specific banks with a single global PatchCore bank drops it to 81.3%.

Data

The data used to train and evaluate COSARAD is available on Hugging Face: huggingface.co/datasets/alextarvo/cosarad. It builds on the public Real3D-AD and Anomaly-ShapeNet benchmarks.

BibTeX

@inproceedings{tarvo2026contrastive,
  title={Contrastive Learning on 3D Point Clouds for Robotic Geometric Defect Detection},
  author={Tarvo, Alexander and Yusen, Wan and Xu, Chen},
  booktitle={2026 IEEE International Conference on Robotics and Automation (ICRA)},
  year={2026},
  address={Vienna, Austria},
  organization={IEEE}
}