ICRA 2026 · Vienna, Austria

Contrastive Learning on 3D Point Clouds for Robotic Geometric Defect Detection

¹MACS Lab, University of Washington

Anomaly localization by COSARAD on Real3D-AD: heatmaps of point-wise anomaly probability above light-blue ground-truth masks with defects in red. — Anomaly localization by COSARAD on the Real3D-AD dataset. Color intensity denotes point-wise anomaly probability; ground-truth masks are shown in light blue with defects in red.

Abstract

Robotic quality inspection is emerging as a key enabler in intelligent manufacturing, allowing robots to transcend human limitations in endurance, consistency, and access to complex structures. Most existing approaches emphasize 2D image-based surface defect detection, yet they often overlook geometric defects, which are more prevalent and challenging in industrial inspection.

We formulate geometric defect detection as anomaly detection in 3D point clouds and propose a framework that integrates contrastive learning with spatially aware comparisons of local geometries. We partition point-cloud surfaces into patches and use contrastive learning to train a neural feature extractor that captures rich geometric representations. An anomaly detection algorithm then identifies defects by comparing patch-level features in a spatially consistent manner. On the Real3D-AD benchmark, our method reaches a mean object-level AUROC of 0.901, establishing a new state of the art and demonstrating the potential of robotic inspection to detect subtle geometric anomalies.

Method Overview

Our framework operates in two stages. During training, we construct triplets of surface patches from labeled point clouds and learn a feature extractor to distinguish anomalous patches from defect-free ones. During inference, a test object is registered against several defect-free templates in a common coordinate frame, and each test patch is scored only against reference patches drawn from the same location.

Contrastive feature extractor

A PointNet++ backbone, trained with a triplet loss, learns highly expressive patch representations. In this representation space, a "good" patch sits close to another good patch from the same location and far from an anomalous one. Because our triplets are spatially fixed, standard mining schemes do not apply; instead, we use a memory-efficient EMA-based miner with yield control, together with strong geometric augmentation, to keep training informative and robust to object orientation.

Spatially aware patch comparison

Instead of storing every template patch in a single global memory bank, COSARAD maintains many small, location-specific banks, comparing a test patch only against template patches from the same location. This spatially aware comparison prevents false negatives — cases where a part feature that is perfectly valid in one place goes undetected when it appears in the wrong location. Comparing against multiple templates further makes the method robust to the manufacturing tolerances and measurement noise present in real point clouds.

Statistical anomaly scoring

For each patch, the anomaly score is Cohen's effect size between the test-to-template distance distribution and the template-to-template distances, calibrating each score by the natural geometric variability around that location. Per-patch scores are aggregated into an object score and diffused into a dense per-point anomaly map for precise localization.

Experimental Results

COSARAD sets a new state of the art on Real3D-AD and Anomaly-ShapeNet benchmarks. On Real3D-AD it reaches a mean object-level AUROC of 90.1% (vs. 82.9% for the strongest prior method, PointCore) and a point-level AUROC of 95.1%. On Anomaly-ShapeNet it reaches 91.3% object-level and 96.1% point-level AUROC. We evaluate with 10-fold cross-validation over disjoint object classes, so the extractor never sees the class it is tested on. The model trains on a single consumer GPU (RTX 5090, 32 GB).

Per-object accuracy on Real3D-AD, as object-level / point-level AUROC (%). Best in **bold**, second best underlined; "–" denotes a value not reported.
Object	Reg3D-AD	PO3AD	Simple3D	MC3D-AD	PointCore	COSARAD (ours)
Airplane	71.6/63.1	80.4/–	76.5/88.1	85.0/62.8	66.0/60.8	95.6/98.1
Candybar	82.7/72.2	78.5/–	85.1/96.2	77.8/73.6	97.6/76.0	100.0/98.6
Car	69.7/71.8	65.4/–	98.1/99.2	74.9/81.9	86.6/70.6	92.2/98.0
Chicken	85.2/67.6	68.6/–	82.6/86.1	71.5/64.0	84.1/78.0	92.1/90.2
Diamond	90.0/83.5	80.1/–	100/99.0	95.5/94.2	96.3/81.0	100.0/97.9
Duck	58.4/50.3	82.0/–	78.7/96.6	83.1/82.2	68.4/71.2	98.0/98.1
Fish	91.9/85.2	85.9/–	91.2/99.6	91.2/90.6	99.2/78.2	97.6/96.6
Gemstone	41.7/54.5	69.3/–	70.4/97.3	56.0/45.8	53.4/51.5	97.6/97.0
Seahorse	76.2/81.7	75.6/–	93.0/94.2	90.1/95.0	97.3/84.1	63.4/91.7
Shell	35.8/81.1	80.0/–	85.1/97.6	51.5/47.1	86.1/78.1	77.8/89.8
Starfish	50.6/71.7	75.8/–	69.5/85.8	76.6/69.0	65.2/73.6	82.3/94.1
Toffees	68.5/75.9	77.1/–	88.9/96.8	78.3/93.4	92.9/74.5	84.7/91.4
Mean	70.4/70.5	76.7/–	80.4/92.3	78.2/76.8	82.9/73.1	90.1/95.1

Box plots of object-level and point-level ROC AUC on Anomaly-ShapeNet, comparing PO3AD, Simple3D, MC3D-AD, and COSARAD. — Object- and point-level AUROC on Anomaly-ShapeNet, compared against recent baselines.

Ablations confirm both innovations — the contrastive feature extractor and the spatially aware patch comparison — matter: replacing the learned features with handcrafted FPFH descriptors drops Real3D-AD O-AUROC from 90.1% to 70.6%, and replacing the location-specific banks with a single global PatchCore bank drops it to 81.3%.

Data

The data used to train and evaluate COSARAD is available on Hugging Face: huggingface.co/datasets/alextarvo/cosarad. It builds on the public Real3D-AD and Anomaly-ShapeNet benchmarks.

BibTeX

@inproceedings{tarvo2026contrastive,
  title={Contrastive Learning on 3D Point Clouds for Robotic Geometric Defect Detection},
  author={Tarvo, Alexander and Yusen, Wan and Xu, Chen},
  booktitle={2026 IEEE International Conference on Robotics and Automation (ICRA)},
  year={2026},
  address={Vienna, Austria},
  organization={IEEE}
}