Ran An / Xi’an Jiaotong University; Xi’an; PR China; 710049;School of Mechanical Engineering
Jiafeng Tang / Xi’an Jiaotong University;School of Mechanical Engineering
Zhibin Zhao / 西安交通大学;School of Mechanical Engineering
Xuefeng Chen / State Key Laboratory for Manufacturing Systems Engineering Xi’an Jiaotong University
Few-shot anomaly detection (FSAD) aims to identify anomalies using models trained on minimal samples, a task made particularly challenging in real-world scenarios due to domain shifts caused by variations in lighting conditions, object pose, and other environmental factors. Recently, large pre-trained vision-language models like CLIP have shown promise in FSAD visual tasks. However, most of existing approaches often rely on manually designed prompts to capture anomaly semantics, which are susceptible to environmental interference and labor-intensive to implement. To address this, we propose a cross-domain CLIP for anomaly detection (CDADCLIP) to adapt CLIP for FSAD under conditions with domain shift. CDADCLIP incorporates domain-invariant learnable prompts into CLIP to model normal and abnormal semantics. Furthermore, a Hybrid Semantic Fusion (HSF) module is utilized to enhance anomaly detection performance by integrating region-level information with global features. Experiments result on the AeBAD-S dataset with domain shift demonstrates the superior performance of our method compared with existing state-of-the-art methods.