RaP-ProtoViT: Efficient Dual-Head Transformers for Robust Gastric Endoscopy Classification and Generalizable Clinical Deployment 2025亚洲通信与网络会议 2025 Asian Conference on Communication and Networks ASIANComNet 2025

报告详情

RaP-ProtoViT: Efficient Dual-Head Transformers for Robust Gastric Endoscopy Classification and Generalizable Clinical Deployment

编号：164 访问权限：仅限参会人更新：2025-12-23 13:29:05 浏览：111次拓展类型2

报告开始：2025年12月30日 13:45（Asia/Amman）

报告时间：15min

所在会场：[S7] Track 7: Pattern Recognition, Computer Vision and Image Processing [S7-2] Track 7: Pattern Recognition, Computer Vision and Image Processing

视频无权播放演示文件

提示：该报告下的文件权限为仅限参会人，您尚未登录，暂时无法查看。

摘要

We introduce RaP-ProtoViT, an end-to-end dual-head transformer for 8-class GI endoscopy (Kvasir-v2). A margin head (ArcFace/AM-Softmax) enforces angular separation, while a prototype head aggregates top-k token–prototype similarities (with M trainable prototypes/class); a lightweight input-adaptive MLP fuses the heads. A leakage-aware pipeline (pHash dedup + GroupKFold) prevents near-duplicate bleed-over. Training uses AdamW(+SAM) with cosine warm-up, DropPath, label smoothing, SWA, and post-hoc temperature scaling; two-stage HPO (MOTPE+ASHA → qEHVI) under Latency@224 ≤ 200 ms and memory constraints selects operating points. On Kvasir-v2 the model attains 99.1% accuracy, Macro-F1 = 0.991, Macro-AUPRC = 0.997, AUROC = 0.998, and ECE ≈ 0.9%, with per-class F1 tightly clustered in 0.988–0.994 and fold stability (±0.2 pp accuracy, ±0.002 Macro-F1). Ablations show margin-only/prototype-only variants reduce Macro-F1 to 0.967/0.975 and raise ECE to 2.8%/2.2%; removing adaptive fusion drops Macro-F1 to 0.984. The proposed HPO converges 2–3× faster and yields better final MF1/AUPRC/ECE than Bayesian TPE or Random+ASHA. The prototype head provides localized, intrinsically interpretable evidence, complementing the margin head’s discrimination, within a single-model deployment footprint. By advancing robust, interpretable, and computationally efficient AI for gastric endoscopy, our approach can improve early detection of gastrointestinal disease and enable reliable clinical deployment across diverse healthcare settings.

关键词

Endoscopy classification, Vision transformer, Prototype learning, hyperparameter optimization.

报告人

Mohamadreza Khosravi

Researcher Shiraz University of Medical Sciences

稿件作者

Khosro Rezaee Meybod University

Mohamadreza Khosravi Shiraz University of Medical Sciences

Ali Rachini Holy Spirit University of Kaslik

Zakaria Che Muda Surveying INTI-IU University

发表评论

全部评论

重要日期

会议日期

12月29日

2025

至

12月31日

2025
12月30日 2025

报告提交截止日期
02月10日 2026

初稿截稿日期
02月10日 2026

注册截止日期

主办单位

国际科学联合会

承办单位

扎尔卡大学

联系方式

Miss AsianComNet
as******@usssociety.org

登录查看完整联系方式

历届会议

2024年10月24日泰国 Bangkok
2024亚洲通讯与网络会议

移动端

在手机上打开

小程序

打开微信小程序

客服

扫码或点此咨询

2025亚洲通信与网络会议 (ASIANComNet 2025)

2025 Asian Conference on Communication and Networks