Yulong Xing / China Academy of Railway Sciences Corporation Limited
Leilei Zhang / Moutai Institute
Yunfei Shao / University of Science and Technology Beijing
Jianping Xuan / Huazhong University of Science and Technology
Intelligent Operation and Maintenance (IO&M) of high-end equipment faces challenges such as a lack of annotated data, weak generalization capability, and insufficient integration of domain knowledge. To address these issues, we propose an Industrial Large Model (ILM) that unifies the time-frequency visual representation of sensor signals with textual maintenance knowledge within a large decoder-based language model framework. ILM uses a visual transformer as a visual encoder to extract semantic embeddings from the spectrograms generated by the sensor, which are then projected and merged into a prompt template together with the domain text. Through prompt engineering, ILM effectively fuses multimodal information to achieve flexible and explainable fault diagnosis without complex multi-stage training. Experiments on benchmark datasets and real industrial datasets show that ILM has higher accuracy, robustness, and reasoning ability than baseline methods, highlighting its potential as a next-generation intelligent IO&M solution.