Data-driven tool condition monitoring (TCM) has gained increasing attention due to its high accuracy and adaptability in complex machining environments. Nevertheless, existing methods are often restricted to one-dimensional signal features or single-form image encodings, which leads to limited utilization, weak expression, and insufficient extraction of wear characteristics. To address these limitations, this paper proposes a multimodal image encoding strategy and a dual-stream cross-domain fusion network. Specifically, time series are transformed into RGB images via Recurrence Plots (RP), Short-Time Fourier Transform (STFT), and Gramian Angular Fields (GAF), thereby capturing time-frequency patterns, nonlinear dynamics, and trend information. Subsequently, a dual-stream architecture integrating EfficientNet and Gated Recurrent Unit (GRU) is constructed. The experimental results show that the method has high accuracy and versatility, and provides a practical solution for real-time TCM.