Speaker
Description
High-energy physics experiments routinely perform petabyte-scale file transfers across distributed grid sites while simultaneously streaming data for interactive analysis, making traffic type differentiation critical for network orchestration, bandwidth forecasting, and responsiveness to operational demands. We present a machine learning–based traffic classification system that requires no payload inspection and operates directly on raw packet headers. At its core is the Workflow Identification Window (WIW) abstraction, which groups packets from multiple flows into short temporal sequences, preserving timing gaps and directionality. These sequences are fed into deep neural models such as CNN and LSTM, removing the need for manual feature engineering and allowing automatic discovery of discriminative patterns. Using traffic collected between Fermilab and U.S. storage sites as our use case, our system achieves over 94% accuracy in controlled tests and maintaining 84% performance on previously unseen traces, demonstrating that tightly spaced bursts in early flow phases provide stable classification signals.
For HEP operations, this capability brings multiple benefits: first, it enables predictive bandwidth allocation; second, it supports early traffic shaping for bulk transfers; third, it ensures responsive prioritization of streaming sessions; and fourth, it lays the foundation for self-driving network services where workflows dynamically trigger adaptive QoS and routing. We are using the CMS experiment at the LHC as a use case, concentrating on U.S. sites that are connected via ESnet. ESnet's High-Touch service provides the essential packet-level visibility for capturing the fine-grained timing and flow patterns that enable our classification approach. For our CMS use case across U.S. ESnet-connected sites, this classification capability is the first step toward more efficient data distribution and analysis responsiveness. But the approach potentially has wide applicability to other HEP projects and beyond. In the long term, the approach aims at improving HEP-wide data movement performance by reducing transfer delays, balancing bandwidth utilization across distributed sites, and providing real-time workflow visibility to guide experiment planning and resource optimization.