Speaker
Description
The research and education community relies on a robust network in order to access the vast amounts of data generated by their scientific experiments. The underlying infrastructure connects a few hundreds of sites across the world, which require reliable and efficient transfers of increasingly large datasets. These activities demand proactive methods in network management, where potentially severe issues are predicted and circumvented before they can impact the data exchanges. Our ongoing research is focused on leveraging both machine learning (ML) and deep learning (DL) methodologies to find patterns that cause network anomalies, predict key performance metrics, and explore the interconnectivity of paths across the networks.
We explore a diverse set of ML/DL models including a range of strategies suited for time series analysis, anomaly detection, and predictive modeling, where we are continually adjusting and refining our techniques. The goal is to detect subtle indicators of network instability or degradation that could disrupt the scientific workflows. Furthermore, we seek to localize problematic clusters, specific routers, or router-to-router links. This capability could serve not only to inform site administrators of present network health, but to guide upgrades and resource allocation for future network planning.
In this presentation we will share our experiments intended to delve into suitable ML/DL techniques, including ensemble learning and unsupervised models that may capture the complexities inherent in network data. In addition, we will discuss some of the many challenges we encounter, the selected model architectures and achieved results.