Speaker
Description
With the escalating processing demands of modern high-energy physics experiments, traditional monitoring tools are faltering under the dual pressures of cumbersome deployment and coarse-grained observability in high-throughput production environments. JobLens is a lightweight, one-click-deployable data collector designed to deliver fine-grained, job-level observability for HEP workloads. Its architecture centers on three core innovations: (1) eBPF-based kernel instrumentation enabling near-zero-overhead, dynamic tracing of process lifecycles and system calls without kernel modifications; (2) a highly configurable plugin architecture featuring asynchronous double-buffered pipelines that seamlessly export metrics to diverse backends (Elasticsearch, Prometheus, Kafka) while maintaining under 5% CPU average overhead; and (3) a Lua-scripted rule engine that dynamically registers monitoring policies to autonomously detect and track specific job categories in HTCondor-managed HEP clusters. This script-driven automation eliminates manual configuration, empowering operators to define custom matching rules (by experiment, user group, or resource template) that are evaluated at runtime to instantiate per-job collectors. Design analysis and preliminary benchmarks demonstrate support for over 200 concurrent jobs on a single worker node, targeting sub-second 99th-percentile collection latency. Comprehensive validation at production scale across HEP experiment workflows is currently underway.