triton inference service now available at UC AF(https://github.com/triton-inference-server/server/tree/main/deploy/k8s-onprem)
- configured with loadbalancing and auto scaling(maximum currnently set a 3 but we have more than 70 gpus)
- model registry is a S3 bucket(currently on SSL cluster)
- available with kubernetes clusterips