Offloading Metric
The offloading metric is a customizable infrastructure metric used by the Beamlit controller to trigger model offloading when it hits a certain threshold.
Currently, Beamlit supports two ways to retrieve metrics:
- metrics from a self-managed Prometheus, evaluated through a Prometheus query (PromQL).
- metrics from a Kubernetes metrics-server
Overview
Setting up the offloading metric is made via the following parameters in the ModelDeployment
custom resource:
spec:
# ...
offloadingConfig:
remoteBackend:
host: my-model-on-another-cluster:80
scheme: http
behavior:
percentage: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
where:
remoteBackend
is the reference to the remote cluster/backend where your traffic will be offloaded tobehavior
is the percentage of requests to offload to the remote backend when the offloading metric reaches its thresholdmetrics
is the offloading metric, based on which the controller will decide whether to trigger traffic offloading
Set up metric using Prometheus
Prerequisites
- A Prometheus server that is either in your Kubernetes cluster, or accessible in your network via URL without authentication.
- Set up the Beamlit controller to monitor your Prometheus service by adding the following configuration in the controller’s
values.yaml
:
Metric overview
The Beamlit controller can use any metric that is saved in the Prometheus database, or use any computation on such metrics using PromQL.
Metrics are specified by an External metric (MetricSpec from Kubernetes). For example:
offloadingConfig:
remoteBackend:
host: my-model-on-another-cluster:80
scheme: http
behavior:
percentage: 50
metrics:
- type: External
external:
metric:
name: my_custom_metric_in_prom # Metric name, or PromQL query
selector:
# Any label you want to match in your metric.
# This only works for a single metric (not a PromQL query).
matchLabels:
my_label: "value"
target:
type: Value
value: 10
Set up metric using Kubernetes metrics-server
Prerequisites
- Kubernetes metrics-server must be installed on your Kubernetes cluster.
Metric overview
The Beamlit controller can use any metric that's compatible with the Kubernetes HorizontalPodAutoscaler (HPA) and accessible through metrics-server.
Metrics are defined using the Horizontal Pod Autoscaler (HPA) format for Resource, Pod, Object, and External metrics. For example, with Resource metrics:
This will trigger offloading when the average CPU usage gets higher than 50%.