Offloading Metric

The offloading metric is a customizable infrastructure metric used by the Beamlit controller to trigger model offloading when it hits a certain threshold.

Currently, Beamlit supports two ways to retrieve metrics:

metrics from a self-managed Prometheus, evaluated through a Prometheus query (PromQL).
metrics from a Kubernetes metrics-server

Overview

Setting up the offloading metric is made via the following parameters in the ModelDeployment custom resource:

spec:
  # ...
  offloadingConfig:
    remoteBackend:
      host: my-model-on-another-cluster:80
      scheme: http
    behavior:
      percentage: 50
    metrics:
      - type: Resource
        resource:
          name: cpu
          target:
            type: Utilization
            averageUtilization: 50

where:

remoteBackend is the reference to the remote cluster/backend where your traffic will be offloaded to
behavior is the percentage of requests to offload to the remote backend when the offloading metric reaches its threshold
metrics is the offloading metric, based on which the controller will decide whether to trigger traffic offloading

Set up metric using Prometheus

Prerequisites

A Prometheus server that is either in your Kubernetes cluster, or accessible in your network via URL without authentication.
Set up the Beamlit controller to monitor your Prometheus service by adding the following configuration in the controller’s values.yaml :

config:
  metricInformer:
    type: prometheus
    prometheus:
      address: my-local-prom:9090

Metric overview

The Beamlit controller can use any metric that is saved in the Prometheus database, or use any computation on such metrics using PromQL.

Metrics are specified by an External metric (MetricSpec from Kubernetes). For example:

offloadingConfig:
  remoteBackend:
    host: my-model-on-another-cluster:80
    scheme: http
  behavior:
    percentage: 50
  metrics:
    - type: External
      external:
        metric:
          name: my_custom_metric_in_prom # Metric name, or PromQL query
          selector:
            # Any label you want to match in your metric.
            # This only works for a single metric (not a PromQL query).
            matchLabels:
            my_label: "value"
          target:
            type: Value
            value: 10

Set up metric using Kubernetes metrics-server

Prerequisites

Kubernetes metrics-server must be installed on your Kubernetes cluster.

Using metrics-server is the default mode of the Beamlit controller.

Metric overview

The Beamlit controller can use any metric that's compatible with the Kubernetes HorizontalPodAutoscaler (HPA) and accessible through metrics-server.

Metrics are defined using the Horizontal Pod Autoscaler (HPA) format for Resource, Pod, Object, and External metrics. For example, with Resource metrics:

metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

This will trigger offloading when the average CPU usage gets higher than 50%.

Read more about the available parameters for defining metrics and targets on the Kubernetes HPA documentation.