Skip to content

Offload your model to any destination

One key thing with Beamlit Controller, is that you can use it without any Beamlit subscription. Basically, if you'are already running models replicas spread accross multiple clusters, Beamlit Controller can help you manage traffic and offload your models.

In the following tutorial will show you how to setup this architecture:

flowchart TD
subgraph s1["Kubernetes A"]
        n1["Llama 3"]
        n2["API"]
        n3["Beamlit Gateway"]
  end
subgraph s2["Kubernetes B"]
        n4["Llama 3"]
  end
    n2 -- Call Llama3 --> n3
    n3 --> n1
    n3 -- Oflload --> n4

Requirements

  • Two Kubernetes clusters
  • Helm (version 3.8.0 or later is recommended).
  • Beamlit controller installed on the first cluster (See Getting Started for that)

Let's dive in!

Lets assume you have a model deployment in your Kubernetes cluster. For testing purposes, this is a simple PHP-Apache deployment.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: php-apache
spec:
  selector:
    matchLabels:
      run: php-apache
  template:
    metadata:
      labels:
        run: php-apache
    spec:
      containers:
      - name: php-apache
        image: registry.k8s.io/hpa-example
        ports:
        - containerPort: 80
        resources:
          limits:
            cpu: 500m
          requests:
            cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
  name: php-apache
  labels:
    run: php-apache
spec:
  ports:
  - port: 80
  selector:
    run: php-apache

Deploy this on the second cluster too, but make it reachable from the first cluster.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: php-apache
spec:
  selector:
    matchLabels:
      run: php-apache
  template:
    metadata:
      labels:
        run: php-apache
    spec:
      containers:
      - name: php-apache
        image: registry.k8s.io/hpa-example
        ports:
        - containerPort: 80
        resources:
          limits:
            cpu: 500m
          requests:
            cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
  name: php-apache
  labels:
    run: php-apache
spec:
  type: LoadBalancer
  ports:
  - port: 80
  selector:
    run: php-apache

Now, you want to offload your deployment on the first cluster. Just create a ModelDeployment resource.

apiVersion: deployment.beamlit.com/v1alpha1
kind: ModelDeployment
metadata:
  name: my-model
spec:
  model: my-model
  environment: production
  modelSourceRef:
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache
    namespace: default
  serviceRef:
    name: php-apache
    namespace: default
    targetPort: 80
  offloadingConfig:
    remoteBackend:
      host: my-model-on-another-cluster:80
      scheme: http
    behavior:
      percentage: 50
    metrics:
      - type: Resource
        resource:
          name: cpu
          target:
            type: Utilization
            averageUtilization: 50

Further reading

  • Check the CRD definition for more details on the ModelDeployment resource and availables fields in spec.offloadingConfig.remoteBackend.