Bring your own models. Inference stays inside.

Limit Platform is model-agnostic. Customer-selected open models, customer-trained models, vendor-supplied models, or hybrids. Inference runs on customer hardware. Weights, prompts, and outputs never leave the perimeter.

// inference stays inside

01Overview

The platform is model-agnostic. Customers choose.

There is no single right model for every regulated AI workload. Some need open foundation models; some need customer-fine-tuned models; some need small specialist models per use case; some need ensembles. Limit Runtime serves all of them without locking the customer into a particular vendor or format.

What Limit Runtime owns is the inference loop itself: serving, governance, observability, evidence. The model is a pluggable component. The customer decides which model runs on their hardware, against their data, for their use case.

02Capabilities

Serving, governance, evidence.

Multi-format model serving

Compatibility with common open formats (transformer-family weights, GGUF, ONNX, vLLM-compatible serving). Hybrid deployments combine multiple model backends behind a unified inference interface.

Inference-level governance

Every inference call passes through identity, authorization, sensitive-data handling, and policy enforcement before it reaches a model. Outputs are governed before they reach downstream consumers.

Per-inference evidence

Every model invocation lands in the evidence store with the prompt the model saw, the model version, the inference parameters, the output, and the downstream action. Reproducibility is bounded by the underlying model determinism.

Hybrid and ensemble support

Customers route workloads to different models based on use case, sensitivity, latency, or cost. The platform unifies the inference interface; the routing policy lives in customer-controlled configuration.

03Inside the perimeter

Weights and inference stay on customer hardware.

Model serving deploys inside the customer perimeter on customer compute. Model weights are stored in the customer-controlled storage layer. Inference happens locally; no prompt, completion, or model parameter leaves the perimeter.

For customers operating in air-gapped environments, the inference loop functions fully without any external dependency. Model updates are introduced through customer-controlled artifact pipelines.

Model formats: Transformer-family weights, GGUF, ONNX, vLLM-compatible serving. Customer choice of backend.
Where weights live: In customer-controlled storage. Encrypted with customer keys. Never copied off the perimeter.
Where inference runs: On customer compute, alongside the data. No off-perimeter call required.
Hybrid routing: Customer-controlled routing across multiple model backends per use case.

Part of Limit Platform

This capability is one layer of the operating system. Every application uses it. Every customer perimeter that runs Limit Platform gets it. See the full architecture →

Related capabilities

Start a conversation

See it in your environment.

Walk us through the systems already in place. We'll show you how this layer fits.

Get in touch →All platform capabilities →

Bring your own models. Inference stays inside.

The platform is model-agnostic. Customers choose.

Serving, governance, evidence.

Multi-format model serving

Inference-level governance

Per-inference evidence

Hybrid and ensemble support

Weights and inference stay on customer hardware.

Compute and jobs

Sensitive-data handling

See it in your environment.

Bring your own models. Inference stays inside.

The platform is model-agnostic. Customers choose.

Serving, governance, evidence.

Multi-format model serving

Inference-level governance

Per-inference evidence

Hybrid and ensemble support

Weights and inference stay on customer hardware.

Compute and jobs→

Sensitive-data handling→

See it in your environment.

Compute and jobs

Sensitive-data handling