Bring your own models. Inference stays inside.
Limit Platform is model-agnostic. Customer-selected open models, customer-trained models, vendor-supplied models, or hybrids. Inference runs on customer hardware. Weights, prompts, and outputs never leave the perimeter.
// inference stays inside
The platform is model-agnostic. Customers choose.
There is no single right model for every regulated AI workload. Some need open foundation models; some need customer-fine-tuned models; some need small specialist models per use case; some need ensembles. Limit Runtime serves all of them without locking the customer into a particular vendor or format.
What Limit Runtime owns is the inference loop itself: serving, governance, observability, evidence. The model is a pluggable component. The customer decides which model runs on their hardware, against their data, for their use case.
Serving, governance, evidence.
- 01
Multi-format model serving
Compatibility with common open formats (transformer-family weights, GGUF, ONNX, vLLM-compatible serving). Hybrid deployments combine multiple model backends behind a unified inference interface.
- 02
Inference-level governance
Every inference call passes through identity, authorization, sensitive-data handling, and policy enforcement before it reaches a model. Outputs are governed before they reach downstream consumers.
- 03
Per-inference evidence
Every model invocation lands in the evidence store with the prompt the model saw, the model version, the inference parameters, the output, and the downstream action. Reproducibility is bounded by the underlying model determinism.
- 04
Hybrid and ensemble support
Customers route workloads to different models based on use case, sensitivity, latency, or cost. The platform unifies the inference interface; the routing policy lives in customer-controlled configuration.
Weights and inference stay on customer hardware.
Model serving deploys inside the customer perimeter on customer compute. Model weights are stored in the customer-controlled storage layer. Inference happens locally; no prompt, completion, or model parameter leaves the perimeter.
For customers operating in air-gapped environments, the inference loop functions fully without any external dependency. Model updates are introduced through customer-controlled artifact pipelines.
- Model formats
- Transformer-family weights, GGUF, ONNX, vLLM-compatible serving. Customer choice of backend.
- Where weights live
- In customer-controlled storage. Encrypted with customer keys. Never copied off the perimeter.
- Where inference runs
- On customer compute, alongside the data. No off-perimeter call required.
- Hybrid routing
- Customer-controlled routing across multiple model backends per use case.
This capability is one layer of the operating system. Every application uses it. Every customer perimeter that runs Limit Platform gets it. See the full architecture →
See it in your environment.
Walk us through the systems already in place. We'll show you how this layer fits.