AI
Our LLMaaS Gateway (Large Language Models as a Service) provides high-performance access to a curated selection of current open-weight language models. Inference runs entirely on our Swiss-hosted GPU infrastructure — your prompts, embeddings, and generated responses never leave Switzerland.
Available Models
Currently available in production via the gateway:
Top Models
- MiniMax-M2.7
- Deepseek v3.2
- Qwen3.6-35B-A3B
- Gemma4
Other available Models
- apertus-70b
- apertus-8b
- bge-reranker
- deepseekr1-670b
- gpt-oss-120b
- kimi-k2
- llama4-maverick
- qwen3-vl-235b
- qwen3-embedding-4b
- qwen3-reranker-4b
- voxtral-4b-tts-2603
- whisper-large-v3-turbo
More top-tier models are in the evaluation phase and will be added soon. All models are addressed using the same provider/model format (e.g. ew/minimax27), so switching models is typically a one-line change.
OpenAI-Compatible API
The gateway exposes an OpenAI-compatible REST interface — existing code using the OpenAI SDK (Python, Node, Go, …) can be pointed at our endpoints with no changes to application logic:
POST /v1/chat/completions— chat and reasoning requests, including streaming and tool callingPOST /v1/embeddings— vector embeddings for RAG, semantic search, classificationPOST /v1/rerank— re-ranking of search results for higher hit qualityGET /v1/models— list of all currently available models
→ Full interface specification in the API Reference.
Virtual Keys & Governance
The gateway supports virtual keys (prefix sk-bf-...) for fine-grained access control, model routing, and per-team / per-project / per-use-case usage tracking. Self-service management of virtual keys will soon be available in the Cloud Service Portal — until then, keys are issued on request through our support team.
Typical Use Cases
- RAG pipelines — document search with embeddings + rerank, context-aware answer generation
- Code assistance — internal developer tooling, code review, and refactoring suggestions
- Classification & extraction — structured data extraction from emails, reports, tickets
- Agents & automation — tool-calling-enabled workflows with controlled write access
- Multilingual content — translation and localisation with a focus on German-speaking markets
Early Adopter Access
Would you like to evaluate LLMaaS now for internal pilot projects? The gateway is currently being opened gradually to selected early adopters.
Request access
Contact our support team to receive credentials, an API key, and tailored model recommendations for your use case.