Model roster
Beamdesk routes AI requests across 13 production models from four families: Claude, GPT, Gemini, and GLM-based. Each persona can specify preferred model family and fallback chain. Routing balances latency, cost, and capability per request.
Claude
Opus 4.7, Sonnet 4.6, Haiku 4.3 — best for complex reasoning, long context, and strict instruction following.
GPT
GPT-5.5, GPT-4.5, GPT-4o — strong at web search, code generation, and agentic tool chaining.
Gemini
3.1 Pro, 3.1 Flash, 1.5 Pro — excellent at deep research, Google Search grounding, and multimodal tasks.
GLM
GLM-5, GLM-4 — native Chinese support, strong math reasoning, and cost-effective generation.
Kimi
K2.5 Instruct, K2.5 Vision — best at design-to-code, visual QA, and OCR.
MiniMax
M2.7, M2.5 — ultra-fast batch operations, translations, and boilerplate generation.
Routing logic
Default routing uses a tiered fallback chain: Haiku for quick responses, Sonnet for standard requests, and Opus for complex tasks requiring deep reasoning. Personas can override this per-channel or per-intent.
Persona model configuration:
Routing tier: auto // haiku → sonnet → opus
Fallback family: claude
Timeout ms: 30000
Max output tokens: 8192
Per-channel overrides:
email: { tier: 'opus', fallback: 'claude' }
chat_widget: { tier: 'sonnet', fallback: 'gpt' }
voice: { tier: 'sonnet', fallback: 'claude' }
Per-intent routing:
'pricing_question': { model: 'gpt-5.5' }
'technical_debug': { model: 'claude-opus' }
'refund_request': { model: 'claude-opus', consensus: true }When a model times out or errors, routing falls back to the next model in the chain. If fallback chain is exhausted, the request fails with a clear error. You can configure per-channel timeouts and max output tokens.
Per-persona preferences
Each persona can specify model family preference, routing tier, and consensus mode. This lets you use fast models for low-risk queries and expensive models for high-risk decisions.
Persona: Billing
System prompt:
You handle refunds, invoices, and subscription changes.
Always verify policy before approving exceptions.
Model settings:
primary_family: claude
fallback_family: gpt
tier: opus
consensus_mode: auto
Consensus triggers:
- refund_amount >= 50
- policy_exception
- customer_risk_level: high
When consensus is enabled, routing runs:
1. Primary model generates draft
2. Secondary model verifies draft
3. If disagreement: third model breaks tieconsensus_mode: auto enables consensus for high-risk decisions detected by guardrails. Manual mode forces consensus for every request. Disabled mode skips consensus entirely.
Cost transparency
Beamdesk adds zero markup to model costs. You pay the provider rate directly: Claude, GPT, Gemini, GLM, Kimi, and MiniMax are all billed at their published per-token rates. Usage breakdown is available in workspace billing.
Approximate costs per 1K tokens
| Model | Input | Output |
|---|---|---|
| Claude Opus | $0.005 | $0.025 |
| Claude Sonnet | $0.003 | $0.015 |
| GPT-5.5 | $0.003 | $0.015 |
| Gemini 3.1 Pro | $0.001 | $0.006 |
| MiniMax M2.7 | $0.0002 | $0.001 |
Consensus mode
Consensus mode runs multiple models on the same request and compares outputs. Use consensus for high-stakes decisions: refunds, policy exceptions, security questions, and account changes. The tradeoff is higher cost and latency for higher reliability.
Consensus configuration:
enabled: true
quorum: 2 // 2/3 models must agree
timeout_ms: 60000
Model selection for consensus:
- primary: persona preferred model
- verifier: different family (avoid echo chamber)
- tiebreaker: opus or gpt-5.5
Example refund flow:
1. Draft model: Claude Opus suggests refund
2. Verifier: GPT-5.5 checks against policy
3. Tiebreaker: Gemini searches for similar cases
4. Decision: block (policy violation found)
Result:
- Action: abstain
- Citations: [policy_kb_123, case_456]
- Reason: "Refund exceeds 30-day window without override"Consensus adds 2-3x cost and latency for consensus-enabled requests. Use it selectively on high-risk queries identified by guardrails, not for every interaction.
SDK configuration
import { Beamdesk } from '@beamdesk/sdk';
const beam = new Beamdesk({
apiKey: process.env.BEAMDESK_API_KEY!,
baseUrl: 'https://beamdesk.preview.softblaze.net',
});
// Create persona with model preferences
await beam.personas.create({
name: 'Support',
systemPrompt: 'You resolve...',
modelConfig: {
primaryFamily: 'claude',
fallbackFamily: 'gpt',
tier: 'auto', // haiku → sonnet → opus
consensusMode: 'auto',
consensusTriggers: [
{ type: 'refund_amount', threshold: 50 },
{ type: 'policy_exception' },
{ type: 'customer_risk', threshold: 'high' },
],
perChannel: {
email: { tier: 'opus', fallback: 'claude' },
chat_widget: { tier: 'sonnet', fallback: 'gpt' },
},
},
});