One key, every frontier model — the fastest way to try them all.
- Trying frontier models from multiple vendors without juggling separate keys
- Teams that want one invoice and one rate-limit budget across providers
- A/B testing the same prompt across Claude, GPT, Gemini, and Grok
Read the Connect OpenRouter guide ›
Claude Opus, Sonnet, and Haiku straight from Anthropic.
- Long-context analysis up to 200K tokens per request
- Complex multi-step reasoning with tools
- Writing tasks that need nuance — ad copy, email sequences, brand voice
Read the Connect Anthropic (Claude) guide ›
Enterprise-grade RAG and retrieval — Command R and Command R+.
- Retrieval-augmented generation with strong citation output
- Enterprise deployments that need a stricter data-handling contract
- Multilingual workloads — Command R is trained on 10+ high-resource languages
Read the Connect Cohere guide ›
Cheapest reasoning per token — strong chain-of-thought at a fraction of Claude's price.
- Bulk reasoning tasks where per-token cost dominates
- Batch analysis of campaign data, logs, or transcripts
- Prototyping agent loops before switching to a pricier model for production
Read the Connect DeepSeek guide ›
200K context with optional thinking mode — strong tool use at lower price than Opus.
- Cost-effective 200K-context workloads
- Chain-of-thought reasoning with optional thinking mode
- Strong tool-use performance at a lower price than Claude Opus
Read the Connect GLM (Z.AI) guide ›
1M-token context from Google AI Studio — Gemini 2.5 Pro, Flash, and Flash-Lite.
- 1M-token context — ingesting whole creative libraries, transcripts, or log dumps in one call
- Multimodal inputs (image + text) at large scale
- Cost-effective flash tiers for classification and routing at volume
Read the Connect Google Gemini guide ›
European data residency — Mistral Large, Medium, and Devstral.
- EU-based teams that need data processed within European infrastructure
- Workloads where you may later self-host a matching open-weight model
- Code tasks — Devstral is tuned for developer workflows
Read the Connect Mistral guide ›
GPT-5, GPT-4o, and o4-mini direct from OpenAI.
- Teams with an existing OpenAI org and monthly spend commitment
- 400K-context workloads that exceed Claude's 200K
- o4-mini for cheap reasoning-style tasks
Read the Connect OpenAI guide ›
Online-aware Sonar models with live citations — for research, not agents.
- Real-time web research with citations baked into every answer
- Questions about today's events, current prices, breaking news
- Competitor monitoring and market intel that needs fresh sources
Read the Connect Perplexity guide ›
Open-weight Llama and Qwen on fast serverless infra.
- Open-weight-first teams that may later self-host
- Llama 3.1 70B or 405B at higher throughput than most managed providers
- Qwen 2.5 for multilingual workloads with strong Chinese performance
Read the Connect Together AI guide ›
Grok 4 with real-time X/Twitter access.
- Real-time social signal — brand mentions, trending topics, influencer activity on X
- Questions about news or events that broke in the last few hours
- General-purpose reasoning at GPT-5-ish quality with a different flavor
Read the Connect xAI (Grok) guide ›