AI Provider Setup · BYOK
Open-weight Llama and Qwen on fast serverless infra.
Together AI serves open-weight frontier models — Llama 3.1, Qwen 2.5 — on hosted serverless GPUs. Good pick when you want open-weight characteristics (reproducibility, eventual self-hosting) without running GPUs yourself.
TOGETHER_API_KEYSign up at together.ai. New accounts receive a small amount of trial credit.
Open api.together.xyz/settings/billing and add a card. Trial credit runs out fast with frontier models — add billing before you rely on it.
Navigate to api.together.xyz/settings/api-keys. You can create multiple keys per account.
Click Create new key, name it Admaxxer, and create. The key is shown once — copy it immediately.
Open Settings then AI Providers, find the Together AI row, paste the key, and click Connect.
meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo is our recommended pick — great price-performance. Use the 405B variant when you need max quality, or Qwen 2.5 for multilingual.
Confirm the key is fresh, billing is active, and the model id matches Together's exact casing (they are case-sensitive, e.g. Meta-Llama not meta-llama).
Once you've pasted your key into /settings/ai-providers, hit Test. Admaxxer makes a single no-cost call against the provider's /v1/models (or equivalent) endpoint and reports back:
These are the pinned model IDs Admaxxer tests against. The model picker in chat will show these plus any live-catalog entries the provider returns.
| Model ID | Display name | Context | Tools | Best for |
|---|---|---|---|---|
meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo |
Llama 3.1 70B Instruct Turbo (recommended) | 128K | Yes | Best price-performance on Together. Strong reasoning, tool use, and speed. |
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo |
Llama 3.1 405B Instruct Turbo | 128K | Yes | Largest Llama Together hosts. Use when you need maximum quality, expect higher latency. |
Qwen/Qwen2.5-72B-Instruct-Turbo |
Qwen 2.5 72B Instruct Turbo | 128K | Yes | Strongest non-Llama open-weight option. Great for multilingual (esp. Chinese) workloads. |
When it happens: On Connect if the key is wrong or revoked.
Fix: Regenerate at api.together.xyz/settings/api-keys and paste the new key.
When it happens: When trial credit runs out and no card is on file.
Fix: Add billing at api.together.xyz/settings/billing.
When it happens: Under concurrent load — per-model limits vary.
Fix: Space out parallel requests. Together publishes per-model limits in their docs.
When it happens: If the model id casing is wrong (e.g. meta-llama vs Meta-Llama).
Fix: Copy model ids exactly as they appear in the Together catalog — they are case-sensitive.
When it happens: On 405B during peak demand — very large models get queued.
Fix: Retry with backoff, or drop to the 70B tier for lower latency.