Groq
Ultra-fast inference with Groq LPU™ for real-time AI applications.
⚡
Fastest Inference: Groq’s LPU can achieve 500+ tokens/second for real-time applications
Features
- ✅ Ultra-low latency inference
- ✅ Chat Completions
- ✅ Llama 3.1 & 3.2 models
- ✅ Mixtral models
- ✅ Function Calling
- ✅ JSON Mode
- ✅ Streaming responses
Setup
1. Add your API key to Secrets
In the ProtectMyAPI dashboard:
- Go to your app → Secrets
- Add a secret named
GROQ_API_KEY - Paste your Groq API key as the value
2. Create an endpoint
Create an endpoint with:
- Name: Groq Chat
- Slug:
groq-chat - Target URL:
https://api.groq.com/openai/v1/chat/completions - Method: POST
- Auth Type: Bearer
- Auth Value:
{{GROQ_API_KEY}}
Code Examples
import ProtectMyAPI
// Fast chat completion
let response = try await ProtectMyAPI.shared.request(
endpoint: "groq-chat",
method: .POST,
body: [
"model": "llama-3.1-70b-versatile",
"messages": [
["role": "user", "content": "Explain quantum computing in simple terms"]
],
"temperature": 0.7
]
)
// Streaming for real-time responses
try await ProtectMyAPI.shared.stream(
endpoint: "groq-chat",
method: .POST,
body: [
"model": "llama-3.1-8b-instant",
"messages": [
["role": "user", "content": "Write a poem about coding"]
],
"stream": true
]
) { chunk in
print(chunk, terminator: "")
}
// JSON Mode
let jsonResponse = try await ProtectMyAPI.shared.request(
endpoint: "groq-chat",
method: .POST,
body: [
"model": "llama-3.1-70b-versatile",
"messages": [
["role": "user", "content": "List 3 planets with their number of moons"]
],
"response_format": ["type": "json_object"]
]
)Models
| Model | Speed | Use Case |
|---|---|---|
llama-3.1-8b-instant | Fastest | Simple tasks, real-time |
llama-3.1-70b-versatile | Fast | Complex reasoning |
llama-3.2-1b-preview | Ultra-fast | Edge/mobile |
llama-3.2-3b-preview | Very fast | Balanced |
mixtral-8x7b-32768 | Fast | Long context |
gemma2-9b-it | Fast | Instruction following |
Performance Optimization
Groq is ideal for:
- Real-time chat - Sub-second responses
- Interactive coding - Instant completions
- Voice assistants - Low-latency for TTS
- Gaming - Dynamic NPC dialogue
// Optimize for speed
let fastResponse = try await ProtectMyAPI.shared.request(
endpoint: "groq-chat",
method: .POST,
body: [
"model": "llama-3.1-8b-instant", // Fastest model
"messages": [
["role": "user", "content": prompt]
],
"max_tokens": 100, // Limit output
"temperature": 0 // Deterministic
]
)Comparison with OpenAI
| Feature | Groq | OpenAI |
|---|---|---|
| Latency | ~100ms | ~500ms |
| Throughput | 500+ tok/s | ~80 tok/s |
| Models | Open source | Proprietary |
| Cost | Very competitive | Higher |