Groq

Ultra-fast inference with Groq LPU™ for real-time AI applications.

Fastest Inference: Groq’s LPU can achieve 500+ tokens/second for real-time applications

Features

  • ✅ Ultra-low latency inference
  • ✅ Chat Completions
  • ✅ Llama 3.1 & 3.2 models
  • ✅ Mixtral models
  • ✅ Function Calling
  • ✅ JSON Mode
  • ✅ Streaming responses

Setup

1. Add your API key to Secrets

In the ProtectMyAPI dashboard:

  1. Go to your app → Secrets
  2. Add a secret named GROQ_API_KEY
  3. Paste your Groq API key as the value

2. Create an endpoint

Create an endpoint with:

  • Name: Groq Chat
  • Slug: groq-chat
  • Target URL: https://api.groq.com/openai/v1/chat/completions
  • Method: POST
  • Auth Type: Bearer
  • Auth Value: {{GROQ_API_KEY}}

Code Examples

import ProtectMyAPI
 
// Fast chat completion
let response = try await ProtectMyAPI.shared.request(
    endpoint: "groq-chat",
    method: .POST,
    body: [
        "model": "llama-3.1-70b-versatile",
        "messages": [
            ["role": "user", "content": "Explain quantum computing in simple terms"]
        ],
        "temperature": 0.7
    ]
)
 
// Streaming for real-time responses
try await ProtectMyAPI.shared.stream(
    endpoint: "groq-chat",
    method: .POST,
    body: [
        "model": "llama-3.1-8b-instant",
        "messages": [
            ["role": "user", "content": "Write a poem about coding"]
        ],
        "stream": true
    ]
) { chunk in
    print(chunk, terminator: "")
}
 
// JSON Mode
let jsonResponse = try await ProtectMyAPI.shared.request(
    endpoint: "groq-chat",
    method: .POST,
    body: [
        "model": "llama-3.1-70b-versatile",
        "messages": [
            ["role": "user", "content": "List 3 planets with their number of moons"]
        ],
        "response_format": ["type": "json_object"]
    ]
)

Models

ModelSpeedUse Case
llama-3.1-8b-instantFastestSimple tasks, real-time
llama-3.1-70b-versatileFastComplex reasoning
llama-3.2-1b-previewUltra-fastEdge/mobile
llama-3.2-3b-previewVery fastBalanced
mixtral-8x7b-32768FastLong context
gemma2-9b-itFastInstruction following

Performance Optimization

Groq is ideal for:

  1. Real-time chat - Sub-second responses
  2. Interactive coding - Instant completions
  3. Voice assistants - Low-latency for TTS
  4. Gaming - Dynamic NPC dialogue
// Optimize for speed
let fastResponse = try await ProtectMyAPI.shared.request(
    endpoint: "groq-chat",
    method: .POST,
    body: [
        "model": "llama-3.1-8b-instant", // Fastest model
        "messages": [
            ["role": "user", "content": prompt]
        ],
        "max_tokens": 100, // Limit output
        "temperature": 0 // Deterministic
    ]
)

Comparison with OpenAI

FeatureGroqOpenAI
Latency~100ms~500ms
Throughput500+ tok/s~80 tok/s
ModelsOpen sourceProprietary
CostVery competitiveHigher