Groq

Name: ProtectMyAPI
Rating: 4.9 (50 reviews)

Ultra-fast inference with Groq LPU™ for real-time AI applications.

⚡

Fastest Inference: Groq’s LPU can achieve 500+ tokens/second for real-time applications

Features

✅ Ultra-low latency inference
✅ Chat Completions
✅ Llama 3.1 & 3.2 models
✅ Mixtral models
✅ Function Calling
✅ JSON Mode
✅ Streaming responses

Setup

1. Add your API key to Secrets

In the ProtectMyAPI dashboard:

Go to your app → Secrets
Add a secret named GROQ_API_KEY
Paste your Groq API key as the value

2. Create an endpoint

Create an endpoint with:

Name: Groq Chat
Slug: groq-chat
Target URL: https://api.groq.com/openai/v1/chat/completions
Method: POST
Auth Type: Bearer
Auth Value: {{GROQ_API_KEY}}

Code Examples

import ProtectMyAPI
 
// Fast chat completion
let response = try await ProtectMyAPI.shared.request(
    endpoint: "groq-chat",
    method: .POST,
    body: [
        "model": "llama-3.1-70b-versatile",
        "messages": [
            ["role": "user", "content": "Explain quantum computing in simple terms"]
        ],
        "temperature": 0.7
    ]
)
 
// Streaming for real-time responses
try await ProtectMyAPI.shared.stream(
    endpoint: "groq-chat",
    method: .POST,
    body: [
        "model": "llama-3.1-8b-instant",
        "messages": [
            ["role": "user", "content": "Write a poem about coding"]
        ],
        "stream": true
    ]
) { chunk in
    print(chunk, terminator: "")
}
 
// JSON Mode
let jsonResponse = try await ProtectMyAPI.shared.request(
    endpoint: "groq-chat",
    method: .POST,
    body: [
        "model": "llama-3.1-70b-versatile",
        "messages": [
            ["role": "user", "content": "List 3 planets with their number of moons"]
        ],
        "response_format": ["type": "json_object"]
    ]
)

// Fast chat completion
val response = ProtectMyAPI.instance.request(
    endpoint = "groq-chat",
    method = "POST",
    body = JSONObject().apply {
        put("model", "llama-3.1-70b-versatile")
        put("messages", JSONArray().apply {
            put(JSONObject().apply {
                put("role", "user")
                put("content", "Explain quantum computing in simple terms")
            })
        })
        put("temperature", 0.7)
    }
)
 
// With function calling
val toolResponse = ProtectMyAPI.instance.request(
    endpoint = "groq-chat",
    method = "POST",
    body = JSONObject().apply {
        put("model", "llama-3.1-70b-versatile")
        put("messages", JSONArray().apply {
            put(JSONObject().apply {
                put("role", "user")
                put("content", "What's the weather in Tokyo?")
            })
        })
        put("tools", JSONArray().apply {
            put(JSONObject().apply {
                put("type", "function")
                put("function", JSONObject().apply {
                    put("name", "get_weather")
                    put("description", "Get current weather")
                    put("parameters", JSONObject().apply {
                        put("type", "object")
                        put("properties", JSONObject().apply {
                            put("location", JSONObject().apply {
                                put("type", "string")
                            })
                        })
                    })
                })
            })
        })
    }
)

// Fast chat completion
final response = await ProtectMyAPI.instance.secureRequest(
  '/groq-chat',
  body: {
    'model': 'llama-3.1-70b-versatile',
    'messages': [
      {'role': 'user', 'content': 'Explain quantum computing in simple terms'}
    ],
    'temperature': 0.7,
  },
);
 
// With streaming
final streamResponse = await ProtectMyAPI.instance.secureRequestStream(
  '/groq-chat',
  body: {
    'model': 'llama-3.1-8b-instant',
    'messages': [
      {'role': 'user', 'content': 'Write a poem about coding'}
    ],
    'stream': true,
  },
);
 
await for (final chunk in streamResponse) {
  print(chunk);
}

Models

Model	Speed	Use Case
`llama-3.1-8b-instant`	Fastest	Simple tasks, real-time
`llama-3.1-70b-versatile`	Fast	Complex reasoning
`llama-3.2-1b-preview`	Ultra-fast	Edge/mobile
`llama-3.2-3b-preview`	Very fast	Balanced
`mixtral-8x7b-32768`	Fast	Long context
`gemma2-9b-it`	Fast	Instruction following

Performance Optimization

Groq is ideal for:

Real-time chat - Sub-second responses
Interactive coding - Instant completions
Voice assistants - Low-latency for TTS
Gaming - Dynamic NPC dialogue

// Optimize for speed
let fastResponse = try await ProtectMyAPI.shared.request(
    endpoint: "groq-chat",
    method: .POST,
    body: [
        "model": "llama-3.1-8b-instant", // Fastest model
        "messages": [
            ["role": "user", "content": prompt]
        ],
        "max_tokens": 100, // Limit output
        "temperature": 0 // Deterministic
    ]
)

Comparison with OpenAI

Feature	Groq	OpenAI
Latency	~100ms	~500ms
Throughput	500+ tok/s	~80 tok/s
Models	Open source	Proprietary
Cost	Very competitive	Higher

Mistral AI DeepSeek