Fireworks AI 🎆
Ultra-fast inference for LLMs and image models with function calling support.
⚡
What you can do: Lightning-fast chat completions, function calling, JSON mode, DeepSeek R1 reasoning, image generation, and streaming - optimized for production workloads.
Setup
Add your Fireworks AI API key in the ProtectMyAPI Dashboard.
Chat Completions
Basic Chat
let fireworks = ProtectMyAPI.fireworksService()
let response = try await fireworks.createChatCompletion(
request: FireworksChatRequest(
model: "accounts/fireworks/models/llama-v3p1-70b-instruct",
messages: [
.system("You are a helpful assistant."),
.user("Explain quantum computing in simple terms")
]
)
)
print(response.choices.first?.message.content ?? "")With Parameters
let response = try await fireworks.createChatCompletion(
request: FireworksChatRequest(
model: "accounts/fireworks/models/llama-v3p1-70b-instruct",
messages: [.user("Write a creative story")],
temperature: 0.7,
maxTokens: 2000,
topP: 0.9,
topK: 40,
presencePenalty: 0.1,
frequencyPenalty: 0.1
)
)Streaming
for try await chunk in fireworks.createChatCompletionStream(
request: FireworksChatRequest(
model: "accounts/fireworks/models/llama-v3p1-70b-instruct",
messages: [.user("Write a detailed analysis of AI trends")]
)
) {
print(chunk.choices.first?.delta?.content ?? "", terminator: "")
}DeepSeek R1 Reasoning
Advanced reasoning model for complex problems:
let response = try await fireworks.createChatCompletion(
request: FireworksChatRequest(
model: "accounts/fireworks/models/deepseek-r1",
messages: [
.user("""
Solve this step by step:
A train travels from A to B at 60 mph, and returns at 40 mph.
What is the average speed for the round trip?
""")
],
temperature: 0.1 // Lower for reasoning
)
)
// DeepSeek R1 shows reasoning in <think> tags
print(response.choices.first?.message.content ?? "")Function Calling
let response = try await fireworks.createChatCompletion(
request: FireworksChatRequest(
model: "accounts/fireworks/models/firefunction-v2",
messages: [
.user("What's the weather like in San Francisco?")
],
tools: [
FireworksTool(
type: "function",
function: FireworksFunction(
name: "get_weather",
description: "Get current weather for a location",
parameters: [
"type": "object",
"properties": [
"location": [
"type": "string",
"description": "City name"
],
"unit": [
"type": "string",
"enum": ["celsius", "fahrenheit"]
]
],
"required": ["location"]
]
)
)
],
toolChoice: "auto"
)
)
// Check if function was called
if let toolCall = response.choices.first?.message.toolCalls?.first {
print("Function: \(toolCall.function.name)")
print("Arguments: \(toolCall.function.arguments)")
}JSON Mode
Get structured JSON responses:
let response = try await fireworks.createChatCompletion(
request: FireworksChatRequest(
model: "accounts/fireworks/models/llama-v3p1-70b-instruct",
messages: [
.system("You are a helpful assistant that outputs JSON."),
.user("List 3 popular programming languages with their main use cases")
],
responseFormat: FireworksResponseFormat(type: "json_object")
)
)
// Parse JSON response
if let json = response.choices.first?.message.content {
print(json)
}Image Generation
Generate images with Stable Diffusion:
let image = try await fireworks.createImage(
request: FireworksImageRequest(
model: "accounts/fireworks/models/stable-diffusion-xl-1024-v1-0",
prompt: "A majestic mountain landscape at golden hour, photorealistic",
negativePrompt: "blurry, low quality, distorted",
width: 1024,
height: 1024,
steps: 30,
guidanceScale: 7.5,
seed: 42 // For reproducibility
)
)
// image.data contains base64-encoded images
for img in image.data {
let data = Data(base64Encoded: img.b64Json!)
}Available Models
Chat Models
| Model | Context | Best For |
|---|---|---|
llama-v3p1-405b-instruct | 128K | Highest quality |
llama-v3p1-70b-instruct | 128K | Best balance |
llama-v3p1-8b-instruct | 128K | Fast responses |
mixtral-8x22b-instruct | 65K | MoE efficiency |
qwen2-72b-instruct | 32K | Multilingual |
Reasoning Models
| Model | Description |
|---|---|
deepseek-r1 | Full reasoning with chain-of-thought |
deepseek-r1-distill-llama-70b | Distilled, faster reasoning |
Function Calling Models
| Model | Description |
|---|---|
firefunction-v2 | Optimized for function calling |
firefunction-v1 | Original function model |
Image Models
| Model | Description |
|---|---|
stable-diffusion-xl-1024-v1-0 | SDXL base |
playground-v2-1024px-aesthetic | Aesthetic focus |
Performance Features
Speculative Decoding
For even faster inference:
let response = try await fireworks.createChatCompletion(
request: FireworksChatRequest(
model: "accounts/fireworks/models/llama-v3p1-70b-instruct",
messages: [.user("Hello!")],
speculativeDecoding: true // Enable speculative decoding
)
)Pricing
Fireworks offers competitive pricing with pay-per-token:
- Llama 70B: ~$0.90 per million tokens
- Llama 8B: ~$0.20 per million tokens
- DeepSeek R1: ~$3.00 per million tokens
- Images: ~$0.025 per image
Check their pricing page for current rates.