Fireworks AI 🎆

Ultra-fast inference for LLMs and image models with function calling support.

⚡

What you can do: Lightning-fast chat completions, function calling, JSON mode, DeepSeek R1 reasoning, image generation, and streaming - optimized for production workloads.

Setup

Add your Fireworks AI API key in the ProtectMyAPI Dashboard.

Chat Completions

Basic Chat

let fireworks = ProtectMyAPI.fireworksService()
 
let response = try await fireworks.createChatCompletion(
    request: FireworksChatRequest(
        model: "accounts/fireworks/models/llama-v3p1-70b-instruct",
        messages: [
            .system("You are a helpful assistant."),
            .user("Explain quantum computing in simple terms")
        ]
    )
)
 
print(response.choices.first?.message.content ?? "")

val fireworks = ProtectMyAPIAI.fireworksService()
 
val response = fireworks.createChatCompletion(
    request = FireworksChatRequest(
        model = "accounts/fireworks/models/llama-v3p1-70b-instruct",
        messages = listOf(
            FireworksMessage.system("You are a helpful assistant."),
            FireworksMessage.user("Explain quantum computing in simple terms")
        )
    )
)
 
println(response.choices.firstOrNull()?.message?.content)

final fireworks = ProtectMyAPIAI.fireworksService();
 
final response = await fireworks.createChatCompletion(
  request: FireworksChatRequest(
    model: "accounts/fireworks/models/llama-v3p1-70b-instruct",
    messages: [
      FireworksMessage.system("You are a helpful assistant."),
      FireworksMessage.user("Explain quantum computing in simple terms"),
    ],
  ),
);
 
print(response.choices.first?.message.content);

With Parameters

let response = try await fireworks.createChatCompletion(
    request: FireworksChatRequest(
        model: "accounts/fireworks/models/llama-v3p1-70b-instruct",
        messages: [.user("Write a creative story")],
        temperature: 0.7,
        maxTokens: 2000,
        topP: 0.9,
        topK: 40,
        presencePenalty: 0.1,
        frequencyPenalty: 0.1
    )
)

val response = fireworks.createChatCompletion(
    request = FireworksChatRequest(
        model = "accounts/fireworks/models/llama-v3p1-70b-instruct",
        messages = listOf(FireworksMessage.user("Write a creative story")),
        temperature = 0.7,
        maxTokens = 2000,
        topP = 0.9,
        topK = 40,
        presencePenalty = 0.1,
        frequencyPenalty = 0.1
    )
)

final response = await fireworks.createChatCompletion(
  request: FireworksChatRequest(
    model: "accounts/fireworks/models/llama-v3p1-70b-instruct",
    messages: [FireworksMessage.user("Write a creative story")],
    temperature: 0.7,
    maxTokens: 2000,
    topP: 0.9,
    topK: 40,
    presencePenalty: 0.1,
    frequencyPenalty: 0.1,
  ),
);

Streaming

for try await chunk in fireworks.createChatCompletionStream(
    request: FireworksChatRequest(
        model: "accounts/fireworks/models/llama-v3p1-70b-instruct",
        messages: [.user("Write a detailed analysis of AI trends")]
    )
) {
    print(chunk.choices.first?.delta?.content ?? "", terminator: "")
}

fireworks.createChatCompletionStream(
    request = FireworksChatRequest(
        model = "accounts/fireworks/models/llama-v3p1-70b-instruct",
        messages = listOf(FireworksMessage.user("Write a detailed analysis of AI trends"))
    )
).collect { chunk ->
    print(chunk.choices.firstOrNull()?.delta?.content ?: "")
}

await for (final chunk in fireworks.createChatCompletionStream(
  request: FireworksChatRequest(
    model: "accounts/fireworks/models/llama-v3p1-70b-instruct",
    messages: [FireworksMessage.user("Write a detailed analysis of AI trends")],
  ),
)) {
  stdout.write(chunk.choices.first?.delta?.content ?? "");
}

DeepSeek R1 Reasoning

Advanced reasoning model for complex problems:

let response = try await fireworks.createChatCompletion(
    request: FireworksChatRequest(
        model: "accounts/fireworks/models/deepseek-r1",
        messages: [
            .user("""
                Solve this step by step:
                A train travels from A to B at 60 mph, and returns at 40 mph.
                What is the average speed for the round trip?
            """)
        ],
        temperature: 0.1 // Lower for reasoning
    )
)
 
// DeepSeek R1 shows reasoning in <think> tags
print(response.choices.first?.message.content ?? "")

val response = fireworks.createChatCompletion(
    request = FireworksChatRequest(
        model = "accounts/fireworks/models/deepseek-r1",
        messages = listOf(
            FireworksMessage.user("""
                Solve this step by step:
                A train travels from A to B at 60 mph, and returns at 40 mph.
                What is the average speed for the round trip?
            """.trimIndent())
        ),
        temperature = 0.1
    )
)
 
println(response.choices.firstOrNull()?.message?.content)

final response = await fireworks.createChatCompletion(
  request: FireworksChatRequest(
    model: "accounts/fireworks/models/deepseek-r1",
    messages: [
      FireworksMessage.user('''
        Solve this step by step:
        A train travels from A to B at 60 mph, and returns at 40 mph.
        What is the average speed for the round trip?
      '''),
    ],
    temperature: 0.1,
  ),
);
 
print(response.choices.first?.message.content);

Function Calling

let response = try await fireworks.createChatCompletion(
    request: FireworksChatRequest(
        model: "accounts/fireworks/models/firefunction-v2",
        messages: [
            .user("What's the weather like in San Francisco?")
        ],
        tools: [
            FireworksTool(
                type: "function",
                function: FireworksFunction(
                    name: "get_weather",
                    description: "Get current weather for a location",
                    parameters: [
                        "type": "object",
                        "properties": [
                            "location": [
                                "type": "string",
                                "description": "City name"
                            ],
                            "unit": [
                                "type": "string",
                                "enum": ["celsius", "fahrenheit"]
                            ]
                        ],
                        "required": ["location"]
                    ]
                )
            )
        ],
        toolChoice: "auto"
    )
)
 
// Check if function was called
if let toolCall = response.choices.first?.message.toolCalls?.first {
    print("Function: \(toolCall.function.name)")
    print("Arguments: \(toolCall.function.arguments)")
}

val response = fireworks.createChatCompletion(
    request = FireworksChatRequest(
        model = "accounts/fireworks/models/firefunction-v2",
        messages = listOf(
            FireworksMessage.user("What's the weather like in San Francisco?")
        ),
        tools = listOf(
            FireworksTool(
                type = "function",
                function = FireworksFunction(
                    name = "get_weather",
                    description = "Get current weather for a location",
                    parameters = mapOf(
                        "type" to "object",
                        "properties" to mapOf(
                            "location" to mapOf(
                                "type" to "string",
                                "description" to "City name"
                            ),
                            "unit" to mapOf(
                                "type" to "string",
                                "enum" to listOf("celsius", "fahrenheit")
                            )
                        ),
                        "required" to listOf("location")
                    )
                )
            )
        ),
        toolChoice = "auto"
    )
)
 
response.choices.firstOrNull()?.message?.toolCalls?.firstOrNull()?.let { toolCall ->
    println("Function: ${toolCall.function.name}")
    println("Arguments: ${toolCall.function.arguments}")
}

final response = await fireworks.createChatCompletion(
  request: FireworksChatRequest(
    model: "accounts/fireworks/models/firefunction-v2",
    messages: [
      FireworksMessage.user("What's the weather like in San Francisco?"),
    ],
    tools: [
      FireworksTool(
        type: "function",
        function: FireworksFunction(
          name: "get_weather",
          description: "Get current weather for a location",
          parameters: {
            "type": "object",
            "properties": {
              "location": {"type": "string", "description": "City name"},
              "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
            },
            "required": ["location"],
          },
        ),
      ),
    ],
    toolChoice: "auto",
  ),
);
 
final toolCall = response.choices.first?.message.toolCalls?.first;
if (toolCall != null) {
  print("Function: ${toolCall.function.name}");
  print("Arguments: ${toolCall.function.arguments}");
}

JSON Mode

Get structured JSON responses:

let response = try await fireworks.createChatCompletion(
    request: FireworksChatRequest(
        model: "accounts/fireworks/models/llama-v3p1-70b-instruct",
        messages: [
            .system("You are a helpful assistant that outputs JSON."),
            .user("List 3 popular programming languages with their main use cases")
        ],
        responseFormat: FireworksResponseFormat(type: "json_object")
    )
)
 
// Parse JSON response
if let json = response.choices.first?.message.content {
    print(json)
}

val response = fireworks.createChatCompletion(
    request = FireworksChatRequest(
        model = "accounts/fireworks/models/llama-v3p1-70b-instruct",
        messages = listOf(
            FireworksMessage.system("You are a helpful assistant that outputs JSON."),
            FireworksMessage.user("List 3 popular programming languages with their main use cases")
        ),
        responseFormat = FireworksResponseFormat(type = "json_object")
    )
)
 
response.choices.firstOrNull()?.message?.content?.let { json ->
    println(json)
}

final response = await fireworks.createChatCompletion(
  request: FireworksChatRequest(
    model: "accounts/fireworks/models/llama-v3p1-70b-instruct",
    messages: [
      FireworksMessage.system("You are a helpful assistant that outputs JSON."),
      FireworksMessage.user("List 3 popular programming languages with their main use cases"),
    ],
    responseFormat: FireworksResponseFormat(type: "json_object"),
  ),
);
 
print(response.choices.first?.message.content);

Image Generation

Generate images with Stable Diffusion:

let image = try await fireworks.createImage(
    request: FireworksImageRequest(
        model: "accounts/fireworks/models/stable-diffusion-xl-1024-v1-0",
        prompt: "A majestic mountain landscape at golden hour, photorealistic",
        negativePrompt: "blurry, low quality, distorted",
        width: 1024,
        height: 1024,
        steps: 30,
        guidanceScale: 7.5,
        seed: 42 // For reproducibility
    )
)
 
// image.data contains base64-encoded images
for img in image.data {
    let data = Data(base64Encoded: img.b64Json!)
}

val image = fireworks.createImage(
    request = FireworksImageRequest(
        model = "accounts/fireworks/models/stable-diffusion-xl-1024-v1-0",
        prompt = "A majestic mountain landscape at golden hour, photorealistic",
        negativePrompt = "blurry, low quality, distorted",
        width = 1024,
        height = 1024,
        steps = 30,
        guidanceScale = 7.5,
        seed = 42
    )
)
 
image.data.forEach { img ->
    val bytes = Base64.getDecoder().decode(img.b64Json)
}

final image = await fireworks.createImage(
  request: FireworksImageRequest(
    model: "accounts/fireworks/models/stable-diffusion-xl-1024-v1-0",
    prompt: "A majestic mountain landscape at golden hour, photorealistic",
    negativePrompt: "blurry, low quality, distorted",
    width: 1024,
    height: 1024,
    steps: 30,
    guidanceScale: 7.5,
    seed: 42,
  ),
);
 
for (final img in image.data) {
  final bytes = base64Decode(img.b64Json!);
}

Available Models

Chat Models

Model	Context	Best For
`llama-v3p1-405b-instruct`	128K	Highest quality
`llama-v3p1-70b-instruct`	128K	Best balance
`llama-v3p1-8b-instruct`	128K	Fast responses
`mixtral-8x22b-instruct`	65K	MoE efficiency
`qwen2-72b-instruct`	32K	Multilingual

Reasoning Models

Model	Description
`deepseek-r1`	Full reasoning with chain-of-thought
`deepseek-r1-distill-llama-70b`	Distilled, faster reasoning

Function Calling Models

Model	Description
`firefunction-v2`	Optimized for function calling
`firefunction-v1`	Original function model

Image Models

Model	Description
`stable-diffusion-xl-1024-v1-0`	SDXL base
`playground-v2-1024px-aesthetic`	Aesthetic focus

Performance Features

Speculative Decoding

For even faster inference:

let response = try await fireworks.createChatCompletion(
    request: FireworksChatRequest(
        model: "accounts/fireworks/models/llama-v3p1-70b-instruct",
        messages: [.user("Hello!")],
        speculativeDecoding: true // Enable speculative decoding
    )
)

val response = fireworks.createChatCompletion(
    request = FireworksChatRequest(
        model = "accounts/fireworks/models/llama-v3p1-70b-instruct",
        messages = listOf(FireworksMessage.user("Hello!")),
        speculativeDecoding = true
    )
)

final response = await fireworks.createChatCompletion(
  request: FireworksChatRequest(
    model: "accounts/fireworks/models/llama-v3p1-70b-instruct",
    messages: [FireworksMessage.user("Hello!")],
    speculativeDecoding: true,
  ),
);

Pricing

Fireworks offers competitive pricing with pay-per-token:

Llama 70B: ~$0.90 per million tokens
Llama 8B: ~$0.20 per million tokens
DeepSeek R1: ~$3.00 per million tokens
Images: ~$0.025 per image

Check their pricing page for current rates.

Replicate OpenRouter