🧠 AI ProvidersFireworks AI

Fireworks AI 🎆

Ultra-fast inference for LLMs and image models with function calling support.

What you can do: Lightning-fast chat completions, function calling, JSON mode, DeepSeek R1 reasoning, image generation, and streaming - optimized for production workloads.

Setup

Add your Fireworks AI API key in the ProtectMyAPI Dashboard.


Chat Completions

Basic Chat

let fireworks = ProtectMyAPI.fireworksService()
 
let response = try await fireworks.createChatCompletion(
    request: FireworksChatRequest(
        model: "accounts/fireworks/models/llama-v3p1-70b-instruct",
        messages: [
            .system("You are a helpful assistant."),
            .user("Explain quantum computing in simple terms")
        ]
    )
)
 
print(response.choices.first?.message.content ?? "")

With Parameters

let response = try await fireworks.createChatCompletion(
    request: FireworksChatRequest(
        model: "accounts/fireworks/models/llama-v3p1-70b-instruct",
        messages: [.user("Write a creative story")],
        temperature: 0.7,
        maxTokens: 2000,
        topP: 0.9,
        topK: 40,
        presencePenalty: 0.1,
        frequencyPenalty: 0.1
    )
)

Streaming

for try await chunk in fireworks.createChatCompletionStream(
    request: FireworksChatRequest(
        model: "accounts/fireworks/models/llama-v3p1-70b-instruct",
        messages: [.user("Write a detailed analysis of AI trends")]
    )
) {
    print(chunk.choices.first?.delta?.content ?? "", terminator: "")
}

DeepSeek R1 Reasoning

Advanced reasoning model for complex problems:

let response = try await fireworks.createChatCompletion(
    request: FireworksChatRequest(
        model: "accounts/fireworks/models/deepseek-r1",
        messages: [
            .user("""
                Solve this step by step:
                A train travels from A to B at 60 mph, and returns at 40 mph.
                What is the average speed for the round trip?
            """)
        ],
        temperature: 0.1 // Lower for reasoning
    )
)
 
// DeepSeek R1 shows reasoning in <think> tags
print(response.choices.first?.message.content ?? "")

Function Calling

let response = try await fireworks.createChatCompletion(
    request: FireworksChatRequest(
        model: "accounts/fireworks/models/firefunction-v2",
        messages: [
            .user("What's the weather like in San Francisco?")
        ],
        tools: [
            FireworksTool(
                type: "function",
                function: FireworksFunction(
                    name: "get_weather",
                    description: "Get current weather for a location",
                    parameters: [
                        "type": "object",
                        "properties": [
                            "location": [
                                "type": "string",
                                "description": "City name"
                            ],
                            "unit": [
                                "type": "string",
                                "enum": ["celsius", "fahrenheit"]
                            ]
                        ],
                        "required": ["location"]
                    ]
                )
            )
        ],
        toolChoice: "auto"
    )
)
 
// Check if function was called
if let toolCall = response.choices.first?.message.toolCalls?.first {
    print("Function: \(toolCall.function.name)")
    print("Arguments: \(toolCall.function.arguments)")
}

JSON Mode

Get structured JSON responses:

let response = try await fireworks.createChatCompletion(
    request: FireworksChatRequest(
        model: "accounts/fireworks/models/llama-v3p1-70b-instruct",
        messages: [
            .system("You are a helpful assistant that outputs JSON."),
            .user("List 3 popular programming languages with their main use cases")
        ],
        responseFormat: FireworksResponseFormat(type: "json_object")
    )
)
 
// Parse JSON response
if let json = response.choices.first?.message.content {
    print(json)
}

Image Generation

Generate images with Stable Diffusion:

let image = try await fireworks.createImage(
    request: FireworksImageRequest(
        model: "accounts/fireworks/models/stable-diffusion-xl-1024-v1-0",
        prompt: "A majestic mountain landscape at golden hour, photorealistic",
        negativePrompt: "blurry, low quality, distorted",
        width: 1024,
        height: 1024,
        steps: 30,
        guidanceScale: 7.5,
        seed: 42 // For reproducibility
    )
)
 
// image.data contains base64-encoded images
for img in image.data {
    let data = Data(base64Encoded: img.b64Json!)
}

Available Models

Chat Models

ModelContextBest For
llama-v3p1-405b-instruct128KHighest quality
llama-v3p1-70b-instruct128KBest balance
llama-v3p1-8b-instruct128KFast responses
mixtral-8x22b-instruct65KMoE efficiency
qwen2-72b-instruct32KMultilingual

Reasoning Models

ModelDescription
deepseek-r1Full reasoning with chain-of-thought
deepseek-r1-distill-llama-70bDistilled, faster reasoning

Function Calling Models

ModelDescription
firefunction-v2Optimized for function calling
firefunction-v1Original function model

Image Models

ModelDescription
stable-diffusion-xl-1024-v1-0SDXL base
playground-v2-1024px-aestheticAesthetic focus

Performance Features

Speculative Decoding

For even faster inference:

let response = try await fireworks.createChatCompletion(
    request: FireworksChatRequest(
        model: "accounts/fireworks/models/llama-v3p1-70b-instruct",
        messages: [.user("Hello!")],
        speculativeDecoding: true // Enable speculative decoding
    )
)

Pricing

Fireworks offers competitive pricing with pay-per-token:

  • Llama 70B: ~$0.90 per million tokens
  • Llama 8B: ~$0.20 per million tokens
  • DeepSeek R1: ~$3.00 per million tokens
  • Images: ~$0.025 per image

Check their pricing page for current rates.