Together AI 🤝

Access 100+ open-source models with unified API.

🚀

What you can do: Chat completions, embeddings, image generation, code models, and streaming - all with open-source models at competitive prices.

Setup

Add your Together AI API key in the ProtectMyAPI Dashboard.

Chat Completions

Basic Chat

let together = ProtectMyAPI.togetherService()
 
let response = try await together.createChatCompletion(
    request: TogetherChatRequest(
        model: "meta-llama/Llama-3.1-70B-Instruct-Turbo",
        messages: [
            .system("You are a helpful assistant."),
            .user("Explain microservices architecture")
        ]
    )
)
 
print(response.choices.first?.message?.content ?? "")

val together = ProtectMyAPIAI.togetherService()
 
val response = together.createChatCompletion(
    request = TogetherChatRequest(
        model = "meta-llama/Llama-3.1-70B-Instruct-Turbo",
        messages = listOf(
            TogetherMessage.system("You are a helpful assistant."),
            TogetherMessage.user("Explain microservices architecture")
        )
    )
)
 
println(response.choices.firstOrNull()?.message?.content)

final together = ProtectMyAPIAI.togetherService();
 
final response = await together.createChatCompletion(
  request: TogetherChatRequest(
    model: "meta-llama/Llama-3.1-70B-Instruct-Turbo",
    messages: [
      TogetherMessage.system("You are a helpful assistant."),
      TogetherMessage.user("Explain microservices architecture"),
    ],
  ),
);
 
print(response.choices.first?.message?.content);

With Parameters

let response = try await together.createChatCompletion(
    request: TogetherChatRequest(
        model: "meta-llama/Llama-3.1-70B-Instruct-Turbo",
        messages: [.user("Write a creative story")],
        temperature: 0.8,
        maxTokens: 2000,
        topP: 0.95,
        topK: 50,
        repetitionPenalty: 1.1
    )
)

val response = together.createChatCompletion(
    request = TogetherChatRequest(
        model = "meta-llama/Llama-3.1-70B-Instruct-Turbo",
        messages = listOf(TogetherMessage.user("Write a creative story")),
        temperature = 0.8,
        maxTokens = 2000,
        topP = 0.95,
        topK = 50,
        repetitionPenalty = 1.1
    )
)

final response = await together.createChatCompletion(
  request: TogetherChatRequest(
    model: "meta-llama/Llama-3.1-70B-Instruct-Turbo",
    messages: [TogetherMessage.user("Write a creative story")],
    temperature: 0.8,
    maxTokens: 2000,
    topP: 0.95,
    topK: 50,
    repetitionPenalty: 1.1,
  ),
);

Streaming

for try await chunk in together.createChatCompletionStream(
    request: TogetherChatRequest(
        model: "meta-llama/Llama-3.1-70B-Instruct-Turbo",
        messages: [.user("Write a long essay about AI")]
    )
) {
    print(chunk.choices.first?.delta?.content ?? "", terminator: "")
}

together.createChatCompletionStream(
    request = TogetherChatRequest(
        model = "meta-llama/Llama-3.1-70B-Instruct-Turbo",
        messages = listOf(TogetherMessage.user("Write a long essay about AI"))
    )
).collect { chunk ->
    print(chunk.choices.firstOrNull()?.delta?.content ?: "")
}

await for (final chunk in together.createChatCompletionStream(
  request: TogetherChatRequest(
    model: "meta-llama/Llama-3.1-70B-Instruct-Turbo",
    messages: [TogetherMessage.user("Write a long essay about AI")],
  ),
)) {
  stdout.write(chunk.choices.first?.delta?.content ?? "");
}

Embeddings

Create vector embeddings for text:

let embeddings = try await together.createEmbeddings(
    request: TogetherEmbeddingsRequest(
        model: "togethercomputer/m2-bert-80M-8k-retrieval",
        input: [
            "The quick brown fox jumps over the lazy dog",
            "Machine learning is a subset of artificial intelligence"
        ]
    )
)
 
for embedding in embeddings.data {
    print("Vector dimensions: \(embedding.embedding.count)")
}

val embeddings = together.createEmbeddings(
    request = TogetherEmbeddingsRequest(
        model = "togethercomputer/m2-bert-80M-8k-retrieval",
        input = listOf(
            "The quick brown fox jumps over the lazy dog",
            "Machine learning is a subset of artificial intelligence"
        )
    )
)
 
embeddings.data.forEach { println("Vector dimensions: ${it.embedding.size}") }

final embeddings = await together.createEmbeddings(
  request: TogetherEmbeddingsRequest(
    model: "togethercomputer/m2-bert-80M-8k-retrieval",
    input: [
      "The quick brown fox jumps over the lazy dog",
      "Machine learning is a subset of artificial intelligence",
    ],
  ),
);
 
for (final embedding in embeddings.data) {
  print("Vector dimensions: ${embedding.embedding.length}");
}

Image Generation

Generate images with FLUX and other models:

let image = try await together.createImage(
    request: TogetherImageRequest(
        model: "black-forest-labs/FLUX.1-schnell-Free",
        prompt: "A futuristic city at sunset, cyberpunk style",
        width: 1024,
        height: 1024,
        steps: 20,
        n: 1
    )
)
 
if let imageData = image.data.first {
    // imageData.b64Json contains the base64-encoded image
    let data = Data(base64Encoded: imageData.b64Json!)
}

val image = together.createImage(
    request = TogetherImageRequest(
        model = "black-forest-labs/FLUX.1-schnell-Free",
        prompt = "A futuristic city at sunset, cyberpunk style",
        width = 1024,
        height = 1024,
        steps = 20,
        n = 1
    )
)
 
image.data.firstOrNull()?.let { imageData ->
    val bytes = Base64.getDecoder().decode(imageData.b64Json)
}

final image = await together.createImage(
  request: TogetherImageRequest(
    model: "black-forest-labs/FLUX.1-schnell-Free",
    prompt: "A futuristic city at sunset, cyberpunk style",
    width: 1024,
    height: 1024,
    steps: 20,
    n: 1,
  ),
);
 
final imageData = image.data.first;
final bytes = base64Decode(imageData.b64Json!);

Code Models

For code generation and analysis:

let response = try await together.createChatCompletion(
    request: TogetherChatRequest(
        model: "codellama/CodeLlama-70b-Instruct-hf",
        messages: [
            .system("You are an expert programmer."),
            .user("Write a Swift function to implement binary search")
        ],
        temperature: 0.1 // Lower for more deterministic code
    )
)

val response = together.createChatCompletion(
    request = TogetherChatRequest(
        model = "codellama/CodeLlama-70b-Instruct-hf",
        messages = listOf(
            TogetherMessage.system("You are an expert programmer."),
            TogetherMessage.user("Write a Kotlin function to implement binary search")
        ),
        temperature = 0.1
    )
)

final response = await together.createChatCompletion(
  request: TogetherChatRequest(
    model: "codellama/CodeLlama-70b-Instruct-hf",
    messages: [
      TogetherMessage.system("You are an expert programmer."),
      TogetherMessage.user("Write a Dart function to implement binary search"),
    ],
    temperature: 0.1,
  ),
);

Popular Models

Chat Models

Model	Parameters	Best For
`meta-llama/Llama-3.1-405B-Instruct-Turbo`	405B	Highest quality
`meta-llama/Llama-3.1-70B-Instruct-Turbo`	70B	Best balance
`meta-llama/Llama-3.1-8B-Instruct-Turbo`	8B	Fast responses
`mistralai/Mixtral-8x22B-Instruct-v0.1`	8x22B	MoE efficiency
`Qwen/Qwen2-72B-Instruct`	72B	Multilingual

Code Models

Model	Description
`codellama/CodeLlama-70b-Instruct-hf`	Best code generation
`deepseek-ai/deepseek-coder-33b-instruct`	Code analysis
`WizardLM/WizardCoder-Python-34B-V1.0`	Python specialist

Image Models

Model	Description
`black-forest-labs/FLUX.1-schnell-Free`	Free, fast
`black-forest-labs/FLUX.1-schnell`	Faster generation
`stabilityai/stable-diffusion-xl-base-1.0`	SDXL base

Embedding Models

Model	Dimensions
`togethercomputer/m2-bert-80M-8k-retrieval`	768
`BAAI/bge-large-en-v1.5`	1024
`sentence-transformers/msmarco-bert-base-dot-v5`	768

JSON Mode

Get structured JSON responses:

let response = try await together.createChatCompletion(
    request: TogetherChatRequest(
        model: "meta-llama/Llama-3.1-70B-Instruct-Turbo",
        messages: [
            .system("Return JSON only"),
            .user("List 3 programming languages with their use cases")
        ],
        responseFormat: .json
    )
)

val response = together.createChatCompletion(
    request = TogetherChatRequest(
        model = "meta-llama/Llama-3.1-70B-Instruct-Turbo",
        messages = listOf(
            TogetherMessage.system("Return JSON only"),
            TogetherMessage.user("List 3 programming languages with their use cases")
        ),
        responseFormat = ResponseFormat.JSON
    )
)

final response = await together.createChatCompletion(
  request: TogetherChatRequest(
    model: "meta-llama/Llama-3.1-70B-Instruct-Turbo",
    messages: [
      TogetherMessage.system("Return JSON only"),
      TogetherMessage.user("List 3 programming languages with their use cases"),
    ],
    responseFormat: ResponseFormat.json,
  ),
);

Pricing

Together AI offers competitive pricing for open-source models. Generally:

Inference: $0.10-0.90 per million tokens (varies by model)
Images: $0.003-0.025 per image
Embeddings: $0.008 per million tokens

Check their pricing page for current rates.

Perplexity Replicate