ElevenLabs 🔊

Create natural-sounding voices with AI text-to-speech, voice cloning, and audio processing.

🎙️

What you can do: Text-to-speech (29 languages), speech-to-speech voice conversion, instant voice cloning, sound effects generation, and audio isolation.

Setup

Add your ElevenLabs API key in the ProtectMyAPI Dashboard.

Text to Speech

Basic TTS

let elevenlabs = ProtectMyAPI.elevenLabsService()
 
let audio = try await elevenlabs.textToSpeech(
    text: "Hello! Welcome to my app. How can I help you today?",
    voiceId: "EXAVITQu4vr4xnSDxMaL", // Sarah
    modelId: .multilingualV2
)
 
// Play the audio
let player = try AVAudioPlayer(data: audio)
player.play()

val elevenlabs = ProtectMyAPIAI.elevenLabsService()
 
val audio = elevenlabs.textToSpeech(
    text = "Hello! Welcome to my app. How can I help you today?",
    voiceId = "EXAVITQu4vr4xnSDxMaL", // Sarah
    modelId = ElevenLabsModel.MULTILINGUAL_V2
)
 
// Play the audio
mediaPlayer.setDataSource(audio)
mediaPlayer.prepare()
mediaPlayer.start()

final elevenlabs = ProtectMyAPIAI.elevenLabsService();
 
final audio = await elevenlabs.textToSpeech(
  text: "Hello! Welcome to my app. How can I help you today?",
  voiceId: "EXAVITQu4vr4xnSDxMaL", // Sarah
  modelId: ElevenLabsModel.multilingualV2,
);
 
// Play the audio
await audioPlayer.play(BytesSource(audio));

With Voice Settings

let audio = try await elevenlabs.textToSpeech(
    text: "This is a dramatic reading of the news.",
    voiceId: "EXAVITQu4vr4xnSDxMaL",
    modelId: .multilingualV2,
    voiceSettings: ElevenLabsVoiceSettings(
        stability: 0.3,        // Lower = more expressive
        similarityBoost: 0.8,  // Higher = more like original voice
        style: 0.7,            // Higher = more stylized
        useSpeakerBoost: true  // Enhance clarity
    ),
    outputFormat: .mp3_44100_192
)

val audio = elevenlabs.textToSpeech(
    text = "This is a dramatic reading of the news.",
    voiceId = "EXAVITQu4vr4xnSDxMaL",
    modelId = ElevenLabsModel.MULTILINGUAL_V2,
    voiceSettings = VoiceSettings(
        stability = 0.3f,
        similarityBoost = 0.8f,
        style = 0.7f,
        useSpeakerBoost = true
    ),
    outputFormat = OutputFormat.MP3_44100_192
)

final audio = await elevenlabs.textToSpeech(
  text: "This is a dramatic reading of the news.",
  voiceId: "EXAVITQu4vr4xnSDxMaL",
  modelId: ElevenLabsModel.multilingualV2,
  voiceSettings: VoiceSettings(
    stability: 0.3,
    similarityBoost: 0.8,
    style: 0.7,
    useSpeakerBoost: true,
  ),
  outputFormat: OutputFormat.mp3_44100_192,
);

Streaming TTS (Real-time)

for try await chunk in elevenlabs.streamTextToSpeech(
    text: "This is a very long text that will be streamed...",
    voiceId: "EXAVITQu4vr4xnSDxMaL"
) {
    // Play each chunk as it arrives
    audioPlayer.append(chunk)
}

elevenlabs.streamTextToSpeech(
    text = "This is a very long text that will be streamed...",
    voiceId = "EXAVITQu4vr4xnSDxMaL"
).collect { chunk ->
    // Play each chunk as it arrives
    audioPlayer.append(chunk)
}

await for (final chunk in elevenlabs.streamTextToSpeech(
  text: "This is a very long text that will be streamed...",
  voiceId: "EXAVITQu4vr4xnSDxMaL",
)) {
  // Play each chunk as it arrives
  audioPlayer.append(chunk);
}

Speech to Speech

Convert voice recordings to another voice while preserving emotion:

let converted = try await elevenlabs.speechToSpeech(
    audio: originalRecording,
    voiceId: "targetVoiceId",
    modelId: "eleven_english_sts_v2",
    removeBackgroundNoise: true
)

val converted = elevenlabs.speechToSpeech(
    audio = originalRecording,
    voiceId = "targetVoiceId",
    modelId = "eleven_english_sts_v2",
    removeBackgroundNoise = true
)

final converted = await elevenlabs.speechToSpeech(
  audio: originalRecording,
  voiceId: "targetVoiceId",
  modelId: "eleven_english_sts_v2",
  removeBackgroundNoise: true,
);

Voice Cloning

Instant Voice Clone

Create a voice clone from audio samples (minimum 1 minute recommended):

let clonedVoice = try await elevenlabs.createVoiceClone(
    name: "My Custom Voice",
    description: "A warm, friendly voice for my app",
    files: [audioSample1, audioSample2, audioSample3],
    labels: ["accent": "american", "gender": "female"]
)
 
// Use the cloned voice
let audio = try await elevenlabs.textToSpeech(
    text: "Hello from my cloned voice!",
    voiceId: clonedVoice.voiceId
)

val clonedVoice = elevenlabs.createVoiceClone(
    name = "My Custom Voice",
    description = "A warm, friendly voice for my app",
    files = listOf(audioSample1, audioSample2, audioSample3),
    labels = mapOf("accent" to "american", "gender" to "female")
)
 
// Use the cloned voice
val audio = elevenlabs.textToSpeech(
    text = "Hello from my cloned voice!",
    voiceId = clonedVoice.voiceId
)

final clonedVoice = await elevenlabs.createVoiceClone(
  name: "My Custom Voice",
  description: "A warm, friendly voice for my app",
  files: [audioSample1, audioSample2, audioSample3],
  labels: {"accent": "american", "gender": "female"},
);
 
// Use the cloned voice
final audio = await elevenlabs.textToSpeech(
  text: "Hello from my cloned voice!",
  voiceId: clonedVoice.voiceId,
);

Sound Effects

Generate sound effects from text descriptions:

let soundEffect = try await elevenlabs.generateSoundEffect(
    text: "A thunderstorm with heavy rain and distant lightning",
    durationSeconds: 10
)

val soundEffect = elevenlabs.generateSoundEffect(
    text = "A thunderstorm with heavy rain and distant lightning",
    durationSeconds = 10
)

final soundEffect = await elevenlabs.generateSoundEffect(
  text: "A thunderstorm with heavy rain and distant lightning",
  durationSeconds: 10,
);

Audio Isolation

Remove background noise and isolate speech:

let cleanAudio = try await elevenlabs.isolateAudio(
    audio: noisyRecording
)

val cleanAudio = elevenlabs.isolateAudio(audio = noisyRecording)

final cleanAudio = await elevenlabs.isolateAudio(audio: noisyRecording);

Available Voices

Default Voices

Voice ID	Name	Style
`EXAVITQu4vr4xnSDxMaL`	Sarah	Soft, warm
`21m00Tcm4TlvDq8ikWAM`	Rachel	Clear, confident
`AZnzlk1XvdvUeBnXmlld`	Domi	Strong, expressive
`MF3mGyEYCl7XYWbV9V6O`	Elli	Young, bubbly
`TxGEqnHWrfWFTfGW9XjX`	Josh	Deep, authoritative
`VR6AewLTigWG4xSOukaG`	Arnold	American, casual
`pNInz6obpgDQGcFmaJgB`	Adam	Deep, narration
`yoZ06aMxZJJ28mfd3POQ`	Sam	Raspy, dynamic

Models

Model	Description	Best For
`eleven_multilingual_v2`	29 languages, emotional	Most use cases
`eleven_turbo_v2_5`	Low latency	Real-time apps
`eleven_english_sts_v2`	English speech-to-speech	Voice conversion

Voice Settings Guide

Setting	Range	Effect
`stability`	0.0 - 1.0	Lower = more expressive, Higher = more consistent
`similarityBoost`	0.0 - 1.0	Higher = closer to original voice
`style`	0.0 - 1.0	Higher = more stylized delivery
`useSpeakerBoost`	bool	Enhances clarity

Output Formats

Format	Quality	File Size
`mp3_44100_64`	Good	Small
`mp3_44100_128`	Better	Medium
`mp3_44100_192`	Best	Large
`pcm_16000`	Raw PCM	For processing
`pcm_44100`	High-res PCM	For processing

Pricing Note

ElevenLabs charges per character. Check their pricing page for current rates.

Stability AI Perplexity