I Built Ollama for Voice — Introducing koelab

There are plenty of voice-changing tools out there. None of them are actually easy to use.

RVC is sitting on 700+ open issues with no signs of life. seed-vc is archived. vcclient won’t run on Mac and has no API. GPT-SoVITS takes two hours to set up. Every one of them is a tool — not a platform.

That’s why I built koelab.

table of contents

The Concept: Ollama for Voice

Ollama established a new standard in the LLM world: pull a model, run it. That’s all koelab is trying to do — the same experience, but for voice models.

koelab pull seed-vc
koelab run seed-vc -i source.wav -r reference.wav -o output.wav

That’s it. If a GPU is available, koelab uses it automatically. If not, it falls back to CPU. Model downloads, caching, storage — koelab handles all of it.

What You Can Do

1. File-Based Voice Conversion

The simplest use case. Convert source.wav to match the voice in reference.wav.

koelab run seed-vc \
  --input source.wav \
  --reference target_voice.wav \
  --output converted.wav \
  --steps 25 \
  --similarity 0.7

--steps controls the number of diffusion steps — higher means better quality but slower processing. --similarity sets how closely the output matches the reference voice (0.0–1.0).

2. Real-Time Voice Conversion

Convert microphone input to another voice in real time. A headset is strongly recommended.

koelab live seed-vc -r reference_voice.wav

It technically works with a laptop’s built-in mic and speakers, but audio from the speakers leaks back into the mic, which causes feedback. A headset is essentially required for practical use.

To check available devices and select specific ones:

koelab audio-devices

koelab live seed-vc \
  --reference voice.wav \
  --input-device 1 \
  --output-device 3 \
  --block-time 0.25 \
  --silence-threshold 0.003 \
  --save-output live.wav    # Save the converted output to a WAV file (optional)

On Windows, when selecting devices, make sure the input and output devices come from the same host API — koelab audio-devices shows the API name (MME, DirectSound, WASAPI, or WDM-KS).

3. Text-to-Speech (TTS)

koelab supports irodori-tts for Japanese TTS and xtts-v2 for multilingual TTS.

koelab pull irodori-tts
koelab speak irodori-tts --text "Hello from koelab." -o hello.wav

koelab pull xtts-v2
koelab speak xtts-v2 --text "Hello from koelab." --language en -o hello_en.wav

xtts-v2 requires --reference (the --no-ref flag is not supported).

xtts-v2 supports 17 languages, including English, Spanish, French, German, Japanese, and Korean.

4. WebUI + REST API

pip install "koelab[api]"
koelab serve

Open http://localhost:11435 in your browser to access the WebUI.

What you can do in the WebUI:

  • Record directly from your microphone and use it as the source audio — with a live waveform meter
  • Save reference voices to a library and give them names for easy reuse
  • Pin up to three favorite reference voices for quick access
  • Browse conversion history, replay results, download files, and reuse settings
  • Switch between fast / balanced / quality presets
  • Use the /tts page for text-to-speech with irodori-tts and xtts-v2

The API is built on FastAPI. Swagger UI is available at http://localhost:11435/docs.

5. Model Registry

koelab fetches the list of available models from a remote registry (cached locally for one hour).

koelab models           # List all available models (fetched from remote registry)
koelab info seed-vc     # Show details: author, license, VRAM, supported languages, etc.
koelab search tts       # Search models by keyword
koelab list             # List installed models
koelab remove seed-vc   # Remove an installed model

Models currently in the registry:

ModelTypeNotes
seed-vcVoice ConversionAR+CFM pipeline, high quality
seed-vc-v1Voice ConversionDiT-based, fast
openvoiceTone Color CloningBy MyShell, lightweight (0.3 GB)
irodori-ttsJapanese TTSFlow Matching, supports voice cloning
xtts-v2Multilingual TTS17 languages, by Coqui

Installation

Python 3.11 or 3.12 is recommended (3.11+ required). Avoid installing directly into your global Python environment — use a virtual environment.

# Using venv (Windows)
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install --upgrade pip

# Using uv
uv venv
uv pip install koelab

Base Install

pip install koelab

Note: pip install koelab alone does not install the dependencies needed to run any voice or TTS models. You need to install the appropriate extras for the models you want to use.

pip install "koelab[seed-vc]"    # for seed-vc and seed-vc-v1
pip install "koelab[openvoice]"  # for openvoice
pip install "koelab[irodori]"    # for irodori-tts
pip install "koelab[xtts]"       # for xtts-v2

With API + WebUI

pip install "koelab[api]"

[api] only adds FastAPI and uvicorn. You still need the model-specific extras above to actually run any models.

FFmpeg

Not strictly required, but strongly recommended. FFmpeg is needed for formats like .m4a.aac, and .ogg. On Windows, autoffmpeg can handle the setup automatically.

Docker

git clone https://github.com/superdoccimo/koelab.git
cd koelab
docker compose build
docker compose run koelab pull seed-vc
docker compose run koelab run seed-vc \
  -i /app/audio/source.wav \
  -r /app/audio/reference.wav \
  -o /app/audio/output.wav

Under the Hood

Engine Abstraction

Every model implements the BaseEngine abstract base class.

BaseEngine
  ├── SeedVCEngine       (seed-vc, seed-vc-v1)
  ├── OpenVoiceEngine    (openvoice)
  └── BaseTTSEngine
        ├── IrodoriTTSEngine  (irodori-tts)
        └── XTTSv2Engine      (xtts-v2)

To add a new engine, inherit from BaseEngine, implement load()convert(), and unload(), then add an entry to registry/models.json. That’s all it takes.

Remote Registry

Model metadata is fetched from registry/models.json on GitHub and cached locally for one hour, so there’s no network request on every command. It also works offline using the last cached data.

The doctor Command

Diagnose environment issues before they become problems.

koelab doctor              # Check Python, FFmpeg, Git, and TTS runtime dependencies
koelab doctor irodori-tts  # Diagnose irodori-tts specifically
koelab doctor xtts-v2      # Diagnose xtts-v2 specifically

When setup isn’t working, this is the first thing to run.

FAQ

Does it work without a GPU? Yes. Pass --device cpu, or leave it on auto and it will fall back to CPU automatically. Expect significantly slower processing times.

Does it work on Windows? Yes. The CI pipeline runs smoke tests on Windows with Python 3.11 and 3.12.

How much latency does real-time conversion add? With --block-time 0.1 (100ms blocks) on a GPU, expect roughly 200–400ms. On CPU, it can exceed one second.

How do I get a new model added? Open a PR that adds an entry to registry/models.json. If the engine itself needs to be implemented, open an issue first to discuss it.

koelab is MIT-licensed. The xtts-v2 model is subject to the Coqui CPML license, which restricts commercial use. Run koelab info xtts-v2 for details.

If you like this article, please
Follow !

Please share if you like it!
table of contents