Text To Speech

Text To Speech

The Text To Speech pipeline generates speech from text.

Example

The following shows a simple example using this pipeline.

from txtai.pipeline import TextToSpeech
# Create and run pipeline
tts = TextToSpeech()
tts("Say something here")

See the link below for a more detailed example.

Notebook	Description
Text to speech generation	Generate speech from text

This pipeline is backed by ONNX models from the Hugging Face Hub. The following models are currently available.

Configuration-driven example

Pipelines are run with Python or configuration. Pipelines can be instantiated in configuration using the lower case name of the pipeline. Configuration-driven pipelines are run with workflows or the API.

config.yml

# Create pipeline using lower case class name
texttospeech:
# Run pipeline with workflow
workflow:
  tts:
    tasks:
      - action: texttospeech

Run with Workflows

from txtai.app import Application
# Create and run pipeline with workflow
app = Application("config.yml")
list(app.workflow("tts", ["Say something here"]))

Run with API

CONFIG=config.yml uvicorn "txtai.api:app" &
curl \
  -X POST "http://localhost:8000/workflow" \
  -H "Content-Type: application/json" \
  -d '{"name":"tts", "elements":["Say something here"]}'

Methods

Python documentation for the pipeline.

Creates a new TextToSpeech pipeline.

Parameters:

Name	Type	Description	Default
`path`		optional Hugging Face model hub id	`None`
`maxtokens`		maximum number of tokens model can process, defaults to 512	`512`

Source code in txtai/pipeline/audio/texttospeech.py

def init(self, path=None, maxtokens=512):
    “””
    Creates a new TextToSpeech pipeline.
    Args:
        path: optional Hugging Face model hub id
        maxtokens: maximum number of tokens model can process, defaults to 512
    “””
    if not TTS:
        raise ImportError(‘TextToSpeech pipeline is not available - install “pipeline” extra to enable’)
    # Default path
    path = path if path else “neuml/ljspeech-jets-onnx”
    # Get path to model and config
    config = hf_hub_download(path, filename=”config.yaml”)
    model = hf_hub_download(path, filename=”model.onnx”)
    # Read yaml config
    with open(config, “r”, encoding=”utf-8”) as f:
        config = yaml.safe_load(f)
    # Create tokenizer
    tokens = config.get(“token”, {}).get(“list”)
    self.tokenizer = TTSTokenizer(tokens)
    # Create ONNX Session
    self.model = ort.InferenceSession(model, ort.SessionOptions(), self.providers())
    # Max number of input tokens model can handle
    self.maxtokens = maxtokens
    # Get model input name, typically “text”
    self.input = self.model.get_inputs()[0].name

Generates speech from text. Text longer than maxtokens will be batched and returned as a single waveform per text input.

This method supports files as a string or a list. If the input is a string, the return type is string. If text is a list, the return type is a list.

Parameters:

Name	Type	Description	Default
`text`		text\|list	required

Returns:

Type	Description
	list of speech as NumPy array waveforms

Source code in txtai/pipeline/audio/texttospeech.py

def call(self, text):
    “””
    Generates speech from text. Text longer than maxtokens will be batched and returned
    as a single waveform per text input.
    This method supports files as a string or a list. If the input is a string,
    the return type is string. If text is a list, the return type is a list.
    Args:
        text: text|list
    Returns:
        list of speech as NumPy array waveforms
    “””
    # Convert results to a list if necessary
    texts = [text] if isinstance(text, str) else text
    outputs = []
    for x in texts:
        # Truncate to max size model can handle
        x = self.tokenizer(x)
        # Run input through model and store result
        result = self.execute(x)
        outputs.append(result)
    # Return results
    return outputs[0] if isinstance(text, str) else outputs