Text To Speech

pipeline pipeline

The Text To Speech pipeline generates speech from text.

Example

The following shows a simple example using this pipeline.

  1. from txtai.pipeline import TextToSpeech
  2. # Create and run pipeline
  3. tts = TextToSpeech()
  4. tts("Say something here")

See the link below for a more detailed example.

NotebookDescription
Text to speech generationGenerate speech from textOpen In Colab

This pipeline is backed by ONNX models from the Hugging Face Hub. The following models are currently available.

Configuration-driven example

Pipelines are run with Python or configuration. Pipelines can be instantiated in configuration using the lower case name of the pipeline. Configuration-driven pipelines are run with workflows or the API.

config.yml

  1. # Create pipeline using lower case class name
  2. texttospeech:
  3. # Run pipeline with workflow
  4. workflow:
  5. tts:
  6. tasks:
  7. - action: texttospeech

Run with Workflows

  1. from txtai.app import Application
  2. # Create and run pipeline with workflow
  3. app = Application("config.yml")
  4. list(app.workflow("tts", ["Say something here"]))

Run with API

  1. CONFIG=config.yml uvicorn "txtai.api:app" &
  2. curl \
  3. -X POST "http://localhost:8000/workflow" \
  4. -H "Content-Type: application/json" \
  5. -d '{"name":"tts", "elements":["Say something here"]}'

Methods

Python documentation for the pipeline.

Creates a new TextToSpeech pipeline.

Parameters:

NameTypeDescriptionDefault
path

optional Hugging Face model hub id

None
maxtokens

maximum number of tokens model can process, defaults to 512

512

Source code in txtai/pipeline/audio/texttospeech.py

  1. 30
  2. 31
  3. 32
  4. 33
  5. 34
  6. 35
  7. 36
  8. 37
  9. 38
  10. 39
  11. 40
  12. 41
  13. 42
  14. 43
  15. 44
  16. 45
  17. 46
  18. 47
  19. 48
  20. 49
  21. 50
  22. 51
  23. 52
  24. 53
  25. 54
  26. 55
  27. 56
  28. 57
  29. 58
  30. 59
  31. 60
  32. 61
  33. 62
  34. 63
  35. 64
  1. def init(self, path=None, maxtokens=512):
  2. “””
  3. Creates a new TextToSpeech pipeline.
  4. Args:
  5. path: optional Hugging Face model hub id
  6. maxtokens: maximum number of tokens model can process, defaults to 512
  7. “””
  8. if not TTS:
  9. raise ImportError(‘TextToSpeech pipeline is not available - install pipeline extra to enable’)
  10. # Default path
  11. path = path if path else neuml/ljspeech-jets-onnx
  12. # Get path to model and config
  13. config = hf_hub_download(path, filename=”config.yaml”)
  14. model = hf_hub_download(path, filename=”model.onnx”)
  15. # Read yaml config
  16. with open(config, r”, encoding=”utf-8”) as f:
  17. config = yaml.safe_load(f)
  18. # Create tokenizer
  19. tokens = config.get(“token”, {}).get(“list”)
  20. self.tokenizer = TTSTokenizer(tokens)
  21. # Create ONNX Session
  22. self.model = ort.InferenceSession(model, ort.SessionOptions(), self.providers())
  23. # Max number of input tokens model can handle
  24. self.maxtokens = maxtokens
  25. # Get model input name, typically “text”
  26. self.input = self.model.get_inputs()[0].name

Generates speech from text. Text longer than maxtokens will be batched and returned as a single waveform per text input.

This method supports files as a string or a list. If the input is a string, the return type is string. If text is a list, the return type is a list.

Parameters:

NameTypeDescriptionDefault
text

text|list

required

Returns:

TypeDescription

list of speech as NumPy array waveforms

Source code in txtai/pipeline/audio/texttospeech.py

  1. 66
  2. 67
  3. 68
  4. 69
  5. 70
  6. 71
  7. 72
  8. 73
  9. 74
  10. 75
  11. 76
  12. 77
  13. 78
  14. 79
  15. 80
  16. 81
  17. 82
  18. 83
  19. 84
  20. 85
  21. 86
  22. 87
  23. 88
  24. 89
  25. 90
  26. 91
  27. 92
  28. 93
  29. 94
  1. def call(self, text):
  2. “””
  3. Generates speech from text. Text longer than maxtokens will be batched and returned
  4. as a single waveform per text input.
  5. This method supports files as a string or a list. If the input is a string,
  6. the return type is string. If text is a list, the return type is a list.
  7. Args:
  8. text: text|list
  9. Returns:
  10. list of speech as NumPy array waveforms
  11. “””
  12. # Convert results to a list if necessary
  13. texts = [text] if isinstance(text, str) else text
  14. outputs = []
  15. for x in texts:
  16. # Truncate to max size model can handle
  17. x = self.tokenizer(x)
  18. # Run input through model and store result
  19. result = self.execute(x)
  20. outputs.append(result)
  21. # Return results
  22. return outputs[0] if isinstance(text, str) else outputs