The LLM pipeline runs prompts through a large language model (LLM). This pipeline autodetects if the model path is a text generation or sequence to sequence model.


The following shows a simple example using this pipeline.

  1. from txtai.pipeline import LLM
  2. # Create and run LLM pipeline
  3. llm = LLM()
  4. llm(
  5. """
  6. Answer the following question using the provided context.
  7. Question:
  8. What are the applications of txtai?
  9. Context:
  10. txtai is an open-source platform for semantic search and
  11. workflows powered by language models.
  12. """
  13. )

The LLM pipeline automatically detects the underlying model type (text-generation or sequence-sequence). This can also be manually set.

  1. from txtai.pipeline import LLM, Generator, Sequences
  2. # Set model type via task parameter
  3. llm = LLM("google/flan-t5-xl", task="sequence-sequence")
  4. # Create sequences pipeline (same as previous statement)
  5. sequences = Sequences("google/flan-t5-xl")
  6. # Set model type via task parameter
  7. llm = LLM("openlm-research/open_llama_3b", task="language-generation")
  8. # Create generator pipeline (same as previous statement)
  9. generator = Generator("openlm-research/open_llama_3b")

Models can be externally loaded and passed to pipelines. This is useful for models that are not yet supported by Transformers and/or need special initialization.

  1. import torch
  2. from transformers import AutoModelForCausalLM, AutoTokenizer
  3. from txtai.pipeline import LLM
  4. # Load Falcon-7B-Instruct
  5. path = "tiiuae/falcon-7b-instruct"
  6. model = AutoModelForCausalLM.from_pretrained(
  7. path,
  8. torch_dtype=torch.bfloat16,
  9. trust_remote_code=True
  10. )
  11. tokenizer = AutoTokenizer.from_pretrained(path)
  12. llm = LLM((model, tokenizer))

See the links below for more detailed examples.

Configuration-driven example

Pipelines are run with Python or configuration. Pipelines can be instantiated in configuration using the lower case name of the pipeline. Configuration-driven pipelines are run with workflows or the API.


  1. # Create pipeline using lower case class name
  2. # Use `generator` or `sequences` to force model type
  3. llm:
  4. # Run pipeline with workflow
  5. workflow:
  6. llm:
  7. tasks:
  8. - action: llm

Similar to the Python example above, the underlying Hugging Face pipeline parameters and model parameters can be set in pipeline configuration.

  1. llm:
  2. path: tiiuae/falcon-7b-instruct
  3. torch_dtype: torch.bfloat16
  4. trust_remote_code: True

Run with Workflows

  1. from txtai.app import Application
  2. # Create and run pipeline with workflow
  3. app = Application("config.yml")
  4. list(app.workflow("llm", [
  5. """
  6. Answer the following question using the provided context.
  7. Question:
  8. What are the applications of txtai?
  9. Context:
  10. txtai is an open-source platform for semantic search and
  11. workflows powered by language models.
  12. """
  13. ]))

Run with API

  1. CONFIG=config.yml uvicorn "txtai.api:app" &
  2. curl \
  3. -X POST "http://localhost:8000/workflow" \
  4. -H "Content-Type: application/json" \
  5. -d '{"name":"sequences", "elements": ["Answer the following question..."]}'


Python documentation for the pipeline.

__init__(self, path=None, quantize=False, gpu=True, model=None, task=None, **kwargs) special

  1. def __init__(self, path=None, quantize=False, gpu=True, model=None, task=None, **kwargs):
  2. super().__init__(self.task(path, task, **kwargs), path if path else "google/flan-t5-base", quantize, gpu, model, **kwargs)
  3. # Load tokenizer, if necessary
  4. self.pipeline.tokenizer = self.pipeline.tokenizer if self.pipeline.tokenizer else Models.tokenizer(path, **kwargs)

__call__(self, text, prefix=None, maxlength=512, workers=0, **kwargs) special

Generates text using input text





optional prefix to prepend to text elements


maximum sequence length


number of concurrent workers to use for processing data, defaults to None


additional generation keyword arguments




generated text

  1. def __call__(self, text, prefix=None, maxlength=512, workers=0, **kwargs):
  2. """
  3. Generates text using input text
  4. Args:
  5. text: text|list
  6. prefix: optional prefix to prepend to text elements
  7. maxlength: maximum sequence length
  8. workers: number of concurrent workers to use for processing data, defaults to None
  9. kwargs: additional generation keyword arguments
  10. Returns:
  11. generated text
  12. """
  13. # List of texts
  14. texts = text if isinstance(text, list) else [text]
  15. # Add prefix, if necessary
  16. if prefix:
  17. texts = [f"{prefix}{x}" for x in texts]
  18. # Run pipeline
  19. results = self.pipeline(texts, max_length=maxlength, num_workers=workers, **kwargs)
  20. # Get generated text
  21. results = [self.clean(texts[x], result) for x, result in enumerate(results)]
  22. return results[0] if isinstance(text, str) else results