Caption

pipeline pipeline

The caption pipeline reads a list of images and returns a list of captions for those images.

Example

The following shows a simple example using this pipeline.

  1. from txtai.pipeline import Caption
  2. # Create and run pipeline
  3. caption = Caption()
  4. caption("path to image file")

See the link below for a more detailed example.

NotebookDescription
Generate image captions and detect objectsCaptions and object detection for imagesOpen In Colab

Configuration-driven example

Pipelines are run with Python or configuration. Pipelines can be instantiated in configuration using the lower case name of the pipeline. Configuration-driven pipelines are run with workflows or the API.

config.yml

  1. # Create pipeline using lower case class name
  2. caption:
  3. # Run pipeline with workflow
  4. workflow:
  5. caption:
  6. tasks:
  7. - action: caption

Run with Workflows

  1. from txtai.app import Application
  2. # Create and run pipeline with workflow
  3. app = Application("config.yml")
  4. list(app.workflow("caption", ["path to image file"]))

Run with API

  1. CONFIG=config.yml uvicorn "txtai.api:app" &
  2. curl \
  3. -X POST "http://localhost:8000/workflow" \
  4. -H "Content-Type: application/json" \
  5. -d '{"name":"caption", "elements":["path to image file"]}'

Methods

Python documentation for the pipeline.

Source code in txtai/pipeline/image/caption.py

  1. def __init__(self, path=None, quantize=False, gpu=True, model=None, **kwargs):
  2. if not PIL:
  3. raise ImportError('Captions pipeline is not available - install "pipeline" extra to enable')
  4. # Call parent constructor
  5. super().__init__("image-to-text", path, quantize, gpu, model, **kwargs)

Builds captions for images.

This method supports a single image or a list of images. If the input is an image, the return type is a string. If text is a list, a list of strings is returned

Parameters:

NameTypeDescriptionDefault
images

image|list

required

Returns:

TypeDescription

list of captions

Source code in txtai/pipeline/image/caption.py

  1. def __call__(self, images):
  2. """
  3. Builds captions for images.
  4. This method supports a single image or a list of images. If the input is an image, the return
  5. type is a string. If text is a list, a list of strings is returned
  6. Args:
  7. images: image|list
  8. Returns:
  9. list of captions
  10. """
  11. # Convert single element to list
  12. values = [images] if not isinstance(images, list) else images
  13. # Open images if file strings
  14. values = [Image.open(image) if isinstance(image, str) else image for image in values]
  15. # Get and clean captions
  16. captions = []
  17. for result in self.pipeline(values):
  18. text = " ".join([r["generated_text"] for r in result]).strip()
  19. captions.append(text)
  20. # Return single element if single element passed in
  21. return captions[0] if not isinstance(images, list) else captions