ImageHash

pipeline pipeline

The image hash pipeline generates perceptual image hashes. These hashes can be used to detect near-duplicate images. This method is not backed by machine learning models and not intended to find conceptually similar images.

Example

The following shows a simple example using this pipeline.

  1. from txtai.pipeline import ImageHash
  2. # Create and run pipeline
  3. ihash = ImageHash()
  4. ihash("path to image file")

See the link below for a more detailed example.

NotebookDescription
Near duplicate image detectionIdentify duplicate and near-duplicate imagesOpen In Colab

Configuration-driven example

Pipelines are run with Python or configuration. Pipelines can be instantiated in configuration using the lower case name of the pipeline. Configuration-driven pipelines are run with workflows or the API.

config.yml

  1. # Create pipeline using lower case class name
  2. imagehash:
  3. # Run pipeline with workflow
  4. workflow:
  5. imagehash:
  6. tasks:
  7. - action: imagehash

Run with Workflows

  1. from txtai.app import Application
  2. # Create and run pipeline with workflow
  3. app = Application("config.yml")
  4. list(app.workflow("imagehash", ["path to image file"]))

Run with API

  1. CONFIG=config.yml uvicorn "txtai.api:app" &
  2. curl \
  3. -X POST "http://localhost:8000/workflow" \
  4. -H "Content-Type: application/json" \
  5. -d '{"name":"imagehash", "elements":["path to image file"]}'

Methods

Python documentation for the pipeline.

__init__(self, algorithm='average', size=8, strings=True) special

Creates an ImageHash pipeline.

Parameters:

NameTypeDescriptionDefault
algorithm

image hashing algorithm (average, perceptual, difference, wavelet, color)

‘average’
size

hash size

8
strings

outputs hex strings if True (default), otherwise the pipeline returns numpy arrays

True

Source code in txtai/pipeline/image/imagehash.py

  1. def __init__(self, algorithm="average", size=8, strings=True):
  2. """
  3. Creates an ImageHash pipeline.
  4. Args:
  5. algorithm: image hashing algorithm (average, perceptual, difference, wavelet, color)
  6. size: hash size
  7. strings: outputs hex strings if True (default), otherwise the pipeline returns numpy arrays
  8. """
  9. if not PIL:
  10. raise ImportError('ImageHash pipeline is not available - install "pipeline" extra to enable')
  11. self.algorithm = algorithm
  12. self.size = size
  13. self.strings = strings

__call__(self, images) special

Generates perceptual image hashes.

Parameters:

NameTypeDescriptionDefault
images

image|list

required

Returns:

TypeDescription

list of hashes

Source code in txtai/pipeline/image/imagehash.py

  1. def __call__(self, images):
  2. """
  3. Generates perceptual image hashes.
  4. Args:
  5. images: image|list
  6. Returns:
  7. list of hashes
  8. """
  9. # Convert single element to list
  10. values = [images] if not isinstance(images, list) else images
  11. # Open images if file strings
  12. values = [Image.open(image) if isinstance(image, str) else image for image in values]
  13. # Convert images to hashes
  14. hashes = [self.ihash(image) for image in values]
  15. # Return single element if single element passed in
  16. return hashes[0] if not isinstance(images, list) else hashes