Entity

pipeline pipeline

The Entity pipeline applies a token classifier to text and extracts entity/label combinations.

Example

The following shows a simple example using this pipeline.

  1. from txtai.pipeline import Entity
  2. # Create and run pipeline
  3. entity = Entity()
  4. entity("Canada's last fully intact ice shelf has suddenly collapsed, " \
  5. "forming a Manhattan-sized iceberg")

See the link below for a more detailed example.

NotebookDescription
Entity extraction workflowsIdentify entity/label combinationsOpen In Colab

Configuration-driven example

Pipelines are run with Python or configuration. Pipelines can be instantiated in configuration using the lower case name of the pipeline. Configuration-driven pipelines are run with workflows or the API.

config.yml

  1. # Create pipeline using lower case class name
  2. entity:
  3. # Run pipeline with workflow
  4. workflow:
  5. entity:
  6. tasks:
  7. - action: entity

Run with Workflows

  1. from txtai.app import Application
  2. # Create and run pipeline with workflow
  3. app = Application("config.yml")
  4. list(app.workflow("entity", ["Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg"]))

Run with API

  1. CONFIG=config.yml uvicorn "txtai.api:app" &
  2. curl \
  3. -X POST "http://localhost:8000/workflow" \
  4. -H "Content-Type: application/json" \
  5. -d '{"name":"entity", "elements": ["Canadas last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg"]}'

Methods

Python documentation for the pipeline.

Source code in txtai/pipeline/text/entity.py

  1. def __init__(self, path=None, quantize=False, gpu=True, model=None, **kwargs):
  2. super().__init__("token-classification", path, quantize, gpu, model, **kwargs)

Applies a token classifier to text and extracts entity/label combinations.

Parameters:

NameTypeDescriptionDefault
text

text|list

required
labels

list of entity type labels to accept, defaults to None which accepts all

None
aggregate

method to combine multi token entities - options are “simple” (default), “first”, “average” or “max”

‘simple’
flatten

flatten output to a list of labels if present. Accepts a boolean or float value to only keep scores greater than that number.

None
join

joins flattened output into a string if True, ignored if flatten not set

False
workers

number of concurrent workers to use for processing data, defaults to None

0

Returns:

TypeDescription

list of (entity, entity type, score) or list of entities depending on flatten parameter

Source code in txtai/pipeline/text/entity.py

  1. def __call__(self, text, labels=None, aggregate="simple", flatten=None, join=False, workers=0):
  2. """
  3. Applies a token classifier to text and extracts entity/label combinations.
  4. Args:
  5. text: text|list
  6. labels: list of entity type labels to accept, defaults to None which accepts all
  7. aggregate: method to combine multi token entities - options are "simple" (default), "first", "average" or "max"
  8. flatten: flatten output to a list of labels if present. Accepts a boolean or float value to only keep scores greater than that number.
  9. join: joins flattened output into a string if True, ignored if flatten not set
  10. workers: number of concurrent workers to use for processing data, defaults to None
  11. Returns:
  12. list of (entity, entity type, score) or list of entities depending on flatten parameter
  13. """
  14. # Run token classification pipeline
  15. results = self.pipeline(text, aggregation_strategy=aggregate, num_workers=workers)
  16. # Convert results to a list if necessary
  17. if isinstance(text, str):
  18. results = [results]
  19. # Score threshold when flatten is set
  20. threshold = 0.0 if isinstance(flatten, bool) else flatten
  21. # Extract entities if flatten set, otherwise extract (entity, entity type, score) tuples
  22. outputs = []
  23. for result in results:
  24. if flatten:
  25. output = [r["word"] for r in result if self.accept(r["entity_group"], labels) and r["score"] >= threshold]
  26. outputs.append(" ".join(output) if join else output)
  27. else:
  28. outputs.append([(r["word"], r["entity_group"], float(r["score"])) for r in result if self.accept(r["entity_group"], labels)])
  29. return outputs[0] if isinstance(text, str) else outputs