Methods

Bases: Application

Base API template. The API is an extended txtai application, adding the ability to cluster API instances together.

Downstream applications can extend this base template to add/modify functionality.

Source code in txtai/api/base.py

  1. 10
  2. 11
  3. 12
  4. 13
  5. 14
  6. 15
  7. 16
  8. 17
  9. 18
  10. 19
  11. 20
  12. 21
  13. 22
  14. 23
  15. 24
  16. 25
  17. 26
  18. 27
  19. 28
  20. 29
  21. 30
  22. 31
  23. 32
  24. 33
  25. 34
  26. 35
  27. 36
  28. 37
  29. 38
  30. 39
  31. 40
  32. 41
  33. 42
  34. 43
  35. 44
  36. 45
  37. 46
  38. 47
  39. 48
  40. 49
  41. 50
  42. 51
  43. 52
  44. 53
  45. 54
  46. 55
  47. 56
  48. 57
  49. 58
  50. 59
  51. 60
  52. 61
  53. 62
  54. 63
  55. 64
  56. 65
  57. 66
  58. 67
  59. 68
  60. 69
  61. 70
  62. 71
  63. 72
  64. 73
  65. 74
  66. 75
  67. 76
  68. 77
  69. 78
  70. 79
  71. 80
  72. 81
  73. 82
  74. 83
  75. 84
  76. 85
  77. 86
  78. 87
  79. 88
  80. 89
  81. 90
  82. 91
  83. 92
  84. 93
  85. 94
  86. 95
  87. 96
  88. 97
  89. 98
  90. 99
  91. 100
  92. 101
  93. 102
  94. 103
  95. 104
  96. 105
  97. 106
  98. 107
  99. 108
  100. 109
  101. 110
  102. 111
  103. 112
  104. 113
  105. 114
  106. 115
  107. 116
  108. 117
  109. 118
  110. 119
  111. 120
  112. 121
  113. 122
  114. 123
  115. 124
  116. 125
  117. 126
  118. 127
  119. 128
  120. 129
  121. 130
  122. 131
  123. 132
  124. 133
  125. 134
  126. 135
  127. 136
  128. 137
  129. 138
  130. 139
  131. 140
  132. 141
  133. 142
  134. 143
  135. 144
  136. 145
  137. 146
  138. 147
  139. 148
  140. 149
  141. 150
  142. 151
  143. 152
  1. class API(Application):
  2. “””
  3. Base API template. The API is an extended txtai application, adding the ability to cluster API instances together.
  4. Downstream applications can extend this base template to add/modify functionality.
  5. “””
  6. def init(self, config, loaddata=True):
  7. super().init(config, loaddata)
  8. # Embeddings cluster
  9. self.cluster = None
  10. if self.config.get(cluster):
  11. self.cluster = Cluster(self.config[cluster])
  12. # pylint: disable=W0221
  13. def search(self, query, limit=None, weights=None, index=None, request=None):
  14. # When search is invoked via the API, limit is set from the request
  15. # When search is invoked directly, limit is set using the method parameter
  16. limit = self.limit(request.query_params.get(limit) if request and hasattr(request, query_params) else limit)
  17. weights = self.weights(request.query_params.get(weights) if request and hasattr(request, query_params) else weights)
  18. index = request.query_params.get(index) if request and hasattr(request, query_params) else index
  19. if self.cluster:
  20. return self.cluster.search(query, limit, weights, index)
  21. return super().search(query, limit, weights, index)
  22. def batchsearch(self, queries, limit=None, weights=None, index=None):
  23. if self.cluster:
  24. return self.cluster.batchsearch(queries, self.limit(limit), weights, index)
  25. return super().batchsearch(queries, limit, weights, index)
  26. def add(self, documents):
  27. “””
  28. Adds a batch of documents for indexing.
  29. Downstream applications can override this method to also store full documents in an external system.
  30. Args:
  31. documents: list of {id: value, text: value}
  32. Returns:
  33. unmodified input documents
  34. “””
  35. if self.cluster:
  36. self.cluster.add(documents)
  37. else:
  38. super().add(documents)
  39. return documents
  40. def index(self):
  41. “””
  42. Builds an embeddings index for previously batched documents.
  43. “””
  44. if self.cluster:
  45. self.cluster.index()
  46. else:
  47. super().index()
  48. def upsert(self):
  49. “””
  50. Runs an embeddings upsert operation for previously batched documents.
  51. “””
  52. if self.cluster:
  53. self.cluster.upsert()
  54. else:
  55. super().upsert()
  56. def delete(self, ids):
  57. “””
  58. Deletes from an embeddings index. Returns list of ids deleted.
  59. Args:
  60. ids: list of ids to delete
  61. Returns:
  62. ids deleted
  63. “””
  64. if self.cluster:
  65. return self.cluster.delete(ids)
  66. return super().delete(ids)
  67. def reindex(self, config, function=None):
  68. “””
  69. Recreates this embeddings index using config. This method only works if document content storage is enabled.
  70. Args:
  71. config: new config
  72. function: optional function to prepare content for indexing
  73. “””
  74. if self.cluster:
  75. self.cluster.reindex(config, function)
  76. else:
  77. super().reindex(config, function)
  78. def count(self):
  79. “””
  80. Total number of elements in this embeddings index.
  81. Returns:
  82. number of elements in embeddings index
  83. “””
  84. if self.cluster:
  85. return self.cluster.count()
  86. return super().count()
  87. def limit(self, limit):
  88. “””
  89. Parses the number of results to return from the request. Allows range of 1-250, with a default of 10.
  90. Args:
  91. limit: limit parameter
  92. Returns:
  93. bounded limit
  94. “””
  95. # Return between 1 and 250 results, defaults to 10
  96. return max(1, min(250, int(limit) if limit else 10))
  97. def weights(self, weights):
  98. “””
  99. Parses the weights parameter from the request.
  100. Args:
  101. weights: weights parameter
  102. Returns:
  103. weights
  104. “””
  105. return float(weights) if weights else weights

add(documents)

Adds a batch of documents for indexing.

Downstream applications can override this method to also store full documents in an external system.

Parameters:

NameTypeDescriptionDefault
documents

list of {id: value, text: value}

required

Returns:

TypeDescription

unmodified input documents

Source code in txtai/api/base.py

  1. 44
  2. 45
  3. 46
  4. 47
  5. 48
  6. 49
  7. 50
  8. 51
  9. 52
  10. 53
  11. 54
  12. 55
  13. 56
  14. 57
  15. 58
  16. 59
  17. 60
  18. 61
  19. 62
  1. def add(self, documents):
  2. “””
  3. Adds a batch of documents for indexing.
  4. Downstream applications can override this method to also store full documents in an external system.
  5. Args:
  6. documents: list of {id: value, text: value}
  7. Returns:
  8. unmodified input documents
  9. “””
  10. if self.cluster:
  11. self.cluster.add(documents)
  12. else:
  13. super().add(documents)
  14. return documents

batchexplain(queries, texts=None, limit=10)

Explains the importance of each input token in text for a list of queries.

Parameters:

NameTypeDescriptionDefault
query

queries text

required
texts

optional list of text, otherwise runs search queries

None
limit

optional limit if texts is None

10

Returns:

TypeDescription

list of dict per input text per query where a higher token scores represents higher importance relative to the query

Source code in txtai/app/base.py

  1. 571
  2. 572
  3. 573
  4. 574
  5. 575
  6. 576
  7. 577
  8. 578
  9. 579
  10. 580
  11. 581
  12. 582
  13. 583
  14. 584
  15. 585
  16. 586
  17. 587
  18. 588
  1. def batchexplain(self, queries, texts=None, limit=10):
  2. “””
  3. Explains the importance of each input token in text for a list of queries.
  4. Args:
  5. query: queries text
  6. texts: optional list of text, otherwise runs search queries
  7. limit: optional limit if texts is None
  8. Returns:
  9. list of dict per input text per query where a higher token scores represents higher importance relative to the query
  10. “””
  11. if self.embeddings:
  12. with self.lock:
  13. return self.embeddings.batchexplain(queries, texts, limit)
  14. return None

batchsimilarity(queries, texts)

Computes the similarity between list of queries and list of text. Returns a list of {id: value, score: value} sorted by highest score per query, where id is the index in texts.

Parameters:

NameTypeDescriptionDefault
queries

queries text

required
texts

list of text

required

Returns:

TypeDescription

list of {id: value, score: value} per query

Source code in txtai/app/base.py

  1. 530
  2. 531
  3. 532
  4. 533
  5. 534
  6. 535
  7. 536
  8. 537
  9. 538
  10. 539
  11. 540
  12. 541
  13. 542
  14. 543
  15. 544
  16. 545
  17. 546
  18. 547
  19. 548
  20. 549
  21. 550
  1. def batchsimilarity(self, queries, texts):
  2. “””
  3. Computes the similarity between list of queries and list of text. Returns a list
  4. of {id: value, score: value} sorted by highest score per query, where id is the
  5. index in texts.
  6. Args:
  7. queries: queries text
  8. texts: list of text
  9. Returns:
  10. list of {id: value, score: value} per query
  11. “””
  12. # Use similarity instance if available otherwise fall back to embeddings model
  13. if similarity in self.pipelines:
  14. return [[{id: uid, score: float(score)} for uid, score in r] for r in self.pipelinessimilarity]
  15. if self.embeddings:
  16. return [[{id: uid, score: float(score)} for uid, score in r] for r in self.embeddings.batchsimilarity(queries, texts)]
  17. return None

batchtransform(texts)

Transforms list of text into embeddings arrays.

Parameters:

NameTypeDescriptionDefault
texts

list of text

required

Returns:

TypeDescription

embeddings arrays

Source code in txtai/app/base.py

  1. 606
  2. 607
  3. 608
  4. 609
  5. 610
  6. 611
  7. 612
  8. 613
  9. 614
  10. 615
  11. 616
  12. 617
  13. 618
  14. 619
  15. 620
  16. 621
  1. def batchtransform(self, texts):
  2. “””
  3. Transforms list of text into embeddings arrays.
  4. Args:
  5. texts: list of text
  6. Returns:
  7. embeddings arrays
  8. “””
  9. if self.embeddings:
  10. documents = [(None, text, None) for text in texts]
  11. return [[float(x) for x in result] for result in self.embeddings.batchtransform(documents)]
  12. return None

count()

Total number of elements in this embeddings index.

Returns:

TypeDescription

number of elements in embeddings index

Source code in txtai/api/base.py

  1. 114
  2. 115
  3. 116
  4. 117
  5. 118
  6. 119
  7. 120
  8. 121
  9. 122
  10. 123
  11. 124
  12. 125
  1. def count(self):
  2. “””
  3. Total number of elements in this embeddings index.
  4. Returns:
  5. number of elements in embeddings index
  6. “””
  7. if self.cluster:
  8. return self.cluster.count()
  9. return super().count()

delete(ids)

Deletes from an embeddings index. Returns list of ids deleted.

Parameters:

NameTypeDescriptionDefault
ids

list of ids to delete

required

Returns:

TypeDescription

ids deleted

Source code in txtai/api/base.py

  1. 84
  2. 85
  3. 86
  4. 87
  5. 88
  6. 89
  7. 90
  8. 91
  9. 92
  10. 93
  11. 94
  12. 95
  13. 96
  14. 97
  15. 98
  1. def delete(self, ids):
  2. “””
  3. Deletes from an embeddings index. Returns list of ids deleted.
  4. Args:
  5. ids: list of ids to delete
  6. Returns:
  7. ids deleted
  8. “””
  9. if self.cluster:
  10. return self.cluster.delete(ids)
  11. return super().delete(ids)

explain(query, texts=None, limit=10)

Explains the importance of each input token in text for a query.

Parameters:

NameTypeDescriptionDefault
query

query text

required
texts

optional list of text, otherwise runs search query

None
limit

optional limit if texts is None

10

Returns:

TypeDescription

list of dict per input text where a higher token scores represents higher importance relative to the query

Source code in txtai/app/base.py

  1. 552
  2. 553
  3. 554
  4. 555
  5. 556
  6. 557
  7. 558
  8. 559
  9. 560
  10. 561
  11. 562
  12. 563
  13. 564
  14. 565
  15. 566
  16. 567
  17. 568
  18. 569
  1. def explain(self, query, texts=None, limit=10):
  2. “””
  3. Explains the importance of each input token in text for a query.
  4. Args:
  5. query: query text
  6. texts: optional list of text, otherwise runs search query
  7. limit: optional limit if texts is None
  8. Returns:
  9. list of dict per input text where a higher token scores represents higher importance relative to the query
  10. “””
  11. if self.embeddings:
  12. with self.lock:
  13. return self.embeddings.explain(query, texts, limit)
  14. return None

extract(queue, texts=None)

Extracts answers to input questions.

Parameters:

NameTypeDescriptionDefault
queue

list of {name: value, query: value, question: value, snippet: value}

required
texts

optional list of text

None

Returns:

TypeDescription

list of {name: value, answer: value}

Source code in txtai/app/base.py

  1. 623
  2. 624
  3. 625
  4. 626
  5. 627
  6. 628
  7. 629
  8. 630
  9. 631
  10. 632
  11. 633
  12. 634
  13. 635
  14. 636
  15. 637
  16. 638
  17. 639
  18. 640
  19. 641
  20. 642
  1. def extract(self, queue, texts=None):
  2. “””
  3. Extracts answers to input questions.
  4. Args:
  5. queue: list of {name: value, query: value, question: value, snippet: value}
  6. texts: optional list of text
  7. Returns:
  8. list of {name: value, answer: value}
  9. “””
  10. if self.embeddings and extractor in self.pipelines:
  11. # Get extractor instance
  12. extractor = self.pipelines[extractor]
  13. # Run extractor and return results as dicts
  14. return extractor(queue, texts)
  15. return None

index()

Builds an embeddings index for previously batched documents.

Source code in txtai/api/base.py

  1. 64
  2. 65
  3. 66
  4. 67
  5. 68
  6. 69
  7. 70
  8. 71
  9. 72
  1. def index(self):
  2. “””
  3. Builds an embeddings index for previously batched documents.
  4. “””
  5. if self.cluster:
  6. self.cluster.index()
  7. else:
  8. super().index()

label(text, labels)

Applies a zero shot classifier to text using a list of labels. Returns a list of {id: value, score: value} sorted by highest score, where id is the index in labels.

Parameters:

NameTypeDescriptionDefault
text

text|list

required
labels

list of labels

required

Returns:

TypeDescription

list of {id: value, score: value} per text element

Source code in txtai/app/base.py

  1. 644
  2. 645
  3. 646
  4. 647
  5. 648
  6. 649
  7. 650
  8. 651
  9. 652
  10. 653
  11. 654
  12. 655
  13. 656
  14. 657
  15. 658
  16. 659
  17. 660
  18. 661
  19. 662
  20. 663
  21. 664
  22. 665
  1. def label(self, text, labels):
  2. “””
  3. Applies a zero shot classifier to text using a list of labels. Returns a list of
  4. {id: value, score: value} sorted by highest score, where id is the index in labels.
  5. Args:
  6. text: text|list
  7. labels: list of labels
  8. Returns:
  9. list of {id: value, score: value} per text element
  10. “””
  11. if labels in self.pipelines:
  12. # Text is a string
  13. if isinstance(text, str):
  14. return [{id: uid, score: float(score)} for uid, score in self.pipelineslabels]
  15. # Text is a list
  16. return [[{id: uid, score: float(score)} for uid, score in result] for result in self.pipelineslabels]
  17. return None

pipeline(name, args)

Generic pipeline execution method.

Parameters:

NameTypeDescriptionDefault
name

pipeline name

required
args

pipeline arguments

required

Returns:

TypeDescription

pipeline results

Source code in txtai/app/base.py

  1. 667
  2. 668
  3. 669
  4. 670
  5. 671
  6. 672
  7. 673
  8. 674
  9. 675
  10. 676
  11. 677
  12. 678
  13. 679
  14. 680
  15. 681
  16. 682
  1. def pipeline(self, name, args):
  2. “””
  3. Generic pipeline execution method.
  4. Args:
  5. name: pipeline name
  6. args: pipeline arguments
  7. Returns:
  8. pipeline results
  9. “””
  10. if name in self.pipelines:
  11. return self.pipelinesname
  12. return None

reindex(config, function=None)

Recreates this embeddings index using config. This method only works if document content storage is enabled.

Parameters:

NameTypeDescriptionDefault
config

new config

required
function

optional function to prepare content for indexing

None

Source code in txtai/api/base.py

  1. 100
  2. 101
  3. 102
  4. 103
  5. 104
  6. 105
  7. 106
  8. 107
  9. 108
  10. 109
  11. 110
  12. 111
  13. 112
  1. def reindex(self, config, function=None):
  2. “””
  3. Recreates this embeddings index using config. This method only works if document content storage is enabled.
  4. Args:
  5. config: new config
  6. function: optional function to prepare content for indexing
  7. “””
  8. if self.cluster:
  9. self.cluster.reindex(config, function)
  10. else:
  11. super().reindex(config, function)

similarity(query, texts)

Computes the similarity between query and list of text. Returns a list of {id: value, score: value} sorted by highest score, where id is the index in texts.

Parameters:

NameTypeDescriptionDefault
query

query text

required
texts

list of text

required

Returns:

TypeDescription

list of {id: value, score: value}

Source code in txtai/app/base.py

  1. 508
  2. 509
  3. 510
  4. 511
  5. 512
  6. 513
  7. 514
  8. 515
  9. 516
  10. 517
  11. 518
  12. 519
  13. 520
  14. 521
  15. 522
  16. 523
  17. 524
  18. 525
  19. 526
  20. 527
  21. 528
  1. def similarity(self, query, texts):
  2. “””
  3. Computes the similarity between query and list of text. Returns a list of
  4. {id: value, score: value} sorted by highest score, where id is the index
  5. in texts.
  6. Args:
  7. query: query text
  8. texts: list of text
  9. Returns:
  10. list of {id: value, score: value}
  11. “””
  12. # Use similarity instance if available otherwise fall back to embeddings model
  13. if similarity in self.pipelines:
  14. return [{id: uid, score: float(score)} for uid, score in self.pipelinessimilarity]
  15. if self.embeddings:
  16. return [{id: uid, score: float(score)} for uid, score in self.embeddings.similarity(query, texts)]
  17. return None

transform(text)

Transforms text into embeddings arrays.

Parameters:

NameTypeDescriptionDefault
text

input text

required

Returns:

TypeDescription

embeddings array

Source code in txtai/app/base.py

  1. 590
  2. 591
  3. 592
  4. 593
  5. 594
  6. 595
  7. 596
  8. 597
  9. 598
  10. 599
  11. 600
  12. 601
  13. 602
  14. 603
  15. 604
  1. def transform(self, text):
  2. “””
  3. Transforms text into embeddings arrays.
  4. Args:
  5. text: input text
  6. Returns:
  7. embeddings array
  8. “””
  9. if self.embeddings:
  10. return [float(x) for x in self.embeddings.transform((None, text, None))]
  11. return None

upsert()

Runs an embeddings upsert operation for previously batched documents.

Source code in txtai/api/base.py

  1. 74
  2. 75
  3. 76
  4. 77
  5. 78
  6. 79
  7. 80
  8. 81
  9. 82
  1. def upsert(self):
  2. “””
  3. Runs an embeddings upsert operation for previously batched documents.
  4. “””
  5. if self.cluster:
  6. self.cluster.upsert()
  7. else:
  8. super().upsert()

wait()

Closes threadpool and waits for completion.

Source code in txtai/app/base.py

  1. 706
  2. 707
  3. 708
  4. 709
  5. 710
  6. 711
  7. 712
  8. 713
  9. 714
  1. def wait(self):
  2. “””
  3. Closes threadpool and waits for completion.
  4. “””
  5. if self.pool:
  6. self.pool.close()
  7. self.pool.join()
  8. self.pool = None

workflow(name, elements)

Executes a workflow.

Parameters:

NameTypeDescriptionDefault
name

workflow name

required
elements

elements to process

required

Returns:

TypeDescription

processed elements

Source code in txtai/app/base.py

  1. 684
  2. 685
  3. 686
  4. 687
  5. 688
  6. 689
  7. 690
  8. 691
  9. 692
  10. 693
  11. 694
  12. 695
  13. 696
  14. 697
  15. 698
  16. 699
  17. 700
  18. 701
  19. 702
  20. 703
  21. 704
  1. def workflow(self, name, elements):
  2. “””
  3. Executes a workflow.
  4. Args:
  5. name: workflow name
  6. elements: elements to process
  7. Returns:
  8. processed elements
  9. “””
  10. if hasattr(elements, len) and hasattr(elements, getitem):
  11. # Convert to tuples and return as a list since input is sized
  12. elements = [tuple(element) if isinstance(element, list) else element for element in elements]
  13. else:
  14. # Convert to tuples and return as a generator since input is not sized
  15. elements = (tuple(element) if isinstance(element, list) else element for element in elements)
  16. # Execute workflow
  17. return self.workflowsname