HFOnnx

pipeline pipeline

Exports a Hugging Face Transformer model to ONNX. Currently, this works best with classification/pooling/qa models. Work is ongoing for sequence to sequence models (summarization, transcription, translation).

Example

The following shows a simple example using this pipeline.

  1. from txtai.pipeline import HFOnnx, Labels
  2. # Model path
  3. path = "distilbert-base-uncased-finetuned-sst-2-english"
  4. # Export model to ONNX
  5. onnx = HFOnnx()
  6. model = onnx(path, "text-classification", "model.onnx", True)
  7. # Run inference and validate
  8. labels = Labels((model, path), dynamic=False)
  9. labels("I am happy")

See the link below for a more detailed example.

NotebookDescription
Export and run models with ONNXExport models with ONNX, run natively in JavaScript, Java and RustOpen In Colab

Methods

Python documentation for the pipeline.

__call__(self, path, task='default', output=None, quantize=False, opset=12) special

Exports a Hugging Face Transformer model to ONNX.

Parameters:

NameTypeDescriptionDefault
path

path to model, accepts Hugging Face model hub id, local path or (model, tokenizer) tuple

required
task

optional model task or category, determines the model type and outputs, defaults to export hidden state

‘default’
output

optional output model path, defaults to return byte array if None

None
quantize

if model should be quantized (requires onnx to be installed), defaults to False

False
opset

onnx opset, defaults to 12

12

Returns:

TypeDescription

path to model output or model as bytes depending on output parameter

Source code in txtai/pipeline/train/hfonnx.py

  1. def __call__(self, path, task="default", output=None, quantize=False, opset=12):
  2. """
  3. Exports a Hugging Face Transformer model to ONNX.
  4. Args:
  5. path: path to model, accepts Hugging Face model hub id, local path or (model, tokenizer) tuple
  6. task: optional model task or category, determines the model type and outputs, defaults to export hidden state
  7. output: optional output model path, defaults to return byte array if None
  8. quantize: if model should be quantized (requires onnx to be installed), defaults to False
  9. opset: onnx opset, defaults to 12
  10. Returns:
  11. path to model output or model as bytes depending on output parameter
  12. """
  13. inputs, outputs, model = self.parameters(task)
  14. if isinstance(path, (list, tuple)):
  15. model, tokenizer = path
  16. model = model.cpu()
  17. else:
  18. model = model(path)
  19. tokenizer = AutoTokenizer.from_pretrained(path)
  20. # Generate dummy inputs
  21. dummy = dict(tokenizer(["test inputs"], return_tensors="pt"))
  22. # Default to BytesIO if no output file provided
  23. output = output if output else BytesIO()
  24. # Export model to ONNX
  25. export(
  26. model,
  27. (dummy,),
  28. output,
  29. opset_version=opset,
  30. do_constant_folding=True,
  31. input_names=list(inputs.keys()),
  32. output_names=list(outputs.keys()),
  33. dynamic_axes=dict(chain(inputs.items(), outputs.items())),
  34. )
  35. # Quantize model
  36. if quantize:
  37. if not ONNX_RUNTIME:
  38. raise ImportError('onnxruntime is not available - install "pipeline" extra to enable')
  39. output = self.quantization(output)
  40. if isinstance(output, BytesIO):
  41. # Reset stream and return bytes
  42. output.seek(0)
  43. output = output.read()
  44. return output