Fluent Interface

Jina provides a simple fluent interface for Document that allows one to process (often preprocess) a Document object by chaining methods. For example to read an image file as numpy.ndarray, resize it, normalize it and then store it to another file; one can simply do:

  1. from jina import Document
  2. d = (
  3. Document(uri='apple.png')
  4. .load_uri_to_image_blob()
  5. .set_image_blob_shape((64, 64))
  6. .set_image_blob_normalization()
  7. .dump_image_blob_to_file('apple1.png')
  8. )

../../../_images/apple1.png

Original apple.png

../../../_images/apple11.png

Processed apple1.png

Important

Note that, chaining methods always modify the original Document in-place. That means the above example is equivalent to:

  1. from jina import Document
  2. d = Document(uri='apple.png')
  3. (d.load_uri_to_image_blob()
  4. .set_image_blob_shape((64, 64))
  5. .set_image_blob_normalization()
  6. .dump_image_blob_to_file('apple1.png'))

Parallelization

Fluent interface is super useful when processing a large DocumentArray or DocumentArrayMemmap. One can leverage map() to speed up things quite a lot.

The following example shows the time difference on preprocessing ~6000 image Documents.

  1. from jina import DocumentArray
  2. from jina.logging.profile import TimeContext
  3. docs = DocumentArray.from_files('*.jpg')
  4. def foo(d):
  5. return (d.load_uri_to_image_blob()
  6. .set_image_blob_normalization()
  7. .set_image_blob_channel_axis(-1, 0))
  8. with TimeContext('map-process'):
  9. for d in docs.map(foo, backend='process'):
  10. pass
  11. with TimeContext('map-thread'):
  12. for d in docs.map(foo, backend='thread'):
  13. pass
  14. with TimeContext('for-loop'):
  15. for d in docs:
  16. foo(d)
  1. map-process ... map-process takes 5 seconds (5.55s)
  2. map-thread ... map-thread takes 10 seconds (10.28s)
  3. for-loop ... for-loop takes 18 seconds (18.52s)

Methods

All the following methods can be chained.

Convert

Provide helper functions for Document to support conversion between blob, text and buffer.

  • convert_blob_to_buffer()

  • convert_buffer_to_blob()

  • convert_uri_to_datauri()

TextData

Provide helper functions for Document to support text data.

  • convert_blob_to_text()

  • convert_text_to_blob()

  • dump_text_to_datauri()

  • load_uri_to_text()

ImageData

Provide helper functions for Document to support image data.

  • convert_buffer_to_image_blob()

  • convert_image_blob_to_buffer()

  • convert_image_blob_to_sliding_windows()

  • convert_image_blob_to_uri()

  • dump_image_blob_to_file()

  • load_uri_to_image_blob()

  • set_image_blob_channel_axis()

  • set_image_blob_inv_normalization()

  • set_image_blob_normalization()

  • set_image_blob_shape()

AudioData

Provide helper functions for Document to support audio data.

  • dump_audio_blob_to_file()

  • load_uri_to_audio_blob()

BufferData

Provide helper functions for Document to handle binary data.

  • dump_buffer_to_datauri()

  • load_uri_to_buffer()

DumpFile

Provide helper functions for Document to dump content to a file.

  • dump_buffer_to_file()

  • dump_uri_to_file()

ContentProperty

Provide helper functions for Document to allow universal content property access.

  • dump_content_to_datauri()

VideoData

Provide helper functions for Document to support video data.

  • dump_video_blob_to_file()

  • load_uri_to_video_blob()

SingletonSugar

Provide sugary syntax for Document by inheriting methods from DocumentArray

  • embed()

  • match()

MeshData

Provide helper functions for Document to support 3D mesh data and point cloud.

  • load_uri_to_point_cloud_blob()