Multi-Modality

vLLM provides experimental support for multi-modal models through the vllm.multimodal package.

Multi-modal inputs can be passed alongside text and token prompts to supported models via the multi_modal_data field in vllm.inputs.PromptInputs.

Currently, vLLM only has built-in support for image data. You can extend vLLM to process additional modalities by following this guide.

Looking to add your own multi-modal model? Please follow the instructions listed here.

Guides

Module Contents

Registry

Base Classes

Image Classes