Input Processing Pipeline
Input data is passed to
LLMEngine(orAsyncLLMEngine).Tokenize the data if necessary.
Process the inputs using INPUT_REGISTRY.process_input.
- For example, add placeholder tokens to reserve KV cache for multi-modal embeddings.
Send the processed inputs to
ExecutorBase.Distribute the inputs via
WorkerBasetoModelRunnerBase.If the data contains multi-modal data, convert it into keyword arguments using
MULTIMODAL_REGISTRY.map_input.- For example, convert a PIL.Image.Image input to its pixel values for a vision language model.