HunyuanOCR Usage Guide

Introduction

HunyuanOCR stands as a leading end-to-end OCR expert VLM powered by Hunyuan’s native multimodal architecture. In this guide, we demonstrate how to set up HunyuanOCR for online OCR serving with OpenAI compatible API server.

Installing vLLM

  1. uv venv
  2. source .venv/bin/activate
  3. # Until v0.11.3 release, you need to install vLLM from nightly build
  4. uv pip install -U vllm --pre --extra-index-url https://wheels.vllm.ai/nightly

Deploying HunyuanOCR

  1. vllm serve tencent/HunyuanOCR \
  2. --no-enable-prefix-caching \
  3. --mm-processor-cache-gb 0

Querying with OpenAI API Client

  1. from openai import OpenAI
  2. client = OpenAI(
  3. api_key="EMPTY",
  4. base_url="http://localhost:8000/v1",
  5. timeout=3600
  6. )
  7. messages = [
  8. {"role": "system", "content": ""},
  9. {
  10. "role": "user",
  11. "content": [
  12. {
  13. "type": "image_url",
  14. "image_url": {
  15. "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/chat-ui/tools-dark.png"
  16. }
  17. },
  18. {
  19. "type": "text",
  20. "text": (
  21. "Extract all information from the main body of the document image "
  22. "and represent it in markdown format, ignoring headers and footers."
  23. "Tables should be expressed in HTML format, formulas in the document "
  24. "should be represented using LaTeX format, and the parsing should be "
  25. "organized according to the reading order."
  26. )
  27. }
  28. ]
  29. }
  30. ]
  31. response = client.chat.completions.create(
  32. model="tencent/HunyuanOCR",
  33. messages=messages,
  34. temperature=0.0,
  35. extra_body={
  36. "top_k": 1,
  37. "repetition_penalty": 1.0
  38. },
  39. )
  40. print(f"Generated text: {response.choices[0].message.content}")

Configuration Tips

  • Use greedy sampling (i.e., temperature=0.0) or sampling with low temperature for the optimal OCR performance.
  • Unlike multi-turn chat use cases, we do not expect OCR tasks to benefit significantly from prefix caching or image reuse, therefore it’s recommended to turn off these features to avoid unnecessary hashing and caching.
  • Depending on your hardware capability, adjust max_num_batched_tokens for better throughput performance.
  • Check out the official HunyuanOCR documentation for more application-oriented prompts for various document parsing tasks.