×
思维导图备注
vLLM v0.7.2 Documentation
首页
AI助手
白天
夜间
小程序
阅读
书签
我的书签
添加书签
移除书签
Getting Started
Sponsor
来源:vLLM
浏览
739
扫码
2025-02-09 13:17:40
Installation
Quickstart
Examples
Troubleshooting
Frequently Asked Questions
当前内容版权归
vLLM
或其关联方所有,如需对内容或内容相关联开源项目进行关注与资助,请访问
vLLM
.
上一篇:
下一篇:
版本
vLLM v0.7.2 Documentation
vLLM v0.7.1 Documentation
vLLM v0.7.0 Documentation
vLLM v0.6.6 Documentation
vLLM v0.6.5 Documentation
vLLM v0.6.4 Documentation
vLLM v0.6.3 Documentation
vLLM v0.6.2 Documentation
vLLM v0.6.1 Documentation
vLLM v0.6.0 Documentation
vLLM v0.5.5 Documentation
vLLM v0.5.4 Documentation
vLLM v0.5.3 Documentation
vLLM v0.5.2 Documentation
vLLM v0.5.1 Documentation
vLLM v0.5.0 Documentation
vLLM v0.4.3 Documentation
vLLM v0.4.2 Documentation
vLLM v0.4.1 Documentation
Getting Started
Installation
GPU
CPU
Other AI accelerators
Quickstart
Examples
Offline Inference
AQLM Example
Arctic
Audio Language
Basic
Basic With Model Default Sampling
Chat
Chat With Tools
Classification
CLI
CPU Offload
Distributed
Embedding
Encoder Decoder
Florence2 Inference
GGUF Inference
LLM Engine Example
LoRA With Quantization Inference
MLPSpeculator
MultiLoRA Inference
Neuron
Neuron INT8 Quantization
Offline Inference with the OpenAI Batch file format
Pixtral
Prefix Caching
Profiling
vLLM TPU Profiling
Rlhf
Save Sharded State
Scoring
Simple Profiling
Structured Outputs
Torchrun Example
TPU
Vision Language
Vision Language Embedding
Vision Language Multi Image
Whisper
Online Serving
API Client
Helm Charts
Cohere Rerank Client
Disaggregated Prefill
Gradio OpenAI Chatbot Webserver
Gradio Webserver
Jinaai Rerank Client
OpenAI Chat Completion Client
OpenAI Chat Completion Client For Multimodal
OpenAI Chat Completion Client With Tools
OpenAI Chat Completion Structured Outputs
OpenAI Chat Completion With Reasoning
OpenAI Chat Completion With Reasoning Streaming
OpenAI Chat Embedding Client For Multimodal
OpenAI Completion Client
OpenAI Cross Encoder Score
OpenAI Embedding Client
OpenAI Pooling Client
Setup OpenTelemetry POC
Prometheus and Grafana
Run Cluster
Sagemaker-Entrypoint
Other
Logging Configuration
Tensorize vLLM Model
Troubleshooting
Frequently Asked Questions
Models
Generative Models
Pooling Models
List of Supported Models
Built-in Extensions
Loading models with Run:ai Model Streamer
Loading models with CoreWeave’s Tensorizer
Features
Quantization
Supported Hardware
AutoAWQ
BitsAndBytes
GGUF
INT4 W4A16
INT8 W8A8
FP8 W8A8
Quantized KV Cache
LoRA Adapters
Tool Calling
Reasoning Outputs
Structured Outputs
Automatic Prefix Caching
Disaggregated Prefilling (experimental)
Speculative Decoding
Compatibility Matrix
Inference and Serving
Offline Inference
OpenAI-Compatible Server
Multimodal Inputs
Distributed Inference and Serving
Production Metrics
Engine Arguments
Environment Variables
Usage Stats Collection
External Integrations
LangChain
LlamaIndex
Deployment
Using Docker
Using Kubernetes
Using Nginx
Using other frameworks
BentoML
Cerebrium
dstack
Helm
LWS
Modal
SkyPilot
NVIDIA Triton
External Integrations
KServe
KubeAI
Llama Stack
Performance
Optimization and Tuning
Benchmark Suites
Design Documents
Architecture Overview
Integration with HuggingFace
vLLM’s Plugin System
vLLM Paged Attention
Multi-Modal Data Processing
Automatic Prefix Caching
Python Multiprocessing
V1 Design Documents
Automatic Prefix Caching
Developer Guide
Contributing to vLLM
Profiling vLLM
Dockerfile
Adding a New Model
Implementing a Basic Model
Registering a Model to vLLM
Writing Unit Tests
Multi-Modal Support
Vulnerability Management
API Reference
Offline Inference
LLM Class
LLM Inputs
vLLM Engine
LLMEngine
AsyncLLMEngine
Inference Parameters
Multi-Modality
Input Definitions
Data Parsing
Data Processing
Memory Profiling
Registry
Model Development
Base Model Interfaces
Optional Interfaces
Model Adapters
Community
vLLM Blog
vLLM Meetups
Sponsors
暂无相关搜索结果!
本文档使用
BookStack
构建
×
文章二维码
手机扫一扫,轻松掌上读
×
文档下载
普通下载
下载码下载(免登录无限下载)
你与大神的距离,只差一个APP
请下载您需要的格式的文档,随时随地,享受汲取知识的乐趣!
PDF
文档
EPUB
文档
MOBI
文档
温馨提示
每天每在网站阅读学习一分钟时长可下载一本电子书,每天连续签到可增加阅读时长
下载码方式下载:免费、免登录、无限制。
免费获取下载码
下载码
文档格式
PDF
EPUB
MOBI
码上下载
×
微信小程序阅读
您与他人的薪资差距,只差一个随时随地学习的小程序
×
书签列表
×
阅读记录
阅读进度:
0.00%
(
0/0
)
重置阅读进度
欢迎使用AI助手
AI助手
全屏
缩小
隐藏
清空