×
思维导图备注
vLLM v0.6.4 Documentation
召唤码灵薯
首页
白天
夜间
小程序
阅读
书签
我的书签
添加书签
移除书签
API Documentation
Sponsor
来源:vLLM
浏览
30
扫码
分享
2025-02-10 14:10:42
Sampling Parameters
Pooling Parameters
Offline Inference
vLLM Engine
当前内容版权归
vLLM
或其关联方所有,如需对内容或内容相关联开源项目进行关注与资助,请访问
vLLM
.
上一篇:
下一篇:
版本
vLLM v0.7.2 Documentation
vLLM v0.7.1 Documentation
vLLM v0.7.0 Documentation
vLLM v0.6.6 Documentation
vLLM v0.6.5 Documentation
vLLM v0.6.4 Documentation
vLLM v0.6.3 Documentation
vLLM v0.6.2 Documentation
vLLM v0.6.1 Documentation
vLLM v0.6.0 Documentation
vLLM v0.5.5 Documentation
vLLM v0.5.4 Documentation
vLLM v0.5.3 Documentation
vLLM v0.5.2 Documentation
vLLM v0.5.1 Documentation
vLLM v0.5.0 Documentation
vLLM v0.4.3 Documentation
vLLM v0.4.2 Documentation
vLLM v0.4.1 Documentation
Getting Started
Installation
Installation with ROCm
Installation with OpenVINO
Installation with CPU
Installation with Intel® Gaudi® AI Accelerators
Installation with Neuron
Installation with TPU
Installation with XPU
Quickstart
Debugging Tips
Examples
API Client
Aqlm Example
Cpu Offload
Florence2 Inference
Gguf Inference
Gradio OpenAI Chatbot Webserver
Gradio Webserver
LLM Engine Example
Lora With Quantization Inference
MultiLoRA Inference
Offline Chat With Tools
Offline Inference
Offline Inference Arctic
Offline Inference Audio Language
Offline Inference Chat
Offline Inference Distributed
Offline Inference Embedding
Offline Inference Encoder Decoder
Offline Inference Mlpspeculator
Offline Inference Neuron
Offline Inference Neuron Int8 Quantization
Offline Inference Pixtral
Offline Inference Tpu
Offline Inference Vision Language
Offline Inference Vision Language Embedding
Offline Inference Vision Language Multi Image
Offline Inference With Prefix
Offline Inference With Profiler
Offline Profile
OpenAI Chat Completion Client
OpenAI Chat Completion Client For Multimodal
OpenAI Chat Completion Client With Tools
OpenAI Chat Embedding Client For Multimodal
OpenAI Completion Client
OpenAI Embedding Client
Save Sharded State
Tensorize vLLM Model
Serving
OpenAI Compatible Server
Deploying with Docker
Deploying with Kubernetes
Deploying with Nginx Loadbalancer
Distributed Inference and Serving
Production Metrics
Environment Variables
Usage Stats Collection
Integrations
Deploying and scaling up with SkyPilot
Deploying with KServe
Deploying with NVIDIA Triton
Deploying with BentoML
Deploying with Cerebrium
Deploying with LWS
Deploying with dstack
Serving with Langchain
Serving with llama_index
Serving with Llama Stack
Loading Models with CoreWeave’s Tensorizer
Compatibility Matrix
Frequently Asked Questions
Models
Supported Models
Adding a New Model
Enabling Multimodal Inputs
Engine Arguments
Using LoRA adapters
Using VLMs
Speculative decoding in vLLM
Performance and Tuning
Quantization
Supported Hardware for Quantization Kernels
AutoAWQ
BitsAndBytes
GGUF
INT8 W8A8
FP8 W8A8
FP8 E5M2 KV Cache
FP8 E4M3 KV Cache
Automatic Prefix Caching
Introduction
Implementation
Performance
Benchmark Suites
Community
vLLM Meetups
Sponsors
API Documentation
Sampling Parameters
Pooling Parameters
Offline Inference
LLM Class
LLM Inputs
vLLM Engine
LLMEngine
AsyncLLMEngine
Design
vLLM’s Class Hierarchy
Integration with HuggingFace
Input Processing
Input Processing Pipeline
vLLM Paged Attention
Multi-Modality
Adding a Multimodal Plugin
For Developers
Contributing to vLLM
Profiling vLLM
Dockerfile
暂无相关搜索结果!
本文档使用
BookStack
构建
×
分享,让知识传承更久远
×
文章二维码
手机扫一扫,轻松掌上读
×
文档下载
普通下载
下载码下载(免登录无限下载)
你与大神的距离,只差一个APP
请下载您需要的格式的文档,随时随地,享受汲取知识的乐趣!
PDF
文档
EPUB
文档
MOBI
文档
温馨提示
每天每在网站阅读学习一分钟时长可下载一本电子书,每天连续签到可增加阅读时长
下载码方式下载:免费、免登录、无限制。
免费获取下载码
下载码
文档格式
PDF
EPUB
MOBI
码上下载
×
微信小程序阅读
您与他人的薪资差距,只差一个随时随地学习的小程序
×
书签列表
×
阅读记录
阅读进度:
0.00%
(
0/0
)
重置阅读进度
欢迎使用【码灵薯·CoderBot】
码灵薯·CoderBot
全屏
缩小
隐藏
新标签