思维导图备注

vLLM v0.6.4 Documentation
召唤码灵薯 首页 白天 夜间 BookChat 小程序 小程序 阅读
  • 书签 我的书签
  • 添加书签 添加书签 移除书签 移除书签

Models

 Sponsor  来源:vLLM 浏览 34 扫码 分享 2025-02-10 14:10:45
  • Supported Models
  • Adding a New Model
  • Enabling Multimodal Inputs
  • Engine Arguments
  • Using LoRA adapters
  • Using VLMs
  • Speculative decoding in vLLM
  • Performance and Tuning
当前内容版权归 vLLM 或其关联方所有,如需对内容或内容相关联开源项目进行关注与资助,请访问 vLLM .
上一篇:
下一篇:
  • 书签
  • 添加书签 移除书签
  • vLLM v0.7.2 Documentation
  • vLLM v0.7.1 Documentation
  • vLLM v0.7.0 Documentation
  • vLLM v0.6.6 Documentation
  • vLLM v0.6.5 Documentation
  • vLLM v0.6.4 Documentation
  • vLLM v0.6.3 Documentation
  • vLLM v0.6.2 Documentation
  • vLLM v0.6.1 Documentation
  • vLLM v0.6.0 Documentation
  • vLLM v0.5.5 Documentation
  • vLLM v0.5.4 Documentation
  • vLLM v0.5.3 Documentation
  • vLLM v0.5.2 Documentation
  • vLLM v0.5.1 Documentation
  • vLLM v0.5.0 Documentation
  • vLLM v0.4.3 Documentation
  • vLLM v0.4.2 Documentation
  • vLLM v0.4.1 Documentation
  • Getting Started
    • Installation
    • Installation with ROCm
    • Installation with OpenVINO
    • Installation with CPU
    • Installation with Intel® Gaudi® AI Accelerators
    • Installation with Neuron
    • Installation with TPU
    • Installation with XPU
    • Quickstart
    • Debugging Tips
    • Examples
      • API Client
      • Aqlm Example
      • Cpu Offload
      • Florence2 Inference
      • Gguf Inference
      • Gradio OpenAI Chatbot Webserver
      • Gradio Webserver
      • LLM Engine Example
      • Lora With Quantization Inference
      • MultiLoRA Inference
      • Offline Chat With Tools
      • Offline Inference
      • Offline Inference Arctic
      • Offline Inference Audio Language
      • Offline Inference Chat
      • Offline Inference Distributed
      • Offline Inference Embedding
      • Offline Inference Encoder Decoder
      • Offline Inference Mlpspeculator
      • Offline Inference Neuron
      • Offline Inference Neuron Int8 Quantization
      • Offline Inference Pixtral
      • Offline Inference Tpu
      • Offline Inference Vision Language
      • Offline Inference Vision Language Embedding
      • Offline Inference Vision Language Multi Image
      • Offline Inference With Prefix
      • Offline Inference With Profiler
      • Offline Profile
      • OpenAI Chat Completion Client
      • OpenAI Chat Completion Client For Multimodal
      • OpenAI Chat Completion Client With Tools
      • OpenAI Chat Embedding Client For Multimodal
      • OpenAI Completion Client
      • OpenAI Embedding Client
      • Save Sharded State
      • Tensorize vLLM Model
  • Serving
    • OpenAI Compatible Server
    • Deploying with Docker
    • Deploying with Kubernetes
    • Deploying with Nginx Loadbalancer
    • Distributed Inference and Serving
    • Production Metrics
    • Environment Variables
    • Usage Stats Collection
    • Integrations
      • Deploying and scaling up with SkyPilot
      • Deploying with KServe
      • Deploying with NVIDIA Triton
      • Deploying with BentoML
      • Deploying with Cerebrium
      • Deploying with LWS
      • Deploying with dstack
      • Serving with Langchain
      • Serving with llama_index
      • Serving with Llama Stack
    • Loading Models with CoreWeave’s Tensorizer
    • Compatibility Matrix
    • Frequently Asked Questions
  • Models
    • Supported Models
    • Adding a New Model
    • Enabling Multimodal Inputs
    • Engine Arguments
    • Using LoRA adapters
    • Using VLMs
    • Speculative decoding in vLLM
    • Performance and Tuning
  • Quantization
    • Supported Hardware for Quantization Kernels
    • AutoAWQ
    • BitsAndBytes
    • GGUF
    • INT8 W8A8
    • FP8 W8A8
    • FP8 E5M2 KV Cache
    • FP8 E4M3 KV Cache
  • Automatic Prefix Caching
    • Introduction
    • Implementation
  • Performance
    • Benchmark Suites
  • Community
    • vLLM Meetups
    • Sponsors
  • API Documentation
    • Sampling Parameters
    • Pooling Parameters
    • Offline Inference
      • LLM Class
      • LLM Inputs
    • vLLM Engine
      • LLMEngine
      • AsyncLLMEngine
  • Design
    • vLLM’s Class Hierarchy
    • Integration with HuggingFace
    • Input Processing
      • Input Processing Pipeline
    • vLLM Paged Attention
    • Multi-Modality
      • Adding a Multimodal Plugin
  • For Developers
    • Contributing to vLLM
    • Profiling vLLM
    • Dockerfile
暂无相关搜索结果!

    本文档使用 BookStack 构建

    展开/收起文章目录

    分享,让知识传承更久远

    文章二维码

    手机扫一扫,轻松掌上读

    文档下载

    • 普通下载
    • 下载码下载(免登录无限下载)
    你与大神的距离,只差一个APP
    APP下载
    请下载您需要的格式的文档,随时随地,享受汲取知识的乐趣!
    PDF文档 EPUB文档 MOBI文档
    温馨提示 每天每在网站阅读学习一分钟时长可下载一本电子书,每天连续签到可增加阅读时长
    下载码方式下载:免费、免登录、无限制。 免费获取下载码

    微信小程序阅读

    BookChat 微信小程序阅读
    您与他人的薪资差距,只差一个随时随地学习的小程序

    书签列表

      阅读记录

      阅读进度: 0.00% ( 0/0 ) 重置阅读进度

        欢迎使用【码灵薯·CoderBot】 码灵薯·CoderBot
        全屏 缩小 隐藏 新标签