vllm-project/vllm: 一个高吞吐量、内存高效的推理和服务引擎，用于LLMs

Easy, fast, and cheap LLM serving for everyone
简单、快速且经济的LLM服务，适合所有人

Ray Summit CPF is Open (June 4th to June 20th)!
Ray 峰会 CPF 现已开放 (6 月 4 日至 6 月 20 日)！

There will be a track for vLLM at the Ray Summit (09/30-10/02, SF) this year! If you have cool projects related to vLLM or LLM inference, we would love to see your proposals. This will be a great chance for everyone in the community to get together and learn.
今年的 Ray Summit（09 月 30 日至 10 月 02 日，旧金山）将设有 vLLM 专题轨道！如果你有关于 vLLM 或LLM推理的酷项目，我们非常希望看到你的提案。这将是社区成员聚在一起学习的好机会。
Please submit your proposal here
请在这里提交您的提案

Latest News 🔥 最新动态 🔥

[2024/06] We hosted the fourth vLLM meetup with Cloudflare and BentoML! Please find the meetup slides here.
[2024/06] 我们与 Cloudflare 和 BentoML 一起举办了第四次 vLLM 聚会！请在这里查看聚会的幻灯片。
[2024/04] We hosted the third vLLM meetup with Roblox! Please find the meetup slides here.
[2024/04] 我们与 Roblox 举办了第三次虚拟 LLM 聚会！请在这里查看聚会的幻灯片。
[2024/01] We hosted the second vLLM meetup in SF! Please find the meetup slides here.
[2024/01] 我们在 SF 主办了第二次 vLLM 聚会！请在这里查看聚会的幻灯片。
[2024/01] Added ROCm 6.0 support to vLLM.
[2024/01] 在 vLLM 中添加了 ROCm 6.0 支持。
[2023/12] Added ROCm 5.7 support to vLLM.
[2023/12] 为 vLLM 添加了 ROCm 5.7 支持。
[2023/10] We hosted the first vLLM meetup in SF! Please find the meetup slides here.
[2023/10] 我们在旧金山主办了第一次 vLLM 聚会！请在这里找到聚会的幻灯片。
[2023/09] We created our Discord server! Join us to discuss vLLM and LLM serving! We will also post the latest announcements and updates there.
[2023/09] 我们创建了我们的 Discord 服务器！加入我们讨论 vLLM 和LLM服务！我们也会在那里发布最新的公告和更新。
[2023/09] We released our PagedAttention paper on arXiv!
[2023/09] 我们在 arXiv 上发布了我们的 PagedAttention 论文！
[2023/08] We would like to express our sincere gratitude to Andreessen Horowitz (a16z) for providing a generous grant to support the open-source development and research of vLLM.
[2023/08] 我们衷心感谢安德森·霍洛维茨(A16Z)提供的慷慨资助，以支持 vLLM 的开源开发和研究。
[2023/07] Added support for LLaMA-2! You can run and serve 7B/13B/70B LLaMA-2s on vLLM with a single command!
[2023/07] 添加了对 LLaMA-2 的支持！现在您可以通过一个命令在 vLLM 上运行和提供 7B/13B/70B 的 LLaMA-2 模型！
[2023/06] Serving vLLM On any Cloud with SkyPilot. Check out a 1-click example to start the vLLM demo, and the blog post for the story behind vLLM development on the clouds.
[2023/06] 使用 SkyPilot 在任何云上提供 vLLM。查看一键式示例以启动 vLLM 演示，以及关于在云端开发 vLLM 的故事的博客文章。
[2023/06] We officially released vLLM! FastChat-vLLM integration has powered LMSYS Vicuna and Chatbot Arena since mid-April. Check out our blog post.
[2023/06] 我们正式发布了 vLLM！自 4 月中旬以来，FastChat-vLLM 集成已为 LMSYS Vicuna 和 Chatbot Arena 提供支持。查看我们的博客文章。

About 关于

vLLM is a fast and easy-to-use library for LLM inference and serving.
vLLM 是一个快速且易于使用的库，用于LLM推理和服务。

vLLM is fast with:
vLLM 快速实现：

State-of-the-art serving throughput
最先进的服务吞吐量
Efficient management of attention key and value memory with PagedAttention
使用 PagedAttention 进行注意力键值内存的高效管理
Continuous batching of incoming requests
连续批次处理入站请求
Fast model execution with CUDA/HIP graph
使用 CUDA/HIP 图实现快速模型执行
Quantization: GPTQ, AWQ, SqueezeLLM, FP8 KV Cache
量化：GPTQ，AWQ，SqueezeLLM，FP8 KV 缓存
Optimized CUDA kernels 优化的 CUDA 内核

vLLM is flexible and easy to use with:
vLLM 灵活易用，具有以下特点：

Seamless integration with popular Hugging Face models
与流行的 Hugging Face 模型无缝集成
High-throughput serving with various decoding algorithms, including parallel sampling, beam search, and more
高吞吐量服务，支持多种解码算法，包括并行采样、束搜索等
Tensor parallelism support for distributed inference
分布式推理的张量并行支持
Streaming outputs 流式输出
OpenAI-compatible API server
与 OpenAI 兼容的 API 服务器
Support NVIDIA GPUs, AMD GPUs, Intel CPUs and GPUs
支持 NVIDIA GPU、AMD GPU、Intel CPU 和 GPU
(Experimental) Prefix caching support
(实验性) 前缀缓存支持
(Experimental) Multi-lora support
(实验性) 多 LoRa 支持

vLLM seamlessly supports most popular open-source models on HuggingFace, including:
vLLM 平滑支持 HuggingFace 上的大多数流行开源模型，包括：

Transformer-like LLMs (e.g., Llama)
Transformer-like LLMs (例如，Llama)
Mixture-of-Expert LLMs (e.g., Mixtral)
混合专家模型 (LLMs，例如 Mixtral)
Multi-modal LLMs (e.g., LLaVA)
多模态LLMs（例如，LLaVA）

Find the full list of supported models here.
在这里可以找到全部支持的模型列表。

Getting Started 开始使用

Install vLLM with pip or from source:
使用 pip 或从源代码安装 vLLM： ```markdown 在简体中文中：通过 pip 或从源代码安装 vLLM： ```

pip install vllm

Visit our documentation to learn more.
访问我们的文档以了解更多信息。

Contributing 贡献指南

We welcome and value any contributions and collaborations. Please check out CONTRIBUTING.md for how to get involved.
我们欢迎并珍视任何贡献和协作。请查阅 CONTRIBUTING.md 了解如何参与。

Sponsors 赞助商

vLLM is a community project. Our compute resources for development and testing are supported by the following organizations. Thank you for your support!
vLLM 是一个社区项目。我们的开发和测试计算资源由以下组织支持。感谢您的支持！

a16z
AMD
Anyscale Anyscale (安赛乐)
AWS AWS 的翻译为：亚马逊网络服务（Amazon Web Services）
Crusoe Cloud 克鲁索云
Databricks Databricks (数据库布克司)
DeepInfra 深度基础设施
Dropbox
Lambda Lab Lambda 实验室
NVIDIA
Replicate 复制
Roblox Roblox（中文：罗布乐思）
RunPod 运行 Pod
Sequoia Capital 红杉资本
Trainy
UC Berkeley 加州大学伯克利分校
UC San Diego 加州大学圣地亚哥分校
ZhenFund 真格基金

We also have an official fundraising venue through OpenCollective. We plan to use the fund to support the development, maintenance, and adoption of vLLM.
我们还通过 OpenCollective 设有一个官方的筹款平台。我们计划使用这笔资金来支持 vLLM 的开发、维护和采用。

Citation 引用

If you use vLLM for your research, please cite our paper:
如果你在研究中使用了 vLLM，请引用我们的论文：

@inproceedings{kwon2023efficient,
  title={Efficient Memory Management for Large Language Model Serving with PagedAttention},
  author={Woosuk Kwon and Zhuohan Li and Siyuan Zhuang and Ying Sheng and Lianmin Zheng and Cody Hao Yu and Joseph E. Gonzalez and Hao Zhang and Ion Stoica},
  booktitle={Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles},
  year={2023}
}

Name	Name	Last commit message	Last commit date
Latest commit DarkLight1337 [CI/Build] Refactor image test assets (#5821 ) Jun 26, 2024 6984c02 · Jun 26, 2024 History 1,711 Commits
.buildkite	.buildkite	[Speculative Decoding] Support draft model on different tensor-paral…	Jun 25, 2024
.github	.github	[mypy] Enable type checking for test directory (#5017 )	Jun 15, 2024
benchmarks	benchmarks	[Speculative Decoding] Support draft model on different tensor-paral…	Jun 25, 2024
cmake	cmake	[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improv…	Jun 26, 2024
csrc	csrc	[BugFix] [Kernel] Add Cutlass2x fallback kernels (#5744 )	Jun 24, 2024
docs	docs	[Misc][Doc] Add Example of using OpenAI Server with VLM (#5832 )	Jun 26, 2024
examples	examples	[Misc][Doc] Add Example of using OpenAI Server with VLM (#5832 )	Jun 26, 2024
rocm_patch	rocm_patch	[AMD][Hardware][Misc][Bugfix] xformer cleanup and light navi logic an…	Apr 22, 2024
tests	tests	[CI/Build] Refactor image test assets (#5821 )	Jun 26, 2024
vllm	vllm	[Bugfix][TPU] Fix KV cache size calculation (#5860 )	Jun 26, 2024
.clang-format	.clang-format	[CI/Build] Enforce style for C++ and CUDA code with `clang-format` (#…	May 22, 2024
.dockerignore	.dockerignore	Build docker image with shared objects from "build" step (#2237 )	Jan 5, 2024
.gitignore	.gitignore	Add example scripts to documentation (#4225 )	Apr 23, 2024
.readthedocs.yaml	.readthedocs.yaml	Add .readthedocs.yaml (#136 )	Jun 3, 2023
.yapfignore	.yapfignore	[issue templates] add some issue templates (#3412 )	Mar 15, 2024
CMakeLists.txt	CMakeLists.txt	[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improv…	Jun 26, 2024
CONTRIBUTING.md	CONTRIBUTING.md	[Misc] Define common requirements (#3841 )	Apr 5, 2024
Dockerfile	Dockerfile	[BugFix] exclude version 1.15.0 for modelscope (#5668 )	Jun 22, 2024
Dockerfile.cpu	Dockerfile.cpu	[CI/BUILD] Support non-AVX512 vLLM building and testing (#5574 )	Jun 18, 2024
Dockerfile.neuron	Dockerfile.neuron	[Misc] Remove VLLM_BUILD_WITH_NEURON env variable (#5389 )	Jun 11, 2024
Dockerfile.rocm	Dockerfile.rocm	[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improv…	Jun 26, 2024
Dockerfile.tpu	Dockerfile.tpu	[Hardware] Initial TPU integration (#5292 )	Jun 13, 2024
Dockerfile.xpu	Dockerfile.xpu	[Hardware][Intel GPU] Add Intel GPU(XPU) inference backend (#3814 )	Jun 18, 2024
LICENSE	LICENSE	Add Apache-2.0 license (#102 )	May 15, 2023
MANIFEST.in	MANIFEST.in	[BugFix] Include target-device specific requirements.txt in sdist (#4559	May 3, 2024
README.md	README.md	[CI][Hardware][Intel GPU] add Intel GPU(XPU) ci pipeline (#5616 )	Jun 22, 2024
collect_env.py	collect_env.py	[Misc] update collect env (#5261 )	Jun 5, 2024
format.sh	format.sh	[mypy] Enable type checking for test directory (#5017 )	Jun 15, 2024
pyproject.toml	pyproject.toml	[CI/Build][Misc] Update Pytest Marker for VLMs (#5623 )	Jun 18, 2024
requirements-build.txt	requirements-build.txt	[Misc] Upgrade to `torch==2.3.0` (#4454 )	Apr 30, 2024
requirements-common.txt	requirements-common.txt	[CI/Build] Add tqdm to dependencies (#5680 )	Jun 19, 2024
requirements-cpu.txt	requirements-cpu.txt	[Bugfix] Add phi3v resize for dynamic shape and fix torchvision requi…	Jun 24, 2024
requirements-cuda.txt	requirements-cuda.txt	[Bugfix] Add phi3v resize for dynamic shape and fix torchvision requi…	Jun 24, 2024
requirements-dev.txt	requirements-dev.txt	Seperate dev requirements into lint and test (#5474 )	Jun 14, 2024
requirements-lint.txt	requirements-lint.txt	Seperate dev requirements into lint and test (#5474 )	Jun 14, 2024
requirements-neuron.txt	requirements-neuron.txt	[Misc] Define common requirements (#3841 )	Apr 5, 2024
requirements-rocm.txt	requirements-rocm.txt	[Build/CI] Enabling AMD Entrypoints Test (#4834 )	May 21, 2024
requirements-test.txt	requirements-test.txt	[Bugfix] Add phi3v resize for dynamic shape and fix torchvision requi…	Jun 24, 2024
requirements-tpu.txt	requirements-tpu.txt	[Hardware] Initial TPU integration (#5292 )	Jun 13, 2024
requirements-xpu.txt	requirements-xpu.txt	[Hardware][Intel GPU] Add Intel GPU(XPU) inference backend (#3814 )	Jun 18, 2024
setup.py	setup.py	[Hardware][Intel GPU] Add Intel GPU(XPU) inference backend (#3814 )	Jun 18, 2024

Navigate back to
返回到

Provide feedback 提供反馈

Saved searches 保存的搜索

Use saved searches to filter your results more quickly

Create list

Unstar this repository?

Notifications

Custom 自定义

Repository files navigation

Easy, fast, and cheap LLM serving for everyone
简单、快速且经济的LLM服务，适合所有人

About 关于

Getting Started 开始使用

Contributing 贡献指南

Sponsors 赞助商

Citation 引用

About

Releases 28

Used by 1.2k

Contributors 410

Languages

Create list

Unstar this repository?

License

vllm-project/vllm

Add file

Add file

Folders and files

Latest commit

History

Repository files navigation

Easy, fast, and cheap LLM serving for everyone 简单、快速且经济的LLM服务，适合所有人

About 关于

Getting Started 开始使用

Contributing 贡献指南

Sponsors 赞助商

Citation 引用

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 28

Used by 1.2k

Contributors 410

Languages

Easy, fast, and cheap LLM serving for everyone
简单、快速且经济的LLM服务，适合所有人