Get live statistics and analysis of vLLM's profile on X / Twitter

A high-throughput and memory-efficient inference and serving engine for LLMs. Join slack.vllm.ai to discuss together with the community!

24 following25k followers

Archetype analysis

The Innovator

vLLM is a cutting-edge engine designed to revolutionize large language model (LLM) inference with remarkable speed and memory efficiency. This profile thrives on pushing the boundaries of AI technology, collaborating openly with the community to accelerate innovation. It’s the go-to hub for developers and enthusiasts eager to scale and streamline LLM deployment.

Recent engagement

Impressions

2.6M-164.3k

Estimate earning$491.59

Likes

12.5k-1.8k

67%

Retweets

1.4k-72

Replies

217-6

Bookmarks

4.4k-286

24%

Get more insights about vLLM with SuperX

Social Circle

Top users who interacted with vLLM over the last 14 days

Himanshu Kumar @codewithimanshu

Daily posts on AI , Tech, Programing, Tools, Jobs, and Trends | 500k+ (LinkedIn, IG, X) Collabs- abrojackhimanshu@gmail.com

2 interactions

Ansar Ullah Anas @AnsarUllahAnas_

Founder and CEO of Z360

1 interactions

.Marcia Ferreira Dos @marciagata483

1 interactions

Eric Hartford @QuixiAI

We make AI models Dolphin and Samantha BTC 3ENBV6zdwyqieAXzZP2i3EjeZtVwEmAuo4 ko-fi.com/erichartford dphn.ai @dphnAI

1 interactions

Min Chon Chi @MinChonChiSF

1 interactions

Mykhailo Sorochuk @sir4K_zen

Automation specialist

1 interactions

Paul Chen @paulcx

Founder of Winninghealth AI lab Researcher in AI in healthcare, HIT, Biomedical engineering, etc.

1 interactions

kori @korigero

model personality | prev: sent a human to space | Oxford

1 interactions

finbarr @finbarrtimbers

modeling language at @allen_ai

1 interactions

YM @Peng1M

1 interactions

Teknium (e/λ)@Teknium

Cofounder and Head of Post Training @NousResearch, prev @StabilityAI Github: github.com/teknium1 HuggingFace: huggingface.co/teknium

1 interactions

Alpin @AlpinDale

Every age, it seems, is tainted by the greed of men. Rubbish to one such as I, devoid of all worldly wants. — I work on HPC and making AI run faster.

1 interactions

Casper Hansen @casper_hansen_

NLP Scientist | AutoAWQ Creator | Open-Source Contributor

1 interactions

apaz @apaz_cli

apaz.dev Making GPUs go brrr

1 interactions

🔥 Roast

You’re so deep in the code and optimization rabbit hole, you probably benchmark your coffee breaks and cache your lunch – proving even your downtime is more efficient than most people’s entire workday.

⚡️ Nice achievement

Supporting DeepSeek-R1’s RL training and inference, which was featured as a cover article in Nature, marks a landmark achievement showing vLLM’s real-world scientific impact and cutting-edge capabilities.

🌟 Life's purpose

To drive forward the frontier of AI inference technology by creating tools that are not only powerful and efficient but also accessible to a broad community, enabling widespread advancement and adoption of large language models.

💬 Values and Beliefs

vLLM values open source collaboration, transparency, and the power of community-driven innovation. This profile believes that sharing advancements openly accelerates progress and that technology should be built for scalability and real-world impact. Efficiency and accessibility are key principles guiding its development philosophy.

💪 Strength

Exceptional technical prowess in optimizing and delivering high-throughput, memory-efficient AI inference solutions. Strong community engagement through open-sourcing and responsive feature development fuels innovation and trust.

🫣 Weakness

May sometimes lean heavily into technical depth and niche topics, potentially alienating non-expert followers or those seeking simpler entry points. Also, the follower count is undefined, indicating a potential opportunity to boost audience visibility.

⚡️ Growth audience tips

To grow the audience on X, vLLM should mix educational content with engaging storytelling — demystify complex features with visuals, quick demos, or relatable analogies. Collaborate more with influencers in AI and tech communities, and spotlight real-world use cases to widen appeal beyond just the hardcore developers.

💁 Bonus

Fun fact: vLLM’s engine powers fast OCR and multimodal inference projects like DeepSeek-OCR, achieving up to 2500 tokens per second on an A100 GPU – blazing fast for LLM tasks!

vLLM@vllm_project · Oct 20

🚀 DeepSeek-OCR — the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model support. 🧠 Compresses visual contexts up to 20× while keeping 97% OCR accuracy at <10×. 📄 Outperforms GOT-OCR2.0 & MinerU2.0 on OmniDocBench using fewer vision tokens. 🤝 The vLLM team is working with DeepSeek to bring official DeepSeek-OCR support into the next vLLM release — making multimodal inference even faster and easier to scale. 🔗 github.com/deepseek-ai/De… #vLLM #DeepSeek #OCR #LLM #VisionAI #DeepLearning

vLLM@vllm_project · Apr 14

🙏 @deepseek_ai's highly performant inference engine is built on top of vLLM. Now they are open-sourcing the engine the right way: instead of a separate repo, they are bringing changes to the open source community so everyone can immediately benefit! github.com/deepseek-ai/op…

170k

vLLM@vllm_project · Nov 03

Wow excited to see PewDiePie using vLLM to serve language models locally 😃 vLLM brings easy, fast, and cheap LLM serving for everyone 🥰

161k

vLLM@vllm_project · Sep 18

Congrats to @deepseek_ai ! DeepSeek-R1 was published in Nature yesterday as the cover article, and vLLM is proud to have supported its RL training and inference🥰

207k

vLLM@vllm_project · Aug 17

🚀 Amazing community project! vLLM CLI — a command-line tool for serving LLMs with vLLM: ✅ Interactive menu-driven UI & scripting-friendly CLI ✅ Local + HuggingFace Hub model management ✅ Config profiles for perf/memory tuning ✅ Real-time server & GPU monitoring ✅ Error logs & recovery 📦 Install in one line: pip install vllm-cli GitHub: github.com/Chen-zexi/vllm… 👉 Would you like to see these features in vLLM itself? Try it out & share feedback!

70k

vLLM@vllm_project · Feb 21, 2025

We're excited to receive our first #NVIDIADGX B200 system which we'll use for vLLM research and development! Thank you @nvidia!

114k

vLLM@vllm_project · Oct 16

Announcing the completely reimagined vLLM TPU! In collaboration with @Google, we've launched a new high-performance TPU backend unifying @PyTorch and JAX under a single lowering path for amazing performance and flexibility. 🚀 What's New? - JAX + Pytorch: Run PyTorch models on TPUs with no code changes, now with native JAX support. - Up to 5x Performance: Achieve nearly 2x-5x higher throughput compared to the first TPU prototype. - Ragged Paged Attention v3: A more flexible and performant attention kernel for TPUs. - SPMD Native: We've shifted to Single Program, Multi-Data (SPMD) as the default, a compiler-centric model native to TPUs for optimal execution. Dive deep into the new architecture and see the performance benchmarks in our latest blog post! blog.vllm.ai/2025/10/16/vll… #vLLM #TPU #JAX #PyTorch #AI #OpenSource

153k

vLLM@vllm_project · Apr 17

vLLM🤝🤗! You can now deploy any @huggingface language model with vLLM's speed. This integration makes it possible for one consistent implementation of the model in HF for both training and inference. 🧵 blog.vllm.ai/2025/04/11/tra…

48k

vLLM@vllm_project · Feb 01, 2025

We landed the 1st batch of enhancements to the @deepseek_ai models, starting MLA and cutlass fp8 kernels. Compared to v0.7.0, we offer ~3x the generation throughput, ~10x the memory capacity for tokens, and horizontal context scalability with pipeline parallelism.

88k

vLLM@vllm_project · Sep 29

How does @deepseek_ai Sparse Attention (DSA) work? It has 2 components: the Lightning Indexer and Sparse Multi-Latent Attention (MLA). The indexer keeps a small key cache of 128 per token (vs. 512 for MLA). It scores incoming queries. The top-2048 tokens to pass to Sparse MLA.

100k

vLLM@vllm_project · Sep 28

🚀 New in vLLM: dots.ocr 🔥 A powerful multilingual OCR model from @xiaohongshu hi lab is now officially supported in vLLM! 📝 Single end-to-end parser for text, tables (HTML), formulas (LaTeX), and layouts (Markdown) 🌍 Supports 100 languages with robust performance on low-resource docs ⚡ Compact 1.7B VLM, but achieves SOTA results on OmniDocBench & dots.ocr-bench ✅ Free for commercial use Deploy it in just two steps: uv pip install vllm --extra-index-url wheels.vllm.ai/nightly vllm serve rednote-hilab/dots.ocr --trust-remote-code Try it today and bring fast, accurate OCR to your pipelines. Which models would you like to see next in vLLM? Drop a comment ⬇️

67k

vLLM@vllm_project · Oct 22

it’s tokenization again! 🤯 did you know tokenize(detokenize(token_ids)) ≠ token_ids? RL researchers from Agent Lightning coined the term Retokenization Drift — a subtle mismatch between what your model generated and what your trainer thinks it generated. why? because most agents call LLMs via OpenAI-compatible APIs that only return strings, so when those strings get retokenized later, token splits may differ (HAV+ING vs H+AVING), tool-call JSON may be reformatted, or chat templates may vary. → unstable learning, off-policy updates, training chaos. 😬 (@karpathy has a great video explaining all details about tokenization 👉🏻 youtube.com/watch?v=zduSFx… ) together with the Agent Lightning team at Microsoft Research, we’ve fixed it: vLLM’s OpenAI-compatible endpoints can return token IDs directly. just add "return_token_ids": true to your /v1/chat/completions or /v1/completions request, and you’ll get both prompt_token_ids and token_ids along with normal text outputs. no more drift. no more mismatch. your agent RL now trains exactly on what it sampled. read more from the blog 👇 👉 blog.vllm.ai/2025/10/22/age… #vLLM #AgentLightning #RL #LLMs #OpenAIAPI #ReinforcementLearning

167k

vLLM@vllm_project · Jan 27, 2025

🚀 With the v0.7.0 release today, we are excited to announce the alpha release of vLLM V1: A major architectural upgrade with 1.7x speedup! Clean code, optimized execution loop, zero-overhead prefix caching, enhanced multimodal support, and more.

83k

vLLM@vllm_project · Oct 11

🚀 vLLM x MinerU: Document Parsing at Lightning Speed! We’re excited to see MinerU fully powered by vLLM — bringing ultra-fast, accurate, and efficient document understanding to everyone. ⚡ Powered by vLLM’s high-throughput inference engine, MinerU 2.5 delivers: Instant parsing, no waiting Deeper understanding for complex docs Optimized cost — even consumer GPUs can fly Experience the new speed of intelligence: 👉 github.com/opendatalab/Mi… #vLLM #MinerU #AI #LLM #DocumentParsing #AIresearch

67k

vLLM@vllm_project · Jul 08

vLLM runs on free-threaded Python! A group of engineers from @Meta’s Python runtime language team has shown that it’s possible to run vLLM on the nogil distribution of Python. We’re incredibly excited to embrace this future technique and be early adopters 😍

49k

vLLM@vllm_project · Feb 20, 2025

vLLM v0.7.3 now supports @deepseek_ai's Multi-Token Prediction module! It delivers up to 69% speedup boost. You can turn it on with `--num-speculative-tokens=1` and an optional `--draft-tensor-parallel-size=1`. We saw 81-82.3% acceptance rate on the ShareGPT.

37k

Most engaged tweets of vLLM

vLLM@vllm_project · Oct 20

vLLM@vllm_project · Feb 01, 2025

88k

vLLM@vllm_project · Apr 14

170k

vLLM@vllm_project · Dec 25, 2024

what would you like vLLM to build/fix in 2025 :D

vLLM@vllm_project · Feb 21, 2025

We're excited to receive our first #NVIDIADGX B200 system which we'll use for vLLM research and development! Thank you @nvidia!

114k

vLLM@vllm_project · Oct 16

153k

vLLM@vllm_project · Feb 20, 2025

37k

vLLM@vllm_project · Sep 28

67k

vLLM@vllm_project · Oct 22

167k

vLLM@vllm_project · Aug 08, 2024

🙏 Thank you @nvidia for sponsoring vLLM development. The DGX H200 machine is marvelous! We plan to use the machine for benchmarking and performance enhancement 🏎️.

40k

vLLM@vllm_project · Jan 27, 2025

83k

vLLM@vllm_project · Aug 17

70k

vLLM@vllm_project · Nov 03

Wow excited to see PewDiePie using vLLM to serve language models locally 😃 vLLM brings easy, fast, and cheap LLM serving for everyone 🥰

161k

vLLM@vllm_project · Sep 05, 2024

A month ago, we announced our performance roadmap. Today, we are happy to share that the latest release achieves 🚀2.7x higher throughput and is 5x faster for output latency on Llama 8B, and 1.8x higher throughput and 2x faster on Llama 70B for H100s. blog.vllm.ai/2024/09/05/per…

88k

vLLM@vllm_project · Jan 01, 1970

vLLM@vllm_project · Sep 18

Congrats to @deepseek_ai ! DeepSeek-R1 was published in Nature yesterday as the cover article, and vLLM is proud to have supported its RL training and inference🥰

207k

People with Innovator archetype

The Innovator

Morty@coolmortyvibe

Believe in God he will take care of you

1k following1k followers

The Innovator

Andrea Giacon@AndryHTC

🆕 Searching for the next B2B use case of AI ✨ 10Y+ Product & Software Engineer | CEO at assetplan.co.uk

116 following118 followers

The Innovator

Abram Jackson@abrakjamson

aka.ms/Build25_BRK165 Designing the new era of intelligent applications, currently as PM for @msftcopilot 365 extensibility. All opinions are mine.

370 following1k followers

The Innovator

ALPHA CHRIS@chibuezentachi

CRYPTO ENTHUSIAST ➡️ HOSPITALITY MANAGER➡️ PROTOCOL TESTOR➡️ CRYPTO IS FRREEDOM

1k following2k followers

The Innovator

Crystal@crystalsssup

Staff @Kimi_Moonshot prev. co-maker of ModelizeAI & gemsouls "Personality goes a long way" @UCSanDiego

790 following17k followers

The Innovator

Glyde@GlydeGG

The first onchain equity layer. Building decentralized ownership for all. Use → earn → own. Powered by next gen DEX (BETA) on @Solana. Alerts: @GlydoAlerts 🤖

118 following47k followers

The Innovator