Get live statistics and analysis of vLLM's profile on X / Twitter

A high-throughput and memory-efficient inference and serving engine for LLMs. Join slack.vllm.ai to discuss together with the community!

24 following25k followers

The Innovator

vLLM is a cutting-edge engine designed to revolutionize large language model (LLM) inference with remarkable speed and memory efficiency. This profile thrives on pushing the boundaries of AI technology, collaborating openly with the community to accelerate innovation. It’s the go-to hub for developers and enthusiasts eager to scale and streamline LLM deployment.

Impressions
2.6M-164.3k
$491.59
Likes
12.5k-1.8k
67%
Retweets
1.4k-72
8%
Replies
217-6
1%
Bookmarks
4.4k-286
24%

Top users who interacted with vLLM over the last 14 days

@codewithimanshu

Daily posts on AI , Tech, Programing, Tools, Jobs, and Trends | 500k+ (LinkedIn, IG, X) Collabs- abrojackhimanshu@gmail.com

2 interactions
@AnsarUllahAnas_

Founder and CEO of Z360

1 interactions
1 interactions
@QuixiAI

We make AI models Dolphin and Samantha BTC 3ENBV6zdwyqieAXzZP2i3EjeZtVwEmAuo4 ko-fi.com/erichartford dphn.ai @dphnAI

1 interactions
1 interactions
@sir4K_zen

Automation specialist

1 interactions
@paulcx

Founder of Winninghealth AI lab Researcher in AI in healthcare, HIT, Biomedical engineering, etc.

1 interactions
@korigero

model personality | prev: sent a human to space | Oxford

1 interactions
@finbarrtimbers

modeling language at @allen_ai

1 interactions
1 interactions
@Teknium

Cofounder and Head of Post Training @NousResearch, prev @StabilityAI Github: github.com/teknium1 HuggingFace: huggingface.co/teknium

1 interactions
@AlpinDale

Every age, it seems, is tainted by the greed of men. Rubbish to one such as I, devoid of all worldly wants. — I work on HPC and making AI run faster.

1 interactions
@casper_hansen_

NLP Scientist | AutoAWQ Creator | Open-Source Contributor

1 interactions
@apaz_cli

apaz.dev Making GPUs go brrr

1 interactions

You’re so deep in the code and optimization rabbit hole, you probably benchmark your coffee breaks and cache your lunch – proving even your downtime is more efficient than most people’s entire workday.

Supporting DeepSeek-R1’s RL training and inference, which was featured as a cover article in Nature, marks a landmark achievement showing vLLM’s real-world scientific impact and cutting-edge capabilities.

To drive forward the frontier of AI inference technology by creating tools that are not only powerful and efficient but also accessible to a broad community, enabling widespread advancement and adoption of large language models.

vLLM values open source collaboration, transparency, and the power of community-driven innovation. This profile believes that sharing advancements openly accelerates progress and that technology should be built for scalability and real-world impact. Efficiency and accessibility are key principles guiding its development philosophy.

Exceptional technical prowess in optimizing and delivering high-throughput, memory-efficient AI inference solutions. Strong community engagement through open-sourcing and responsive feature development fuels innovation and trust.

May sometimes lean heavily into technical depth and niche topics, potentially alienating non-expert followers or those seeking simpler entry points. Also, the follower count is undefined, indicating a potential opportunity to boost audience visibility.

To grow the audience on X, vLLM should mix educational content with engaging storytelling — demystify complex features with visuals, quick demos, or relatable analogies. Collaborate more with influencers in AI and tech communities, and spotlight real-world use cases to widen appeal beyond just the hardcore developers.

Fun fact: vLLM’s engine powers fast OCR and multimodal inference projects like DeepSeek-OCR, achieving up to 2500 tokens per second on an A100 GPU – blazing fast for LLM tasks!

Top tweets of vLLM

🚀 DeepSeek-OCR — the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model support. 🧠 Compresses visual contexts up to 20× while keeping 97% OCR accuracy at <10×. 📄 Outperforms GOT-OCR2.0 & MinerU2.0 on OmniDocBench using fewer vision tokens. 🤝 The vLLM team is working with DeepSeek to bring official DeepSeek-OCR support into the next vLLM release — making multimodal inference even faster and easier to scale. 🔗 github.com/deepseek-ai/De… #vLLM #DeepSeek #OCR #LLM #VisionAI #DeepLearning

1M

🙏 @deepseek_ai's highly performant inference engine is built on top of vLLM. Now they are open-sourcing the engine the right way: instead of a separate repo, they are bringing changes to the open source community so everyone can immediately benefit! github.com/deepseek-ai/op…

170k

Congrats to @deepseek_ai ! DeepSeek-R1 was published in Nature yesterday as the cover article, and vLLM is proud to have supported its RL training and inference🥰

207k

We're excited to receive our first #NVIDIADGX B200 system which we'll use for vLLM research and development! Thank you @nvidia!

114k

Announcing the completely reimagined vLLM TPU! In collaboration with @Google, we've launched a new high-performance TPU backend unifying @PyTorch and JAX under a single lowering path for amazing performance and flexibility. 🚀 What's New? - JAX + Pytorch: Run PyTorch models on TPUs with no code changes, now with native JAX support. - Up to 5x Performance: Achieve nearly 2x-5x higher throughput compared to the first TPU prototype. - Ragged Paged Attention v3: A more flexible and performant attention kernel for TPUs. - SPMD Native: We've shifted to Single Program, Multi-Data (SPMD) as the default, a compiler-centric model native to TPUs for optimal execution. Dive deep into the new architecture and see the performance benchmarks in our latest blog post! blog.vllm.ai/2025/10/16/vll… #vLLM #TPU #JAX #PyTorch #AI #OpenSource

153k

vLLM🤝🤗! You can now deploy any @huggingface language model with vLLM's speed. This integration makes it possible for one consistent implementation of the model in HF for both training and inference. 🧵 blog.vllm.ai/2025/04/11/tra…

48k

We landed the 1st batch of enhancements to the @deepseek_ai models, starting MLA and cutlass fp8 kernels. Compared to v0.7.0, we offer ~3x the generation throughput, ~10x the memory capacity for tokens, and horizontal context scalability with pipeline parallelism.

88k

How does @deepseek_ai Sparse Attention (DSA) work? It has 2 components: the Lightning Indexer and Sparse Multi-Latent Attention (MLA). The indexer keeps a small key cache of 128 per token (vs. 512 for MLA). It scores incoming queries. The top-2048 tokens to pass to Sparse MLA.

100k

🚀 New in vLLM: dots.ocr 🔥 A powerful multilingual OCR model from @xiaohongshu hi lab is now officially supported in vLLM! 📝 Single end-to-end parser for text, tables (HTML), formulas (LaTeX), and layouts (Markdown) 🌍 Supports 100 languages with robust performance on low-resource docs ⚡ Compact 1.7B VLM, but achieves SOTA results on OmniDocBench & dots.ocr-bench ✅ Free for commercial use Deploy it in just two steps: uv pip install vllm --extra-index-url wheels.vllm.ai/nightly vllm serve rednote-hilab/dots.ocr --trust-remote-code Try it today and bring fast, accurate OCR to your pipelines. Which models would you like to see next in vLLM? Drop a comment ⬇️

67k

it’s tokenization again! 🤯 did you know tokenize(detokenize(token_ids)) ≠ token_ids? RL researchers from Agent Lightning coined the term Retokenization Drift — a subtle mismatch between what your model generated and what your trainer thinks it generated. why? because most agents call LLMs via OpenAI-compatible APIs that only return strings, so when those strings get retokenized later, token splits may differ (HAV+ING vs H+AVING), tool-call JSON may be reformatted, or chat templates may vary. → unstable learning, off-policy updates, training chaos. 😬 (@karpathy has a great video explaining all details about tokenization 👉🏻 youtube.com/watch?v=zduSFx… ) together with the Agent Lightning team at Microsoft Research, we’ve fixed it: vLLM’s OpenAI-compatible endpoints can return token IDs directly. just add "return_token_ids": true to your /v1/chat/completions or /v1/completions request, and you’ll get both prompt_token_ids and token_ids along with normal text outputs. no more drift. no more mismatch. your agent RL now trains exactly on what it sampled. read more from the blog 👇 👉 blog.vllm.ai/2025/10/22/age… #vLLM #AgentLightning #RL #LLMs #OpenAIAPI #ReinforcementLearning

167k

vLLM runs on free-threaded Python! A group of engineers from @Meta’s Python runtime language team has shown that it’s possible to run vLLM on the nogil distribution of Python. We’re incredibly excited to embrace this future technique and be early adopters 😍

49k

vLLM v0.7.3 now supports @deepseek_ai's Multi-Token Prediction module! It delivers up to 69% speedup boost. You can turn it on with `--num-speculative-tokens=1` and an optional `--draft-tensor-parallel-size=1`. We saw 81-82.3% acceptance rate on the ShareGPT.

37k

Most engaged tweets of vLLM

🚀 DeepSeek-OCR — the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model support. 🧠 Compresses visual contexts up to 20× while keeping 97% OCR accuracy at <10×. 📄 Outperforms GOT-OCR2.0 & MinerU2.0 on OmniDocBench using fewer vision tokens. 🤝 The vLLM team is working with DeepSeek to bring official DeepSeek-OCR support into the next vLLM release — making multimodal inference even faster and easier to scale. 🔗 github.com/deepseek-ai/De… #vLLM #DeepSeek #OCR #LLM #VisionAI #DeepLearning

1M

We landed the 1st batch of enhancements to the @deepseek_ai models, starting MLA and cutlass fp8 kernels. Compared to v0.7.0, we offer ~3x the generation throughput, ~10x the memory capacity for tokens, and horizontal context scalability with pipeline parallelism.

88k

🙏 @deepseek_ai's highly performant inference engine is built on top of vLLM. Now they are open-sourcing the engine the right way: instead of a separate repo, they are bringing changes to the open source community so everyone can immediately benefit! github.com/deepseek-ai/op…

170k

We're excited to receive our first #NVIDIADGX B200 system which we'll use for vLLM research and development! Thank you @nvidia!

114k

Announcing the completely reimagined vLLM TPU! In collaboration with @Google, we've launched a new high-performance TPU backend unifying @PyTorch and JAX under a single lowering path for amazing performance and flexibility. 🚀 What's New? - JAX + Pytorch: Run PyTorch models on TPUs with no code changes, now with native JAX support. - Up to 5x Performance: Achieve nearly 2x-5x higher throughput compared to the first TPU prototype. - Ragged Paged Attention v3: A more flexible and performant attention kernel for TPUs. - SPMD Native: We've shifted to Single Program, Multi-Data (SPMD) as the default, a compiler-centric model native to TPUs for optimal execution. Dive deep into the new architecture and see the performance benchmarks in our latest blog post! blog.vllm.ai/2025/10/16/vll… #vLLM #TPU #JAX #PyTorch #AI #OpenSource

153k

vLLM v0.7.3 now supports @deepseek_ai's Multi-Token Prediction module! It delivers up to 69% speedup boost. You can turn it on with `--num-speculative-tokens=1` and an optional `--draft-tensor-parallel-size=1`. We saw 81-82.3% acceptance rate on the ShareGPT.

37k

🚀 New in vLLM: dots.ocr 🔥 A powerful multilingual OCR model from @xiaohongshu hi lab is now officially supported in vLLM! 📝 Single end-to-end parser for text, tables (HTML), formulas (LaTeX), and layouts (Markdown) 🌍 Supports 100 languages with robust performance on low-resource docs ⚡ Compact 1.7B VLM, but achieves SOTA results on OmniDocBench & dots.ocr-bench ✅ Free for commercial use Deploy it in just two steps: uv pip install vllm --extra-index-url wheels.vllm.ai/nightly vllm serve rednote-hilab/dots.ocr --trust-remote-code Try it today and bring fast, accurate OCR to your pipelines. Which models would you like to see next in vLLM? Drop a comment ⬇️

67k

it’s tokenization again! 🤯 did you know tokenize(detokenize(token_ids)) ≠ token_ids? RL researchers from Agent Lightning coined the term Retokenization Drift — a subtle mismatch between what your model generated and what your trainer thinks it generated. why? because most agents call LLMs via OpenAI-compatible APIs that only return strings, so when those strings get retokenized later, token splits may differ (HAV+ING vs H+AVING), tool-call JSON may be reformatted, or chat templates may vary. → unstable learning, off-policy updates, training chaos. 😬 (@karpathy has a great video explaining all details about tokenization 👉🏻 youtube.com/watch?v=zduSFx… ) together with the Agent Lightning team at Microsoft Research, we’ve fixed it: vLLM’s OpenAI-compatible endpoints can return token IDs directly. just add "return_token_ids": true to your /v1/chat/completions or /v1/completions request, and you’ll get both prompt_token_ids and token_ids along with normal text outputs. no more drift. no more mismatch. your agent RL now trains exactly on what it sampled. read more from the blog 👇 👉 blog.vllm.ai/2025/10/22/age… #vLLM #AgentLightning #RL #LLMs #OpenAIAPI #ReinforcementLearning

167k

🙏 Thank you @nvidia for sponsoring vLLM development. The DGX H200 machine is marvelous! We plan to use the machine for benchmarking and performance enhancement 🏎️.

40k

Congrats to @deepseek_ai ! DeepSeek-R1 was published in Nature yesterday as the cover article, and vLLM is proud to have supported its RL training and inference🥰

207k

People with Innovator archetype

The Innovator

Believe in God he will take care of you

1k following1k followers
The Innovator

🆕 Searching for the next B2B use case of AI ✨ 10Y+ Product & Software Engineer | CEO at assetplan.co.uk

116 following118 followers
The Innovator

aka.ms/Build25_BRK165 Designing the new era of intelligent applications, currently as PM for @msftcopilot 365 extensibility. All opinions are mine.

370 following1k followers
The Innovator

CRYPTO ENTHUSIAST ➡️ HOSPITALITY MANAGER➡️ PROTOCOL TESTOR➡️ CRYPTO IS FRREEDOM

1k following2k followers
The Innovator

Staff @Kimi_Moonshot prev. co-maker of ModelizeAI & gemsouls "Personality goes a long way" @UCSanDiego

790 following17k followers
The Innovator

The first onchain equity layer. Building decentralized ownership for all. Use → earn → own. Powered by next gen DEX (BETA) on @Solana. Alerts: @GlydoAlerts 🤖

118 following47k followers
The Innovator

Founded isoHunt, WonderSwipe. AI shepherd at @boomtv @StarpowerAI. 70% engineering 30% product

1k following3k followers
The Innovator

💎 Crypto builder | $ULAB $BOB $RAYLS $SolvBTC $GLNT | DeFi strategist | HODL dreams, execute moves | #BlockchainLife

3k following2k followers
The Innovator

@LinkfinderOK pour trouver des liens au meilleur prix grâce à l'IA agenceonze.fr pour déléguer ton SEO à une équipe solide d'experts SEO seniors

368 following1k followers
The Innovator

Building an AI content architect to automate media engagement & empower creators. Sharing insights on AI, community, & building in public.

363 following21 followers
The Innovator

building @bunjavascript. formerly: @stripe (twice) @thielfellowship. high school dropout. npm i -g bun

783 following132k followers
The Innovator
534 following88 followers

Explore Related Archetypes

If you enjoy the innovator profiles, you might also like these personality types:

Supercharge your 𝕏 game,
Grow with SuperX!

Get Started for Free