Get live statistics and analysis of Andi Marafioti's profile on X / Twitter

cooking multimodal models @huggingface (prev @unity)

531 following5k followers

The Innovator

Andi Marafioti is a trailblazer in the world of multimodal AI models, passionately pushing the limits of what’s possible with open-source vision and OCR technology. Known for their unapologetic enthusiasm and lightning-fast breakthroughs, Andi bridges the gap between cutting-edge research and accessible tech. They’re the go-to voice if you want to know how to run powerful AI models on your laptop — or even your toaster!

Impressions
597.7k-35.1k
$112.03
Likes
4.7k-719
63%
Retweets
363-42
5%
Replies
270-18
4%
Bookmarks
2.1k-111
28%

Top users who interacted with Andi Marafioti over the last 14 days

@iamRezaSayar

👨🏻‍🎓Life-long Learner👨🏻‍🎓 Kindness❤️, Helpfulness🫂 , AI🧠, Robotics🦾 & Reggaetón💃🏻

1 interactions
@JamieGale7

Transforming the worlds physical data into decision making insights

1 interactions
@ivanfioravanti

Co-founder and CTO of @CoreViewHQ GenAI/LLM addicted, Apple MLX, Microsoft 365, Azure, Kubernetes, builder of my personal dreams.

1 interactions
@lusxvr

CS @ TUM

1 interactions
@NandoDF

Writing my own AI story. Recent: NPI, AlphaGo tuning, learn to learn, AlphaCode, Gato, ReST, r-Gemma, Imagen3, Veo, Genie, MAI …

1 interactions
1 interactions
@ManuelFaysse

NLP Research, interning at FAIR @AIatMeta + PhD Candidate @CentraleSupelec Prev: @imperialcollege, @epfl

1 interactions
@MaziyarPanahi

AI x Healthcare | Creator of @OpenMed_AI | Open-Source AI Advocate ❤️ | eu/acc 🇫🇷🇪🇺

1 interactions
@ClementDelangue

Co-founder & CEO @HuggingFace 🤗, the open and collaborative platform for AI builders

1 interactions
@Observer_ofyou

cs • except cs

1 interactions
@dennis_nik75

Animator/Programmer, Christian, Conservative, Married, Father, Patriot. I love learning about aerospace, AI & robotics. Free speech equals free thought. 🚫DMs

1 interactions
@leothecurious

teaching robots to see by day, learning from nature by night. in search of elegant solutions to the metaproblem. infinitely curious.

1 interactions
@vedantadoestech

i write, research and eat. that’s all basically

1 interactions
@sir4K_zen

Automation specialist

1 interactions
@iamdiegopy

exploring the universe and building stuff

1 interactions
@Niccolg92

PhD in ML, now AI Research Lead in 🇱🇺. Here mostly AI, including sharing paper reviews. Chess, philosophy, and a travel pic may appear. Opinions are my own.

1 interactions
@altiamkabir

🧠 AI Educator | Career Coach | Founder 📧 DM for Collaboration 🚀 Want to Learn & Earn with AI? 🤝 Join our 100k+ AI community & learn AI with 27+ Free Gifts👇

1 interactions
@CodewithP

🙋‍♂️ SWE @ServiceNow 👨‍💻 Talks to software & cats 🐈 QE's are not my best friends 👀

1 interactions
1 interactions
@k7agar

your friendly neighbourhood engineer. prev world models @lossfunk.

1 interactions

Andi’s idea of ‘fine-tuning on a laptop’ sounds more like their laptop deserves a bodyguard — it’s basically running a marathon in flip-flops while their code sprints effortlessly ahead.

Spearheading the release of SmolDocling and SmolVLM, Andi helped deliver record-breaking open-source vision models that outperform competitors up to 27x larger, redefining efficiency in AI OCR.

To democratize AI technology by creating efficient, scalable, and open-source multimodal models that empower developers and enthusiasts to innovate without barriers.

They believe in transparency, open collaboration, and achieving state-of-the-art results through community-driven projects. Efficiency and accessibility are core values, proving that powerful AI can come in small, sleek packages without the need for monstrous hardware.

A fearless open-sourcer and energetic communicator who not only builds game-changing models but also makes complex AI concepts digestible and exciting for a broad audience.

Their casual and sometimes profanity-laced style might alienate more traditional or formal tech communities despite their undeniable expertise.

To grow their audience on X, Andi should continue blending technical depth with their authentic voice, leveraging informative threads with humor and transparency. Engaging more with followers through Q&As and live demos could turn passive viewers into loyal fans.

Fun fact: Andi’s SmolVLM models can run on less than 1GB of GPU memory, meaning you could practically fire one up on your toaster if it had a GPU! Talk about small but mighty.

Top tweets of Andi Marafioti

Just read the Qwen2.5-Omni technical report from the Qwen team, it's super interesting. Here are my notes. Qwen2.5-Omni is a unified end-to-end model that can perceive text, images, audio, and video — and generate both text and natural speech responses in a streaming fashion. At its core is the Thinker-Talker architecture: Thinker: a large language model that processes multimodal inputs and generates text. Talker: an autoregressive speech decoder that turns Thinker's hidden states into speech tokens. They're trained together, end-to-end. Handling audio: audio is converted to 128-channel mel-spectrograms (16kHz, 25ms window, 10ms hop). Encoded via a modified Whisper model. Audio is processed in 2s blocks with streaming-compatible attention to reduce latency. Handling video: uses a ViT-based encoder with dynamic frame sampling. Each frame is treated like an image. To sync with audio, they introduce TMRoPE — Time-aligned Multimodal RoPE — a novel positional embedding that aligns video and audio in time. TMRoPE splits positional encoding into temporal, height, and width axes, letting Qwen2.5-Omni represent image/video/audio/text all on the same timeline. Interleaving of audio and visual tokens every 2 seconds enables synchronized fusion. Streaming audio generation: audio tokens from Talker are decoded using a sliding-window DiT model + modified BigVGAN. The receptive field includes 2 lookback blocks and 1 lookahead to allow context-aware streaming audio generation. Pretraining involved locking the LLM and training the audio/vision encoders first. Later stages unfreeze everything and train on a massive mix of audio-text, video-text, image-text, and long-sequence (32k tokens) data. Post-training includes reinforcement learning for Talker to reduce hallucinations and improve pronunciation/timing. Plus, multi-speaker fine-tuning for better prosody and naturalness. Qwen2.5-Omni achieves SOTA on OmniBench, AV-Odyssey, and strong results across text, image, audio, and video tasks. End-to-end speech instruction following is nearly on par with text-based inputs. That's rare. Overall: a super ambitious and well-integrated multimodal model. The Thinker-Talker separation is elegant. TMRoPE is a clever solution to a tricky problem. That said, I wish the paper had included more ablation studies or experiments justifying some of the architectural decisions. Many claims are reasonable but would benefit from more empirical evidence. Still, major kudos to the team. Qwen2.5-Omni is a big step toward real-time, unified multimodal assistants.

32k

Today, we share the tech report for SmolVLM: Redefining small and efficient multimodal models. 🔥 Explaining how to design a tiny 256M VLM that uses less than 1GB of RAM and outperforms our 80B models from 18 months ago! Here are the coolest insights from our experiments: ✨ Longer context = Big wins: Increasing the context length from 2K to 16K gave our tiny VLMs a 60% performance boost! ✨ Smaller is smarter with SigLIP: Surprise! Smaller LLMs didn't benefit from the usual large SigLIP (400M). Instead, we use the 80M base SigLIP that performs equally well at just 20% of the original size! ✨ Pixel shuffling magic: Aggressively pixel shuffling helped our compact VLMs "see" better, achieving the same performance with sequences 16x shorter! ✨ Learned positional tokens FTW: For compact models, learned positional tokens significantly outperform raw text tokens, enhancing efficiency and accuracy. ✨ System prompts and special tokens are key: Introducing system prompts and dedicated media intro/outro tokens significantly boosted our compact VLM’s performance—especially for video tasks. ✨ Less CoT, more efficiency: Turns out, too much Chain-of-Thought (CoT) data actually hurts performance in small models. They dumb ✨ Longer videos, better results: Increasing video length during training enhanced performance on both video and image tasks. 🌟 State-of-the-Art Performance, SmolVLM comes in three powerful yet compact sizes—256M, 500M, and 2.2B parameters—each setting new SOTA benchmarks for their hardware constraints in image and video understanding. 📱 Real-world Efficiency: We've created an app using SmolVLM on an iPhone 15 and got real-time inference directly from its camera! 🌐 Browser-based Inference? Yep! We get lightning-fast inference speeds of 40-80 tokens per second directly in a web browser. No tricks, just compact, efficient models! If you’re into efficient multimodal models, you’ll love this one.

64k

Most engaged tweets of Andi Marafioti

People with Innovator archetype

The Innovator

Product designer Framer expert MagicPath expert ➞ framer.link/iziviziframers…

67 following329 followers
The Innovator

Helping professionals with breaking the traditional rules for creating a quality lifestyle. How will you find the right fit in life to create balance?

42 following36 followers
The Innovator

👨‍💻 Developer by day, AI explorer by night. Building with Data & AI. Sharing cutting-edge AI research, agents & dev tools.

1k following4k followers
The Innovator

I have a dream...

7k following7k followers
The Innovator

AI enthusiast / tech lover always exploring the future of innovation/ for Promotion 👉teajaay07@gmail.com 📧

1k following26k followers
The Innovator

AI & Tech Insights ⚡ | Ghostwriter 📝 | Passionate about AI & Tech Tools | Helping CEOs Build Personal Brands on Twitter 🚀 | DM for Collaborations 📩

120 following18k followers
The Innovator

Secure Nodes, Smart Stakes

2k following1k followers
The Innovator

Computer vision projects. Python @exactlab CFD in threejs @dualistic_twin captain @docker

1k following8k followers
The Innovator

Learning with Machines | Cosmophile

95 following174 followers
The Innovator

AR/VR Product Design and Design Engineering @LIV. Experiment with #spatialcomputing. Build creative tools.

1k following6k followers
The Innovator

Cofounder @ Machine Phase Systems | Solving humanity’s biggest challenges one atom at a time. x Blockstream, ZeroKnowledgeSystems, InJoy

12k following18k followers
The Innovator

AI tinkerer, I build my brains out, lately using AI. I experiment I ship. $30/12k MRR prontoshoot.com nostalgiapicturesai.com

694 following2k followers

Explore Related Archetypes

If you enjoy the innovator profiles, you might also like these personality types:

Supercharge your 𝕏 game,
Grow with SuperX!

Get Started for Free