Get live statistics and analysis of anshuman's profile on X / Twitter

ml @zomato; prev: ai consultant @google

879 following18k followers

The Analyst

Anshuman is a deeply analytical machine learning engineer who brilliantly connects complex AI concepts with everyday experiences. His tweets reveal a brain wired to decode intricate technical details while making them accessible and relatable. He's a natural explainer and problem solver who thrives on clarity and insight.

Impressions
1.5M15.1k
$298.51
Likes
10.5k400
45%
Retweets
629-4
3%
Replies
46266
2%
Bookmarks
11.9k-386
51%

Top users who interacted with anshuman over the last 14 days

2 interactions
@prajpawar23

22 // ml @qualcomm // prev - gpu engg @amd

2 interactions
@RaviRaiML

Freelance ML Engineer | Fixing AI products with MLOps

2 interactions
@hrishikesshhhh

20 || 6'2 || CS-22 || Software Developer|| Full-Stack Dev

2 interactions
@HyunRish

another digital footprint 👣

2 interactions
@abhi1thakur

AI Search @vespaengine, ex-@huggingface, World's First 4x GM @kaggle, YouTube 120k+: youtube.com/@abhishekkrtha…

1 interactions
@_PaperMoose_

Built ARC-AGI 2 evals @gregkamrad. Ex-CTO @ DentoAI. Built findmymedsapp.com for Novo Nordisk. Building automated reliability testing for healthcare

1 interactions
@threadreaderapp

I'm a 🤖 to help you read threads more easily. Reply to any tweet of a thread and mention me with the "unroll" keyword and I'll give you a link back 😀

1 interactions
@leodoan_

software engineer. crafting impactful things to open source world | building overwrite: mnismt.com/overwrite | changelogs: changelogs.directory

1 interactions
1 interactions
@Hari1275866

GenAI & Data engineering | Tech Enthusiast | Programmer | keen to learn new technology |

1 interactions
@abtw3t

robotics + ml | @SAEIntl student

1 interactions
1 interactions
1 interactions
@NiiMante

Engineer. Investor. Traveler

1 interactions
@duborges

digital entrepreneur since 1997 ≫ saas ≫ mobile apps ≫ chrome extensions ≫ programmatic sites ≫ softwares ≫ chatbots ≫ hacking ≫ AI

1 interactions
@joefioti

it's not possible, it's necessary. building a compiler @luminal_ai (yc s25) to solve inference.

1 interactions
1 interactions
@_willfalcon

CEO @LightningAI. Creator, PyTorch Lightning⚡, Former AI PhD student (pretraining, researcher) @metaAI @CILVRatNYU w @kchonyc @ylecun

1 interactions
@georgecurtiss

CEO at @helixdb | YC X25 | calisthenics enjoyer 😎 | 🇬🇧 | 6’4” | 23 Star the GH! github.com/helixdb/helix-…

1 interactions

Anshuman’s tweets are so heavily laden with Transformer math, even his love life seems to be stuck in multi-head attention—he’s got all the right heads, just waiting for the algorithm to optimize dating outcomes!

His tweet on transformer attention mechanisms humorously and insightfully went viral, garnering over 700K views and 12K likes, establishing him as a go-to voice for insightful ML explanations on social media.

To demystify the complexities of machine learning and AI by translating technical jargon into engaging narratives that educate, inform, and inspire both peers and enthusiasts.

Anshuman values precision, intellectual rigor, and clarity of thought. He believes that understanding complex systems requires breaking down layered information into digestible parts and that knowledge sharing drives progress. He embraces the power of data-driven insights and thoughtful curiosity.

His strengths lie in his exceptional ability to analyze, simplify, and communicate complex AI concepts, turning abstract ideas into relatable stories that resonate widely.

His focus on detailed technical explanations might sometimes overwhelm audiences unfamiliar with jargon, potentially limiting broader engagement.

To grow his audience on X, Anshuman should blend his deep technical content with more accessible threads and interactive Q&A sessions, leveraging storytelling and relatable analogies to invite engagement from both experts and curious beginners.

Fun fact: Anshuman masterfully equates romantic relationship dynamics with transformer attention mechanisms, showing his unique ability to blend technical expertise with humor and emotional insight.

Top tweets of anshuman

She dumped me last night. Not because I don't listen. Not because I'm always on my phone. Not even because I forgot our anniversary (twice). But because, in her exact words: "You only pay attention to the parts of what I say that you think are important." I stared at her for a moment and realized... She just perfectly described the attention mechanism in transformers. Turns out I wasn't being a bad boyfriend. I was being mathematically optimal. See, in conversations (and transformers), you don't give equal weight to every word. Some words matter more for understanding context. Attention figures out exactly HOW important each word should be. Here's the beautiful math: Attention(Q, K, V) = softmax(QK^T / √d_k)V Breaking it down: Q (Query): "What am I looking for?" K (Key): "What info is available?" V (Value): "What is that info?" d_k: Key dimension (for scaling) Think library analogy: You have a question (Query). Books have titles (Keys) and content (Values). Attention finds which books are most relevant. Step-by-step with "The cat sat on the mat": Step 1: Create Q, K, VEach word → three vectors via learned matrices W_Q, W_K, W_V For "cat": Query: "What should I attend to when processing 'cat'?" Key: "I am 'cat'" Value: "Here's cat info" Step 2: Calculate scoresQK^T = how much each word should attend to others Processing "sat"? High similarity with "cat" (cats sit) and "mat" (where sitting happens). Step 3: Scale by √d_kPrevents dot products from getting too large, keeps softmax balanced. Step 4: SoftmaxConverts scores to probabilities: "cat": 0.4 (subject) "sat": 0.3 (action) "mat": 0.2 (location) "on": 0.1 (preposition) "the": 0.1 (article) Step 5: Weight valuesMultiply each word's value by attention weight, sum up. Now "sat" knows it's most related to "cat" and "mat". Multi-Head Magic:Transformers do this multiple times in parallel: Head 1: Subject-verb relationships Head 2: Spatial ("on", "in", "under") Head 3: Temporal ("before", "after") Head 4: Semantic similarity Each head learns different relationship types. Why This Changed Everything: Before: RNNs = reading with flashlight (one word at a time, forget the beginning) After: Attention = floodlights on entire sentence with dimmer switches This is why ChatGPT can: Remember 50 messages ago Know "it" refers to something specific Understand "bank" = money vs river based on context The Kicker:Models learn these patterns from data alone. Nobody programmed grammar rules. It figured out language structure just by predicting next words. Attention is how AI learned to read between the lines. Just like my therapist helped me understand my focus patterns, maybe understanding transformers helps us see how we decide what matters. Now if only I could implement multi-head attention in dating... 🤖 Still waiting for "scaled dot-product listening" to be invented.

712k

She dumped me last night. Not because I don't listen. Not because I'm always on my phone. Not even because I forgot our anniversary (twice). But because, in her exact words: "You only pay attention to the parts of what I say that you think are important." I stared at her for a moment and realized... She just perfectly described the attention mechanism in transformers. Turns out I wasn't being a bad boyfriend. I was being mathematically optimal. See, in conversations (and transformers), you don't give equal weight to every word. Some words matter more for understanding context. Attention figures out exactly HOW important each word should be. Here's the beautiful math: Attention(Q, K, V) = softmax(QK^T / √d_k)V Breaking it down: Q (Query): "What am I looking for?" K (Key): "What info is available?" V (Value): "What is that info?" d_k: Key dimension (for scaling) Think library analogy: You have a question (Query). Books have titles (Keys) and content (Values). Attention finds which books are most relevant. Step-by-step with "The cat sat on the mat": Step 1: Create Q, K, VEach word → three vectors via learned matrices W_Q, W_K, W_V For "cat": Query: "What should I attend to when processing 'cat'?" Key: "I am 'cat'" Value: "Here's cat info" Step 2: Calculate scoresQK^T = how much each word should attend to others Processing "sat"? High similarity with "cat" (cats sit) and "mat" (where sitting happens). Step 3: Scale by √d_kPrevents dot products from getting too large, keeps softmax balanced. Step 4: SoftmaxConverts scores to probabilities: "cat": 0.4 (subject) "sat": 0.3 (action) "mat": 0.2 (location) "on": 0.1 (preposition) "the": 0.1 (article) Step 5: Weight valuesMultiply each word's value by attention weight, sum up. Now "sat" knows it's most related to "cat" and "mat". Multi-Head Magic:Transformers do this multiple times in parallel: Head 1: Subject-verb relationships Head 2: Spatial ("on", "in", "under") Head 3: Temporal ("before", "after") Head 4: Semantic similarity Each head learns different relationship types. Why This Changed Everything: Before: RNNs = reading with flashlight (one word at a time, forget the beginning) After: Attention = floodlights on entire sentence with dimmer switches This is why ChatGPT can: Remember 50 messages ago Know "it" refers to something specific Understand "bank" = money vs river based on context The Kicker:Models learn these patterns from data alone. Nobody programmed grammar rules. It figured out language structure just by predicting next words. Attention is how AI learned to read between the lines. Just like my therapist helped me understand my focus patterns, maybe understanding transformers helps us see how we decide what matters. Now if only I could implement multi-head attention in dating... 🤖 Still waiting for "scaled dot-product listening" to be invented.

193k

I rejected a job offer yesterday. Not because of the salary. Not because of the tech stack. Not even because of the long hours they warned me about. But because, when I asked how they evaluate their AI systems, the hiring manager said: "We just ask it some questions and see if the answers sound right." I stared at them for a moment and realized... They just described the biggest problem in AI today. See, "sounds right" isn't a measurement. It's a hope. Here's what proper LLM evaluation actually looks like: - Accuracy: Can it get factual questions right? (Not 80% of the time. Consistently.) - Hallucination rate: How often does it make things up? (This should be near zero for critical applications.) - Bias metrics: Does it treat all groups fairly? (Measured across demographics, not assumed.) Real Evaluation Frameworks: - BLEU scores for translation quality Perplexity for language modeling Human evaluation with inter-annotator agreement Adversarial testing (red teaming) Domain-specific benchmarks (legal, medical, financial) The Process: > Define success criteria BEFORE deployment > Create diverse test sets (not just happy paths) > Measure consistently across model versions > Track performance over time (models drift) Have humans validate edge cases Why This Matters: Before proper evals: "Our model is amazing!" (based on cherry-picked examples) After proper evals: "Our AI achieves 94.2% accuracy on domain X, with known failure modes Y and Z" The difference? One builds trust. The other destroys it when reality hits. The kicker: Most companies are still in the "sounds right" phase. They're deploying models evaluated by vibes, not metrics. Just like you wouldn't join a team that deploys code without tests, you shouldn't join one that deploys AI without proper evaluation. What's your experience with LLM evaluation? Are we measuring what actually matters?

281k

Most engaged tweets of anshuman

She dumped me last night. Not because I don't listen. Not because I'm always on my phone. Not even because I forgot our anniversary (twice). But because, in her exact words: "You only pay attention to the parts of what I say that you think are important." I stared at her for a moment and realized... She just perfectly described the attention mechanism in transformers. Turns out I wasn't being a bad boyfriend. I was being mathematically optimal. See, in conversations (and transformers), you don't give equal weight to every word. Some words matter more for understanding context. Attention figures out exactly HOW important each word should be. Here's the beautiful math: Attention(Q, K, V) = softmax(QK^T / √d_k)V Breaking it down: Q (Query): "What am I looking for?" K (Key): "What info is available?" V (Value): "What is that info?" d_k: Key dimension (for scaling) Think library analogy: You have a question (Query). Books have titles (Keys) and content (Values). Attention finds which books are most relevant. Step-by-step with "The cat sat on the mat": Step 1: Create Q, K, VEach word → three vectors via learned matrices W_Q, W_K, W_V For "cat": Query: "What should I attend to when processing 'cat'?" Key: "I am 'cat'" Value: "Here's cat info" Step 2: Calculate scoresQK^T = how much each word should attend to others Processing "sat"? High similarity with "cat" (cats sit) and "mat" (where sitting happens). Step 3: Scale by √d_kPrevents dot products from getting too large, keeps softmax balanced. Step 4: SoftmaxConverts scores to probabilities: "cat": 0.4 (subject) "sat": 0.3 (action) "mat": 0.2 (location) "on": 0.1 (preposition) "the": 0.1 (article) Step 5: Weight valuesMultiply each word's value by attention weight, sum up. Now "sat" knows it's most related to "cat" and "mat". Multi-Head Magic:Transformers do this multiple times in parallel: Head 1: Subject-verb relationships Head 2: Spatial ("on", "in", "under") Head 3: Temporal ("before", "after") Head 4: Semantic similarity Each head learns different relationship types. Why This Changed Everything: Before: RNNs = reading with flashlight (one word at a time, forget the beginning) After: Attention = floodlights on entire sentence with dimmer switches This is why ChatGPT can: Remember 50 messages ago Know "it" refers to something specific Understand "bank" = money vs river based on context The Kicker:Models learn these patterns from data alone. Nobody programmed grammar rules. It figured out language structure just by predicting next words. Attention is how AI learned to read between the lines. Just like my therapist helped me understand my focus patterns, maybe understanding transformers helps us see how we decide what matters. Now if only I could implement multi-head attention in dating... 🤖 Still waiting for "scaled dot-product listening" to be invented.

712k

She dumped me last night. Not because I don't listen. Not because I'm always on my phone. Not even because I forgot our anniversary (twice). But because, in her exact words: "You only pay attention to the parts of what I say that you think are important." I stared at her for a moment and realized... She just perfectly described the attention mechanism in transformers. Turns out I wasn't being a bad boyfriend. I was being mathematically optimal. See, in conversations (and transformers), you don't give equal weight to every word. Some words matter more for understanding context. Attention figures out exactly HOW important each word should be. Here's the beautiful math: Attention(Q, K, V) = softmax(QK^T / √d_k)V Breaking it down: Q (Query): "What am I looking for?" K (Key): "What info is available?" V (Value): "What is that info?" d_k: Key dimension (for scaling) Think library analogy: You have a question (Query). Books have titles (Keys) and content (Values). Attention finds which books are most relevant. Step-by-step with "The cat sat on the mat": Step 1: Create Q, K, VEach word → three vectors via learned matrices W_Q, W_K, W_V For "cat": Query: "What should I attend to when processing 'cat'?" Key: "I am 'cat'" Value: "Here's cat info" Step 2: Calculate scoresQK^T = how much each word should attend to others Processing "sat"? High similarity with "cat" (cats sit) and "mat" (where sitting happens). Step 3: Scale by √d_kPrevents dot products from getting too large, keeps softmax balanced. Step 4: SoftmaxConverts scores to probabilities: "cat": 0.4 (subject) "sat": 0.3 (action) "mat": 0.2 (location) "on": 0.1 (preposition) "the": 0.1 (article) Step 5: Weight valuesMultiply each word's value by attention weight, sum up. Now "sat" knows it's most related to "cat" and "mat". Multi-Head Magic:Transformers do this multiple times in parallel: Head 1: Subject-verb relationships Head 2: Spatial ("on", "in", "under") Head 3: Temporal ("before", "after") Head 4: Semantic similarity Each head learns different relationship types. Why This Changed Everything: Before: RNNs = reading with flashlight (one word at a time, forget the beginning) After: Attention = floodlights on entire sentence with dimmer switches This is why ChatGPT can: Remember 50 messages ago Know "it" refers to something specific Understand "bank" = money vs river based on context The Kicker:Models learn these patterns from data alone. Nobody programmed grammar rules. It figured out language structure just by predicting next words. Attention is how AI learned to read between the lines. Just like my therapist helped me understand my focus patterns, maybe understanding transformers helps us see how we decide what matters. Now if only I could implement multi-head attention in dating... 🤖 Still waiting for "scaled dot-product listening" to be invented.

193k

I rejected a job offer yesterday. Not because of the salary. Not because of the tech stack. Not even because of the long hours they warned me about. But because, when I asked how they evaluate their AI systems, the hiring manager said: "We just ask it some questions and see if the answers sound right." I stared at them for a moment and realized... They just described the biggest problem in AI today. See, "sounds right" isn't a measurement. It's a hope. Here's what proper LLM evaluation actually looks like: - Accuracy: Can it get factual questions right? (Not 80% of the time. Consistently.) - Hallucination rate: How often does it make things up? (This should be near zero for critical applications.) - Bias metrics: Does it treat all groups fairly? (Measured across demographics, not assumed.) Real Evaluation Frameworks: - BLEU scores for translation quality Perplexity for language modeling Human evaluation with inter-annotator agreement Adversarial testing (red teaming) Domain-specific benchmarks (legal, medical, financial) The Process: > Define success criteria BEFORE deployment > Create diverse test sets (not just happy paths) > Measure consistently across model versions > Track performance over time (models drift) Have humans validate edge cases Why This Matters: Before proper evals: "Our model is amazing!" (based on cherry-picked examples) After proper evals: "Our AI achieves 94.2% accuracy on domain X, with known failure modes Y and Z" The difference? One builds trust. The other destroys it when reality hits. The kicker: Most companies are still in the "sounds right" phase. They're deploying models evaluated by vibes, not metrics. Just like you wouldn't join a team that deploys code without tests, you shouldn't join one that deploys AI without proper evaluation. What's your experience with LLM evaluation? Are we measuring what actually matters?

281k

A girl at my gym approached me after her workout, clearly annoyed. "I've been watching and copying your entire routine for weeks, but I'm not seeing the same improvements you are!" I explained, "You can't just mimic what I do - you need to understand which exercises deserve more focus for your specific goals." She nodded. And then she said, "Wait, isn't that like attention mechanism in ChatGPT? " And I know you're sitting there like: WTF is Attention Mechanism? Attention Mechanism is like that gym bro who knows exactly which exercises deserve maximum effort during each workout. How does it work in LLMs? You feed a sentence with multiple words to the model Each word "examines" ALL other words in the sentence It calculates "how much attention should I pay to each word?" Creates weighted connections based on relevance Important words get higher attention scores, others get ignored The Complete Math: Step 1: Create Query, Key, and Value matrices Query (Q) = What am I looking for? Key (K) = What information is available? Value (V) = The actual content to extract For each word position i: Q_i = X_i × W_Q (input × query weight matrix) K_i = X_i × W_K (input × key weight matrix) V_i = X_i × W_V (input × value weight matrix) Step 2: Calculate Attention Scores Score(i,j) = Q_i × K_j^T This tells us how much word i should pay attention to word j. Step 3: Scale the scores Scaled_Score = Score / √d_k Where d_k is the dimension of the key vectors (prevents exploding gradients). Step 4: Apply Softmax Attention_Weight(i,j) = Softmax(Scaled_Score(i,j)) Softmax formula: e^(x_i) / Σ(e^(x_k)) for all k This ensures all attention weights sum to 1. Step 5: Weighted Sum Output_i = Σ(Attention_Weight(i,j) × V_j) for all j Complete Formula: Attention(Q,K,V) = Softmax(QK^T / √d_k)V Sentence: "She wants to deadlift heavy weights" Let's say we have 3-dimensional embeddings (simplified): Word Embeddings: She = [1, 0, 0] wants = [0, 1, 0] deadlift = [1, 1, 1] heavy = [0, 0, 1] weights = [1, 0, 1] When processing "deadlift": Query for "deadlift" = [1, 1, 1] Calculate dot products (attention scores): deadlift → She: [1,1,1] · [1,0,0] = 1 deadlift → wants: [1,1,1] · [0,1,0] = 1 deadlift → deadlift: [1,1,1] · [1,1,1] = 3 deadlift → heavy: [1,1,1] · [0,0,1] = 1 deadlift → weights: [1,1,1] · [1,0,1] = 2 Raw scores: [1, 1, 3, 1, 2] After Softmax: She: e^1/(e^1+e^1+e^3+e^1+e^2) = 0.04 wants: 0.04 deadlift: e^3/(total) = 0.66 heavy: 0.04 weights: e^2/(total) = 0.22 Final attention weights: [0.04, 0.04, 0.66, 0.04, 0.22] Multi-Head Attention (the gym analogy): Think of it like having multiple personal trainers, each focusing on different aspects: Head 1: Focuses on exercise form and technique Head 2: Focuses on muscle groups being targeted Head 3: Focuses on safety and proper progression Each head has its own Q, K, V matrices and calculates attention independently, then results are concatenated. Mathematical representation: MultiHead(Q,K,V) = Concat(head_1, head_2, ..., head_h) × W_O Where each head_i = Attention(QW_i^Q, KW_i^K, VW_i^V) Why this revolutionized NLP: > Context Understanding – Mathematical precision in determining word relationships > Parallel Processing – All attention scores calculated simultaneously, not sequentially > Gradient Flow – Softmax ensures smooth gradients for training > Scalability – Works efficiently with sequences of any length Final Result: Attention Mechanism gave AI mathematical precision in focusing on what matters - just like how you calculate exactly which muscle groups need the most work based on your goals!

37k

I did it. I finally did it. I ran 10k in Phonpe Midnight Marathon (my first ever). Road to 10k was long. I am so grateful to following people for unlocking this version of me: 1. @Naina_2728 for showing me what is possible with running, how fun it can be and how many new connections you can make through it. 2. @lucifer_x007 and @pathikghugare for pushing me always and staying with me for first 2.5k 3. @asmitaakamboj was like a guiding light i guess, seeing her active and posting her progress daily made me believe that consistency is possible. 4. @xerefic and @theadityadas for filling me in the information I needed to make the long run possible 5. Mahendra for encouragement I needed on the D-Day. Thank you soooo much and love you guys❤️

5k

People with Analyst archetype

The Analyst

Diversification is a myth. Zapping inefficiencies into tomorrow’s gains. None of the tweets are financial advice, DYOR!

235 following3k followers
The Analyst

NINETEEN CRYPTO 创始人 分析项目/拆解周期/记录交易与情绪 社区链接:t.me/+1vPCJGh5281jN… polymarket策略群:t.me/+WuSEEGRTbiNlM… 打狗就用Key👀 APP官网下载:key.pro

146 following252 followers
The Analyst

Common Sense 🇨🇦 | ISTP | AI enthusiast | $TSLA Investor | ₿itcoin Preacher 我就想想 | 你说得对 | ❤️ YNWA ❤️

168 following509 followers
The Analyst

程序员,不是独立开发者。

753 following10k followers
The Analyst

financial freedom individual freedom mostly spiritual freedom. 财务自由 人身自由 精神自由

7k following4k followers
The Analyst

低强度工作,高强度刷推,摸鱼主理人,不键政,不喜欢粉红 #老婆@whikylucky

595 following337 followers
The Analyst

ASD & ADHD / 外国語教育 ASD研究 撮影 音ゲー ツイ廃 男ママ タメ口 / Arcaea PTT 11.39 maimai 13,422 / CN EN JP KR / 🤍 @srkmanno @watebird14760 @wanzi0209 / sub @aoim31 無言フォロー失礼

1k following1k followers
The Analyst

be not afraid of greatness

2k following6k followers
The Analyst

Science Fiction, Fantasy, Economics, Analysis. Watch my full videos on X / Twitter under the Highlights tab. #fantasy #puns #economics Anti-Authoritarian.

27k following30k followers
The Analyst

Crypto curious | OG vibe @KaitoAI | @PortaltoBitcoin

790 following980 followers
The Analyst

fake peater + research chemical enthusiast

503 following1k followers
The Analyst

born lucky. committed to bits. dad wannabe

324 following81 followers

Explore Related Archetypes

If you enjoy the analyst profiles, you might also like these personality types:

Supercharge your 𝕏 game,
Grow with SuperX!

Get Started for Free