Get live statistics and analysis of Andrej Karpathy's profile on X / Twitter

Andrej Karpathy@karpathy

Building @EurekaLabsAI. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets 🧠🤖💥

946 following1M followers

Archetype analysis

The Thought Leader

Andrej Karpathy is a trailblazer at the intersection of AI and education, passionately shaping how we learn through innovative technologies. With a history at Tesla and OpenAI, this cerebral creator isn't just sharing insights; he's building transformative solutions that empower future learners. He’s not just serving thoughts; he’s dishing out the future!

Recent engagement

Impressions

23.7M-99.1k

Estimate earning$4450.88

Likes

152.9k-3.2k

66%

Retweets

12.1k-579

Replies

5.9k-320

Bookmarks

59.4k-4.8k

26%

Get more insights about Andrej Karpathy with SuperX

🔥 Roast

Andrej's so ahead of the curve, he's probably already holding office hours in the Andromeda galaxy—don't forget to bring your own wormhole!

⚡️ Nice achievement

His biggest win is launching Eureka Labs, promising to redefine the way we approach education and AI, ensuring that even the most complex subjects become easily digestible for everyone.

🌟 Life's purpose

To democratize education using AI, making it accessible and engaging for anyone eager to learn, while enhancing human potential through technology.

💬 Values and Beliefs

Andrej believes that education should be a collective journey enhanced by powerful tools, and that technology can integrate into our learning processes to foster creativity and knowledge more effectively.

💪 Strength

Andrej's strengths lie in his exceptional ability to connect AI innovations with educational needs, his extensive background in technology, and his vision for scalable solutions to learning challenges.

🫣 Weakness

A potential weakness could be that his visionary ideas sometimes appear overly ambitious, which may raise skepticism among traditional educators and learners.

⚡️ Growth audience tips

To grow his audience on X, Andrej should consider sharing behind-the-scenes insights and personal anecdotes about his journey in AI and education, paired with engaging visuals or videos that bring complex concepts to life.

💁 Bonus

Fun fact: Andrej’s goal for Eureka Labs is to create the world’s best AI course, which sounds like the training ground for future AI overlords!

Andrej Karpathy@karpathy · Jan 24, 2023

The hottest new programming language is English

Andrej Karpathy@karpathy · Jul 16, 2024

⚡️ Excited to share that I am starting an AI+Education company called Eureka Labs. The announcement: --- We are Eureka Labs and we are building a new kind of school that is AI native. How can we approach an ideal experience for learning something new? For example, in the case of physics one could imagine working through very high quality course materials together with Feynman, who is there to guide you every step of the way. Unfortunately, subject matter experts who are deeply passionate, great at teaching, infinitely patient and fluent in all of the world's languages are also very scarce and cannot personally tutor all 8 billion of us on demand. However, with recent progress in generative AI, this learning experience feels tractable. The teacher still designs the course materials, but they are supported, leveraged and scaled with an AI Teaching Assistant who is optimized to help guide the students through them. This Teacher + AI symbiosis could run an entire curriculum of courses on a common platform. If we are successful, it will be easy for anyone to learn anything, expanding education in both reach (a large number of people learning something) and extent (any one person learning a large amount of subjects, beyond what may be possible today unassisted). Our first product will be the world's obviously best AI course, LLM101n. This is an undergraduate-level class that guides the student through training their own AI, very similar to a smaller version of the AI Teaching Assistant itself. The course materials will be available online, but we also plan to run both digital and physical cohorts of people going through it together. Today, we are heads down building LLM101n, but we look forward to a future where AI is a key technology for increasing human potential. What would you like to learn? --- @EurekaLabsAI is the culmination of my passion in both AI and education over ~2 decades. My interest in education took me from YouTube tutorials on Rubik's cubes to starting CS231n at Stanford, to my more recent Zero-to-Hero AI series. While my work in AI took me from academic research at Stanford to real-world products at Tesla and AGI research at OpenAI. All of my work combining the two so far has only been part-time, as side quests to my "real job", so I am quite excited to dive in and build something great, professionally and full-time. It's still early days but I wanted to announce the company so that I can build publicly instead of keeping a secret that isn't. Outbound links with a bit more info in the reply!

Andrej Karpathy@karpathy · Feb 09, 2023

Some personal news: I am joining OpenAI (again :)). Like many others both in/out of AI, I am very inspired by the impact of their work and I have personally benefited greatly from it. The future potential is especially exciting; it is a great pleasure to jump back in and build!🪄

Andrej Karpathy@karpathy · Mar 27, 2022

TikTok is scary good. It's digital crack. First time I feel attacked by AI in the brain.

Andrej Karpathy@karpathy · Dec 03, 2022

Plan is to throw a party in the Andromeda galaxy 1B years from now. Everyone welcome, except for those who litter

Andrej Karpathy@karpathy · Jul 13, 2022

It’s been a great pleasure to help Tesla towards its goals over the last 5 years and a difficult decision to part ways. In that time, Autopilot graduated from lane keeping to city streets and I look forward to seeing the exceptionally strong Autopilot team continue that momentum.

Andrej Karpathy@karpathy · Feb 14, 2024

Hi everyone yes, I left OpenAI yesterday. First of all nothing "happened" and it’s not a result of any particular event, issue or drama (but please keep the conspiracy theories coming as they are highly entertaining :)). Actually, being at OpenAI over the last ~year has been really great - the team is really strong, the people are wonderful, and the roadmap is very exciting, and I think we all have a lot to look forward to. My immediate plan is to work on my personal projects and see what happens. Those of you who’ve followed me for a while may have a sense for what that might look like ;) Cheers

Andrej Karpathy@karpathy · Jan 17, 2023

🔥 New (1h56m) video lecture: "Let's build GPT: from scratch, in code, spelled out." youtube.com/watch?v=kCc8Fm… We build and train a Transformer following the "Attention Is All You Need" paper in the language modeling setting and end up with the core of nanoGPT.

Andrej Karpathy@karpathy · Aug 24, 2024

Programming is changing so fast... I'm trying VS Code Cursor + Sonnet 3.5 instead of GitHub Copilot again and I think it's now a net win. Just empirically, over the last few days most of my "programming" is now writing English (prompting and then reviewing and editing the generated diffs), and doing a bit of "half-coding" where you write the first chunk of the code you'd like, maybe comment it a bit so the LLM knows what the plan is, and then tab tab tab through completions. Sometimes you get a 100-line diff to your code that nails it, which could have taken 10+ minutes before. I still don't think I got sufficiently used to all the features. It's a bit like learning to code all over again but I basically can't imagine going back to "unassisted" coding at this point, which was the only possibility just ~3 years ago.

Andrej Karpathy@karpathy · Nov 23, 2023

New YouTube video: 1hr general-audience introduction to Large Language Models youtube.com/watch?v=zjkBMF… Based on a 30min talk I gave recently; It tries to be non-technical intro, covers mental models for LLM inference, training, finetuning, the emerging LLM OS and LLM Security.

Andrej Karpathy@karpathy · Oct 10, 2024

The YouTube video I want to watch is any highly rated, 1hr long, information dense lecture on anything esoteric and the algorithm just doesn’t get it. It’s too content-driven and too narrow-minded

Andrej Karpathy@karpathy · Jun 09, 2024

📽️ New 4 hour (lol) video lecture on YouTube: "Let’s reproduce GPT-2 (124M)" youtu.be/l8pRSuU81PU The video ended up so long because it is... comprehensive: we start with empty file and end up with a GPT-2 (124M) model: - first we build the GPT-2 network - then we optimize it to train very fast - then we set up the training run optimization and hyperparameters by referencing GPT-2 and GPT-3 papers - then we bring up model evaluation, and - then cross our fingers and go to sleep. In the morning we look through the results and enjoy amusing model generations. Our "overnight" run even gets very close to the GPT-3 (124M) model. This video builds on the Zero To Hero series and at times references previous videos. You could also see this video as building my nanoGPT repo, which by the end is about 90% similar. Github. The associated GitHub repo contains the full commit history so you can step through all of the code changes in the video, step by step. github.com/karpathy/build… Chapters. On a high level Section 1 is building up the network, a lot of this might be review. Section 2 is making the training fast. Section 3 is setting up the run. Section 4 is the results. In more detail: 00:00:00 intro: Let’s reproduce GPT-2 (124M) 00:03:39 exploring the GPT-2 (124M) OpenAI checkpoint 00:13:47 SECTION 1: implementing the GPT-2 nn.Module 00:28:08 loading the huggingface/GPT-2 parameters 00:31:00 implementing the forward pass to get logits 00:33:31 sampling init, prefix tokens, tokenization 00:37:02 sampling loop 00:41:47 sample, auto-detect the device 00:45:50 let’s train: data batches (B,T) → logits (B,T,C) 00:52:53 cross entropy loss 00:56:42 optimization loop: overfit a single batch 01:02:00 data loader lite 01:06:14 parameter sharing wte and lm_head 01:13:47 model initialization: std 0.02, residual init 01:22:18 SECTION 2: Let’s make it fast. GPUs, mixed precision, 1000ms 01:28:14 Tensor Cores, timing the code, TF32 precision, 333ms 01:39:38 float16, gradient scalers, bfloat16, 300ms 01:48:15 torch.compile, Python overhead, kernel fusion, 130ms 02:00:18 flash attention, 96ms 02:06:54 nice/ugly numbers. vocab size 50257 → 50304, 93ms 02:14:55 SECTION 3: hyperpamaters, AdamW, gradient clipping 02:21:06 learning rate scheduler: warmup + cosine decay 02:26:21 batch size schedule, weight decay, FusedAdamW, 90ms 02:34:09 gradient accumulation 02:46:52 distributed data parallel (DDP) 03:10:21 datasets used in GPT-2, GPT-3, FineWeb (EDU) 03:23:10 validation data split, validation loss, sampling revive 03:28:23 evaluation: HellaSwag, starting the run 03:43:05 SECTION 4: results in the morning! GPT-2, GPT-3 repro 03:56:21 shoutout to llm.c, equivalent but faster code in raw C/CUDA 03:59:39 summary, phew, build-nanogpt github repo

Andrej Karpathy@karpathy · Feb 10, 2024

# on shortification of "learning" There are a lot of videos on YouTube/TikTok etc. that give the appearance of education, but if you look closely they are really just entertainment. This is very convenient for everyone involved : the people watching enjoy thinking they are

Andrej Karpathy@karpathy · May 04, 2021

WSJ front page every day is like >>> "Stock Market %s!!" % ('rises' if random.random() <= 0.54 else 'falls', )

Andrej Karpathy@karpathy · Jun 21, 2024

These 94 lines of code are everything that is needed to train a neural network. Everything else is just efficiency. This is my earlier project Micrograd. It implements a scalar-valued auto-grad engine. You start with some numbers at the leafs (usually the input data and the neural network parameters), build up a computational graph with operations like + and * that mix them, and the graph ends with a single value at the very end (the loss). You then go backwards through the graph applying chain rule at each node to calculate the gradients. The gradients tell you how to nudge your parameters to decrease the loss (and hence improve your network). Sometimes when things get too complicated, I come back to this code and just breathe a little. But ok ok you also do have to know what the computational graph should be (e.g. MLP -> Transformer), what the loss function should be (e.g. autoregressive/diffusion), how to best use the gradients for a parameter update (e.g. SGD -> AdamW) etc etc. But it is the core of what is mostly happening. The 1986 paper from Rumelhart, Hinton, Williams that popularized and used this algorithm (backpropagation) for training neural nets: cs.toronto.edu/~hinton/absps/… micrograd on Github: github.com/karpathy/micro… and my (now somewhat old) YouTube video where I very slowly build and explain: youtube.com/watch?v=VMj-3S…

Andrej Karpathy@karpathy · Dec 09, 2023

# On the "hallucination problem" I always struggle a bit with I'm asked about the "hallucination problem" in LLMs. Because, in some sense, hallucination is all LLMs do. They are dream machines. We direct their dreams with prompts. The prompts start the dream, and based on the LLM's hazy recollection of its training documents, most of the time the result goes someplace useful. It's only when the dreams go into deemed factually incorrect territory that we label it a "hallucination". It looks like a bug, but it's just the LLM doing what it always does. At the other end of the extreme consider a search engine. It takes the prompt and just returns one of the most similar "training documents" it has in its database, verbatim. You could say that this search engine has a "creativity problem" - it will never respond with something new. An LLM is 100% dreaming and has the hallucination problem. A search engine is 0% dreaming and has the creativity problem. All that said, I realize that what people *actually* mean is they don't want an LLM Assistant (a product like ChatGPT etc.) to hallucinate. An LLM Assistant is a lot more complex system than just the LLM itself, even if one is at the heart of it. There are many ways to mitigate hallcuinations in these systems - using Retrieval Augmented Generation (RAG) to more strongly anchor the dreams in real data through in-context learning is maybe the most common one. Disagreements between multiple samples, reflection, verification chains. Decoding uncertainty from activations. Tool use. All an active and very interesting areas of research. TLDR I know I'm being super pedantic but the LLM has no "hallucination problem". Hallucination is not a bug, it is LLM's greatest feature. The LLM Assistant has a hallucination problem, and we should fix it. Okay I feel much better now :)

Most engaged tweets of Andrej Karpathy

Andrej Karpathy@karpathy · Oct 16, 2022

Movies that I've seen 5+ times but ready & willing to keep watching: Interstellar, Gladiator, Contact, Good Will Hunting, The Matrix, LotR 1/2/3, HP 1, Avatar, The Fifth Element, The Independence Day, Rush Hour, Armageddon, Stargate, Anchorman, Mean Girls, Terminator 2, more=? :)

Andrej Karpathy@karpathy · Feb 14, 2024

Andrej Karpathy@karpathy · Jul 16, 2024

Andrej Karpathy@karpathy · Aug 27, 2024

I feel like a large amount of GDP is locked up because it is difficult for person A to very conveniently pay 5 cents to person B. Current high fixed costs per transaction force each of them to be of high enough amounts, which results in business models with purchase bundles, subscriptions, ad-based, etc., instead of simply pay-as-you-go. As an example, I'd like my computer to auto-pay 5 cents to the article/blog that I just read but I can't, and I think we're worse for it. In a capitalist system, transactions between entities are the gradient signal of the economy. Because our pipes don't support low magnitude terms in the sums, the gradients are not flowing properly through the system. I'm not familiar enough with payments to have an idea of specific solutions, but I expect we'd see a lot of positive 2nd / 3rd order effects if the gradients were allowed to flow properly, frictionlessly and with much higher resolution.

Andrej Karpathy@karpathy · Oct 18, 2024

What is the name for the paranoid feeling that what you just read was LLM generated

612k

Andrej Karpathy@karpathy · Jul 13, 2022

Andrej Karpathy@karpathy · Jan 24, 2023

The hottest new programming language is English

Andrej Karpathy@karpathy · Feb 09, 2023

Andrej Karpathy@karpathy · Dec 03, 2022

Plan is to throw a party in the Andromeda galaxy 1B years from now. Everyone welcome, except for those who litter

Andrej Karpathy@karpathy · Nov 21, 2023

Thinking a lot about centralization and decentralization these few days.

Andrej Karpathy@karpathy · Mar 09, 2024

Reading a tweet is a bit like downloading an (attacker-controlled) executable that you instantly run on your brain. Each one elicits emotions, suggests knowledge, nudges world-view. In the future it might feel surprising that we allowed direct, untrusted information to brain.