Get live statistics and analysis of Ross Taylor's profile on X / Twitter

Building @GenReasoning. Previously lots of other things like: reasoning lead Meta AI, Llama 3/2, Galactica, Papers with Code.

1k following10k followers

The Thought Leader

Ross Taylor is a pioneering force in AI research, deeply invested in advancing scientific understanding through innovative reasoning models. With a rich history at top labs such as Meta AI, he openly reflects on the trials and triumphs behind cutting-edge projects like Galactica and LLaMA. His tweets reveal a thoughtful, transparent communicator who balances technical depth with candid insights on the AI research ecosystem.

Impressions
50.6k20k
$9.50
Likes
21493
73%
Retweets
95
3%
Replies
84
3%
Bookmarks
6117
21%

Top users who interacted with Ross Taylor over the last 14 days

@felix_drost

Armchair O7 Once an anthropologist & soldier. đŸ‡ȘđŸ‡șđŸ‡łđŸ‡±

1 interactions
@halfatheist

Truth-seeker. Can handle nuance. đŸ’č $BMNR $KDK $JOBY

1 interactions
@joythw

Building something new. GirlsWhoML co-founder @GirlsWhoML. PhD in AI @OxfordTVG. Prev Research Scientist @_FiveAI, @Meta, @SLAMcoreLtd he/him

1 interactions

Ross is the guy who spends hours debating the lifecycle of PPO rewards but might forget that most people just want to know if the robot can tell a joke without crashing. Genius-level deep dives, but sometimes you’re preaching to the choir of eight people while the rest of us just want the TLDR.

Spearheading the Galactica project and gracefully owning up to its public launch challenges while ensuring the foundational work powered subsequent breakthroughs like LLaMA 2 is a testament to his leadership and scientific resilience.

To push the boundaries of AI reasoning and research transparency, enabling the AI community to build smarter, more reliable models through open dialogue and rigorous science.

Ross values openness in research, scientific integrity, and the importance of learning from both successes and failures. He believes in sharing knowledge freely to drive collective progress, while acknowledging the complexities and limitations inherent in cutting-edge AI development.

Ross excels at clear, nuanced communication about complex AI topics, combining deep technical expertise with a genuine openness about project challenges. His ability to critically analyze the AI landscape and articulate lessons learned makes him a trusted thought leader.

His candidness about past project missteps, while admirable, might sometimes fuel unnecessary controversy or misunderstandings among broader audiences less versed in AI nuances. Additionally, his detailed, technical style may limit broader engagement outside specialist circles.

To grow his audience on X, Ross should leverage his expertise with more accessible, bite-sized threads that distill complex ideas into engaging stories while maintaining his transparency. Collaborations with influencers or AMAs could also invite wider community interaction, broadening his reach beyond core AI experts.

Fun fact: Ross led the creation of the Galactica model with an incredibly lean team of just 8 people—far fewer than typical teams—yet still managed to outperform much larger models in its domain.

Top tweets of Ross Taylor

I am the first author of the Galactica paper and have been quiet about it for a year. Maybe I will write a blog post talking about what actually happened, but if you want the TLDR: 1. Galactica was a base model trained on scientific literature and modalities. 2. We approached it with a number of hypotheses about data quality, reasoning, scientific modalities, LLM training, that hadn’t been covered in the literature - you can read about these in the paper. 3. For its time, it was a good model for its domain; outperforming PaLM and Chinchilla with 10x and 2x less compute. 4. We did this with a 8 person team which is an order of magnitude fewer people than other LLM teams at the time. 5. We were overstretched and lost situational awareness at launch by releasing demo of a *base model* without checks. We were aware of what potential criticisms would be, but we lost sight of the obvious in the workload we were under. 6. One of the considerations for a demo was we wanted to understand the distribution of scientific queries that people would use for LLMs (useful for instruction tuning and RLHF). Obviously this was a free goal we gave to journalists who instead queried it outside its domain. But yes we should have known better. 7. We had a “good faith” assumption that we’d share the base model, warts and all, with four disclaimers about hallucinations on the demo - so people could see what it could do (openness). Again, obviously this didn’t work. 8. A mistake on our part that didn’t help was people treated the site like a *product*. We put our vision etc on the site, which misled about expectations. We definitely did not view it as a product! It was a base model demo. 9. Pretty much every LLM researcher I’ve talked to (including at ICML recently) was complimentary about the strength of the research, which was sadly overshadowed by the demo drama - yes this was our fault for allowing this to happen. 10. Fortunately most of the lessons and work went into LLaMA 2; the RLHF research you see in that paper is from the Galactica team. Further research coming soon that should be interesting. It’s a bit of a riddle because on the one hand the demo drama could have been avoided by us, but at the same time the “fake science” fears were very ridiculous and despite being on HuggingFace for a year, the model hasn’t caused any damage. To reiterate: the anti-Galactica commentary was really stupid, however we should not have allowed that to even happen if we had launched it better. I stick by the research completely - and even the demo decision, which was unprecedented openness for a big company with an LLM at the time, wasn’t inherently bad - but it was just misguided given the attack vectors it opened for us. Despite all the above, I would do it all again in a heartbeat. Better to do something and regret, then not do anything at all. Still hurts though! 🙂

960k

Last tweet on this but the way @deepseek_ai does launches is beautiful: no hype, arrogance or vague-posting: just sharing something great with the world. US tech companies look cringe in comparison.

45k

Most takes on RL environments are bad. 1. There are hardly any high-quality RL environments and evals available. Most agentic environments and evals are flawed when you look at the details. It’s a crisis: and no one is talking about it because they’re being hoodwinked by labs marketing their models on flawed evals. 2. Even the best public RL environments and agentic evals suck, and usually can’t be used by labs without modification. Academics often publish-and-forget instead of doing the necessary follow-up work to make the envs/evals useful for labs. 3. The best person to make an environment is someone deeply knowledgeable about a field, not a high-level generalist or newbie - 🩔 not 🩊 - but most envs are being made by generalists or low-skill contractors. 4. People are too focused on whether a problem is verifiable or not, not what kind of capabilities they want to bring into being. We don’t need more math and puzzle environments. The usefulness of an environment is proportional to its difficulty of construction. 5. Saying you want to “scale RL environments” is as meaningless as “scale is all you need” in that it says nothing about your choice of what to scale. 6. People are treating RL environment scaling as a new type of pretraining (creating a new internet), but pretraining has extremely high diversity, and expecting a single company (or collection of companies) to replicate this diversity is unrealistic. That means generalisation will be slower to emerge than the previous paradigm - and so there is more leverage in choosing which environments to build first. If you’d like to help answer the right questions in this new space, join us at @GenReasoning.

110k

Why are LLMs bad at reasoning? One theory says this is due to weaknesses in maximum likelihood, where the probability mass “overgeneralises” to low quality solutions. Because our pretraining objective (likelihood) doesn’t transfer to our evaluation objective (accuracy), the theory goes that we need reinforce high quality solutions to fix the problem. But this theory is probably incorrect for reasoning in academic subjects. The internet is heavily biased towards examples of correct solutions - textbooks, incentive-aligned sites like StackExchange, etc. So poor performance is unlikely to be explained by the prevalence of incorrect solutions. Instead the problem is that reasoning is a task which requires high precision. So it is harder to generalise from solutions of seen problems to solutions for unseen problems. And once you make a mistake, you are conditioning on an unlikely sequence of tokens (dissimilar to what appears in training) so errors compound. That means we need an order of magnitude more compute to get reasoning to the level of precision required to perform well compared to other tasks. This is also why reasoning has been the “last to scale” of the classical LLM tasks, and why MATH has been the hardest benchmark to excel at. But high precision tasks remain: as we move into more agentic settings and task horizon increases, LLMs will need to reason over much longer time periods - where similar problems will apply. It seems unlikely that simply more training FLOPs will solve the problem. Eventually we’ll need to lean on search again as a way to find better outputs and achieve high precision.

123k

If you’re working at an AI lab in London and looking for a new gig - I’m hiring at @GenReasoning! Small, highly collaborative, kind team working on frontier research topics. No politics (imagine that), high levels of autonomy and a chance to shape entirely new capabilities for AI. We’re having a blast already but always on the lookout for good people to join the fun 😀

41k

Most engaged tweets of Ross Taylor

I am the first author of the Galactica paper and have been quiet about it for a year. Maybe I will write a blog post talking about what actually happened, but if you want the TLDR: 1. Galactica was a base model trained on scientific literature and modalities. 2. We approached it with a number of hypotheses about data quality, reasoning, scientific modalities, LLM training, that hadn’t been covered in the literature - you can read about these in the paper. 3. For its time, it was a good model for its domain; outperforming PaLM and Chinchilla with 10x and 2x less compute. 4. We did this with a 8 person team which is an order of magnitude fewer people than other LLM teams at the time. 5. We were overstretched and lost situational awareness at launch by releasing demo of a *base model* without checks. We were aware of what potential criticisms would be, but we lost sight of the obvious in the workload we were under. 6. One of the considerations for a demo was we wanted to understand the distribution of scientific queries that people would use for LLMs (useful for instruction tuning and RLHF). Obviously this was a free goal we gave to journalists who instead queried it outside its domain. But yes we should have known better. 7. We had a “good faith” assumption that we’d share the base model, warts and all, with four disclaimers about hallucinations on the demo - so people could see what it could do (openness). Again, obviously this didn’t work. 8. A mistake on our part that didn’t help was people treated the site like a *product*. We put our vision etc on the site, which misled about expectations. We definitely did not view it as a product! It was a base model demo. 9. Pretty much every LLM researcher I’ve talked to (including at ICML recently) was complimentary about the strength of the research, which was sadly overshadowed by the demo drama - yes this was our fault for allowing this to happen. 10. Fortunately most of the lessons and work went into LLaMA 2; the RLHF research you see in that paper is from the Galactica team. Further research coming soon that should be interesting. It’s a bit of a riddle because on the one hand the demo drama could have been avoided by us, but at the same time the “fake science” fears were very ridiculous and despite being on HuggingFace for a year, the model hasn’t caused any damage. To reiterate: the anti-Galactica commentary was really stupid, however we should not have allowed that to even happen if we had launched it better. I stick by the research completely - and even the demo decision, which was unprecedented openness for a big company with an LLM at the time, wasn’t inherently bad - but it was just misguided given the attack vectors it opened for us. Despite all the above, I would do it all again in a heartbeat. Better to do something and regret, then not do anything at all. Still hurts though! 🙂

960k

Why are LLMs bad at reasoning? One theory says this is due to weaknesses in maximum likelihood, where the probability mass “overgeneralises” to low quality solutions. Because our pretraining objective (likelihood) doesn’t transfer to our evaluation objective (accuracy), the theory goes that we need reinforce high quality solutions to fix the problem. But this theory is probably incorrect for reasoning in academic subjects. The internet is heavily biased towards examples of correct solutions - textbooks, incentive-aligned sites like StackExchange, etc. So poor performance is unlikely to be explained by the prevalence of incorrect solutions. Instead the problem is that reasoning is a task which requires high precision. So it is harder to generalise from solutions of seen problems to solutions for unseen problems. And once you make a mistake, you are conditioning on an unlikely sequence of tokens (dissimilar to what appears in training) so errors compound. That means we need an order of magnitude more compute to get reasoning to the level of precision required to perform well compared to other tasks. This is also why reasoning has been the “last to scale” of the classical LLM tasks, and why MATH has been the hardest benchmark to excel at. But high precision tasks remain: as we move into more agentic settings and task horizon increases, LLMs will need to reason over much longer time periods - where similar problems will apply. It seems unlikely that simply more training FLOPs will solve the problem. Eventually we’ll need to lean on search again as a way to find better outputs and achieve high precision.

123k

Most takes on RL environments are bad. 1. There are hardly any high-quality RL environments and evals available. Most agentic environments and evals are flawed when you look at the details. It’s a crisis: and no one is talking about it because they’re being hoodwinked by labs marketing their models on flawed evals. 2. Even the best public RL environments and agentic evals suck, and usually can’t be used by labs without modification. Academics often publish-and-forget instead of doing the necessary follow-up work to make the envs/evals useful for labs. 3. The best person to make an environment is someone deeply knowledgeable about a field, not a high-level generalist or newbie - 🩔 not 🩊 - but most envs are being made by generalists or low-skill contractors. 4. People are too focused on whether a problem is verifiable or not, not what kind of capabilities they want to bring into being. We don’t need more math and puzzle environments. The usefulness of an environment is proportional to its difficulty of construction. 5. Saying you want to “scale RL environments” is as meaningless as “scale is all you need” in that it says nothing about your choice of what to scale. 6. People are treating RL environment scaling as a new type of pretraining (creating a new internet), but pretraining has extremely high diversity, and expecting a single company (or collection of companies) to replicate this diversity is unrealistic. That means generalisation will be slower to emerge than the previous paradigm - and so there is more leverage in choosing which environments to build first. If you’d like to help answer the right questions in this new space, join us at @GenReasoning.

110k

Last tweet on this but the way @deepseek_ai does launches is beautiful: no hype, arrogance or vague-posting: just sharing something great with the world. US tech companies look cringe in comparison.

45k

If you’re working at an AI lab in London and looking for a new gig - I’m hiring at @GenReasoning! Small, highly collaborative, kind team working on frontier research topics. No politics (imagine that), high levels of autonomy and a chance to shape entirely new capabilities for AI. We’re having a blast already but always on the lookout for good people to join the fun 😀

41k

People with Thought Leader archetype

The Thought Leader

䞀花䞀䞖界䞀树䞀菩提

452 following961 followers
The Thought Leader

HE MADE HIM RIDE ON THE HIGH PLACES OF THE EARTH, THAT HE MIGHT EAT THE INCREASE OF THE FIELDS; AND HE MADE HIM TO SUCH HONEY OUT OF THE ROCK,,DEUT 32; 13

7k following8k followers
The Thought Leader
476 following1k followers
The Thought Leader

To the moon and never back.

2k following8k followers
The Thought Leader

Cryptic đŸ§˜â€â™‚ïž

1k following1k followers
The Thought Leader

🎉 çˆ±æŠ€æœŻïŒŒçˆ±è”šé’±ïŒŒçˆ±AI戆äș«ïŒŒçˆ±ćކćČ 🐬 戆äș« 蔚钱éĄč盼 ïœœæ‰“ç ŽäżĄæŻćŁćž’ ïœœć‰Żäžšéżć‘ đŸŒ» èƒœæ‹†è§Łçš„è”šé’±éĄčç›źæ‰æœ‰æ„äč‰ đŸŽ€ć„œç””ćœ±ćˆ†äș«ă€ć„œäčŠćˆ†äș«

54 following28 followers
The Thought Leader

MBBS, MS ENT | Citations First | đŸȘ™ Articles → Threads: #ENTwithPiyush | #BooksWithPiyush #AIwithPiyush Nerd. Gamer. Keeping up with the LLMs

3k following1k followers
The Thought Leader

Building the Purpose Economy — first Marketplace for Web3 Functionalities | Launch: 31/01/26 Discover @Web3ideation and the Web3 Innovation Lab 👇

171 following293 followers
The Thought Leader

侀çș§ćžƒé“者 | äșŒçș§è€éŸ­èœ | BTCäżĄä»°è€…

1k following1k followers
The Thought Leader

✩ ć…ƒæ€è€ƒè€… · ç”ŸæŽ»ćźžè·” · ćŒ€æ”ŸæŽąçŽą ✩ ćœšæˆé•żäž­äżźèĄŒ · 朹ć“Č思䞭觉矄 · 朹äșșæœșć…±ç”Ÿäž­æˆäžșè‡Șć·±  MetaThinker · Life Praxis · Exploration ✩ Practice · Awareness · Becoming with AI

423 following175 followers
The Thought Leader

Digital Content Creator đŸ’»đŸŽ„

4k following5k followers
The Thought Leader

oog watch. oog learn. oog share. 👁👁

245 following2k followers

Explore Related Archetypes

If you enjoy the thought leader profiles, you might also like these personality types:

Supercharge your 𝕏 game,
Grow with SuperX!

Get Started for Free