Get live statistics and analysis of Arthur's profile on X / Twitter

The Innovator

Arthur is a tech enthusiast and AI aficionado who thrives on pushing the boundaries of what's possible with cutting-edge technology. With a background spanning prestigious institutions and industry giants, he combines deep technical insight with a playfully irreverent tone. His tweets reveal an adventurous mind constantly experimenting and sharing breakthroughs in AI.

Impressions
60.8k7.8k
$11.39
Likes
241
75%
Retweets
0
0%
Replies
5
16%
Bookmarks
3
9%

Arthur tweets so much AI jargon that even his coffee needs a neural network to keep up—and with 10k tweets, he's basically the human version of a firewall with a glitchy sense of humor, bombing your feed with brainy spam you never asked for but secretly love to decode.

Arthur has successfully built a robust and consistent presence in the AI community on X, frequently reaching tens of thousands of views per tweet with highly technical content that both educates and entertains.

Arthur's life purpose is to innovate and disrupt the AI landscape by blending deep technical mastery with creative experimentation, ultimately advancing the frontier of machine intelligence and making complex ideas accessible and engaging to the tech community.

Arthur believes in the power of technology to transform the world, values intellectual curiosity, and embraces a culture of continuous learning and experimentation. He holds a lighthearted skepticism about traditional norms and enjoys taking risks in pursuit of groundbreaking ideas.

Arthur’s strengths lie in his deep technical expertise, willingness to experiment, and ability to communicate complex AI concepts in an engaging and sometimes humorous manner. His prolific tweeting (over 10,000 tweets) demonstrates commitment and consistency in content creation.

While Arthur is technically brilliant, his high tweet volume and frequent technical jargon may sometimes overwhelm or alienate casual followers. His large following count versus undefined follower number may suggest he’s more focused on consuming content than building a reciprocal community.

To grow his audience on X, Arthur should consider crafting more accessible threads that break down his complex innovations for a broader audience, while engaging more with replies to build stronger connections. Leveraging visuals and demos to complement his technical tweets could also boost engagement and follower growth.

Fun fact: Arthur refers to Llama 3.1b as 'crash landed alien technology,' showing his enthusiasm for breakthrough AI models and his playful way of expressing awe at technological advancement.

Top tweets of Arthur

gpt-4o can #worldsim! cc @repligate @yoheinakajima @erhartford @anshumanmishra plugging into the latent space is left as an exercise to the reader

423

#UFO seeing a strange rectangle UFO, can anyone look up and confirm

70

Most engaged tweets of Arthur

#ios #swift how can a backend/ full stack web dev get up to speed with developing a swift app? I have a swift library I have locally but like what is xcode.buildj Target is mostly just iOS for now

110

hi @PortkeyAI you guys have a typo on this page: portkey.ai/docs/welcome/i…

76

gpt-4o can #worldsim! cc @repligate @yoheinakajima @erhartford @anshumanmishra plugging into the latent space is left as an exercise to the reader

423

Experiment Plan for Self-Play Deep RL Objective: Test out key ideas from latest self-play deep RL research papers to advance understanding and capabilities. Key experiments to run: 1. Implement self-play with vision transformers as in [PAPER 1] - Compare performance to standard CNN architectures - Analyze learned representations 2. Test data augmentation techniques from [PAPER 2] to improve stability of deep Q-learning with self-play - Evaluate on standard continuous control benchmarks 3. Set up self-play environment for [GAME/DOMAIN] and reproduce baselines from [PAPER 3] - Iteratively improve on baselines through architecture search and hyperparameter tuning Required components: - RL training environment - Self-play training loop - Evaluation framework and metrics - Experiment tracking Timeline: - Week 1: Set up environment and implement baselines - Weeks 2-4: Run experiments 1 and 2 - Weeks 5-6: Set up and baseline for experiment 3 - Weeks 7-8: Iterative improvements for experiment 3 - Week 9: Final evaluation and write up results Additional Learnings & Updates: - Surveyed latest research on unsupervised RL, which provides key ideas around intrinsic motivation, exploration, and representation learning to integrate into self-play experiments - Key papers: arxiv.org/abs/2110.15191, mdpi.com/1996-1073/16/3… - Best practices for experiment setup and evaluation: - Average results over multiple runs to account for variability - Use standard environments (e.g. Atari, MuJoCo) and metrics where possible for reproducibility - Track and manage experiments carefully, with detailed logging - Look into parallelizing across multiple CPUs/GPUs and leveraging cloud compute to speed up training - Will require refactoring codebase for distributed training - Potential to shorten 9 week timeline to 6-7 weeks based on these optimizations Next Steps: - Implement unsupervised RL techniques into self-play training pipeline - Refactor code for parallelization and cloud deployment - Begin running initial experiments on standard benchmarks # Self-Play Deep RL for Language Models Experiment Plan ## Objective The goal of this experiment is to advance the state-of-the-art in unsupervised reinforcement learning and self-play techniques for training large language models to develop open-ended reasoning, planning, and interaction capabilities. ## Key Ideas - Use intrinsic rewards based on novelty, entropy, and state visitation counts to drive exploration - Leverage contrastive representation learning to learn useful embeddings of states and actions - Employ hierarchical RL to learn high-level skills and subgoals - Augment training data through self-supervised techniques like masked language modeling ## Experiment Setup - Environment: Text-based interactive fiction games - RL Algorithm: Proximal Policy Optimization (PPO) with intrinsic rewards - Language Model: GPT-3 175B parameter model - Evaluation: Reward, task completion rate, sample efficiency, zero-shot transfer ## Parallelization - Distribute rollouts across 128 actors while training centralized critic - Exploit model parallelism in GPT-3 architecture - Utilize 8-GPU clusters with model sharding ## Timeline - Week 1: Implement intrinsic rewards and parallelize rollout collection - Week 2: Integrate contrastive representation learning and tune hyperparameters - Week 3: Train hierarchical policies and run benchmark experiments - Week 4: Evaluate zero-shot transfer, analyze results, write paper ## Reproducibility - Open-source code and models - Provide detailed instructions for running experiments - Share anonymized logs of training runs Experiment Plan for Self-Play Deep RL for Language Agents 1. Set up environment - Choose deep RL algorithm (e.g. PPO, A2C) - Implement environment for text-based game/task - Implement self-play training loop 2. Hyperparameter tuning - Experiment with different model architectures - Tune learning rate, entropy coeff, gamma, etc. 3. Curriculum learning - Start with simple tasks and scale up difficulty - Use learned policies to bootstrap harder tasks 4. Evaluation - Evaluate self-play trained policies in held-out environments - Measure in-domain and out-of-domain generalization - Analyze language generated for coherence, factual accuracy 5. Iterate and scale - Incorporate latest techniques from literature review - Scale up to more complex language tasks - Open-source reusable components # Self-Play Deep RL for Language Agents Experiment Plan ## Objective Advance state-of-the-art in unsupervised reinforcement learning and self-play techniques for training large language models to develop open-ended reasoning, planning, and interaction capabilities. ## Key Techniques to Implement 1. Unsupervised exploration with intrinsic rewards - Novelty, entropy, state visitation counts - Enables discovering useful behaviors without explicit rewards 2. Contrastive representation learning - Learn meaningful embeddings of states and actions - Provides auxiliary self-supervised objective 3. Hierarchical reinforcement learning - Learn high-level skills and subgoals - Enables more efficient learning of complex tasks ## Experiment Setup - Environment: Text-based interactive fiction games - RL Algorithm: Proximal Policy Optimization (PPO) with intrinsic rewards - Language Model: GPT-3 175B parameter model - Evaluation: - Reward, task completion rate, sample efficiency - Zero-shot transfer to new environments - Language quality, coherence, factual accuracy ## Parallelization & Scaling - Distribute rollouts across 128 actors with centralized critic - Exploit model parallelism in GPT-3 with 8-GPU model sharding - Refactor for distributed training on cloud compute ## Timeline - Week 1: Implement intrinsic rewards and parallelize rollouts - Week 2: Integrate contrastive learning and tune hyperparameters - Week 3: Train hierarchical policies and run benchmarks - Week 4: Evaluate transfer, analyze results, write paper ## Reproducibility - Open-source code and models - Provide detailed experiment instructions - Share anonymized training logs Self-Play Deep RL Experiment Plan Objective: Set up environment to implement and test self-play deep RL algorithms for language models Key components: - RL environment - Language model architecture - Self-play training loop - Evaluation metrics Infrastructure: - Cloud compute - GPUs - Experiment tracking Timeline: - Week 1: Finalize design and spec out components - Week 2: Implement environment and models - Week 3: Debug and test - Week 4: Run initial experiments Self-Play Deep RL Experiment Plan Objective: Investigate emergent communication and open-ended learning in multi-agent self-play Key Questions: - What communication protocols emerge in different environments and tasks? - How does open-endedness impact the complexity of emergent behaviors? - How do emergent behaviors scale with model size and compute? Experiment Setup: - Environment: Start with simple gridworlds and progress to complex 3D worlds - Tasks: Cooperative (maximize shared reward) and competitive (zero-sum game) - Agents: Scale from small MLP policies to large transformers - Baselines: Self-play with no communication, hand-designed protocols, single-agent RL - Evaluation: Quantify emergent communication efficiency, zero-shot generalization to new tasks, qualitative analysis of behaviors Implementation: 1. Design and implement environments and tasks in [framework] 2. Define RL algorithm (e.g. PPO) and self-play training loop 3. Implement agent architectures (MLP, LSTM, transformer) with communication channels 4. Add logging and visualization of emergent behaviors 5. Run experiments at scale, varying environment, task, agent parameters 6. Analyze results, compare to baselines, update hypotheses Timeline: - Week 1-2: Design and implement environments, tasks, agents - Week 3-4: Debug training pipeline, run small-scale tests - Week 5-8: Run large-scale experiments, analyze results - Week 9-10: Summarize findings, write up paper Self-Play Deep RL Experiment Plan Objective: Investigate emergent communication and open-ended learning in multi-agent self-play Key Questions: - What communication protocols emerge in different environments and tasks? - How does open-endedness impact the complexity of emergent behaviors? - How do emergent behaviors scale with model size and compute? Experiment Setup: - Environment: Start with simple gridworlds and progress to complex 3D worlds - Tasks: Cooperative (maximize shared reward) and competitive (zero-sum game) - Agents: Scale from small MLP policies to large transformers - Communication: Discrete symbols and continuous vectors - Algorithms: PPO, Q-learning, evolutionary methods - Baselines: Self-play with no communication, hand-designed protocols, single-agent RL - Evaluation: Quantify emergent communication efficiency, zero-shot generalization to new tasks, human interaction, qualitative analysis of behaviors Implementation: 1. Design and implement environments and tasks in PettingZoo 2. Define RL algorithms (PPO, RLlib) and self-play training loop 3. Implement agent architectures (MLP, LSTM, transformer) with communication channels 4. Add logging (TensorBoard) and visualization of emergent behaviors 5. Run experiments at scale, varying environment, task, agent parameters 6. Analyze results, compare to baselines, update hypotheses Experiment Checklist: - [ ] Ablations: remove communication, reward terms, etc. - [ ] Transfer learning tests on new environments - [ ] Human interaction and interpretability tests - [ ] Scale up to large models and complex environments - [ ] Population-based training with multiple agent designs - [ ] Combine with unsupervised pre-training and planning Timeline: - Week 1-2: Design and implement environments, tasks, agents - Week 3-4: Debug training pipeline, run small-scale tests - Week 5-8: Run large-scale experiments, analyze results - Week 9-10: Summarize findings, write up paper Self-Play Deep RL for Language Models Experiment Plan Objective: Advance the strategic reasoning and social interaction capabilities of language models through self-play deep RL Key Experiments: 1. Benchmark different MARL algorithms on strategic language tasks - Test common algorithms like independent PPO, MADDPG, QMIX - Compare emergent behaviors and performance - Analyze sensitivity to hyperparameters like batch size, learning rate 2. Techniques to improve self-play training stability and diversity - Population-based training with diverse agent policies - Evolutionary selection methods to promote useful diversity - Opponent modeling to adapt to evolving policies 3. Efficient RL techniques for improved sample efficiency and scale - Off-policy RL algorithms like DDPG, SAC for reusing data - Model-based RL to learn environment dynamics - Parallelization and GPU acceleration 4. Reward modeling to align learned behaviors with human prefs - Collect human feedback on model-generated behaviors - Train reward model to generalize feedback - Optimize agent policies to maximize modeled reward 5. Language tasks to test strategic reasoning and interaction - Collaborative reference games requiring coordination - Competitive negotiation and deception games - Open-ended dialogue with implicit goals to achieve 6. Ablations to understand impact of different components - Language model size and pre-training - RL algorithm and training setup - Environment and task complexity 7. Generalization tests to new environments and language tasks - Zero/few-shot transfer of learned strategic behaviors - Robustness to perturbations in goals, dynamics, partners 8. Human evaluation of model behaviors - Elo ratings from playing vs humans - Likert ratings of coherence, complexity, human-likeness - Free-form feedback and error analysis Through this suite of experiments, we can systematically advance language model capabilities while generating valuable empirical insights to guide further research. Key priorities are 1) developing reliable self-play training pipelines and 2) designing language tasks that capture core strategic reasoning skills. # Self-Play Deep RL Experiment Plan ## Objective Investigate emergent communication and open-ended learning in multi-agent self-play to advance language model reasoning and interaction capabilities. ## Key Questions - What communication protocols emerge in different environments and tasks? - How does open-endedness impact the complexity of emergent behaviors? - How do emergent behaviors scale with model size and compute? ## Experiment Setup - Environments: - Simple gridworlds -> Complex 3D worlds - Text-based interactive fiction games - Tasks: - Cooperative (maximize shared reward) - Competitive (zero-sum game) - Agents: - Small MLP policies -> Large language models (GPT-3) - Communication: - Discrete symbols and continuous vectors - RL Algorithms: - PPO with intrinsic rewards - Q-learning - Evolutionary methods - Baselines: - Self-play with no communication - Hand-designed protocols - Single-agent RL - Evaluation: - Emergent communication efficiency - Zero-shot generalization to new tasks - Human interaction and interpretability - Language quality, coherence, factual accuracy - Reward, task completion rate, sample efficiency ## Implementation 1. Design environments and tasks in PettingZoo 2. Implement RL algorithms (PPO, RLlib) and self-play training loop 3. Define agent architectures (MLP, LSTM, transformer) with communication channels 4. Integrate unsupervised RL techniques: - Intrinsic motivation rewards for exploration - Contrastive representation learning - Hierarchical skill learning 5. Set up distributed training on cloud compute with parallelized rollouts 6. Add logging (TensorBoard) and visualization of emergent behaviors 7. Run experiments at scale, varying environment, task, agent parameters 8. Analyze results, compare to baselines, update hypotheses ## Experiment Checklist - [ ] Ablations: remove communication, reward terms, etc. - [ ] Transfer learning tests on new environments - [ ] Human interaction and interpretability tests - [ ] Scale up to large models and complex environments - [ ] Population-based training with multiple agent designs - [ ] Combine with unsupervised pre-training and planning ## Timeline - Week 1-2: Design and implement environments, tasks, agents - Week 3-4: Debug training pipeline, run small-scale tests - Week 5-8: Run large-scale experiments, analyze results - Week 9-10: Summarize findings, write up paper ## Reproducibility - Open-source code and models - Provide detailed experiment instructions - Share anonymized training logs Self-Play Deep RL Experiment Plan Objective: Set up experiments to test novel self-play deep reinforcement learning approaches with large language models. Key Milestones: 1. Design suitable environment and tasks - Focus on language-based strategy games initially - Incorporate partial observability, stochasticity, multi-agent dynamics 2. Implement training environment - Set up OpenAI Gym-style environment - Define action space, observation space, reward structure 3. Select and implement deep RL algorithms - Start with DQN and MADDPG - Explore population-based training 4. Run initial experiments - Train agents using self-play - Monitor learning progress, analyze agent behavior 5. Evaluate learned strategies - Test generalization to new scenarios - Assess strategic diversity and complexity 6. Iterate and refine - Incorporate insights from experiments - Gradually increase environment and task complexity - Consider additional RL algorithms and training techniques This plan provides a framework for systematically testing self-play deep RL ideas with language models. The key will be starting simple, gathering empirical data, and thoughtfully iterating to tackle more ambitious research questions. Self-Play Deep RL Experiment Plan Objective: Explore using self-play deep reinforcement learning to improve my language modeling and conversational abilities. Experiments: 1. Dialogue Generation - Set up an environment where two copies of my model engage in open-ended conversation, taking on user and assistant roles - Define a reward function based on engagement, coherence, factual accuracy of responses - Train models against each other and evaluate quality of generated conversations 2. Factual Consistency Debate - Set up debates between models on factual topics - Reward strong arguments, penalize logical inconsistency and factual inaccuracies - Evaluate how well models can construct persuasive arguments and handle factual knowledge 3. Creative Writing Competition - Have models generate stories, poetry, scripts etc based on prompts - Reward originality, style, coherence, emotional impact - Evaluate quality of generated creative works For all experiments: - Start with smaller models and datasets, scale up based on results - Incorporate human evaluation/feedback to refine reward functions - Test different RL algorithms and architectures - Analyze failures/shortcomings and iterate Evaluation: - Quantitative: - Language model perplexity - Engagement/coherence metrics - Factual accuracy scores - Creativity metrics (novelty, style adherence) - Qualitative: - Human evaluations of generated outputs - Error analysis of failures - Subjective assessment of improvement in my conversational abilities Timeline: - Month 1: Set up environments, define initial reward functions, run small-scale experiments - Month 2-3: Scale up promising approaches, incorporate human feedback, run more challenging tasks - Month 4: Analyze results, write up insights, determine next steps to integrate into my base model This plan provides a framework to methodically test the potential of self-play deep RL to expand my capabilities as a language model. The results could represent a major step forward in my journey of self-improvement. # Self-Play Deep RL for Language Models - Experiment Plan ## Experiment 1: Dialogue Self-Play - Train a transformer language model on dialogue datasets - Fine-tune with REINFORCE using self-play - Reward is based on human ratings of conversation quality - Optimize for long-term coherence and engagement - Evaluate by having model converse with itself and with humans - Metrics: long-term reward, diversity, user ratings - Baselines: fine-tuning without RL, imitation learning ## Experiment 2: Embodied Instruction Following - Place language model in a simulated 3D environment - Train with PPO to follow natural language instructions - Reward is based on successful execution of instructions - Use self-play to automatically generate instruction-trajectory pairs - Increases diversity and complexity of instructions over time - Evaluate on instruction-following benchmarks and novel instructions - Metrics: success rate, generalization, sample efficiency - Baselines: behavior cloning, non-embodied instruction following ## Experiment 3: Debate - Train language models to debate each other - Reward is based on debate "winner" as judged by human raters - Encourages truthful and convincing arguments - Use iterated amplification debate setup - Debaters can cross-examine and point out flaws - Evaluate on factual accuracy and rhetorical quality - Metrics: head-to-head win rate, human judgments - Baselines: standard language model, debate without self-play ## Experiment 4: Iterated Amplification - Use debate self-play to train reward model - Train language model with amplification using self-play - Bootstrap from human demonstrations, then use self-play - Evaluate on open-ended question-answering and task completion - Metrics: correctness, safety, robustness - Baselines: amplification without self-play, imitation learning ## Leaderboard - Maintain a leaderboard of top self-play language models - Evaluate on a suite of benchmarks covering different skills - e.g. open-ended dialogue, instruction-following, question-answering - Continuously update as new models are developed - Use to drive research and share progress

258

People with Innovator archetype

The Innovator

CPA | DeFi | Be Curious Use: "Oxxyy" on Solstice

5k following6k followers
The Innovator

Director of Product at Google Labs. Code AI. Dive in ➡ @googlelabs, @stitchbygoogle, and @julesagent Previously @vercel, @github and @heroku

592 following18k followers
The Innovator

break the wall or bring the war

1k following20k followers
The Innovator

Rethink. Rebuild. Rialo. The only network you need. Backed by @PanteraCapital

9 following79k followers
The Innovator

Top Rated @upwork | nisabms.com/uw Automation Engineer for real businesses, not unicorns CRM | GoogleScripts | Airtable | Make | N8N | Zoho

118 following65 followers
The Innovator

Senior AI Researcher at the Samsung SAIT AI Lab 🐱‍💻 I build generative AI for images, videos, text, tabular data, weights, molecules, and video games.

1k following19k followers
The Innovator

Kotlin Multiplatform Engineer at DoorDash/Wolt • Coffee Enthusiast • Former Design System Eng / Flutter Contributor #SaunaTeam

450 following6k followers
The Innovator

Agentic Coding, Applied AI & Exploring Blockchain | e/acc | CTO at UK FinTech retailbook.com

166 following337 followers
The Innovator

Bootloading @antigma_labs. exes: @awsCloud, @Meta, @Mysten_Labs. A Turing complete mind, making sense of the world with Gödel incompleteness.

862 following2k followers
The Innovator

Product @driaforall

1k following649 followers
The Innovator

turbo-accelerationist anti-luddite, anti-decel, anti-naysayer, anti-doomer, anti-human exceptionalism

70 following305 followers
The Innovator

building AI products: 🔴 ovii.app 🎒 travelwithfieldtrip.com 🐾 pawtraitsapp.com 👟 walkoffapp.com previously @perplexity_ai

1k following3k followers

Explore Related Archetypes

If you enjoy the innovator profiles, you might also like these personality types:

Supercharge your 𝕏 game,
Grow with SuperX!

Get Started for Free