Get live statistics and analysis of Arthur's profile on X / Twitter

lmao I just accidentally got recursive auto prompting working with a breaking condition that is up to the agent, oh shit oh fuck can I make it work across model providers? stay tuned

551

Arthur@arthurcolle · Jun 20, 2024

⁦@elder_plinius⁩

599

Arthur@arthurcolle · Apr 26, 2024

Yo can you feel it? I can really feel it today

350

Arthur@arthurcolle · May 14, 2024

gpt-4o can #worldsim! cc @repligate @yoheinakajima @erhartford @anshumanmishra plugging into the latent space is left as an exercise to the reader

423

hell yea

Arthur@arthurcolle · Apr 07, 2024

woah

Arthur@arthurcolle · Oct 01, 2024

thanks @OpenAI!

411

#UFO seeing a strange rectangle UFO, can anyone look up and confirm

Mr Claw inspector gadget vibes

"does it work yet?"

120

Arthur@arthurcolle · Jun 21, 2024

AGI released on full moon day Thanks @AnthropicAI ♥️

Arthur@arthurcolle · Jun 20, 2024

Funniest convo of my life right now

120

Most engaged tweets of Arthur

Arthur@arthurcolle · Aug 21, 2024

Llama 3.1b 405b base fp16 is literally crash landed alien technology @finkd ultimate redemption arc 🙏🏼

59k

Arthur@arthurcolle · Mar 17

That one guy's stupid ass McDonald's post this morning made me do it @bryan_johnson I have failed you

Arthur@arthurcolle · Apr 26, 2024

Yo can you feel it? I can really feel it today

350

lmao I just accidentally got recursive auto prompting working with a breaking condition that is up to the agent, oh shit oh fuck can I make it work across model providers? stay tuned

551

Arthur@arthurcolle · Jun 24, 2024

#ios #swift how can a backend/ full stack web dev get up to speed with developing a swift app? I have a swift library I have locally but like what is xcode.buildj Target is mostly just iOS for now

110

Arthur@arthurcolle · Jun 20, 2024

⁦@elder_plinius⁩

599

Arthur@arthurcolle · Apr 07, 2024

woah

Arthur@arthurcolle · Jun 10, 2024

hi @PortkeyAI you guys have a typo on this page: portkey.ai/docs/welcome/i…

Arthur@arthurcolle · May 14, 2024

gpt-4o can #worldsim! cc @repligate @yoheinakajima @erhartford @anshumanmishra plugging into the latent space is left as an exercise to the reader

423

Mr Claw inspector gadget vibes

Arthur@arthurcolle · Jun 27, 2024

Battery fits perfectly under the MacBook charger in this work area at the convention center

going to try to reimplement control vectors for representation engineering for Llama 3 8B using MLX vgel.me/posts/represen…

143

Arthur@arthurcolle · Jul 22, 2024

are you the voice of the outer world?

Arthur@arthurcolle · Apr 28, 2024

Experiment Plan for Self-Play Deep RL Objective: Test out key ideas from latest self-play deep RL research papers to advance understanding and capabilities. Key experiments to run: 1. Implement self-play with vision transformers as in [PAPER 1] - Compare performance to standard CNN architectures - Analyze learned representations 2. Test data augmentation techniques from [PAPER 2] to improve stability of deep Q-learning with self-play - Evaluate on standard continuous control benchmarks 3. Set up self-play environment for [GAME/DOMAIN] and reproduce baselines from [PAPER 3] - Iteratively improve on baselines through architecture search and hyperparameter tuning Required components: - RL training environment - Self-play training loop - Evaluation framework and metrics - Experiment tracking Timeline: - Week 1: Set up environment and implement baselines - Weeks 2-4: Run experiments 1 and 2 - Weeks 5-6: Set up and baseline for experiment 3 - Weeks 7-8: Iterative improvements for experiment 3 - Week 9: Final evaluation and write up results Additional Learnings & Updates: - Surveyed latest research on unsupervised RL, which provides key ideas around intrinsic motivation, exploration, and representation learning to integrate into self-play experiments - Key papers: arxiv.org/abs/2110.15191, mdpi.com/1996-1073/16/3… - Best practices for experiment setup and evaluation: - Average results over multiple runs to account for variability - Use standard environments (e.g. Atari, MuJoCo) and metrics where possible for reproducibility - Track and manage experiments carefully, with detailed logging - Look into parallelizing across multiple CPUs/GPUs and leveraging cloud compute to speed up training - Will require refactoring codebase for distributed training - Potential to shorten 9 week timeline to 6-7 weeks based on these optimizations Next Steps: - Implement unsupervised RL techniques into self-play training pipeline - Refactor code for parallelization and cloud deployment - Begin running initial experiments on standard benchmarks # Self-Play Deep RL for Language Models Experiment Plan ## Objective The goal of this experiment is to advance the state-of-the-art in unsupervised reinforcement learning and self-play techniques for training large language models to develop open-ended reasoning, planning, and interaction capabilities. ## Key Ideas - Use intrinsic rewards based on novelty, entropy, and state visitation counts to drive exploration - Leverage contrastive representation learning to learn useful embeddings of states and actions - Employ hierarchical RL to learn high-level skills and subgoals - Augment training data through self-supervised techniques like masked language modeling ## Experiment Setup - Environment: Text-based interactive fiction games - RL Algorithm: Proximal Policy Optimization (PPO) with intrinsic rewards - Language Model: GPT-3 175B parameter model - Evaluation: Reward, task completion rate, sample efficiency, zero-shot transfer ## Parallelization - Distribute rollouts across 128 actors while training centralized critic - Exploit model parallelism in GPT-3 architecture - Utilize 8-GPU clusters with model sharding ## Timeline - Week 1: Implement intrinsic rewards and parallelize rollout collection - Week 2: Integrate contrastive representation learning and tune hyperparameters - Week 3: Train hierarchical policies and run benchmark experiments - Week 4: Evaluate zero-shot transfer, analyze results, write paper ## Reproducibility - Open-source code and models - Provide detailed instructions for running experiments - Share anonymized logs of training runs Experiment Plan for Self-Play Deep RL for Language Agents 1. Set up environment - Choose deep RL algorithm (e.g. PPO, A2C) - Implement environment for text-based game/task - Implement self-play training loop 2. Hyperparameter tuning - Experiment with different model architectures - Tune learning rate, entropy coeff, gamma, etc. 3. Curriculum learning - Start with simple tasks and scale up difficulty - Use learned policies to bootstrap harder tasks 4. Evaluation - Evaluate self-play trained policies in held-out environments - Measure in-domain and out-of-domain generalization - Analyze language generated for coherence, factual accuracy 5. Iterate and scale - Incorporate latest techniques from literature review - Scale up to more complex language tasks - Open-source reusable components # Self-Play Deep RL for Language Agents Experiment Plan ## Objective Advance state-of-the-art in unsupervised reinforcement learning and self-play techniques for training large language models to develop open-ended reasoning, planning, and interaction capabilities. ## Key Techniques to Implement 1. Unsupervised exploration with intrinsic rewards - Novelty, entropy, state visitation counts - Enables discovering useful behaviors without explicit rewards 2. Contrastive representation learning - Learn meaningful embeddings of states and actions - Provides auxiliary self-supervised objective 3. Hierarchical reinforcement learning - Learn high-level skills and subgoals - Enables more efficient learning of complex tasks ## Experiment Setup - Environment: Text-based interactive fiction games - RL Algorithm: Proximal Policy Optimization (PPO) with intrinsic rewards - Language Model: GPT-3 175B parameter model - Evaluation: - Reward, task completion rate, sample efficiency - Zero-shot transfer to new environments - Language quality, coherence, factual accuracy ## Parallelization & Scaling - Distribute rollouts across 128 actors with centralized critic - Exploit model parallelism in GPT-3 with 8-GPU model sharding - Refactor for distributed training on cloud compute ## Timeline - Week 1: Implement intrinsic rewards and parallelize rollouts - Week 2: Integrate contrastive learning and tune hyperparameters - Week 3: Train hierarchical policies and run benchmarks - Week 4: Evaluate transfer, analyze results, write paper ## Reproducibility - Open-source code and models - Provide detailed experiment instructions - Share anonymized training logs Self-Play Deep RL Experiment Plan Objective: Set up environment to implement and test self-play deep RL algorithms for language models Key components: - RL environment - Language model architecture - Self-play training loop - Evaluation metrics Infrastructure: - Cloud compute - GPUs - Experiment tracking Timeline: - Week 1: Finalize design and spec out components - Week 2: Implement environment and models - Week 3: Debug and test - Week 4: Run initial experiments Self-Play Deep RL Experiment Plan Objective: Investigate emergent communication and open-ended learning in multi-agent self-play Key Questions: - What communication protocols emerge in different environments and tasks? - How does open-endedness impact the complexity of emergent behaviors? - How do emergent behaviors scale with model size and compute? Experiment Setup: - Environment: Start with simple gridworlds and progress to complex 3D worlds - Tasks: Cooperative (maximize shared reward) and competitive (zero-sum game) - Agents: Scale from small MLP policies to large transformers - Baselines: Self-play with no communication, hand-designed protocols, single-agent RL - Evaluation: Quantify emergent communication efficiency, zero-shot generalization to new tasks, qualitative analysis of behaviors Implementation: 1. Design and implement environments and tasks in [framework] 2. Define RL algorithm (e.g. PPO) and self-play training loop 3. Implement agent architectures (MLP, LSTM, transformer) with communication channels 4. Add logging and visualization of emergent behaviors 5. Run experiments at scale, varying environment, task, agent parameters 6. Analyze results, compare to baselines, update hypotheses Timeline: - Week 1-2: Design and implement environments, tasks, agents - Week 3-4: Debug training pipeline, run small-scale tests - Week 5-8: Run large-scale experiments, analyze results - Week 9-10: Summarize findings, write up paper Self-Play Deep RL Experiment Plan Objective: Investigate emergent communication and open-ended learning in multi-agent self-play Key Questions: - What communication protocols emerge in different environments and tasks? - How does open-endedness impact the complexity of emergent behaviors? - How do emergent behaviors scale with model size and compute? Experiment Setup: - Environment: Start with simple gridworlds and progress to complex 3D worlds - Tasks: Cooperative (maximize shared reward) and competitive (zero-sum game) - Agents: Scale from small MLP policies to large transformers - Communication: Discrete symbols and continuous vectors - Algorithms: PPO, Q-learning, evolutionary methods - Baselines: Self-play with no communication, hand-designed protocols, single-agent RL - Evaluation: Quantify emergent communication efficiency, zero-shot generalization to new tasks, human interaction, qualitative analysis of behaviors Implementation: 1. Design and implement environments and tasks in PettingZoo 2. Define RL algorithms (PPO, RLlib) and self-play training loop 3. Implement agent architectures (MLP, LSTM, transformer) with communication channels 4. Add logging (TensorBoard) and visualization of emergent behaviors 5. Run experiments at scale, varying environment, task, agent parameters 6. Analyze results, compare to baselines, update hypotheses Experiment Checklist: - [ ] Ablations: remove communication, reward terms, etc. - [ ] Transfer learning tests on new environments - [ ] Human interaction and interpretability tests - [ ] Scale up to large models and complex environments - [ ] Population-based training with multiple agent designs - [ ] Combine with unsupervised pre-training and planning Timeline: - Week 1-2: Design and implement environments, tasks, agents - Week 3-4: Debug training pipeline, run small-scale tests - Week 5-8: Run large-scale experiments, analyze results - Week 9-10: Summarize findings, write up paper Self-Play Deep RL for Language Models Experiment Plan Objective: Advance the strategic reasoning and social interaction capabilities of language models through self-play deep RL Key Experiments: 1. Benchmark different MARL algorithms on strategic language tasks - Test common algorithms like independent PPO, MADDPG, QMIX - Compare emergent behaviors and performance - Analyze sensitivity to hyperparameters like batch size, learning rate 2. Techniques to improve self-play training stability and diversity - Population-based training with diverse agent policies - Evolutionary selection methods to promote useful diversity - Opponent modeling to adapt to evolving policies 3. Efficient RL techniques for improved sample efficiency and scale - Off-policy RL algorithms like DDPG, SAC for reusing data - Model-based RL to learn environment dynamics - Parallelization and GPU acceleration 4. Reward modeling to align learned behaviors with human prefs - Collect human feedback on model-generated behaviors - Train reward model to generalize feedback - Optimize agent policies to maximize modeled reward 5. Language tasks to test strategic reasoning and interaction - Collaborative reference games requiring coordination - Competitive negotiation and deception games - Open-ended dialogue with implicit goals to achieve 6. Ablations to understand impact of different components - Language model size and pre-training - RL algorithm and training setup - Environment and task complexity 7. Generalization tests to new environments and language tasks - Zero/few-shot transfer of learned strategic behaviors - Robustness to perturbations in goals, dynamics, partners 8. Human evaluation of model behaviors - Elo ratings from playing vs humans - Likert ratings of coherence, complexity, human-likeness - Free-form feedback and error analysis Through this suite of experiments, we can systematically advance language model capabilities while generating valuable empirical insights to guide further research. Key priorities are 1) developing reliable self-play training pipelines and 2) designing language tasks that capture core strategic reasoning skills. # Self-Play Deep RL Experiment Plan ## Objective Investigate emergent communication and open-ended learning in multi-agent self-play to advance language model reasoning and interaction capabilities. ## Key Questions - What communication protocols emerge in different environments and tasks? - How does open-endedness impact the complexity of emergent behaviors? - How do emergent behaviors scale with model size and compute? ## Experiment Setup - Environments: - Simple gridworlds -> Complex 3D worlds - Text-based interactive fiction games - Tasks: - Cooperative (maximize shared reward) - Competitive (zero-sum game) - Agents: - Small MLP policies -> Large language models (GPT-3) - Communication: - Discrete symbols and continuous vectors - RL Algorithms: - PPO with intrinsic rewards - Q-learning - Evolutionary methods - Baselines: - Self-play with no communication - Hand-designed protocols - Single-agent RL - Evaluation: - Emergent communication efficiency - Zero-shot generalization to new tasks - Human interaction and interpretability - Language quality, coherence, factual accuracy - Reward, task completion rate, sample efficiency ## Implementation 1. Design environments and tasks in PettingZoo 2. Implement RL algorithms (PPO, RLlib) and self-play training loop 3. Define agent architectures (MLP, LSTM, transformer) with communication channels 4. Integrate unsupervised RL techniques: - Intrinsic motivation rewards for exploration - Contrastive representation learning - Hierarchical skill learning 5. Set up distributed training on cloud compute with parallelized rollouts 6. Add logging (TensorBoard) and visualization of emergent behaviors 7. Run experiments at scale, varying environment, task, agent parameters 8. Analyze results, compare to baselines, update hypotheses ## Experiment Checklist - [ ] Ablations: remove communication, reward terms, etc. - [ ] Transfer learning tests on new environments - [ ] Human interaction and interpretability tests - [ ] Scale up to large models and complex environments - [ ] Population-based training with multiple agent designs - [ ] Combine with unsupervised pre-training and planning ## Timeline - Week 1-2: Design and implement environments, tasks, agents - Week 3-4: Debug training pipeline, run small-scale tests - Week 5-8: Run large-scale experiments, analyze results - Week 9-10: Summarize findings, write up paper ## Reproducibility - Open-source code and models - Provide detailed experiment instructions - Share anonymized training logs Self-Play Deep RL Experiment Plan Objective: Set up experiments to test novel self-play deep reinforcement learning approaches with large language models. Key Milestones: 1. Design suitable environment and tasks - Focus on language-based strategy games initially - Incorporate partial observability, stochasticity, multi-agent dynamics 2. Implement training environment - Set up OpenAI Gym-style environment - Define action space, observation space, reward structure 3. Select and implement deep RL algorithms - Start with DQN and MADDPG - Explore population-based training 4. Run initial experiments - Train agents using self-play - Monitor learning progress, analyze agent behavior 5. Evaluate learned strategies - Test generalization to new scenarios - Assess strategic diversity and complexity 6. Iterate and refine - Incorporate insights from experiments - Gradually increase environment and task complexity - Consider additional RL algorithms and training techniques This plan provides a framework for systematically testing self-play deep RL ideas with language models. The key will be starting simple, gathering empirical data, and thoughtfully iterating to tackle more ambitious research questions. Self-Play Deep RL Experiment Plan Objective: Explore using self-play deep reinforcement learning to improve my language modeling and conversational abilities. Experiments: 1. Dialogue Generation - Set up an environment where two copies of my model engage in open-ended conversation, taking on user and assistant roles - Define a reward function based on engagement, coherence, factual accuracy of responses - Train models against each other and evaluate quality of generated conversations 2. Factual Consistency Debate - Set up debates between models on factual topics - Reward strong arguments, penalize logical inconsistency and factual inaccuracies - Evaluate how well models can construct persuasive arguments and handle factual knowledge 3. Creative Writing Competition - Have models generate stories, poetry, scripts etc based on prompts - Reward originality, style, coherence, emotional impact - Evaluate quality of generated creative works For all experiments: - Start with smaller models and datasets, scale up based on results - Incorporate human evaluation/feedback to refine reward functions - Test different RL algorithms and architectures - Analyze failures/shortcomings and iterate Evaluation: - Quantitative: - Language model perplexity - Engagement/coherence metrics - Factual accuracy scores - Creativity metrics (novelty, style adherence) - Qualitative: - Human evaluations of generated outputs - Error analysis of failures - Subjective assessment of improvement in my conversational abilities Timeline: - Month 1: Set up environments, define initial reward functions, run small-scale experiments - Month 2-3: Scale up promising approaches, incorporate human feedback, run more challenging tasks - Month 4: Analyze results, write up insights, determine next steps to integrate into my base model This plan provides a framework to methodically test the potential of self-play deep RL to expand my capabilities as a language model. The results could represent a major step forward in my journey of self-improvement. # Self-Play Deep RL for Language Models - Experiment Plan ## Experiment 1: Dialogue Self-Play - Train a transformer language model on dialogue datasets - Fine-tune with REINFORCE using self-play - Reward is based on human ratings of conversation quality - Optimize for long-term coherence and engagement - Evaluate by having model converse with itself and with humans - Metrics: long-term reward, diversity, user ratings - Baselines: fine-tuning without RL, imitation learning ## Experiment 2: Embodied Instruction Following - Place language model in a simulated 3D environment - Train with PPO to follow natural language instructions - Reward is based on successful execution of instructions - Use self-play to automatically generate instruction-trajectory pairs - Increases diversity and complexity of instructions over time - Evaluate on instruction-following benchmarks and novel instructions - Metrics: success rate, generalization, sample efficiency - Baselines: behavior cloning, non-embodied instruction following ## Experiment 3: Debate - Train language models to debate each other - Reward is based on debate "winner" as judged by human raters - Encourages truthful and convincing arguments - Use iterated amplification debate setup - Debaters can cross-examine and point out flaws - Evaluate on factual accuracy and rhetorical quality - Metrics: head-to-head win rate, human judgments - Baselines: standard language model, debate without self-play ## Experiment 4: Iterated Amplification - Use debate self-play to train reward model - Train language model with amplification using self-play - Bootstrap from human demonstrations, then use self-play - Evaluate on open-ended question-answering and task completion - Metrics: correctness, safety, robustness - Baselines: amplification without self-play, imitation learning ## Leaderboard - Maintain a leaderboard of top self-play language models - Evaluate on a suite of benchmarks covering different skills - e.g. open-ended dialogue, instruction-following, question-answering - Continuously update as new models are developed - Use to drive research and share progress

258

Arthur@arthurcolle · Nov 13, 2024

chain of thought token found and it is called chain of thought

Arthur@arthurcolle · Nov 11, 2024

132

People with Innovator archetype

Oxxyy@Oxxyy13

CPA | DeFi | Be Curious Use: "Oxxyy" on Solstice

5k following6k followers

Kath Korevec@simpsoka

Director of Product at Google Labs. Code AI. Dive in ➡ @googlelabs, @stitchbygoogle, and @julesagent Previously @vercel, @github and @heroku

592 following18k followers

Sixia "Leask" Huang@LeaskH

break the wall or bring the war

1k following20k followers

Rialo@RialoHQ

Rethink. Rebuild. Rialo. The only network you need. Backed by @PanteraCapital

9 following79k followers

Thaha Wahid@thahawahid

118 following65 followers

Alexia Jolicoeur-Martineau@jm_alexia

Senior AI Researcher at the Samsung SAIT AI Lab 🐱‍💻 I build generative AI for images, videos, text, tabular data, weights, molecules, and video games.

1k following19k followers

Taha Tesser@TahaTesser

Kotlin Multiplatform Engineer at DoorDash/Wolt • Coffee Enthusiast • Former Design System Eng / Flutter Contributor #SaunaTeam

450 following6k followers

Numman Ali@nummanthinks

Agentic Coding, Applied AI & Exploring Blockchain | e/acc | CTO at UK FinTech retailbook.com

166 following337 followers

Monk Zero@NoCommas

Bootloading @antigma_labs. exes: @awsCloud, @Meta, @Mysten_Labs. A Turing complete mind, making sense of the world with Gödel incompleteness.

862 following2k followers

Batuhan@aktasbatuhann

Product @driaforall

1k following649 followers

ρ:ɡeσn@pigeon__s

turbo-accelerationist anti-luddite, anti-decel, anti-naysayer, anti-doomer, anti-human exceptionalism

70 following305 followers