Alex Shaw is a trailblazer at the intersection of AI and software development, pioneering innovative solutions like Terminal-Bench to simplify complex agent evaluations. With a background from Google and a passion for pushing technological boundaries, Alex combines deep technical expertise with a collaborative spirit. His tweets showcase a focus on practical innovation, community engagement, and advancing AI capabilities.
For someone who’s inventing the future of AI benchmarks, Alex’s tweet count is almost as low as the patience his Terminal-Bench users might have when integrating dozens of frameworks. Maybe next version can benchmark his tweet frequency!
Successfully launched Terminal-Bench 2.0 with new scalable features and built a community-driven framework widely recognized among AI developers and researchers.
To propel the technology landscape forward by creating cutting-edge tools that make AI agent evaluation accessible, standardized, and scalable, empowering developers and researchers to accelerate innovation.
Alex believes in the power of open collaboration, continuous improvement, and leveraging technology to solve real-world challenges efficiently. He values innovation driven by community input, transparency, and creating practical frameworks that facilitate progress in AI and software development.
Alex’s strength lies in his ability to conceptualize and develop innovative technical frameworks, communicate complex ideas clearly, and foster collaborative ecosystems that drive technological adoption.
Sometimes, Alex’s deep technical focus and commitment to innovation may lead to communication that is highly specialized, potentially limiting accessibility for a broader, less technical audience.
To grow his audience on X, Alex should blend his expert technical insights with engaging storytelling and relatable use cases, making his innovations more approachable. Leveraging video demos or behind-the-scenes content from meetups can also boost engagement and attract a diverse follower base.
Fun fact: Alex co-created Terminal-Bench, the “npm of agent benchmarks,” revolutionizing how AI agent performance is measured and scaled in terminal environments.
Excited to share what I’ve been working on with @andykonwinski, @Mike_A_Merrill, and @lschmidt3 at Stanford & Laude.
Introducing Terminal-Bench! A benchmark and framework to quantify how well AI agents accomplish complex tasks in a terminal environment. We believe that the terminal is a particularly powerful tool for agents because it provides a text-based low-level interface for operating a computer to an agent.
Thanks @SnorkelAI for the great tasks and especially @fredsala, Tom Walshe, and Jeong Shin for the collaboration
Terminal-Bench 2.0 on the horizon 👀 + some other exciting releases!
Excited to share what I’ve been working on with @andykonwinski, @Mike_A_Merrill, and @lschmidt3 at Stanford & Laude.
Introducing Terminal-Bench! A benchmark and framework to quantify how well AI agents accomplish complex tasks in a terminal environment. We believe that the terminal is a particularly powerful tool for agents because it provides a text-based low-level interface for operating a computer to an agent.
{"data":{"__meta":{"device":false,"path":"/creators/alexgshaw"},"/creators/alexgshaw":{"data":{"user":{"id":"1448787032486989825","name":"Alex Shaw","description":"Shipping @LaudeInstitute & investing @LaudeVentures Co-creator of Terminal-Bench. Formerly Google. BYU alum.","followers_count":674,"friends_count":503,"statuses_count":418,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1930281479491452928/rKi6sgPf_normal.jpg","screen_name":"alexgshaw","location":"","entities":{"description":{"urls":[]},"url":{"urls":[{"display_url":"tbench.ai","expanded_url":"http://tbench.ai","url":"https://t.co/31mWyXrHsC","indices":[0,23]}]}}},"details":{"type":"The Innovator","description":"Alex Shaw is a trailblazer at the intersection of AI and software development, pioneering innovative solutions like Terminal-Bench to simplify complex agent evaluations. With a background from Google and a passion for pushing technological boundaries, Alex combines deep technical expertise with a collaborative spirit. His tweets showcase a focus on practical innovation, community engagement, and advancing AI capabilities.","purpose":"To propel the technology landscape forward by creating cutting-edge tools that make AI agent evaluation accessible, standardized, and scalable, empowering developers and researchers to accelerate innovation.","beliefs":"Alex believes in the power of open collaboration, continuous improvement, and leveraging technology to solve real-world challenges efficiently. He values innovation driven by community input, transparency, and creating practical frameworks that facilitate progress in AI and software development.","facts":"Fun fact: Alex co-created Terminal-Bench, the “npm of agent benchmarks,” revolutionizing how AI agent performance is measured and scaled in terminal environments.","strength":"Alex’s strength lies in his ability to conceptualize and develop innovative technical frameworks, communicate complex ideas clearly, and foster collaborative ecosystems that drive technological adoption.","weakness":"Sometimes, Alex’s deep technical focus and commitment to innovation may lead to communication that is highly specialized, potentially limiting accessibility for a broader, less technical audience.","recommendation":"To grow his audience on X, Alex should blend his expert technical insights with engaging storytelling and relatable use cases, making his innovations more approachable. Leveraging video demos or behind-the-scenes content from meetups can also boost engagement and attract a diverse follower base.","roast":"For someone who’s inventing the future of AI benchmarks, Alex’s tweet count is almost as low as the patience his Terminal-Bench users might have when integrating dozens of frameworks. Maybe next version can benchmark his tweet frequency!","win":"Successfully launched Terminal-Bench 2.0 with new scalable features and built a community-driven framework widely recognized among AI developers and researchers."},"tweets":[{"bookmarked":false,"display_text_range":[0,235],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/YwmacS625Z","expanded_url":"https://x.com/alexgshaw/status/1986911106108211461/photo/1","id_str":"1986906599416602625","indices":[236,259],"media_key":"3_1986906599416602625","media_url_https":"https://pbs.twimg.com/media/G5LpBmwbIAElWXo.png","type":"photo","url":"https://t.co/YwmacS625Z","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1024,"w":2048,"resize":"fit"},"medium":{"h":600,"w":1200,"resize":"fit"},"small":{"h":340,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1024,"width":2048,"focus_rects":[{"x":0,"y":0,"w":1829,"h":1024},{"x":0,"y":0,"w":1024,"h":1024},{"x":0,"y":0,"w":898,"h":1024},{"x":0,"y":0,"w":512,"h":1024},{"x":0,"y":0,"w":2048,"h":1024}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986906599416602625"}}}],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/YwmacS625Z","expanded_url":"https://x.com/alexgshaw/status/1986911106108211461/photo/1","id_str":"1986906599416602625","indices":[236,259],"media_key":"3_1986906599416602625","media_url_https":"https://pbs.twimg.com/media/G5LpBmwbIAElWXo.png","type":"photo","url":"https://t.co/YwmacS625Z","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1024,"w":2048,"resize":"fit"},"medium":{"h":600,"w":1200,"resize":"fit"},"small":{"h":340,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1024,"width":2048,"focus_rects":[{"x":0,"y":0,"w":1829,"h":1024},{"x":0,"y":0,"w":1024,"h":1024},{"x":0,"y":0,"w":898,"h":1024},{"x":0,"y":0,"w":512,"h":1024},{"x":0,"y":0,"w":2048,"h":1024}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986906599416602625"}}}]},"favorited":false,"lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911106108211461","view_count":104511,"bookmark_count":117,"created_at":1762551497000,"favorite_count":344,"quote_count":35,"reply_count":23,"retweet_count":71,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Today, we’re announcing the next chapter of Terminal-Bench with two releases:\n\n1. Harbor, a new package for running sandboxed agent rollouts at scale\n2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification https://t.co/YwmacS625Z","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,137],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"luma.com/h9kb99vd","expanded_url":"https://luma.com/h9kb99vd","url":"https://t.co/wLItVEHEMc","indices":[114,137]}],"user_mentions":[]},"favorited":true,"lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":true,"fact_check":null,"id":"1985427712484524433","view_count":13587,"bookmark_count":31,"created_at":1762197828000,"favorite_count":107,"quote_count":2,"reply_count":1,"retweet_count":4,"user_id_str":"1448787032486989825","conversation_id_str":"1985427712484524433","full_text":"We're releasing Terminal-Bench 2.0 this week! Come to our meetup on Thursday @ Databricks to get early access :)\n\nhttps://t.co/wLItVEHEMc","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,277],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/ds5R0rpm3K","expanded_url":"https://x.com/alexgshaw/status/1945579955708203166/photo/1","id_str":"1945578137632296961","indices":[278,301],"media_key":"3_1945578137632296961","media_url_https":"https://pbs.twimg.com/media/GwAVAhBW0AEVdKa.png","type":"photo","url":"https://t.co/ds5R0rpm3K","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":487,"w":2048,"resize":"fit"},"medium":{"h":285,"w":1200,"resize":"fit"},"small":{"h":162,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":528,"width":2220,"focus_rects":[{"x":805,"y":0,"w":943,"h":528},{"x":1012,"y":0,"w":528,"h":528},{"x":1045,"y":0,"w":463,"h":528},{"x":1144,"y":0,"w":264,"h":528},{"x":0,"y":0,"w":2220,"h":528}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1945578137632296961"}}}],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/ds5R0rpm3K","expanded_url":"https://x.com/alexgshaw/status/1945579955708203166/photo/1","id_str":"1945578137632296961","indices":[278,301],"media_key":"3_1945578137632296961","media_url_https":"https://pbs.twimg.com/media/GwAVAhBW0AEVdKa.png","type":"photo","url":"https://t.co/ds5R0rpm3K","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":487,"w":2048,"resize":"fit"},"medium":{"h":285,"w":1200,"resize":"fit"},"small":{"h":162,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":528,"width":2220,"focus_rects":[{"x":805,"y":0,"w":943,"h":528},{"x":1012,"y":0,"w":528,"h":528},{"x":1045,"y":0,"w":463,"h":528},{"x":1144,"y":0,"w":264,"h":528},{"x":0,"y":0,"w":2220,"h":528}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1945578137632296961"}}}]},"favorited":false,"lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1945579955708203166","view_count":11453,"bookmark_count":43,"created_at":1752697383000,"favorite_count":102,"quote_count":4,"reply_count":1,"retweet_count":22,"user_id_str":"1448787032486989825","conversation_id_str":"1945579955708203166","full_text":"Evaluating agents on benchmarks is a pain. Each benchmark comes with its own harness, scoring scripts, and environments and integrating can take days.\n\nWe're introducing the Terminal-Bench dataset registry to solve this problem. Think of it as the npm of agent benchmarks.\n\nNow you can use the Terminal-Bench CLI and harness to evaluate on SWE-bench and other popular benchmarks.","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,278],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"168787972","name":"Andy Konwinski","screen_name":"andykonwinski","indices":[48,62]},{"id_str":"1233837766271569920","name":"Mike A. Merrill","screen_name":"Mike_A_Merrill","indices":[64,79]},{"id_str":"64333359","name":"Ludwig Schmidt","screen_name":"lschmidt3","indices":[85,95]},{"id_str":"168787972","name":"Andy Konwinski","screen_name":"andykonwinski","indices":[48,62]},{"id_str":"1233837766271569920","name":"Mike A. Merrill","screen_name":"Mike_A_Merrill","indices":[64,79]},{"id_str":"64333359","name":"Ludwig Schmidt","screen_name":"lschmidt3","indices":[85,95]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1924579504506413199","quoted_status_permalink":{"url":"https://t.co/xDnLYbC4T8","expanded":"https://twitter.com/Mike_A_Merrill/status/1924579504506413199","display":"x.com/Mike_A_Merrill…"},"retweeted":false,"fact_check":null,"id":"1924591158635987031","view_count":5675,"bookmark_count":8,"created_at":1747693264000,"favorite_count":51,"quote_count":2,"reply_count":3,"retweet_count":6,"user_id_str":"1448787032486989825","conversation_id_str":"1924591158635987031","full_text":"Excited to share what I’ve been working on with @andykonwinski, @Mike_A_Merrill, and @lschmidt3 at Stanford & Laude.\n\nIntroducing Terminal-Bench! A benchmark and framework to quantify how well AI agents accomplish complex tasks in a terminal environment. We believe that the terminal is a particularly powerful tool for agents because it provides a text-based low-level interface for operating a computer to an agent.","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,40],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1670869422733393966","name":"Latent.Space","screen_name":"latentspacepod","indices":[23,38]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1980330471163892201","quoted_status_permalink":{"url":"https://t.co/sMxAiyEOQ0","expanded":"https://twitter.com/Mike_A_Merrill/status/1980330471163892201","display":"x.com/Mike_A_Merrill…"},"retweeted":false,"fact_check":null,"id":"1980344520215785912","view_count":2315,"bookmark_count":3,"created_at":1760985901000,"favorite_count":18,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1980344520215785912","full_text":"Mike and I went on the @latentspacepod !","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,91],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1821252546469752832","name":"Letta","screen_name":"Letta_AI","indices":[12,21]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1952803314594165204","quoted_status_permalink":{"url":"https://t.co/rgevjZWl1E","expanded":"https://twitter.com/Letta_AI/status/1952803314594165204","display":"x.com/Letta_AI/statu…"},"retweeted":false,"fact_check":null,"id":"1953359935661887880","view_count":1894,"bookmark_count":2,"created_at":1754552274000,"favorite_count":15,"quote_count":0,"reply_count":1,"retweet_count":3,"user_id_str":"1448787032486989825","conversation_id_str":"1953359935661887880","full_text":"Congrats to @Letta_AI for building the best performing open source agent on Terminal-Bench!","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,21],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1986093902122942700","quoted_status_permalink":{"url":"https://t.co/zhs56oNQSL","expanded":"https://twitter.com/jyangballin/status/1986093902122942700","display":"x.com/jyangballin/st…"},"retweeted":false,"fact_check":null,"id":"1986106642157703523","view_count":2599,"bookmark_count":6,"created_at":1762359698000,"favorite_count":15,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986106642157703523","full_text":"Very clever benchmark","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,83],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1972045994381791317","quoted_status_permalink":{"url":"https://t.co/llyTZ6MTRY","expanded":"https://twitter.com/matanSF/status/1972045994381791317","display":"x.com/matanSF/status…"},"retweeted":false,"fact_check":null,"id":"1972099151359095124","view_count":2532,"bookmark_count":0,"created_at":1759020052000,"favorite_count":13,"quote_count":0,"reply_count":0,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1972099151359095124","full_text":"We originally wanted to call it “T-Bench” until we realized it was a chest exercise","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,118],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1863959670169501696","name":"Kimi.ai","screen_name":"Kimi_Moonshot","indices":[21,35]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1963802687230947698","quoted_status_permalink":{"url":"https://t.co/oFeHTeCahp","expanded":"https://twitter.com/Kimi_Moonshot/status/1963802687230947698","display":"x.com/Kimi_Moonshot/…"},"retweeted":false,"fact_check":null,"id":"1963826196241600693","view_count":761,"bookmark_count":0,"created_at":1757047625000,"favorite_count":12,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1963826196241600693","full_text":"Wow! Congrats to the @Kimi_Moonshot team for a very impressive Terminal-Bench score and an all around great OSS model!","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,181],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1149142678715527169","name":"Snorkel AI","screen_name":"SnorkelAI","indices":[7,17]},{"id_str":"163562389","name":"Fred Sala","screen_name":"fredsala","indices":[53,62]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1973070528777715839","quoted_status_permalink":{"url":"https://t.co/f87Sptq4x3","expanded":"https://twitter.com/SnorkelAI/status/1973070528777715839","display":"x.com/SnorkelAI/stat…"},"retweeted":false,"fact_check":null,"id":"1973072999898140727","view_count":727,"bookmark_count":1,"created_at":1759252235000,"favorite_count":12,"quote_count":0,"reply_count":0,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1973072999898140727","full_text":"Thanks @SnorkelAI for the great tasks and especially @fredsala, Tom Walshe, and Jeong Shin for the collaboration\n\nTerminal-Bench 2.0 on the horizon 👀 + some other exciting releases!","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,127],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1970844077416690043","quoted_status_permalink":{"url":"https://t.co/qcfwTvUVVY","expanded":"https://twitter.com/daytonaio/status/1970844077416690043","display":"x.com/daytonaio/stat…"},"retweeted":false,"fact_check":null,"id":"1971271862194958510","view_count":2301,"bookmark_count":3,"created_at":1758822811000,"favorite_count":11,"quote_count":1,"reply_count":0,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1971271862194958510","full_text":"Not surprised! We used Daytona to run 37k Terminal-Bench eval rollouts in the last two weeks.\n\n(P.S. new tasks dropping soon 👀)","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,280],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","retweeted":false,"fact_check":null,"id":"1951761080914481372","view_count":287,"bookmark_count":1,"created_at":1754171078000,"favorite_count":11,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1951761080914481372","full_text":"Training an LLM from scratch is essentially imitation (auto-regression), selective imitation (SFT), people pleasing (RLHF), and then outcome seeking (RL).\n\nObjectively verifiable RL is the final step because a smart AI should produce objectively true output that works in reality.","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,105],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1965947230696910935","quoted_status_permalink":{"url":"https://t.co/KL5LzZ15lk","expanded":"https://twitter.com/AlexGDimakis/status/1965947230696910935","display":"x.com/AlexGDimakis/s…"},"retweeted":false,"fact_check":null,"id":"1965982538062876849","view_count":682,"bookmark_count":3,"created_at":1757561737000,"favorite_count":10,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1965982538062876849","full_text":"A fantastic explanation of what a task is in Terminal-Bench and how it’s actually just an RL environment.","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,25],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1591468170136547328","name":"OpenBlock","screen_name":"openblocklabs","indices":[9,23]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1963651471536542129","quoted_status_permalink":{"url":"https://t.co/5ip9owGT3l","expanded":"https://twitter.com/openblocklabs/status/1963651471536542129","display":"x.com/openblocklabs/…"},"retweeted":false,"fact_check":null,"id":"1963763960756220149","view_count":430,"bookmark_count":2,"created_at":1757032787000,"favorite_count":9,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1963763960756220149","full_text":"Congrats @openblocklabs !","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,75],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"33521530","name":"swyx","screen_name":"swyx","indices":[10,15]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1986989606320185472","quoted_status_permalink":{"url":"https://t.co/oyJSv9Xper","expanded":"https://twitter.com/latentspacepod/status/1986989606320185472","display":"x.com/latentspacepod…"},"retweeted":false,"fact_check":null,"id":"1987015010288353599","view_count":495,"bookmark_count":1,"created_at":1762576270000,"favorite_count":8,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1987015010288353599","full_text":"Thank you @swyx ! Was great to have you, and the live podcast was awesome 😁","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,280],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1981013196195668143","quoted_status_permalink":{"url":"https://t.co/eWXL225mnb","expanded":"https://twitter.com/DimitrisPapail/status/1981013196195668143","display":"x.com/DimitrisPapail…"},"retweeted":false,"fact_check":null,"id":"1981062983821512920","view_count":782,"bookmark_count":2,"created_at":1761157196000,"favorite_count":8,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1981062983821512920","full_text":"One definition of AGI is the full automation of all computer work.\n\nA Terminal-Bench task is an instruction, container, and test executable. Agents are just programs that get executed in the container.\n\nSo Terminal-Bench can measure the automation of any computer task, i.e., AGI.","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}],"ctweets":[{"bookmarked":false,"display_text_range":[0,235],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/YwmacS625Z","expanded_url":"https://x.com/alexgshaw/status/1986911106108211461/photo/1","id_str":"1986906599416602625","indices":[236,259],"media_key":"3_1986906599416602625","media_url_https":"https://pbs.twimg.com/media/G5LpBmwbIAElWXo.png","type":"photo","url":"https://t.co/YwmacS625Z","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1024,"w":2048,"resize":"fit"},"medium":{"h":600,"w":1200,"resize":"fit"},"small":{"h":340,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1024,"width":2048,"focus_rects":[{"x":0,"y":0,"w":1829,"h":1024},{"x":0,"y":0,"w":1024,"h":1024},{"x":0,"y":0,"w":898,"h":1024},{"x":0,"y":0,"w":512,"h":1024},{"x":0,"y":0,"w":2048,"h":1024}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986906599416602625"}}}],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/YwmacS625Z","expanded_url":"https://x.com/alexgshaw/status/1986911106108211461/photo/1","id_str":"1986906599416602625","indices":[236,259],"media_key":"3_1986906599416602625","media_url_https":"https://pbs.twimg.com/media/G5LpBmwbIAElWXo.png","type":"photo","url":"https://t.co/YwmacS625Z","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1024,"w":2048,"resize":"fit"},"medium":{"h":600,"w":1200,"resize":"fit"},"small":{"h":340,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1024,"width":2048,"focus_rects":[{"x":0,"y":0,"w":1829,"h":1024},{"x":0,"y":0,"w":1024,"h":1024},{"x":0,"y":0,"w":898,"h":1024},{"x":0,"y":0,"w":512,"h":1024},{"x":0,"y":0,"w":2048,"h":1024}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986906599416602625"}}}]},"favorited":false,"lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911106108211461","view_count":104511,"bookmark_count":117,"created_at":1762551497000,"favorite_count":344,"quote_count":35,"reply_count":23,"retweet_count":71,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Today, we’re announcing the next chapter of Terminal-Bench with two releases:\n\n1. Harbor, a new package for running sandboxed agent rollouts at scale\n2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification https://t.co/YwmacS625Z","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,278],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"168787972","name":"Andy Konwinski","screen_name":"andykonwinski","indices":[48,62]},{"id_str":"1233837766271569920","name":"Mike A. Merrill","screen_name":"Mike_A_Merrill","indices":[64,79]},{"id_str":"64333359","name":"Ludwig Schmidt","screen_name":"lschmidt3","indices":[85,95]},{"id_str":"168787972","name":"Andy Konwinski","screen_name":"andykonwinski","indices":[48,62]},{"id_str":"1233837766271569920","name":"Mike A. Merrill","screen_name":"Mike_A_Merrill","indices":[64,79]},{"id_str":"64333359","name":"Ludwig Schmidt","screen_name":"lschmidt3","indices":[85,95]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1924579504506413199","quoted_status_permalink":{"url":"https://t.co/xDnLYbC4T8","expanded":"https://twitter.com/Mike_A_Merrill/status/1924579504506413199","display":"x.com/Mike_A_Merrill…"},"retweeted":false,"fact_check":null,"id":"1924591158635987031","view_count":5675,"bookmark_count":8,"created_at":1747693264000,"favorite_count":51,"quote_count":2,"reply_count":3,"retweet_count":6,"user_id_str":"1448787032486989825","conversation_id_str":"1924591158635987031","full_text":"Excited to share what I’ve been working on with @andykonwinski, @Mike_A_Merrill, and @lschmidt3 at Stanford & Laude.\n\nIntroducing Terminal-Bench! A benchmark and framework to quantify how well AI agents accomplish complex tasks in a terminal environment. We believe that the terminal is a particularly powerful tool for agents because it provides a text-based low-level interface for operating a computer to an agent.","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,91],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1821252546469752832","name":"Letta","screen_name":"Letta_AI","indices":[12,21]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1952803314594165204","quoted_status_permalink":{"url":"https://t.co/rgevjZWl1E","expanded":"https://twitter.com/Letta_AI/status/1952803314594165204","display":"x.com/Letta_AI/statu…"},"retweeted":false,"fact_check":null,"id":"1953359935661887880","view_count":1894,"bookmark_count":2,"created_at":1754552274000,"favorite_count":15,"quote_count":0,"reply_count":1,"retweet_count":3,"user_id_str":"1448787032486989825","conversation_id_str":"1953359935661887880","full_text":"Congrats to @Letta_AI for building the best performing open source agent on Terminal-Bench!","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,277],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/ds5R0rpm3K","expanded_url":"https://x.com/alexgshaw/status/1945579955708203166/photo/1","id_str":"1945578137632296961","indices":[278,301],"media_key":"3_1945578137632296961","media_url_https":"https://pbs.twimg.com/media/GwAVAhBW0AEVdKa.png","type":"photo","url":"https://t.co/ds5R0rpm3K","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":487,"w":2048,"resize":"fit"},"medium":{"h":285,"w":1200,"resize":"fit"},"small":{"h":162,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":528,"width":2220,"focus_rects":[{"x":805,"y":0,"w":943,"h":528},{"x":1012,"y":0,"w":528,"h":528},{"x":1045,"y":0,"w":463,"h":528},{"x":1144,"y":0,"w":264,"h":528},{"x":0,"y":0,"w":2220,"h":528}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1945578137632296961"}}}],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/ds5R0rpm3K","expanded_url":"https://x.com/alexgshaw/status/1945579955708203166/photo/1","id_str":"1945578137632296961","indices":[278,301],"media_key":"3_1945578137632296961","media_url_https":"https://pbs.twimg.com/media/GwAVAhBW0AEVdKa.png","type":"photo","url":"https://t.co/ds5R0rpm3K","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":487,"w":2048,"resize":"fit"},"medium":{"h":285,"w":1200,"resize":"fit"},"small":{"h":162,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":528,"width":2220,"focus_rects":[{"x":805,"y":0,"w":943,"h":528},{"x":1012,"y":0,"w":528,"h":528},{"x":1045,"y":0,"w":463,"h":528},{"x":1144,"y":0,"w":264,"h":528},{"x":0,"y":0,"w":2220,"h":528}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1945578137632296961"}}}]},"favorited":false,"lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1945579955708203166","view_count":11453,"bookmark_count":43,"created_at":1752697383000,"favorite_count":102,"quote_count":4,"reply_count":1,"retweet_count":22,"user_id_str":"1448787032486989825","conversation_id_str":"1945579955708203166","full_text":"Evaluating agents on benchmarks is a pain. Each benchmark comes with its own harness, scoring scripts, and environments and integrating can take days.\n\nWe're introducing the Terminal-Bench dataset registry to solve this problem. Think of it as the npm of agent benchmarks.\n\nNow you can use the Terminal-Bench CLI and harness to evaluate on SWE-bench and other popular benchmarks.","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,105],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1965947230696910935","quoted_status_permalink":{"url":"https://t.co/KL5LzZ15lk","expanded":"https://twitter.com/AlexGDimakis/status/1965947230696910935","display":"x.com/AlexGDimakis/s…"},"retweeted":false,"fact_check":null,"id":"1965982538062876849","view_count":682,"bookmark_count":3,"created_at":1757561737000,"favorite_count":10,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1965982538062876849","full_text":"A fantastic explanation of what a task is in Terminal-Bench and how it’s actually just an RL environment.","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,118],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1863959670169501696","name":"Kimi.ai","screen_name":"Kimi_Moonshot","indices":[21,35]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1963802687230947698","quoted_status_permalink":{"url":"https://t.co/oFeHTeCahp","expanded":"https://twitter.com/Kimi_Moonshot/status/1963802687230947698","display":"x.com/Kimi_Moonshot/…"},"retweeted":false,"fact_check":null,"id":"1963826196241600693","view_count":761,"bookmark_count":0,"created_at":1757047625000,"favorite_count":12,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1963826196241600693","full_text":"Wow! Congrats to the @Kimi_Moonshot team for a very impressive Terminal-Bench score and an all around great OSS model!","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,25],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1591468170136547328","name":"OpenBlock","screen_name":"openblocklabs","indices":[9,23]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1963651471536542129","quoted_status_permalink":{"url":"https://t.co/5ip9owGT3l","expanded":"https://twitter.com/openblocklabs/status/1963651471536542129","display":"x.com/openblocklabs/…"},"retweeted":false,"fact_check":null,"id":"1963763960756220149","view_count":430,"bookmark_count":2,"created_at":1757032787000,"favorite_count":9,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1963763960756220149","full_text":"Congrats @openblocklabs !","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,137],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"luma.com/h9kb99vd","expanded_url":"https://luma.com/h9kb99vd","url":"https://t.co/wLItVEHEMc","indices":[114,137]}],"user_mentions":[]},"favorited":true,"lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":true,"fact_check":null,"id":"1985427712484524433","view_count":13587,"bookmark_count":31,"created_at":1762197828000,"favorite_count":107,"quote_count":2,"reply_count":1,"retweet_count":4,"user_id_str":"1448787032486989825","conversation_id_str":"1985427712484524433","full_text":"We're releasing Terminal-Bench 2.0 this week! Come to our meetup on Thursday @ Databricks to get early access :)\n\nhttps://t.co/wLItVEHEMc","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,21],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1986093902122942700","quoted_status_permalink":{"url":"https://t.co/zhs56oNQSL","expanded":"https://twitter.com/jyangballin/status/1986093902122942700","display":"x.com/jyangballin/st…"},"retweeted":false,"fact_check":null,"id":"1986106642157703523","view_count":2599,"bookmark_count":6,"created_at":1762359698000,"favorite_count":15,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986106642157703523","full_text":"Very clever benchmark","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,40],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1670869422733393966","name":"Latent.Space","screen_name":"latentspacepod","indices":[23,38]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1980330471163892201","quoted_status_permalink":{"url":"https://t.co/sMxAiyEOQ0","expanded":"https://twitter.com/Mike_A_Merrill/status/1980330471163892201","display":"x.com/Mike_A_Merrill…"},"retweeted":false,"fact_check":null,"id":"1980344520215785912","view_count":2315,"bookmark_count":3,"created_at":1760985901000,"favorite_count":18,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1980344520215785912","full_text":"Mike and I went on the @latentspacepod !","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,127],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1970844077416690043","quoted_status_permalink":{"url":"https://t.co/qcfwTvUVVY","expanded":"https://twitter.com/daytonaio/status/1970844077416690043","display":"x.com/daytonaio/stat…"},"retweeted":false,"fact_check":null,"id":"1971271862194958510","view_count":2301,"bookmark_count":3,"created_at":1758822811000,"favorite_count":11,"quote_count":1,"reply_count":0,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1971271862194958510","full_text":"Not surprised! We used Daytona to run 37k Terminal-Bench eval rollouts in the last two weeks.\n\n(P.S. new tasks dropping soon 👀)","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,140],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1644135335734157313","name":"Factory","screen_name":"FactoryAI","indices":[3,13]}]},"favorited":false,"lang":"en","retweeted":false,"fact_check":null,"id":"1971303092617609643","view_count":614990,"bookmark_count":0,"created_at":1758830256000,"favorite_count":0,"quote_count":0,"reply_count":0,"retweet_count":152,"user_id_str":"1448787032486989825","conversation_id_str":"1971303092617609643","full_text":"RT @FactoryAI: The best agents for software development are becoming the best agents for everything. Droids are the best software developme…","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,140],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1753339277386342400","name":"Qwen","screen_name":"Alibaba_Qwen","indices":[3,16]}]},"favorited":false,"lang":"en","retweeted":false,"fact_check":null,"id":"1970589021094223882","view_count":157373,"bookmark_count":0,"created_at":1758660009000,"favorite_count":0,"quote_count":0,"reply_count":0,"retweet_count":149,"user_id_str":"1448787032486989825","conversation_id_str":"1970589021094223882","full_text":"RT @Alibaba_Qwen: We're excited to announce the upgrade of Qwen3-Coder, and the upgraded API `qwen3-coder-plus` is now available on Alibaba…","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,140],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1854370033520324608","name":"Laude Institute","screen_name":"LaudeInstitute","indices":[3,18]}]},"favorited":false,"lang":"en","retweeted":false,"fact_check":null,"id":"1962294185035440325","view_count":933,"bookmark_count":0,"created_at":1756682366000,"favorite_count":0,"quote_count":0,"reply_count":0,"retweet_count":4,"user_id_str":"1448787032486989825","conversation_id_str":"1962294185035440325","full_text":"RT @LaudeInstitute: If you're obsessed with shipping and like the view from the frontier, come help us build. Now hiring across MoTS, Resea…","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,36],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1591468170136547328","name":"OpenBlock","screen_name":"openblocklabs","indices":[16,30]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1958951300491878481","quoted_status_permalink":{"url":"https://t.co/AcKApER563","expanded":"https://twitter.com/openblocklabs/status/1958951300491878481","display":"x.com/openblocklabs/…"},"retweeted":false,"fact_check":null,"id":"1958954092749037798","view_count":695,"bookmark_count":1,"created_at":1755886025000,"favorite_count":6,"quote_count":0,"reply_count":0,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1958954092749037798","full_text":"Congrats to the @openblocklabs team!","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,100],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/UsHo2MMnhC","expanded_url":"https://x.com/Mike_A_Merrill/status/1958583868510150806/photo/1","id_str":"1958583598262722560","indices":[77,100],"media_key":"3_1958583598262722560","media_url_https":"https://pbs.twimg.com/media/Gy5JZ9VbIAA2h5K.jpg","source_status_id_str":"1958583868510150806","source_user_id_str":"1233837766271569920","type":"photo","url":"https://t.co/UsHo2MMnhC","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":581,"w":1228,"resize":"fit"},"medium":{"h":568,"w":1200,"resize":"fit"},"small":{"h":322,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":581,"width":1228,"focus_rects":[{"x":0,"y":0,"w":1038,"h":581},{"x":0,"y":0,"w":581,"h":581},{"x":0,"y":0,"w":510,"h":581},{"x":8,"y":0,"w":291,"h":581},{"x":0,"y":0,"w":1228,"h":581}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1958583598262722560"}}}],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1233837766271569920","name":"Mike A. Merrill","screen_name":"Mike_A_Merrill","indices":[3,18]},{"id_str":"1714580962569588736","name":"DeepSeek","screen_name":"deepseek_ai","indices":[43,55]}]},"extended_entities":{"media":[{"display_url":"pic.x.com/UsHo2MMnhC","expanded_url":"https://x.com/Mike_A_Merrill/status/1958583868510150806/photo/1","id_str":"1958583598262722560","indices":[77,100],"media_key":"3_1958583598262722560","media_url_https":"https://pbs.twimg.com/media/Gy5JZ9VbIAA2h5K.jpg","source_status_id_str":"1958583868510150806","source_user_id_str":"1233837766271569920","type":"photo","url":"https://t.co/UsHo2MMnhC","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":581,"w":1228,"resize":"fit"},"medium":{"h":568,"w":1200,"resize":"fit"},"small":{"h":322,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":581,"width":1228,"focus_rects":[{"x":0,"y":0,"w":1038,"h":581},{"x":0,"y":0,"w":581,"h":581},{"x":0,"y":0,"w":510,"h":581},{"x":8,"y":0,"w":291,"h":581},{"x":0,"y":0,"w":1228,"h":581}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1958583598262722560"}}}]},"favorited":false,"lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"retweeted_status_result":{"result":{"__typename":"Tweet","rest_id":"1958583868510150806","core":{"user_results":{"result":{"__typename":"User","id":"VXNlcjoxMjMzODM3NzY2MjcxNTY5OTIw","rest_id":"1233837766271569920","affiliates_highlighted_label":{},"avatar":{"image_url":"https://pbs.twimg.com/profile_images/1879909706535157760/KoqzoSqE_normal.jpg"},"core":{"created_at":"Sat Feb 29 19:34:11 +0000 2020","name":"Mike A. Merrill","screen_name":"Mike_A_Merrill"},"dm_permissions":{"can_dm":false,"can_dm_on_xchat":false},"has_graduated_access":true,"is_blue_verified":true,"legacy":{"default_profile":true,"default_profile_image":false,"description":"Postdoc @StanfordAILab \nBuilding https://t.co/KWJvsMlWva with @alexgshaw and many others \nGo Bills","entities":{"description":{"urls":[{"display_url":"tbench.ai","expanded_url":"http://tbench.ai","url":"https://t.co/KWJvsMlWva","indices":[33,56]}]},"url":{"urls":[{"display_url":"mikemerrill.io","expanded_url":"http://mikemerrill.io","url":"https://t.co/wJkoqTnQp2","indices":[0,23]}]}},"fast_followers_count":0,"favourites_count":609,"followers_count":618,"friends_count":302,"has_custom_timelines":true,"is_translator":false,"listed_count":5,"media_count":27,"normal_followers_count":618,"pinned_tweet_ids_str":["1924579504506413199"],"possibly_sensitive":false,"profile_banner_url":"https://pbs.twimg.com/profile_banners/1233837766271569920/1747759339","profile_interstitial_type":"","statuses_count":282,"translator_type":"none","url":"https://t.co/wJkoqTnQp2","want_retweets":false,"withheld_in_countries":[]},"location":{"location":"San Francisco, CA"},"media_permissions":{"can_media_tag":true},"parody_commentary_fan_label":"None","profile_image_shape":"Circle","privacy":{"protected":false},"relationship_perspectives":{"following":false},"tipjar_settings":{},"verification":{"verified":false}}}},"unmention_data":{},"edit_control":{"edit_tweet_ids":["1958583868510150806"],"editable_until_msecs":"1755801357000","is_edit_eligible":true,"edits_remaining":"5"},"is_translatable":false,"views":{"count":"361","state":"EnabledWithCount"},"source":"<a href=\"https://mobile.twitter.com\" rel=\"nofollow\">Twitter Web App</a>","grok_translated_post_with_availability":{"is_available":false},"grok_analysis_button":true,"legacy":{"bookmark_count":1,"bookmarked":false,"created_at":"Thu Aug 21 17:35:57 +0000 2025","conversation_id_str":"1958583868510150806","display_text_range":[0,56],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/UsHo2MMnhC","expanded_url":"https://x.com/Mike_A_Merrill/status/1958583868510150806/photo/1","id_str":"1958583598262722560","indices":[57,80],"media_key":"3_1958583598262722560","media_url_https":"https://pbs.twimg.com/media/Gy5JZ9VbIAA2h5K.jpg","type":"photo","url":"https://t.co/UsHo2MMnhC","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":581,"w":1228,"resize":"fit"},"medium":{"h":568,"w":1200,"resize":"fit"},"small":{"h":322,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":581,"width":1228,"focus_rects":[{"x":0,"y":0,"w":1038,"h":581},{"x":0,"y":0,"w":581,"h":581},{"x":0,"y":0,"w":510,"h":581},{"x":8,"y":0,"w":291,"h":581},{"x":0,"y":0,"w":1228,"h":581}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1958583598262722560"}}}],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1714580962569588736","name":"DeepSeek","screen_name":"deepseek_ai","indices":[23,35]}]},"extended_entities":{"media":[{"display_url":"pic.x.com/UsHo2MMnhC","expanded_url":"https://x.com/Mike_A_Merrill/status/1958583868510150806/photo/1","id_str":"1958583598262722560","indices":[57,80],"media_key":"3_1958583598262722560","media_url_https":"https://pbs.twimg.com/media/Gy5JZ9VbIAA2h5K.jpg","type":"photo","url":"https://t.co/UsHo2MMnhC","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":581,"w":1228,"resize":"fit"},"medium":{"h":568,"w":1200,"resize":"fit"},"small":{"h":322,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":581,"width":1228,"focus_rects":[{"x":0,"y":0,"w":1038,"h":581},{"x":0,"y":0,"w":581,"h":581},{"x":0,"y":0,"w":510,"h":581},{"x":8,"y":0,"w":291,"h":581},{"x":0,"y":0,"w":1228,"h":581}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1958583598262722560"}}}]},"favorite_count":9,"favorited":false,"full_text":"Great improvement from @deepseek_ai on terminal-bench :) https://t.co/UsHo2MMnhC","is_quote_status":false,"lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"quote_count":0,"reply_count":0,"retweet_count":1,"retweeted":false,"user_id_str":"1233837766271569920","id_str":"1958583868510150806","view_count":361}}},"fact_check":null,"id":"1958588359615648210","view_count":1,"bookmark_count":0,"created_at":1755798828000,"favorite_count":0,"quote_count":0,"reply_count":0,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1958588359615648210","full_text":"RT @Mike_A_Merrill: Great improvement from @deepseek_ai on terminal-bench :) https://t.co/UsHo2MMnhC","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":0,"is_ai":null,"ai_score":null}],"activities":{"nreplies":[{"label":"2025-10-18","value":0,"startTime":1760659200000,"endTime":1760745600000,"tweets":[]},{"label":"2025-10-19","value":0,"startTime":1760745600000,"endTime":1760832000000,"tweets":[]},{"label":"2025-10-20","value":0,"startTime":1760832000000,"endTime":1760918400000,"tweets":[]},{"label":"2025-10-21","value":1,"startTime":1760918400000,"endTime":1761004800000,"tweets":[{"bookmarked":false,"display_text_range":[0,40],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1670869422733393966","name":"Latent.Space","screen_name":"latentspacepod","indices":[23,38]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1980330471163892201","quoted_status_permalink":{"url":"https://t.co/sMxAiyEOQ0","expanded":"https://twitter.com/Mike_A_Merrill/status/1980330471163892201","display":"x.com/Mike_A_Merrill…"},"retweeted":false,"fact_check":null,"id":"1980344520215785912","view_count":2315,"bookmark_count":3,"created_at":1760985901000,"favorite_count":18,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1980344520215785912","full_text":"Mike and I went on the @latentspacepod !","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-10-22","value":0,"startTime":1761004800000,"endTime":1761091200000,"tweets":[]},{"label":"2025-10-23","value":0,"startTime":1761091200000,"endTime":1761177600000,"tweets":[{"bookmarked":false,"display_text_range":[0,5],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1980264839806177286","quoted_status_permalink":{"url":"https://t.co/jl3wu1E3P8","expanded":"https://twitter.com/petergostev/status/1980264839806177286","display":"x.com/petergostev/st…"},"retweeted":false,"fact_check":null,"id":"1981121994533048588","view_count":453,"bookmark_count":0,"created_at":1761171265000,"favorite_count":5,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1981121994533048588","full_text":"Cool!","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,280],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1981013196195668143","quoted_status_permalink":{"url":"https://t.co/eWXL225mnb","expanded":"https://twitter.com/DimitrisPapail/status/1981013196195668143","display":"x.com/DimitrisPapail…"},"retweeted":false,"fact_check":null,"id":"1981062983821512920","view_count":782,"bookmark_count":2,"created_at":1761157196000,"favorite_count":8,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1981062983821512920","full_text":"One definition of AGI is the full automation of all computer work.\n\nA Terminal-Bench task is an instruction, container, and test executable. Agents are just programs that get executed in the container.\n\nSo Terminal-Bench can measure the automation of any computer task, i.e., AGI.","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-10-24","value":0,"startTime":1761177600000,"endTime":1761264000000,"tweets":[]},{"label":"2025-10-25","value":0,"startTime":1761264000000,"endTime":1761350400000,"tweets":[]},{"label":"2025-10-26","value":0,"startTime":1761350400000,"endTime":1761436800000,"tweets":[]},{"label":"2025-10-27","value":0,"startTime":1761436800000,"endTime":1761523200000,"tweets":[]},{"label":"2025-10-28","value":0,"startTime":1761523200000,"endTime":1761609600000,"tweets":[]},{"label":"2025-10-29","value":0,"startTime":1761609600000,"endTime":1761696000000,"tweets":[]},{"label":"2025-10-30","value":0,"startTime":1761696000000,"endTime":1761782400000,"tweets":[]},{"label":"2025-10-31","value":0,"startTime":1761782400000,"endTime":1761868800000,"tweets":[]},{"label":"2025-11-01","value":0,"startTime":1761868800000,"endTime":1761955200000,"tweets":[]},{"label":"2025-11-02","value":0,"startTime":1761955200000,"endTime":1762041600000,"tweets":[]},{"label":"2025-11-03","value":0,"startTime":1762041600000,"endTime":1762128000000,"tweets":[]},{"label":"2025-11-04","value":1,"startTime":1762128000000,"endTime":1762214400000,"tweets":[{"bookmarked":false,"display_text_range":[0,137],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"luma.com/h9kb99vd","expanded_url":"https://luma.com/h9kb99vd","url":"https://t.co/wLItVEHEMc","indices":[114,137]}],"user_mentions":[]},"favorited":true,"lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":true,"fact_check":null,"id":"1985427712484524433","view_count":13587,"bookmark_count":31,"created_at":1762197828000,"favorite_count":107,"quote_count":2,"reply_count":1,"retweet_count":4,"user_id_str":"1448787032486989825","conversation_id_str":"1985427712484524433","full_text":"We're releasing Terminal-Bench 2.0 this week! Come to our meetup on Thursday @ Databricks to get early access :)\n\nhttps://t.co/wLItVEHEMc","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-05","value":0,"startTime":1762214400000,"endTime":1762300800000,"tweets":[]},{"label":"2025-11-06","value":1,"startTime":1762300800000,"endTime":1762387200000,"tweets":[{"bookmarked":false,"display_text_range":[0,21],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1986093902122942700","quoted_status_permalink":{"url":"https://t.co/zhs56oNQSL","expanded":"https://twitter.com/jyangballin/status/1986093902122942700","display":"x.com/jyangballin/st…"},"retweeted":false,"fact_check":null,"id":"1986106642157703523","view_count":2599,"bookmark_count":6,"created_at":1762359698000,"favorite_count":15,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986106642157703523","full_text":"Very clever benchmark","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-11-07","value":0,"startTime":1762387200000,"endTime":1762473600000,"tweets":[]},{"label":"2025-11-08","value":36,"startTime":1762473600000,"endTime":1762560000000,"tweets":[{"bookmarked":false,"display_text_range":[0,235],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/YwmacS625Z","expanded_url":"https://x.com/alexgshaw/status/1986911106108211461/photo/1","id_str":"1986906599416602625","indices":[236,259],"media_key":"3_1986906599416602625","media_url_https":"https://pbs.twimg.com/media/G5LpBmwbIAElWXo.png","type":"photo","url":"https://t.co/YwmacS625Z","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1024,"w":2048,"resize":"fit"},"medium":{"h":600,"w":1200,"resize":"fit"},"small":{"h":340,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1024,"width":2048,"focus_rects":[{"x":0,"y":0,"w":1829,"h":1024},{"x":0,"y":0,"w":1024,"h":1024},{"x":0,"y":0,"w":898,"h":1024},{"x":0,"y":0,"w":512,"h":1024},{"x":0,"y":0,"w":2048,"h":1024}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986906599416602625"}}}],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/YwmacS625Z","expanded_url":"https://x.com/alexgshaw/status/1986911106108211461/photo/1","id_str":"1986906599416602625","indices":[236,259],"media_key":"3_1986906599416602625","media_url_https":"https://pbs.twimg.com/media/G5LpBmwbIAElWXo.png","type":"photo","url":"https://t.co/YwmacS625Z","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1024,"w":2048,"resize":"fit"},"medium":{"h":600,"w":1200,"resize":"fit"},"small":{"h":340,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1024,"width":2048,"focus_rects":[{"x":0,"y":0,"w":1829,"h":1024},{"x":0,"y":0,"w":1024,"h":1024},{"x":0,"y":0,"w":898,"h":1024},{"x":0,"y":0,"w":512,"h":1024},{"x":0,"y":0,"w":2048,"h":1024}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986906599416602625"}}}]},"favorited":false,"lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911106108211461","view_count":104511,"bookmark_count":117,"created_at":1762551497000,"favorite_count":344,"quote_count":35,"reply_count":23,"retweet_count":71,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Today, we’re announcing the next chapter of Terminal-Bench with two releases:\n\n1. Harbor, a new package for running sandboxed agent rollouts at scale\n2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification https://t.co/YwmacS625Z","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,274],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"loom.com/share/8c11f218…","expanded_url":"https://www.loom.com/share/8c11f218c9fc4674bd659146af435627","url":"https://t.co/x4gD2nHQxC","indices":[306,329]}],"user_mentions":[{"id_str":"1645813553365139462","name":"Daytona","screen_name":"daytonaio","indices":[182,192]},{"id_str":"1551987185372512263","name":"Modal","screen_name":"modal","indices":[197,203]},{"id_str":"1645813553365139462","name":"Daytona","screen_name":"daytonaio","indices":[182,192]},{"id_str":"1551987185372512263","name":"Modal","screen_name":"modal","indices":[197,203]},{"id_str":"1927771369611350016","name":"SWE-bench","screen_name":"SWEbench","indices":[294,303]}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911110017269764","view_count":2504,"bookmark_count":5,"created_at":1762551498000,"favorite_count":28,"quote_count":0,"reply_count":2,"retweet_count":6,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Just a few of the features I love about Harbor:\n- Evaluate any agent that can be installed and run autonomously\n- Scale up to thousands of concurrent containers using providers like @daytonaio and @modal\n- Generate rollouts for SFT and RL\n- Create your own benchmarks or use existing ones like @SWEbench \n\nhttps://t.co/x4gD2nHQxC","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911108117254546","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,209],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"harborframework.com","expanded_url":"https://harborframework.com/","url":"https://t.co/5kDZlvjhb7","indices":[186,209]}],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911108117254546","view_count":2828,"bookmark_count":7,"created_at":1762551497000,"favorite_count":27,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Harbor is the package we wish we had had while making Terminal-Bench. It’s for agent, model, and benchmark developers and researchers who want to evaluate and improve agents and models.\nhttps://t.co/5kDZlvjhb7","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911106108211461","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,194],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911111460089993","view_count":1578,"bookmark_count":0,"created_at":1762551498000,"favorite_count":16,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Harbor is also the official harness for Terminal-Bench 2.0. We used it to run tens of thousands of experiments in containerized environments while developing the latest version of the benchmark.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911110017269764","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,152],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911113209126988","view_count":1451,"bookmark_count":1,"created_at":1762551499000,"favorite_count":14,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"So why Terminal-Bench 2.0? We always knew that as model capabilities increased, we’d need to keep Terminal-Bench up to date with frontier capabilities.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911111460089993","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,277],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911114626802143","view_count":1535,"bookmark_count":0,"created_at":1762551499000,"favorite_count":17,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Terminal-Bench 2.0 consists of 89 hard tasks to test these capabilities. We aim to push the frontier with an increased emphasis on task quality. Each task received several hours of human and LM-assisted verification to ensure that tasks are (1) solvable, (2) realistic, and (3) well-specified. We’ll share more on how we did this in our upcoming preprint.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911113209126988","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,96],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/GIY0foKhNs","expanded_url":"https://x.com/alexgshaw/status/1986911116115759210/photo/1","id_str":"1986908092202950658","indices":[97,120],"media_key":"3_1986908092202950658","media_url_https":"https://pbs.twimg.com/media/G5LqYf0bIAIo56j.png","type":"photo","url":"https://t.co/GIY0foKhNs","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[{"x":507,"y":873,"h":103,"w":103}]},"medium":{"faces":[{"x":297,"y":511,"h":60,"w":60}]},"small":{"faces":[{"x":168,"y":290,"h":34,"w":34}]},"orig":{"faces":[{"x":637,"y":1097,"h":130,"w":130}]}},"sizes":{"large":{"h":1147,"w":2048,"resize":"fit"},"medium":{"h":672,"w":1200,"resize":"fit"},"small":{"h":381,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1440,"width":2572,"focus_rects":[{"x":0,"y":0,"w":2571,"h":1440},{"x":0,"y":0,"w":1440,"h":1440},{"x":0,"y":0,"w":1263,"h":1440},{"x":90,"y":0,"w":720,"h":1440},{"x":0,"y":0,"w":2572,"h":1440}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908092202950658"}}}],"symbols":[],"timestamps":[],"urls":[{"display_url":"tbench.ai/leaderboard","expanded_url":"https://www.tbench.ai/leaderboard","url":"https://t.co/vblfZF55Nd","indices":[73,96]}],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/GIY0foKhNs","expanded_url":"https://x.com/alexgshaw/status/1986911116115759210/photo/1","id_str":"1986908092202950658","indices":[97,120],"media_key":"3_1986908092202950658","media_url_https":"https://pbs.twimg.com/media/G5LqYf0bIAIo56j.png","type":"photo","url":"https://t.co/GIY0foKhNs","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[{"x":507,"y":873,"h":103,"w":103}]},"medium":{"faces":[{"x":297,"y":511,"h":60,"w":60}]},"small":{"faces":[{"x":168,"y":290,"h":34,"w":34}]},"orig":{"faces":[{"x":637,"y":1097,"h":130,"w":130}]}},"sizes":{"large":{"h":1147,"w":2048,"resize":"fit"},"medium":{"h":672,"w":1200,"resize":"fit"},"small":{"h":381,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1440,"width":2572,"focus_rects":[{"x":0,"y":0,"w":2571,"h":1440},{"x":0,"y":0,"w":1440,"h":1440},{"x":0,"y":0,"w":1263,"h":1440},{"x":90,"y":0,"w":720,"h":1440},{"x":0,"y":0,"w":2572,"h":1440}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908092202950658"}}}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911116115759210","view_count":1916,"bookmark_count":5,"created_at":1762551499000,"favorite_count":25,"quote_count":2,"reply_count":3,"retweet_count":3,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"At present, Codex CLI with GPT-5 sits at the top of our new leaderboard.\nhttps://t.co/vblfZF55Nd https://t.co/GIY0foKhNs","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911114626802143","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,272],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911119328616903","view_count":1335,"bookmark_count":2,"created_at":1762551500000,"favorite_count":16,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Astute Terminal-Bench fans may notice that SOTA performance is comparable to TB1.0 despite our claim that TB2.0 is harder. We believe that this is because task quality is substantially higher in the new benchmark. We have removed several misspecified or impossible tasks – increasing difficulty while maintaining raw performance.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911116115759210","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,148],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"harborframework.com/docs/running-t…","expanded_url":"https://harborframework.com/docs/running-tbench","url":"https://t.co/hmiBYh7VYF","indices":[125,148]}],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911120511373685","view_count":1241,"bookmark_count":1,"created_at":1762551500000,"favorite_count":17,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Interested in using Terminal-Bench-2.0 or submitting to our new leaderboard? Check out the Harbor docs for more information.\nhttps://t.co/hmiBYh7VYF","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911119328616903","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,265],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1233837766271569920","name":"Mike A. Merrill","screen_name":"Mike_A_Merrill","indices":[51,66]},{"id_str":"64333359","name":"Ludwig Schmidt","screen_name":"lschmidt3","indices":[121,131]},{"id_str":"168787972","name":"Andy Konwinski","screen_name":"andykonwinski","indices":[185,199]},{"id_str":"1854370033520324608","name":"Laude Institute","screen_name":"LaudeInstitute","indices":[217,232]}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911121924890876","view_count":1219,"bookmark_count":0,"created_at":1762551501000,"favorite_count":17,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"It's been a pleasure working alongside my co-lead, @mike_a_merrill, whose leadership rallied the team, and our advisors, @lschmidt3, who seeded the idea and guided our development, and @andykonwinski, who gave us the @LaudeInstitute mandate to \"ship your research.\"","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911120511373685","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,254],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/PcQz6FkVua","expanded_url":"https://x.com/alexgshaw/status/1986911123543916899/photo/1","id_str":"1986908910708781056","indices":[255,278],"media_key":"3_1986908910708781056","media_url_https":"https://pbs.twimg.com/media/G5LrII_a8AAi2EG.png","type":"photo","url":"https://t.co/PcQz6FkVua","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1822,"w":1292,"resize":"fit"},"medium":{"h":1200,"w":851,"resize":"fit"},"small":{"h":680,"w":482,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1822,"width":1292,"focus_rects":[{"x":0,"y":320,"w":1292,"h":724},{"x":0,"y":36,"w":1292,"h":1292},{"x":0,"y":0,"w":1292,"h":1473},{"x":319,"y":0,"w":911,"h":1822},{"x":0,"y":0,"w":1292,"h":1822}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908910708781056"}}}],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/PcQz6FkVua","expanded_url":"https://x.com/alexgshaw/status/1986911123543916899/photo/1","id_str":"1986908910708781056","indices":[255,278],"media_key":"3_1986908910708781056","media_url_https":"https://pbs.twimg.com/media/G5LrII_a8AAi2EG.png","type":"photo","url":"https://t.co/PcQz6FkVua","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1822,"w":1292,"resize":"fit"},"medium":{"h":1200,"w":851,"resize":"fit"},"small":{"h":680,"w":482,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1822,"width":1292,"focus_rects":[{"x":0,"y":320,"w":1292,"h":724},{"x":0,"y":36,"w":1292,"h":1292},{"x":0,"y":0,"w":1292,"h":1473},{"x":319,"y":0,"w":911,"h":1822},{"x":0,"y":0,"w":1292,"h":1822}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908910708781056"}}}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911123543916899","view_count":1424,"bookmark_count":0,"created_at":1762551501000,"favorite_count":19,"quote_count":1,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Additionally, Terminal-Bench wouldn’t be possible without its community. We’re so thankful to the over 1k members of our Discord who contributed and audited tasks, helped build and beta test Harbor, and made this such a fun project for everyone involved. https://t.co/PcQz6FkVua","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911121924890876","is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-09","value":1,"startTime":1762560000000,"endTime":1762646400000,"tweets":[{"bookmarked":false,"display_text_range":[0,75],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"33521530","name":"swyx","screen_name":"swyx","indices":[10,15]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1986989606320185472","quoted_status_permalink":{"url":"https://t.co/oyJSv9Xper","expanded":"https://twitter.com/latentspacepod/status/1986989606320185472","display":"x.com/latentspacepod…"},"retweeted":false,"fact_check":null,"id":"1987015010288353599","view_count":495,"bookmark_count":1,"created_at":1762576270000,"favorite_count":8,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1987015010288353599","full_text":"Thank you @swyx ! Was great to have you, and the live podcast was awesome 😁","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[16,63],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"284333988","name":"Logan Kilpatrick","screen_name":"OfficialLoganK","indices":[0,15]}]},"favorited":false,"in_reply_to_screen_name":"OfficialLoganK","lang":"en","retweeted":false,"fact_check":null,"id":"1986948823198146586","view_count":449,"bookmark_count":0,"created_at":1762560489000,"favorite_count":2,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@OfficialLoganK Looking forward to seeing where Gemini 3 lands!","in_reply_to_user_id_str":"284333988","in_reply_to_status_id_str":"1986914722235502858","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[17,37],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"993859188823601152","name":"Robert","screen_name":"skull8888888888","indices":[0,16]}]},"favorited":false,"in_reply_to_screen_name":"skull8888888888","lang":"en","retweeted":false,"fact_check":null,"id":"1986949187930616214","view_count":241,"bookmark_count":0,"created_at":1762560576000,"favorite_count":0,"quote_count":0,"reply_count":1,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@skull8888888888 DM me I'm interested","in_reply_to_user_id_str":"993859188823601152","in_reply_to_status_id_str":"1986927150415430137","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[13,137],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"harborframework.com/docs/task-diff…","expanded_url":"https://harborframework.com/docs/task-difference","url":"https://t.co/dAGxC918PM","indices":[114,137]}],"user_mentions":[{"id_str":"3787342814","name":"pash","screen_name":"pashmerepat","indices":[0,12]}]},"favorited":false,"in_reply_to_screen_name":"pashmerepat","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986949012445143197","view_count":319,"bookmark_count":0,"created_at":1762560534000,"favorite_count":3,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@pashmerepat Yes and here is a link that describes the differences in addition to the migration guide Mike shared https://t.co/dAGxC918PM","in_reply_to_user_id_str":"3787342814","in_reply_to_status_id_str":"1986913713233010728","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[10,40],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"633509372","name":"Monk Zero","screen_name":"NoCommas","indices":[0,9]}]},"favorited":false,"in_reply_to_screen_name":"NoCommas","lang":"en","retweeted":false,"fact_check":null,"id":"1986949254196437085","view_count":267,"bookmark_count":0,"created_at":1762560592000,"favorite_count":1,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@NoCommas Excited to see where you land!","in_reply_to_user_id_str":"633509372","in_reply_to_status_id_str":"1986926703550328905","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[23,38],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"2625021945","name":"potemin","screen_name":"poteminr","indices":[0,9]},{"id_str":"1353836358901501952","name":"Anthropic","screen_name":"AnthropicAI","indices":[10,22]}]},"favorited":false,"in_reply_to_screen_name":"poteminr","lang":"en","retweeted":false,"fact_check":null,"id":"1987008242187485219","view_count":157,"bookmark_count":0,"created_at":1762574656000,"favorite_count":1,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@poteminr @AnthropicAI You and me both","in_reply_to_user_id_str":"2625021945","in_reply_to_status_id_str":"1987002302088237511","is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-10","value":0,"startTime":1762646400000,"endTime":1762732800000,"tweets":[{"bookmarked":false,"display_text_range":[13,14],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1531565962641788936","name":"Dylan Allen-Arnegård","screen_name":"dbear_allen","indices":[0,12]}]},"favorited":false,"in_reply_to_screen_name":"dbear_allen","lang":"qme","retweeted":false,"fact_check":null,"id":"1987404815404769403","view_count":69,"bookmark_count":0,"created_at":1762669206000,"favorite_count":1,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@dbear_allen 🙏","in_reply_to_user_id_str":"1531565962641788936","in_reply_to_status_id_str":"1987400399704498635","is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-11","value":0,"startTime":1762732800000,"endTime":1762819200000,"tweets":[]},{"label":"2025-11-12","value":0,"startTime":1762819200000,"endTime":1762905600000,"tweets":[]},{"label":"2025-11-13","value":0,"startTime":1762905600000,"endTime":1762992000000,"tweets":[]},{"label":"2025-11-14","value":0,"startTime":1762992000000,"endTime":1763078400000,"tweets":[{"bookmarked":false,"display_text_range":[0,280],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1988372114257436810","quoted_status_permalink":{"url":"https://t.co/4o1qxAGQ7E","expanded":"https://twitter.com/warpdotdev/status/1988372114257436810","display":"x.com/warpdotdev/sta…"},"retweeted":false,"fact_check":null,"id":"1989092917798138326","view_count":334,"bookmark_count":0,"created_at":1763071681000,"favorite_count":8,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1989092917798138326","full_text":"Great to see Warp putting up the top score on Terminal-Bench 2.0 just days after release! Even more exciting to hear that they've already made improvements to their agent based on the results.\n\nUltimately, we hope that Terminal-Bench 2.0 accelerates model and agent development in the domain of text-based computer use, benefiting millions of users.","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-11-15","value":0,"startTime":1763078400000,"endTime":1763164800000,"tweets":[]},{"label":"2025-11-16","value":0,"startTime":1763164800000,"endTime":1763251200000,"tweets":[]},{"label":"2025-11-17","value":0,"startTime":1763251200000,"endTime":1763337600000,"tweets":[]}],"nbookmarks":[{"label":"2025-10-18","value":0,"startTime":1760659200000,"endTime":1760745600000,"tweets":[]},{"label":"2025-10-19","value":0,"startTime":1760745600000,"endTime":1760832000000,"tweets":[]},{"label":"2025-10-20","value":0,"startTime":1760832000000,"endTime":1760918400000,"tweets":[]},{"label":"2025-10-21","value":3,"startTime":1760918400000,"endTime":1761004800000,"tweets":[{"bookmarked":false,"display_text_range":[0,40],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1670869422733393966","name":"Latent.Space","screen_name":"latentspacepod","indices":[23,38]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1980330471163892201","quoted_status_permalink":{"url":"https://t.co/sMxAiyEOQ0","expanded":"https://twitter.com/Mike_A_Merrill/status/1980330471163892201","display":"x.com/Mike_A_Merrill…"},"retweeted":false,"fact_check":null,"id":"1980344520215785912","view_count":2315,"bookmark_count":3,"created_at":1760985901000,"favorite_count":18,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1980344520215785912","full_text":"Mike and I went on the @latentspacepod !","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-10-22","value":0,"startTime":1761004800000,"endTime":1761091200000,"tweets":[]},{"label":"2025-10-23","value":2,"startTime":1761091200000,"endTime":1761177600000,"tweets":[{"bookmarked":false,"display_text_range":[0,5],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1980264839806177286","quoted_status_permalink":{"url":"https://t.co/jl3wu1E3P8","expanded":"https://twitter.com/petergostev/status/1980264839806177286","display":"x.com/petergostev/st…"},"retweeted":false,"fact_check":null,"id":"1981121994533048588","view_count":453,"bookmark_count":0,"created_at":1761171265000,"favorite_count":5,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1981121994533048588","full_text":"Cool!","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,280],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1981013196195668143","quoted_status_permalink":{"url":"https://t.co/eWXL225mnb","expanded":"https://twitter.com/DimitrisPapail/status/1981013196195668143","display":"x.com/DimitrisPapail…"},"retweeted":false,"fact_check":null,"id":"1981062983821512920","view_count":782,"bookmark_count":2,"created_at":1761157196000,"favorite_count":8,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1981062983821512920","full_text":"One definition of AGI is the full automation of all computer work.\n\nA Terminal-Bench task is an instruction, container, and test executable. Agents are just programs that get executed in the container.\n\nSo Terminal-Bench can measure the automation of any computer task, i.e., AGI.","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-10-24","value":0,"startTime":1761177600000,"endTime":1761264000000,"tweets":[]},{"label":"2025-10-25","value":0,"startTime":1761264000000,"endTime":1761350400000,"tweets":[]},{"label":"2025-10-26","value":0,"startTime":1761350400000,"endTime":1761436800000,"tweets":[]},{"label":"2025-10-27","value":0,"startTime":1761436800000,"endTime":1761523200000,"tweets":[]},{"label":"2025-10-28","value":0,"startTime":1761523200000,"endTime":1761609600000,"tweets":[]},{"label":"2025-10-29","value":0,"startTime":1761609600000,"endTime":1761696000000,"tweets":[]},{"label":"2025-10-30","value":0,"startTime":1761696000000,"endTime":1761782400000,"tweets":[]},{"label":"2025-10-31","value":0,"startTime":1761782400000,"endTime":1761868800000,"tweets":[]},{"label":"2025-11-01","value":0,"startTime":1761868800000,"endTime":1761955200000,"tweets":[]},{"label":"2025-11-02","value":0,"startTime":1761955200000,"endTime":1762041600000,"tweets":[]},{"label":"2025-11-03","value":0,"startTime":1762041600000,"endTime":1762128000000,"tweets":[]},{"label":"2025-11-04","value":31,"startTime":1762128000000,"endTime":1762214400000,"tweets":[{"bookmarked":false,"display_text_range":[0,137],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"luma.com/h9kb99vd","expanded_url":"https://luma.com/h9kb99vd","url":"https://t.co/wLItVEHEMc","indices":[114,137]}],"user_mentions":[]},"favorited":true,"lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":true,"fact_check":null,"id":"1985427712484524433","view_count":13587,"bookmark_count":31,"created_at":1762197828000,"favorite_count":107,"quote_count":2,"reply_count":1,"retweet_count":4,"user_id_str":"1448787032486989825","conversation_id_str":"1985427712484524433","full_text":"We're releasing Terminal-Bench 2.0 this week! Come to our meetup on Thursday @ Databricks to get early access :)\n\nhttps://t.co/wLItVEHEMc","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-05","value":0,"startTime":1762214400000,"endTime":1762300800000,"tweets":[]},{"label":"2025-11-06","value":6,"startTime":1762300800000,"endTime":1762387200000,"tweets":[{"bookmarked":false,"display_text_range":[0,21],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1986093902122942700","quoted_status_permalink":{"url":"https://t.co/zhs56oNQSL","expanded":"https://twitter.com/jyangballin/status/1986093902122942700","display":"x.com/jyangballin/st…"},"retweeted":false,"fact_check":null,"id":"1986106642157703523","view_count":2599,"bookmark_count":6,"created_at":1762359698000,"favorite_count":15,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986106642157703523","full_text":"Very clever benchmark","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-11-07","value":0,"startTime":1762387200000,"endTime":1762473600000,"tweets":[]},{"label":"2025-11-08","value":138,"startTime":1762473600000,"endTime":1762560000000,"tweets":[{"bookmarked":false,"display_text_range":[0,235],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/YwmacS625Z","expanded_url":"https://x.com/alexgshaw/status/1986911106108211461/photo/1","id_str":"1986906599416602625","indices":[236,259],"media_key":"3_1986906599416602625","media_url_https":"https://pbs.twimg.com/media/G5LpBmwbIAElWXo.png","type":"photo","url":"https://t.co/YwmacS625Z","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1024,"w":2048,"resize":"fit"},"medium":{"h":600,"w":1200,"resize":"fit"},"small":{"h":340,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1024,"width":2048,"focus_rects":[{"x":0,"y":0,"w":1829,"h":1024},{"x":0,"y":0,"w":1024,"h":1024},{"x":0,"y":0,"w":898,"h":1024},{"x":0,"y":0,"w":512,"h":1024},{"x":0,"y":0,"w":2048,"h":1024}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986906599416602625"}}}],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/YwmacS625Z","expanded_url":"https://x.com/alexgshaw/status/1986911106108211461/photo/1","id_str":"1986906599416602625","indices":[236,259],"media_key":"3_1986906599416602625","media_url_https":"https://pbs.twimg.com/media/G5LpBmwbIAElWXo.png","type":"photo","url":"https://t.co/YwmacS625Z","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1024,"w":2048,"resize":"fit"},"medium":{"h":600,"w":1200,"resize":"fit"},"small":{"h":340,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1024,"width":2048,"focus_rects":[{"x":0,"y":0,"w":1829,"h":1024},{"x":0,"y":0,"w":1024,"h":1024},{"x":0,"y":0,"w":898,"h":1024},{"x":0,"y":0,"w":512,"h":1024},{"x":0,"y":0,"w":2048,"h":1024}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986906599416602625"}}}]},"favorited":false,"lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911106108211461","view_count":104511,"bookmark_count":117,"created_at":1762551497000,"favorite_count":344,"quote_count":35,"reply_count":23,"retweet_count":71,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Today, we’re announcing the next chapter of Terminal-Bench with two releases:\n\n1. Harbor, a new package for running sandboxed agent rollouts at scale\n2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification https://t.co/YwmacS625Z","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,274],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"loom.com/share/8c11f218…","expanded_url":"https://www.loom.com/share/8c11f218c9fc4674bd659146af435627","url":"https://t.co/x4gD2nHQxC","indices":[306,329]}],"user_mentions":[{"id_str":"1645813553365139462","name":"Daytona","screen_name":"daytonaio","indices":[182,192]},{"id_str":"1551987185372512263","name":"Modal","screen_name":"modal","indices":[197,203]},{"id_str":"1645813553365139462","name":"Daytona","screen_name":"daytonaio","indices":[182,192]},{"id_str":"1551987185372512263","name":"Modal","screen_name":"modal","indices":[197,203]},{"id_str":"1927771369611350016","name":"SWE-bench","screen_name":"SWEbench","indices":[294,303]}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911110017269764","view_count":2504,"bookmark_count":5,"created_at":1762551498000,"favorite_count":28,"quote_count":0,"reply_count":2,"retweet_count":6,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Just a few of the features I love about Harbor:\n- Evaluate any agent that can be installed and run autonomously\n- Scale up to thousands of concurrent containers using providers like @daytonaio and @modal\n- Generate rollouts for SFT and RL\n- Create your own benchmarks or use existing ones like @SWEbench \n\nhttps://t.co/x4gD2nHQxC","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911108117254546","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,209],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"harborframework.com","expanded_url":"https://harborframework.com/","url":"https://t.co/5kDZlvjhb7","indices":[186,209]}],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911108117254546","view_count":2828,"bookmark_count":7,"created_at":1762551497000,"favorite_count":27,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Harbor is the package we wish we had had while making Terminal-Bench. It’s for agent, model, and benchmark developers and researchers who want to evaluate and improve agents and models.\nhttps://t.co/5kDZlvjhb7","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911106108211461","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,194],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911111460089993","view_count":1578,"bookmark_count":0,"created_at":1762551498000,"favorite_count":16,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Harbor is also the official harness for Terminal-Bench 2.0. We used it to run tens of thousands of experiments in containerized environments while developing the latest version of the benchmark.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911110017269764","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,152],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911113209126988","view_count":1451,"bookmark_count":1,"created_at":1762551499000,"favorite_count":14,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"So why Terminal-Bench 2.0? We always knew that as model capabilities increased, we’d need to keep Terminal-Bench up to date with frontier capabilities.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911111460089993","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,277],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911114626802143","view_count":1535,"bookmark_count":0,"created_at":1762551499000,"favorite_count":17,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Terminal-Bench 2.0 consists of 89 hard tasks to test these capabilities. We aim to push the frontier with an increased emphasis on task quality. Each task received several hours of human and LM-assisted verification to ensure that tasks are (1) solvable, (2) realistic, and (3) well-specified. We’ll share more on how we did this in our upcoming preprint.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911113209126988","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,96],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/GIY0foKhNs","expanded_url":"https://x.com/alexgshaw/status/1986911116115759210/photo/1","id_str":"1986908092202950658","indices":[97,120],"media_key":"3_1986908092202950658","media_url_https":"https://pbs.twimg.com/media/G5LqYf0bIAIo56j.png","type":"photo","url":"https://t.co/GIY0foKhNs","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[{"x":507,"y":873,"h":103,"w":103}]},"medium":{"faces":[{"x":297,"y":511,"h":60,"w":60}]},"small":{"faces":[{"x":168,"y":290,"h":34,"w":34}]},"orig":{"faces":[{"x":637,"y":1097,"h":130,"w":130}]}},"sizes":{"large":{"h":1147,"w":2048,"resize":"fit"},"medium":{"h":672,"w":1200,"resize":"fit"},"small":{"h":381,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1440,"width":2572,"focus_rects":[{"x":0,"y":0,"w":2571,"h":1440},{"x":0,"y":0,"w":1440,"h":1440},{"x":0,"y":0,"w":1263,"h":1440},{"x":90,"y":0,"w":720,"h":1440},{"x":0,"y":0,"w":2572,"h":1440}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908092202950658"}}}],"symbols":[],"timestamps":[],"urls":[{"display_url":"tbench.ai/leaderboard","expanded_url":"https://www.tbench.ai/leaderboard","url":"https://t.co/vblfZF55Nd","indices":[73,96]}],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/GIY0foKhNs","expanded_url":"https://x.com/alexgshaw/status/1986911116115759210/photo/1","id_str":"1986908092202950658","indices":[97,120],"media_key":"3_1986908092202950658","media_url_https":"https://pbs.twimg.com/media/G5LqYf0bIAIo56j.png","type":"photo","url":"https://t.co/GIY0foKhNs","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[{"x":507,"y":873,"h":103,"w":103}]},"medium":{"faces":[{"x":297,"y":511,"h":60,"w":60}]},"small":{"faces":[{"x":168,"y":290,"h":34,"w":34}]},"orig":{"faces":[{"x":637,"y":1097,"h":130,"w":130}]}},"sizes":{"large":{"h":1147,"w":2048,"resize":"fit"},"medium":{"h":672,"w":1200,"resize":"fit"},"small":{"h":381,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1440,"width":2572,"focus_rects":[{"x":0,"y":0,"w":2571,"h":1440},{"x":0,"y":0,"w":1440,"h":1440},{"x":0,"y":0,"w":1263,"h":1440},{"x":90,"y":0,"w":720,"h":1440},{"x":0,"y":0,"w":2572,"h":1440}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908092202950658"}}}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911116115759210","view_count":1916,"bookmark_count":5,"created_at":1762551499000,"favorite_count":25,"quote_count":2,"reply_count":3,"retweet_count":3,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"At present, Codex CLI with GPT-5 sits at the top of our new leaderboard.\nhttps://t.co/vblfZF55Nd https://t.co/GIY0foKhNs","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911114626802143","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,272],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911119328616903","view_count":1335,"bookmark_count":2,"created_at":1762551500000,"favorite_count":16,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Astute Terminal-Bench fans may notice that SOTA performance is comparable to TB1.0 despite our claim that TB2.0 is harder. We believe that this is because task quality is substantially higher in the new benchmark. We have removed several misspecified or impossible tasks – increasing difficulty while maintaining raw performance.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911116115759210","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,148],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"harborframework.com/docs/running-t…","expanded_url":"https://harborframework.com/docs/running-tbench","url":"https://t.co/hmiBYh7VYF","indices":[125,148]}],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911120511373685","view_count":1241,"bookmark_count":1,"created_at":1762551500000,"favorite_count":17,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Interested in using Terminal-Bench-2.0 or submitting to our new leaderboard? Check out the Harbor docs for more information.\nhttps://t.co/hmiBYh7VYF","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911119328616903","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,265],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1233837766271569920","name":"Mike A. Merrill","screen_name":"Mike_A_Merrill","indices":[51,66]},{"id_str":"64333359","name":"Ludwig Schmidt","screen_name":"lschmidt3","indices":[121,131]},{"id_str":"168787972","name":"Andy Konwinski","screen_name":"andykonwinski","indices":[185,199]},{"id_str":"1854370033520324608","name":"Laude Institute","screen_name":"LaudeInstitute","indices":[217,232]}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911121924890876","view_count":1219,"bookmark_count":0,"created_at":1762551501000,"favorite_count":17,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"It's been a pleasure working alongside my co-lead, @mike_a_merrill, whose leadership rallied the team, and our advisors, @lschmidt3, who seeded the idea and guided our development, and @andykonwinski, who gave us the @LaudeInstitute mandate to \"ship your research.\"","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911120511373685","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,254],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/PcQz6FkVua","expanded_url":"https://x.com/alexgshaw/status/1986911123543916899/photo/1","id_str":"1986908910708781056","indices":[255,278],"media_key":"3_1986908910708781056","media_url_https":"https://pbs.twimg.com/media/G5LrII_a8AAi2EG.png","type":"photo","url":"https://t.co/PcQz6FkVua","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1822,"w":1292,"resize":"fit"},"medium":{"h":1200,"w":851,"resize":"fit"},"small":{"h":680,"w":482,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1822,"width":1292,"focus_rects":[{"x":0,"y":320,"w":1292,"h":724},{"x":0,"y":36,"w":1292,"h":1292},{"x":0,"y":0,"w":1292,"h":1473},{"x":319,"y":0,"w":911,"h":1822},{"x":0,"y":0,"w":1292,"h":1822}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908910708781056"}}}],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/PcQz6FkVua","expanded_url":"https://x.com/alexgshaw/status/1986911123543916899/photo/1","id_str":"1986908910708781056","indices":[255,278],"media_key":"3_1986908910708781056","media_url_https":"https://pbs.twimg.com/media/G5LrII_a8AAi2EG.png","type":"photo","url":"https://t.co/PcQz6FkVua","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1822,"w":1292,"resize":"fit"},"medium":{"h":1200,"w":851,"resize":"fit"},"small":{"h":680,"w":482,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1822,"width":1292,"focus_rects":[{"x":0,"y":320,"w":1292,"h":724},{"x":0,"y":36,"w":1292,"h":1292},{"x":0,"y":0,"w":1292,"h":1473},{"x":319,"y":0,"w":911,"h":1822},{"x":0,"y":0,"w":1292,"h":1822}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908910708781056"}}}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911123543916899","view_count":1424,"bookmark_count":0,"created_at":1762551501000,"favorite_count":19,"quote_count":1,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Additionally, Terminal-Bench wouldn’t be possible without its community. We’re so thankful to the over 1k members of our Discord who contributed and audited tasks, helped build and beta test Harbor, and made this such a fun project for everyone involved. https://t.co/PcQz6FkVua","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911121924890876","is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-09","value":1,"startTime":1762560000000,"endTime":1762646400000,"tweets":[{"bookmarked":false,"display_text_range":[0,75],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"33521530","name":"swyx","screen_name":"swyx","indices":[10,15]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1986989606320185472","quoted_status_permalink":{"url":"https://t.co/oyJSv9Xper","expanded":"https://twitter.com/latentspacepod/status/1986989606320185472","display":"x.com/latentspacepod…"},"retweeted":false,"fact_check":null,"id":"1987015010288353599","view_count":495,"bookmark_count":1,"created_at":1762576270000,"favorite_count":8,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1987015010288353599","full_text":"Thank you @swyx ! Was great to have you, and the live podcast was awesome 😁","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[16,63],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"284333988","name":"Logan Kilpatrick","screen_name":"OfficialLoganK","indices":[0,15]}]},"favorited":false,"in_reply_to_screen_name":"OfficialLoganK","lang":"en","retweeted":false,"fact_check":null,"id":"1986948823198146586","view_count":449,"bookmark_count":0,"created_at":1762560489000,"favorite_count":2,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@OfficialLoganK Looking forward to seeing where Gemini 3 lands!","in_reply_to_user_id_str":"284333988","in_reply_to_status_id_str":"1986914722235502858","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[17,37],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"993859188823601152","name":"Robert","screen_name":"skull8888888888","indices":[0,16]}]},"favorited":false,"in_reply_to_screen_name":"skull8888888888","lang":"en","retweeted":false,"fact_check":null,"id":"1986949187930616214","view_count":241,"bookmark_count":0,"created_at":1762560576000,"favorite_count":0,"quote_count":0,"reply_count":1,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@skull8888888888 DM me I'm interested","in_reply_to_user_id_str":"993859188823601152","in_reply_to_status_id_str":"1986927150415430137","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[13,137],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"harborframework.com/docs/task-diff…","expanded_url":"https://harborframework.com/docs/task-difference","url":"https://t.co/dAGxC918PM","indices":[114,137]}],"user_mentions":[{"id_str":"3787342814","name":"pash","screen_name":"pashmerepat","indices":[0,12]}]},"favorited":false,"in_reply_to_screen_name":"pashmerepat","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986949012445143197","view_count":319,"bookmark_count":0,"created_at":1762560534000,"favorite_count":3,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@pashmerepat Yes and here is a link that describes the differences in addition to the migration guide Mike shared https://t.co/dAGxC918PM","in_reply_to_user_id_str":"3787342814","in_reply_to_status_id_str":"1986913713233010728","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[10,40],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"633509372","name":"Monk Zero","screen_name":"NoCommas","indices":[0,9]}]},"favorited":false,"in_reply_to_screen_name":"NoCommas","lang":"en","retweeted":false,"fact_check":null,"id":"1986949254196437085","view_count":267,"bookmark_count":0,"created_at":1762560592000,"favorite_count":1,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@NoCommas Excited to see where you land!","in_reply_to_user_id_str":"633509372","in_reply_to_status_id_str":"1986926703550328905","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[23,38],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"2625021945","name":"potemin","screen_name":"poteminr","indices":[0,9]},{"id_str":"1353836358901501952","name":"Anthropic","screen_name":"AnthropicAI","indices":[10,22]}]},"favorited":false,"in_reply_to_screen_name":"poteminr","lang":"en","retweeted":false,"fact_check":null,"id":"1987008242187485219","view_count":157,"bookmark_count":0,"created_at":1762574656000,"favorite_count":1,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@poteminr @AnthropicAI You and me both","in_reply_to_user_id_str":"2625021945","in_reply_to_status_id_str":"1987002302088237511","is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-10","value":0,"startTime":1762646400000,"endTime":1762732800000,"tweets":[{"bookmarked":false,"display_text_range":[13,14],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1531565962641788936","name":"Dylan Allen-Arnegård","screen_name":"dbear_allen","indices":[0,12]}]},"favorited":false,"in_reply_to_screen_name":"dbear_allen","lang":"qme","retweeted":false,"fact_check":null,"id":"1987404815404769403","view_count":69,"bookmark_count":0,"created_at":1762669206000,"favorite_count":1,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@dbear_allen 🙏","in_reply_to_user_id_str":"1531565962641788936","in_reply_to_status_id_str":"1987400399704498635","is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-11","value":0,"startTime":1762732800000,"endTime":1762819200000,"tweets":[]},{"label":"2025-11-12","value":0,"startTime":1762819200000,"endTime":1762905600000,"tweets":[]},{"label":"2025-11-13","value":0,"startTime":1762905600000,"endTime":1762992000000,"tweets":[]},{"label":"2025-11-14","value":0,"startTime":1762992000000,"endTime":1763078400000,"tweets":[{"bookmarked":false,"display_text_range":[0,280],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1988372114257436810","quoted_status_permalink":{"url":"https://t.co/4o1qxAGQ7E","expanded":"https://twitter.com/warpdotdev/status/1988372114257436810","display":"x.com/warpdotdev/sta…"},"retweeted":false,"fact_check":null,"id":"1989092917798138326","view_count":334,"bookmark_count":0,"created_at":1763071681000,"favorite_count":8,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1989092917798138326","full_text":"Great to see Warp putting up the top score on Terminal-Bench 2.0 just days after release! Even more exciting to hear that they've already made improvements to their agent based on the results.\n\nUltimately, we hope that Terminal-Bench 2.0 accelerates model and agent development in the domain of text-based computer use, benefiting millions of users.","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-11-15","value":0,"startTime":1763078400000,"endTime":1763164800000,"tweets":[]},{"label":"2025-11-16","value":0,"startTime":1763164800000,"endTime":1763251200000,"tweets":[]},{"label":"2025-11-17","value":0,"startTime":1763251200000,"endTime":1763337600000,"tweets":[]}],"nretweets":[{"label":"2025-10-18","value":0,"startTime":1760659200000,"endTime":1760745600000,"tweets":[]},{"label":"2025-10-19","value":0,"startTime":1760745600000,"endTime":1760832000000,"tweets":[]},{"label":"2025-10-20","value":0,"startTime":1760832000000,"endTime":1760918400000,"tweets":[]},{"label":"2025-10-21","value":1,"startTime":1760918400000,"endTime":1761004800000,"tweets":[{"bookmarked":false,"display_text_range":[0,40],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1670869422733393966","name":"Latent.Space","screen_name":"latentspacepod","indices":[23,38]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1980330471163892201","quoted_status_permalink":{"url":"https://t.co/sMxAiyEOQ0","expanded":"https://twitter.com/Mike_A_Merrill/status/1980330471163892201","display":"x.com/Mike_A_Merrill…"},"retweeted":false,"fact_check":null,"id":"1980344520215785912","view_count":2315,"bookmark_count":3,"created_at":1760985901000,"favorite_count":18,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1980344520215785912","full_text":"Mike and I went on the @latentspacepod !","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-10-22","value":0,"startTime":1761004800000,"endTime":1761091200000,"tweets":[]},{"label":"2025-10-23","value":0,"startTime":1761091200000,"endTime":1761177600000,"tweets":[{"bookmarked":false,"display_text_range":[0,5],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1980264839806177286","quoted_status_permalink":{"url":"https://t.co/jl3wu1E3P8","expanded":"https://twitter.com/petergostev/status/1980264839806177286","display":"x.com/petergostev/st…"},"retweeted":false,"fact_check":null,"id":"1981121994533048588","view_count":453,"bookmark_count":0,"created_at":1761171265000,"favorite_count":5,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1981121994533048588","full_text":"Cool!","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,280],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1981013196195668143","quoted_status_permalink":{"url":"https://t.co/eWXL225mnb","expanded":"https://twitter.com/DimitrisPapail/status/1981013196195668143","display":"x.com/DimitrisPapail…"},"retweeted":false,"fact_check":null,"id":"1981062983821512920","view_count":782,"bookmark_count":2,"created_at":1761157196000,"favorite_count":8,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1981062983821512920","full_text":"One definition of AGI is the full automation of all computer work.\n\nA Terminal-Bench task is an instruction, container, and test executable. Agents are just programs that get executed in the container.\n\nSo Terminal-Bench can measure the automation of any computer task, i.e., AGI.","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-10-24","value":0,"startTime":1761177600000,"endTime":1761264000000,"tweets":[]},{"label":"2025-10-25","value":0,"startTime":1761264000000,"endTime":1761350400000,"tweets":[]},{"label":"2025-10-26","value":0,"startTime":1761350400000,"endTime":1761436800000,"tweets":[]},{"label":"2025-10-27","value":0,"startTime":1761436800000,"endTime":1761523200000,"tweets":[]},{"label":"2025-10-28","value":0,"startTime":1761523200000,"endTime":1761609600000,"tweets":[]},{"label":"2025-10-29","value":0,"startTime":1761609600000,"endTime":1761696000000,"tweets":[]},{"label":"2025-10-30","value":0,"startTime":1761696000000,"endTime":1761782400000,"tweets":[]},{"label":"2025-10-31","value":0,"startTime":1761782400000,"endTime":1761868800000,"tweets":[]},{"label":"2025-11-01","value":0,"startTime":1761868800000,"endTime":1761955200000,"tweets":[]},{"label":"2025-11-02","value":0,"startTime":1761955200000,"endTime":1762041600000,"tweets":[]},{"label":"2025-11-03","value":0,"startTime":1762041600000,"endTime":1762128000000,"tweets":[]},{"label":"2025-11-04","value":4,"startTime":1762128000000,"endTime":1762214400000,"tweets":[{"bookmarked":false,"display_text_range":[0,137],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"luma.com/h9kb99vd","expanded_url":"https://luma.com/h9kb99vd","url":"https://t.co/wLItVEHEMc","indices":[114,137]}],"user_mentions":[]},"favorited":true,"lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":true,"fact_check":null,"id":"1985427712484524433","view_count":13587,"bookmark_count":31,"created_at":1762197828000,"favorite_count":107,"quote_count":2,"reply_count":1,"retweet_count":4,"user_id_str":"1448787032486989825","conversation_id_str":"1985427712484524433","full_text":"We're releasing Terminal-Bench 2.0 this week! Come to our meetup on Thursday @ Databricks to get early access :)\n\nhttps://t.co/wLItVEHEMc","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-05","value":0,"startTime":1762214400000,"endTime":1762300800000,"tweets":[]},{"label":"2025-11-06","value":2,"startTime":1762300800000,"endTime":1762387200000,"tweets":[{"bookmarked":false,"display_text_range":[0,21],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1986093902122942700","quoted_status_permalink":{"url":"https://t.co/zhs56oNQSL","expanded":"https://twitter.com/jyangballin/status/1986093902122942700","display":"x.com/jyangballin/st…"},"retweeted":false,"fact_check":null,"id":"1986106642157703523","view_count":2599,"bookmark_count":6,"created_at":1762359698000,"favorite_count":15,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986106642157703523","full_text":"Very clever benchmark","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-11-07","value":0,"startTime":1762387200000,"endTime":1762473600000,"tweets":[]},{"label":"2025-11-08","value":92,"startTime":1762473600000,"endTime":1762560000000,"tweets":[{"bookmarked":false,"display_text_range":[0,235],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/YwmacS625Z","expanded_url":"https://x.com/alexgshaw/status/1986911106108211461/photo/1","id_str":"1986906599416602625","indices":[236,259],"media_key":"3_1986906599416602625","media_url_https":"https://pbs.twimg.com/media/G5LpBmwbIAElWXo.png","type":"photo","url":"https://t.co/YwmacS625Z","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1024,"w":2048,"resize":"fit"},"medium":{"h":600,"w":1200,"resize":"fit"},"small":{"h":340,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1024,"width":2048,"focus_rects":[{"x":0,"y":0,"w":1829,"h":1024},{"x":0,"y":0,"w":1024,"h":1024},{"x":0,"y":0,"w":898,"h":1024},{"x":0,"y":0,"w":512,"h":1024},{"x":0,"y":0,"w":2048,"h":1024}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986906599416602625"}}}],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/YwmacS625Z","expanded_url":"https://x.com/alexgshaw/status/1986911106108211461/photo/1","id_str":"1986906599416602625","indices":[236,259],"media_key":"3_1986906599416602625","media_url_https":"https://pbs.twimg.com/media/G5LpBmwbIAElWXo.png","type":"photo","url":"https://t.co/YwmacS625Z","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1024,"w":2048,"resize":"fit"},"medium":{"h":600,"w":1200,"resize":"fit"},"small":{"h":340,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1024,"width":2048,"focus_rects":[{"x":0,"y":0,"w":1829,"h":1024},{"x":0,"y":0,"w":1024,"h":1024},{"x":0,"y":0,"w":898,"h":1024},{"x":0,"y":0,"w":512,"h":1024},{"x":0,"y":0,"w":2048,"h":1024}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986906599416602625"}}}]},"favorited":false,"lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911106108211461","view_count":104511,"bookmark_count":117,"created_at":1762551497000,"favorite_count":344,"quote_count":35,"reply_count":23,"retweet_count":71,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Today, we’re announcing the next chapter of Terminal-Bench with two releases:\n\n1. Harbor, a new package for running sandboxed agent rollouts at scale\n2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification https://t.co/YwmacS625Z","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,274],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"loom.com/share/8c11f218…","expanded_url":"https://www.loom.com/share/8c11f218c9fc4674bd659146af435627","url":"https://t.co/x4gD2nHQxC","indices":[306,329]}],"user_mentions":[{"id_str":"1645813553365139462","name":"Daytona","screen_name":"daytonaio","indices":[182,192]},{"id_str":"1551987185372512263","name":"Modal","screen_name":"modal","indices":[197,203]},{"id_str":"1645813553365139462","name":"Daytona","screen_name":"daytonaio","indices":[182,192]},{"id_str":"1551987185372512263","name":"Modal","screen_name":"modal","indices":[197,203]},{"id_str":"1927771369611350016","name":"SWE-bench","screen_name":"SWEbench","indices":[294,303]}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911110017269764","view_count":2504,"bookmark_count":5,"created_at":1762551498000,"favorite_count":28,"quote_count":0,"reply_count":2,"retweet_count":6,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Just a few of the features I love about Harbor:\n- Evaluate any agent that can be installed and run autonomously\n- Scale up to thousands of concurrent containers using providers like @daytonaio and @modal\n- Generate rollouts for SFT and RL\n- Create your own benchmarks or use existing ones like @SWEbench \n\nhttps://t.co/x4gD2nHQxC","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911108117254546","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,209],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"harborframework.com","expanded_url":"https://harborframework.com/","url":"https://t.co/5kDZlvjhb7","indices":[186,209]}],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911108117254546","view_count":2828,"bookmark_count":7,"created_at":1762551497000,"favorite_count":27,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Harbor is the package we wish we had had while making Terminal-Bench. It’s for agent, model, and benchmark developers and researchers who want to evaluate and improve agents and models.\nhttps://t.co/5kDZlvjhb7","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911106108211461","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,194],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911111460089993","view_count":1578,"bookmark_count":0,"created_at":1762551498000,"favorite_count":16,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Harbor is also the official harness for Terminal-Bench 2.0. We used it to run tens of thousands of experiments in containerized environments while developing the latest version of the benchmark.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911110017269764","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,152],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911113209126988","view_count":1451,"bookmark_count":1,"created_at":1762551499000,"favorite_count":14,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"So why Terminal-Bench 2.0? We always knew that as model capabilities increased, we’d need to keep Terminal-Bench up to date with frontier capabilities.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911111460089993","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,277],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911114626802143","view_count":1535,"bookmark_count":0,"created_at":1762551499000,"favorite_count":17,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Terminal-Bench 2.0 consists of 89 hard tasks to test these capabilities. We aim to push the frontier with an increased emphasis on task quality. Each task received several hours of human and LM-assisted verification to ensure that tasks are (1) solvable, (2) realistic, and (3) well-specified. We’ll share more on how we did this in our upcoming preprint.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911113209126988","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,96],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/GIY0foKhNs","expanded_url":"https://x.com/alexgshaw/status/1986911116115759210/photo/1","id_str":"1986908092202950658","indices":[97,120],"media_key":"3_1986908092202950658","media_url_https":"https://pbs.twimg.com/media/G5LqYf0bIAIo56j.png","type":"photo","url":"https://t.co/GIY0foKhNs","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[{"x":507,"y":873,"h":103,"w":103}]},"medium":{"faces":[{"x":297,"y":511,"h":60,"w":60}]},"small":{"faces":[{"x":168,"y":290,"h":34,"w":34}]},"orig":{"faces":[{"x":637,"y":1097,"h":130,"w":130}]}},"sizes":{"large":{"h":1147,"w":2048,"resize":"fit"},"medium":{"h":672,"w":1200,"resize":"fit"},"small":{"h":381,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1440,"width":2572,"focus_rects":[{"x":0,"y":0,"w":2571,"h":1440},{"x":0,"y":0,"w":1440,"h":1440},{"x":0,"y":0,"w":1263,"h":1440},{"x":90,"y":0,"w":720,"h":1440},{"x":0,"y":0,"w":2572,"h":1440}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908092202950658"}}}],"symbols":[],"timestamps":[],"urls":[{"display_url":"tbench.ai/leaderboard","expanded_url":"https://www.tbench.ai/leaderboard","url":"https://t.co/vblfZF55Nd","indices":[73,96]}],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/GIY0foKhNs","expanded_url":"https://x.com/alexgshaw/status/1986911116115759210/photo/1","id_str":"1986908092202950658","indices":[97,120],"media_key":"3_1986908092202950658","media_url_https":"https://pbs.twimg.com/media/G5LqYf0bIAIo56j.png","type":"photo","url":"https://t.co/GIY0foKhNs","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[{"x":507,"y":873,"h":103,"w":103}]},"medium":{"faces":[{"x":297,"y":511,"h":60,"w":60}]},"small":{"faces":[{"x":168,"y":290,"h":34,"w":34}]},"orig":{"faces":[{"x":637,"y":1097,"h":130,"w":130}]}},"sizes":{"large":{"h":1147,"w":2048,"resize":"fit"},"medium":{"h":672,"w":1200,"resize":"fit"},"small":{"h":381,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1440,"width":2572,"focus_rects":[{"x":0,"y":0,"w":2571,"h":1440},{"x":0,"y":0,"w":1440,"h":1440},{"x":0,"y":0,"w":1263,"h":1440},{"x":90,"y":0,"w":720,"h":1440},{"x":0,"y":0,"w":2572,"h":1440}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908092202950658"}}}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911116115759210","view_count":1916,"bookmark_count":5,"created_at":1762551499000,"favorite_count":25,"quote_count":2,"reply_count":3,"retweet_count":3,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"At present, Codex CLI with GPT-5 sits at the top of our new leaderboard.\nhttps://t.co/vblfZF55Nd https://t.co/GIY0foKhNs","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911114626802143","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,272],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911119328616903","view_count":1335,"bookmark_count":2,"created_at":1762551500000,"favorite_count":16,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Astute Terminal-Bench fans may notice that SOTA performance is comparable to TB1.0 despite our claim that TB2.0 is harder. We believe that this is because task quality is substantially higher in the new benchmark. We have removed several misspecified or impossible tasks – increasing difficulty while maintaining raw performance.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911116115759210","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,148],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"harborframework.com/docs/running-t…","expanded_url":"https://harborframework.com/docs/running-tbench","url":"https://t.co/hmiBYh7VYF","indices":[125,148]}],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911120511373685","view_count":1241,"bookmark_count":1,"created_at":1762551500000,"favorite_count":17,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Interested in using Terminal-Bench-2.0 or submitting to our new leaderboard? Check out the Harbor docs for more information.\nhttps://t.co/hmiBYh7VYF","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911119328616903","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,265],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1233837766271569920","name":"Mike A. Merrill","screen_name":"Mike_A_Merrill","indices":[51,66]},{"id_str":"64333359","name":"Ludwig Schmidt","screen_name":"lschmidt3","indices":[121,131]},{"id_str":"168787972","name":"Andy Konwinski","screen_name":"andykonwinski","indices":[185,199]},{"id_str":"1854370033520324608","name":"Laude Institute","screen_name":"LaudeInstitute","indices":[217,232]}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911121924890876","view_count":1219,"bookmark_count":0,"created_at":1762551501000,"favorite_count":17,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"It's been a pleasure working alongside my co-lead, @mike_a_merrill, whose leadership rallied the team, and our advisors, @lschmidt3, who seeded the idea and guided our development, and @andykonwinski, who gave us the @LaudeInstitute mandate to \"ship your research.\"","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911120511373685","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,254],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/PcQz6FkVua","expanded_url":"https://x.com/alexgshaw/status/1986911123543916899/photo/1","id_str":"1986908910708781056","indices":[255,278],"media_key":"3_1986908910708781056","media_url_https":"https://pbs.twimg.com/media/G5LrII_a8AAi2EG.png","type":"photo","url":"https://t.co/PcQz6FkVua","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1822,"w":1292,"resize":"fit"},"medium":{"h":1200,"w":851,"resize":"fit"},"small":{"h":680,"w":482,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1822,"width":1292,"focus_rects":[{"x":0,"y":320,"w":1292,"h":724},{"x":0,"y":36,"w":1292,"h":1292},{"x":0,"y":0,"w":1292,"h":1473},{"x":319,"y":0,"w":911,"h":1822},{"x":0,"y":0,"w":1292,"h":1822}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908910708781056"}}}],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/PcQz6FkVua","expanded_url":"https://x.com/alexgshaw/status/1986911123543916899/photo/1","id_str":"1986908910708781056","indices":[255,278],"media_key":"3_1986908910708781056","media_url_https":"https://pbs.twimg.com/media/G5LrII_a8AAi2EG.png","type":"photo","url":"https://t.co/PcQz6FkVua","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1822,"w":1292,"resize":"fit"},"medium":{"h":1200,"w":851,"resize":"fit"},"small":{"h":680,"w":482,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1822,"width":1292,"focus_rects":[{"x":0,"y":320,"w":1292,"h":724},{"x":0,"y":36,"w":1292,"h":1292},{"x":0,"y":0,"w":1292,"h":1473},{"x":319,"y":0,"w":911,"h":1822},{"x":0,"y":0,"w":1292,"h":1822}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908910708781056"}}}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911123543916899","view_count":1424,"bookmark_count":0,"created_at":1762551501000,"favorite_count":19,"quote_count":1,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Additionally, Terminal-Bench wouldn’t be possible without its community. We’re so thankful to the over 1k members of our Discord who contributed and audited tasks, helped build and beta test Harbor, and made this such a fun project for everyone involved. https://t.co/PcQz6FkVua","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911121924890876","is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-09","value":0,"startTime":1762560000000,"endTime":1762646400000,"tweets":[{"bookmarked":false,"display_text_range":[0,75],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"33521530","name":"swyx","screen_name":"swyx","indices":[10,15]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1986989606320185472","quoted_status_permalink":{"url":"https://t.co/oyJSv9Xper","expanded":"https://twitter.com/latentspacepod/status/1986989606320185472","display":"x.com/latentspacepod…"},"retweeted":false,"fact_check":null,"id":"1987015010288353599","view_count":495,"bookmark_count":1,"created_at":1762576270000,"favorite_count":8,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1987015010288353599","full_text":"Thank you @swyx ! Was great to have you, and the live podcast was awesome 😁","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[16,63],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"284333988","name":"Logan Kilpatrick","screen_name":"OfficialLoganK","indices":[0,15]}]},"favorited":false,"in_reply_to_screen_name":"OfficialLoganK","lang":"en","retweeted":false,"fact_check":null,"id":"1986948823198146586","view_count":449,"bookmark_count":0,"created_at":1762560489000,"favorite_count":2,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@OfficialLoganK Looking forward to seeing where Gemini 3 lands!","in_reply_to_user_id_str":"284333988","in_reply_to_status_id_str":"1986914722235502858","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[17,37],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"993859188823601152","name":"Robert","screen_name":"skull8888888888","indices":[0,16]}]},"favorited":false,"in_reply_to_screen_name":"skull8888888888","lang":"en","retweeted":false,"fact_check":null,"id":"1986949187930616214","view_count":241,"bookmark_count":0,"created_at":1762560576000,"favorite_count":0,"quote_count":0,"reply_count":1,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@skull8888888888 DM me I'm interested","in_reply_to_user_id_str":"993859188823601152","in_reply_to_status_id_str":"1986927150415430137","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[13,137],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"harborframework.com/docs/task-diff…","expanded_url":"https://harborframework.com/docs/task-difference","url":"https://t.co/dAGxC918PM","indices":[114,137]}],"user_mentions":[{"id_str":"3787342814","name":"pash","screen_name":"pashmerepat","indices":[0,12]}]},"favorited":false,"in_reply_to_screen_name":"pashmerepat","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986949012445143197","view_count":319,"bookmark_count":0,"created_at":1762560534000,"favorite_count":3,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@pashmerepat Yes and here is a link that describes the differences in addition to the migration guide Mike shared https://t.co/dAGxC918PM","in_reply_to_user_id_str":"3787342814","in_reply_to_status_id_str":"1986913713233010728","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[10,40],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"633509372","name":"Monk Zero","screen_name":"NoCommas","indices":[0,9]}]},"favorited":false,"in_reply_to_screen_name":"NoCommas","lang":"en","retweeted":false,"fact_check":null,"id":"1986949254196437085","view_count":267,"bookmark_count":0,"created_at":1762560592000,"favorite_count":1,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@NoCommas Excited to see where you land!","in_reply_to_user_id_str":"633509372","in_reply_to_status_id_str":"1986926703550328905","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[23,38],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"2625021945","name":"potemin","screen_name":"poteminr","indices":[0,9]},{"id_str":"1353836358901501952","name":"Anthropic","screen_name":"AnthropicAI","indices":[10,22]}]},"favorited":false,"in_reply_to_screen_name":"poteminr","lang":"en","retweeted":false,"fact_check":null,"id":"1987008242187485219","view_count":157,"bookmark_count":0,"created_at":1762574656000,"favorite_count":1,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@poteminr @AnthropicAI You and me both","in_reply_to_user_id_str":"2625021945","in_reply_to_status_id_str":"1987002302088237511","is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-10","value":0,"startTime":1762646400000,"endTime":1762732800000,"tweets":[{"bookmarked":false,"display_text_range":[13,14],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1531565962641788936","name":"Dylan Allen-Arnegård","screen_name":"dbear_allen","indices":[0,12]}]},"favorited":false,"in_reply_to_screen_name":"dbear_allen","lang":"qme","retweeted":false,"fact_check":null,"id":"1987404815404769403","view_count":69,"bookmark_count":0,"created_at":1762669206000,"favorite_count":1,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@dbear_allen 🙏","in_reply_to_user_id_str":"1531565962641788936","in_reply_to_status_id_str":"1987400399704498635","is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-11","value":0,"startTime":1762732800000,"endTime":1762819200000,"tweets":[]},{"label":"2025-11-12","value":0,"startTime":1762819200000,"endTime":1762905600000,"tweets":[]},{"label":"2025-11-13","value":0,"startTime":1762905600000,"endTime":1762992000000,"tweets":[]},{"label":"2025-11-14","value":0,"startTime":1762992000000,"endTime":1763078400000,"tweets":[{"bookmarked":false,"display_text_range":[0,280],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1988372114257436810","quoted_status_permalink":{"url":"https://t.co/4o1qxAGQ7E","expanded":"https://twitter.com/warpdotdev/status/1988372114257436810","display":"x.com/warpdotdev/sta…"},"retweeted":false,"fact_check":null,"id":"1989092917798138326","view_count":334,"bookmark_count":0,"created_at":1763071681000,"favorite_count":8,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1989092917798138326","full_text":"Great to see Warp putting up the top score on Terminal-Bench 2.0 just days after release! Even more exciting to hear that they've already made improvements to their agent based on the results.\n\nUltimately, we hope that Terminal-Bench 2.0 accelerates model and agent development in the domain of text-based computer use, benefiting millions of users.","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-11-15","value":0,"startTime":1763078400000,"endTime":1763164800000,"tweets":[]},{"label":"2025-11-16","value":0,"startTime":1763164800000,"endTime":1763251200000,"tweets":[]},{"label":"2025-11-17","value":0,"startTime":1763251200000,"endTime":1763337600000,"tweets":[]}],"nlikes":[{"label":"2025-10-18","value":0,"startTime":1760659200000,"endTime":1760745600000,"tweets":[]},{"label":"2025-10-19","value":0,"startTime":1760745600000,"endTime":1760832000000,"tweets":[]},{"label":"2025-10-20","value":0,"startTime":1760832000000,"endTime":1760918400000,"tweets":[]},{"label":"2025-10-21","value":18,"startTime":1760918400000,"endTime":1761004800000,"tweets":[{"bookmarked":false,"display_text_range":[0,40],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1670869422733393966","name":"Latent.Space","screen_name":"latentspacepod","indices":[23,38]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1980330471163892201","quoted_status_permalink":{"url":"https://t.co/sMxAiyEOQ0","expanded":"https://twitter.com/Mike_A_Merrill/status/1980330471163892201","display":"x.com/Mike_A_Merrill…"},"retweeted":false,"fact_check":null,"id":"1980344520215785912","view_count":2315,"bookmark_count":3,"created_at":1760985901000,"favorite_count":18,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1980344520215785912","full_text":"Mike and I went on the @latentspacepod !","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-10-22","value":0,"startTime":1761004800000,"endTime":1761091200000,"tweets":[]},{"label":"2025-10-23","value":13,"startTime":1761091200000,"endTime":1761177600000,"tweets":[{"bookmarked":false,"display_text_range":[0,5],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1980264839806177286","quoted_status_permalink":{"url":"https://t.co/jl3wu1E3P8","expanded":"https://twitter.com/petergostev/status/1980264839806177286","display":"x.com/petergostev/st…"},"retweeted":false,"fact_check":null,"id":"1981121994533048588","view_count":453,"bookmark_count":0,"created_at":1761171265000,"favorite_count":5,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1981121994533048588","full_text":"Cool!","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,280],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1981013196195668143","quoted_status_permalink":{"url":"https://t.co/eWXL225mnb","expanded":"https://twitter.com/DimitrisPapail/status/1981013196195668143","display":"x.com/DimitrisPapail…"},"retweeted":false,"fact_check":null,"id":"1981062983821512920","view_count":782,"bookmark_count":2,"created_at":1761157196000,"favorite_count":8,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1981062983821512920","full_text":"One definition of AGI is the full automation of all computer work.\n\nA Terminal-Bench task is an instruction, container, and test executable. Agents are just programs that get executed in the container.\n\nSo Terminal-Bench can measure the automation of any computer task, i.e., AGI.","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-10-24","value":0,"startTime":1761177600000,"endTime":1761264000000,"tweets":[]},{"label":"2025-10-25","value":0,"startTime":1761264000000,"endTime":1761350400000,"tweets":[]},{"label":"2025-10-26","value":0,"startTime":1761350400000,"endTime":1761436800000,"tweets":[]},{"label":"2025-10-27","value":0,"startTime":1761436800000,"endTime":1761523200000,"tweets":[]},{"label":"2025-10-28","value":0,"startTime":1761523200000,"endTime":1761609600000,"tweets":[]},{"label":"2025-10-29","value":0,"startTime":1761609600000,"endTime":1761696000000,"tweets":[]},{"label":"2025-10-30","value":0,"startTime":1761696000000,"endTime":1761782400000,"tweets":[]},{"label":"2025-10-31","value":0,"startTime":1761782400000,"endTime":1761868800000,"tweets":[]},{"label":"2025-11-01","value":0,"startTime":1761868800000,"endTime":1761955200000,"tweets":[]},{"label":"2025-11-02","value":0,"startTime":1761955200000,"endTime":1762041600000,"tweets":[]},{"label":"2025-11-03","value":0,"startTime":1762041600000,"endTime":1762128000000,"tweets":[]},{"label":"2025-11-04","value":107,"startTime":1762128000000,"endTime":1762214400000,"tweets":[{"bookmarked":false,"display_text_range":[0,137],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"luma.com/h9kb99vd","expanded_url":"https://luma.com/h9kb99vd","url":"https://t.co/wLItVEHEMc","indices":[114,137]}],"user_mentions":[]},"favorited":true,"lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":true,"fact_check":null,"id":"1985427712484524433","view_count":13587,"bookmark_count":31,"created_at":1762197828000,"favorite_count":107,"quote_count":2,"reply_count":1,"retweet_count":4,"user_id_str":"1448787032486989825","conversation_id_str":"1985427712484524433","full_text":"We're releasing Terminal-Bench 2.0 this week! Come to our meetup on Thursday @ Databricks to get early access :)\n\nhttps://t.co/wLItVEHEMc","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-05","value":0,"startTime":1762214400000,"endTime":1762300800000,"tweets":[]},{"label":"2025-11-06","value":15,"startTime":1762300800000,"endTime":1762387200000,"tweets":[{"bookmarked":false,"display_text_range":[0,21],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1986093902122942700","quoted_status_permalink":{"url":"https://t.co/zhs56oNQSL","expanded":"https://twitter.com/jyangballin/status/1986093902122942700","display":"x.com/jyangballin/st…"},"retweeted":false,"fact_check":null,"id":"1986106642157703523","view_count":2599,"bookmark_count":6,"created_at":1762359698000,"favorite_count":15,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986106642157703523","full_text":"Very clever benchmark","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-11-07","value":0,"startTime":1762387200000,"endTime":1762473600000,"tweets":[]},{"label":"2025-11-08","value":540,"startTime":1762473600000,"endTime":1762560000000,"tweets":[{"bookmarked":false,"display_text_range":[0,235],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/YwmacS625Z","expanded_url":"https://x.com/alexgshaw/status/1986911106108211461/photo/1","id_str":"1986906599416602625","indices":[236,259],"media_key":"3_1986906599416602625","media_url_https":"https://pbs.twimg.com/media/G5LpBmwbIAElWXo.png","type":"photo","url":"https://t.co/YwmacS625Z","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1024,"w":2048,"resize":"fit"},"medium":{"h":600,"w":1200,"resize":"fit"},"small":{"h":340,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1024,"width":2048,"focus_rects":[{"x":0,"y":0,"w":1829,"h":1024},{"x":0,"y":0,"w":1024,"h":1024},{"x":0,"y":0,"w":898,"h":1024},{"x":0,"y":0,"w":512,"h":1024},{"x":0,"y":0,"w":2048,"h":1024}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986906599416602625"}}}],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/YwmacS625Z","expanded_url":"https://x.com/alexgshaw/status/1986911106108211461/photo/1","id_str":"1986906599416602625","indices":[236,259],"media_key":"3_1986906599416602625","media_url_https":"https://pbs.twimg.com/media/G5LpBmwbIAElWXo.png","type":"photo","url":"https://t.co/YwmacS625Z","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1024,"w":2048,"resize":"fit"},"medium":{"h":600,"w":1200,"resize":"fit"},"small":{"h":340,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1024,"width":2048,"focus_rects":[{"x":0,"y":0,"w":1829,"h":1024},{"x":0,"y":0,"w":1024,"h":1024},{"x":0,"y":0,"w":898,"h":1024},{"x":0,"y":0,"w":512,"h":1024},{"x":0,"y":0,"w":2048,"h":1024}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986906599416602625"}}}]},"favorited":false,"lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911106108211461","view_count":104511,"bookmark_count":117,"created_at":1762551497000,"favorite_count":344,"quote_count":35,"reply_count":23,"retweet_count":71,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Today, we’re announcing the next chapter of Terminal-Bench with two releases:\n\n1. Harbor, a new package for running sandboxed agent rollouts at scale\n2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification https://t.co/YwmacS625Z","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,274],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"loom.com/share/8c11f218…","expanded_url":"https://www.loom.com/share/8c11f218c9fc4674bd659146af435627","url":"https://t.co/x4gD2nHQxC","indices":[306,329]}],"user_mentions":[{"id_str":"1645813553365139462","name":"Daytona","screen_name":"daytonaio","indices":[182,192]},{"id_str":"1551987185372512263","name":"Modal","screen_name":"modal","indices":[197,203]},{"id_str":"1645813553365139462","name":"Daytona","screen_name":"daytonaio","indices":[182,192]},{"id_str":"1551987185372512263","name":"Modal","screen_name":"modal","indices":[197,203]},{"id_str":"1927771369611350016","name":"SWE-bench","screen_name":"SWEbench","indices":[294,303]}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911110017269764","view_count":2504,"bookmark_count":5,"created_at":1762551498000,"favorite_count":28,"quote_count":0,"reply_count":2,"retweet_count":6,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Just a few of the features I love about Harbor:\n- Evaluate any agent that can be installed and run autonomously\n- Scale up to thousands of concurrent containers using providers like @daytonaio and @modal\n- Generate rollouts for SFT and RL\n- Create your own benchmarks or use existing ones like @SWEbench \n\nhttps://t.co/x4gD2nHQxC","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911108117254546","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,209],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"harborframework.com","expanded_url":"https://harborframework.com/","url":"https://t.co/5kDZlvjhb7","indices":[186,209]}],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911108117254546","view_count":2828,"bookmark_count":7,"created_at":1762551497000,"favorite_count":27,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Harbor is the package we wish we had had while making Terminal-Bench. It’s for agent, model, and benchmark developers and researchers who want to evaluate and improve agents and models.\nhttps://t.co/5kDZlvjhb7","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911106108211461","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,194],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911111460089993","view_count":1578,"bookmark_count":0,"created_at":1762551498000,"favorite_count":16,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Harbor is also the official harness for Terminal-Bench 2.0. We used it to run tens of thousands of experiments in containerized environments while developing the latest version of the benchmark.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911110017269764","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,152],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911113209126988","view_count":1451,"bookmark_count":1,"created_at":1762551499000,"favorite_count":14,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"So why Terminal-Bench 2.0? We always knew that as model capabilities increased, we’d need to keep Terminal-Bench up to date with frontier capabilities.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911111460089993","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,277],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911114626802143","view_count":1535,"bookmark_count":0,"created_at":1762551499000,"favorite_count":17,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Terminal-Bench 2.0 consists of 89 hard tasks to test these capabilities. We aim to push the frontier with an increased emphasis on task quality. Each task received several hours of human and LM-assisted verification to ensure that tasks are (1) solvable, (2) realistic, and (3) well-specified. We’ll share more on how we did this in our upcoming preprint.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911113209126988","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,96],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/GIY0foKhNs","expanded_url":"https://x.com/alexgshaw/status/1986911116115759210/photo/1","id_str":"1986908092202950658","indices":[97,120],"media_key":"3_1986908092202950658","media_url_https":"https://pbs.twimg.com/media/G5LqYf0bIAIo56j.png","type":"photo","url":"https://t.co/GIY0foKhNs","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[{"x":507,"y":873,"h":103,"w":103}]},"medium":{"faces":[{"x":297,"y":511,"h":60,"w":60}]},"small":{"faces":[{"x":168,"y":290,"h":34,"w":34}]},"orig":{"faces":[{"x":637,"y":1097,"h":130,"w":130}]}},"sizes":{"large":{"h":1147,"w":2048,"resize":"fit"},"medium":{"h":672,"w":1200,"resize":"fit"},"small":{"h":381,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1440,"width":2572,"focus_rects":[{"x":0,"y":0,"w":2571,"h":1440},{"x":0,"y":0,"w":1440,"h":1440},{"x":0,"y":0,"w":1263,"h":1440},{"x":90,"y":0,"w":720,"h":1440},{"x":0,"y":0,"w":2572,"h":1440}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908092202950658"}}}],"symbols":[],"timestamps":[],"urls":[{"display_url":"tbench.ai/leaderboard","expanded_url":"https://www.tbench.ai/leaderboard","url":"https://t.co/vblfZF55Nd","indices":[73,96]}],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/GIY0foKhNs","expanded_url":"https://x.com/alexgshaw/status/1986911116115759210/photo/1","id_str":"1986908092202950658","indices":[97,120],"media_key":"3_1986908092202950658","media_url_https":"https://pbs.twimg.com/media/G5LqYf0bIAIo56j.png","type":"photo","url":"https://t.co/GIY0foKhNs","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[{"x":507,"y":873,"h":103,"w":103}]},"medium":{"faces":[{"x":297,"y":511,"h":60,"w":60}]},"small":{"faces":[{"x":168,"y":290,"h":34,"w":34}]},"orig":{"faces":[{"x":637,"y":1097,"h":130,"w":130}]}},"sizes":{"large":{"h":1147,"w":2048,"resize":"fit"},"medium":{"h":672,"w":1200,"resize":"fit"},"small":{"h":381,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1440,"width":2572,"focus_rects":[{"x":0,"y":0,"w":2571,"h":1440},{"x":0,"y":0,"w":1440,"h":1440},{"x":0,"y":0,"w":1263,"h":1440},{"x":90,"y":0,"w":720,"h":1440},{"x":0,"y":0,"w":2572,"h":1440}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908092202950658"}}}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911116115759210","view_count":1916,"bookmark_count":5,"created_at":1762551499000,"favorite_count":25,"quote_count":2,"reply_count":3,"retweet_count":3,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"At present, Codex CLI with GPT-5 sits at the top of our new leaderboard.\nhttps://t.co/vblfZF55Nd https://t.co/GIY0foKhNs","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911114626802143","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,272],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911119328616903","view_count":1335,"bookmark_count":2,"created_at":1762551500000,"favorite_count":16,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Astute Terminal-Bench fans may notice that SOTA performance is comparable to TB1.0 despite our claim that TB2.0 is harder. We believe that this is because task quality is substantially higher in the new benchmark. We have removed several misspecified or impossible tasks – increasing difficulty while maintaining raw performance.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911116115759210","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,148],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"harborframework.com/docs/running-t…","expanded_url":"https://harborframework.com/docs/running-tbench","url":"https://t.co/hmiBYh7VYF","indices":[125,148]}],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911120511373685","view_count":1241,"bookmark_count":1,"created_at":1762551500000,"favorite_count":17,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Interested in using Terminal-Bench-2.0 or submitting to our new leaderboard? Check out the Harbor docs for more information.\nhttps://t.co/hmiBYh7VYF","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911119328616903","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,265],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1233837766271569920","name":"Mike A. Merrill","screen_name":"Mike_A_Merrill","indices":[51,66]},{"id_str":"64333359","name":"Ludwig Schmidt","screen_name":"lschmidt3","indices":[121,131]},{"id_str":"168787972","name":"Andy Konwinski","screen_name":"andykonwinski","indices":[185,199]},{"id_str":"1854370033520324608","name":"Laude Institute","screen_name":"LaudeInstitute","indices":[217,232]}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911121924890876","view_count":1219,"bookmark_count":0,"created_at":1762551501000,"favorite_count":17,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"It's been a pleasure working alongside my co-lead, @mike_a_merrill, whose leadership rallied the team, and our advisors, @lschmidt3, who seeded the idea and guided our development, and @andykonwinski, who gave us the @LaudeInstitute mandate to \"ship your research.\"","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911120511373685","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,254],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/PcQz6FkVua","expanded_url":"https://x.com/alexgshaw/status/1986911123543916899/photo/1","id_str":"1986908910708781056","indices":[255,278],"media_key":"3_1986908910708781056","media_url_https":"https://pbs.twimg.com/media/G5LrII_a8AAi2EG.png","type":"photo","url":"https://t.co/PcQz6FkVua","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1822,"w":1292,"resize":"fit"},"medium":{"h":1200,"w":851,"resize":"fit"},"small":{"h":680,"w":482,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1822,"width":1292,"focus_rects":[{"x":0,"y":320,"w":1292,"h":724},{"x":0,"y":36,"w":1292,"h":1292},{"x":0,"y":0,"w":1292,"h":1473},{"x":319,"y":0,"w":911,"h":1822},{"x":0,"y":0,"w":1292,"h":1822}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908910708781056"}}}],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/PcQz6FkVua","expanded_url":"https://x.com/alexgshaw/status/1986911123543916899/photo/1","id_str":"1986908910708781056","indices":[255,278],"media_key":"3_1986908910708781056","media_url_https":"https://pbs.twimg.com/media/G5LrII_a8AAi2EG.png","type":"photo","url":"https://t.co/PcQz6FkVua","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1822,"w":1292,"resize":"fit"},"medium":{"h":1200,"w":851,"resize":"fit"},"small":{"h":680,"w":482,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1822,"width":1292,"focus_rects":[{"x":0,"y":320,"w":1292,"h":724},{"x":0,"y":36,"w":1292,"h":1292},{"x":0,"y":0,"w":1292,"h":1473},{"x":319,"y":0,"w":911,"h":1822},{"x":0,"y":0,"w":1292,"h":1822}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908910708781056"}}}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911123543916899","view_count":1424,"bookmark_count":0,"created_at":1762551501000,"favorite_count":19,"quote_count":1,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Additionally, Terminal-Bench wouldn’t be possible without its community. We’re so thankful to the over 1k members of our Discord who contributed and audited tasks, helped build and beta test Harbor, and made this such a fun project for everyone involved. https://t.co/PcQz6FkVua","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911121924890876","is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-09","value":15,"startTime":1762560000000,"endTime":1762646400000,"tweets":[{"bookmarked":false,"display_text_range":[0,75],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"33521530","name":"swyx","screen_name":"swyx","indices":[10,15]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1986989606320185472","quoted_status_permalink":{"url":"https://t.co/oyJSv9Xper","expanded":"https://twitter.com/latentspacepod/status/1986989606320185472","display":"x.com/latentspacepod…"},"retweeted":false,"fact_check":null,"id":"1987015010288353599","view_count":495,"bookmark_count":1,"created_at":1762576270000,"favorite_count":8,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1987015010288353599","full_text":"Thank you @swyx ! Was great to have you, and the live podcast was awesome 😁","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[16,63],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"284333988","name":"Logan Kilpatrick","screen_name":"OfficialLoganK","indices":[0,15]}]},"favorited":false,"in_reply_to_screen_name":"OfficialLoganK","lang":"en","retweeted":false,"fact_check":null,"id":"1986948823198146586","view_count":449,"bookmark_count":0,"created_at":1762560489000,"favorite_count":2,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@OfficialLoganK Looking forward to seeing where Gemini 3 lands!","in_reply_to_user_id_str":"284333988","in_reply_to_status_id_str":"1986914722235502858","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[17,37],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"993859188823601152","name":"Robert","screen_name":"skull8888888888","indices":[0,16]}]},"favorited":false,"in_reply_to_screen_name":"skull8888888888","lang":"en","retweeted":false,"fact_check":null,"id":"1986949187930616214","view_count":241,"bookmark_count":0,"created_at":1762560576000,"favorite_count":0,"quote_count":0,"reply_count":1,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@skull8888888888 DM me I'm interested","in_reply_to_user_id_str":"993859188823601152","in_reply_to_status_id_str":"1986927150415430137","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[13,137],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"harborframework.com/docs/task-diff…","expanded_url":"https://harborframework.com/docs/task-difference","url":"https://t.co/dAGxC918PM","indices":[114,137]}],"user_mentions":[{"id_str":"3787342814","name":"pash","screen_name":"pashmerepat","indices":[0,12]}]},"favorited":false,"in_reply_to_screen_name":"pashmerepat","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986949012445143197","view_count":319,"bookmark_count":0,"created_at":1762560534000,"favorite_count":3,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@pashmerepat Yes and here is a link that describes the differences in addition to the migration guide Mike shared https://t.co/dAGxC918PM","in_reply_to_user_id_str":"3787342814","in_reply_to_status_id_str":"1986913713233010728","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[10,40],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"633509372","name":"Monk Zero","screen_name":"NoCommas","indices":[0,9]}]},"favorited":false,"in_reply_to_screen_name":"NoCommas","lang":"en","retweeted":false,"fact_check":null,"id":"1986949254196437085","view_count":267,"bookmark_count":0,"created_at":1762560592000,"favorite_count":1,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@NoCommas Excited to see where you land!","in_reply_to_user_id_str":"633509372","in_reply_to_status_id_str":"1986926703550328905","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[23,38],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"2625021945","name":"potemin","screen_name":"poteminr","indices":[0,9]},{"id_str":"1353836358901501952","name":"Anthropic","screen_name":"AnthropicAI","indices":[10,22]}]},"favorited":false,"in_reply_to_screen_name":"poteminr","lang":"en","retweeted":false,"fact_check":null,"id":"1987008242187485219","view_count":157,"bookmark_count":0,"created_at":1762574656000,"favorite_count":1,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@poteminr @AnthropicAI You and me both","in_reply_to_user_id_str":"2625021945","in_reply_to_status_id_str":"1987002302088237511","is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-10","value":1,"startTime":1762646400000,"endTime":1762732800000,"tweets":[{"bookmarked":false,"display_text_range":[13,14],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1531565962641788936","name":"Dylan Allen-Arnegård","screen_name":"dbear_allen","indices":[0,12]}]},"favorited":false,"in_reply_to_screen_name":"dbear_allen","lang":"qme","retweeted":false,"fact_check":null,"id":"1987404815404769403","view_count":69,"bookmark_count":0,"created_at":1762669206000,"favorite_count":1,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@dbear_allen 🙏","in_reply_to_user_id_str":"1531565962641788936","in_reply_to_status_id_str":"1987400399704498635","is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-11","value":0,"startTime":1762732800000,"endTime":1762819200000,"tweets":[]},{"label":"2025-11-12","value":0,"startTime":1762819200000,"endTime":1762905600000,"tweets":[]},{"label":"2025-11-13","value":0,"startTime":1762905600000,"endTime":1762992000000,"tweets":[]},{"label":"2025-11-14","value":8,"startTime":1762992000000,"endTime":1763078400000,"tweets":[{"bookmarked":false,"display_text_range":[0,280],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1988372114257436810","quoted_status_permalink":{"url":"https://t.co/4o1qxAGQ7E","expanded":"https://twitter.com/warpdotdev/status/1988372114257436810","display":"x.com/warpdotdev/sta…"},"retweeted":false,"fact_check":null,"id":"1989092917798138326","view_count":334,"bookmark_count":0,"created_at":1763071681000,"favorite_count":8,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1989092917798138326","full_text":"Great to see Warp putting up the top score on Terminal-Bench 2.0 just days after release! Even more exciting to hear that they've already made improvements to their agent based on the results.\n\nUltimately, we hope that Terminal-Bench 2.0 accelerates model and agent development in the domain of text-based computer use, benefiting millions of users.","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-11-15","value":0,"startTime":1763078400000,"endTime":1763164800000,"tweets":[]},{"label":"2025-11-16","value":0,"startTime":1763164800000,"endTime":1763251200000,"tweets":[]},{"label":"2025-11-17","value":0,"startTime":1763251200000,"endTime":1763337600000,"tweets":[]}],"nviews":[{"label":"2025-10-18","value":0,"startTime":1760659200000,"endTime":1760745600000,"tweets":[]},{"label":"2025-10-19","value":0,"startTime":1760745600000,"endTime":1760832000000,"tweets":[]},{"label":"2025-10-20","value":0,"startTime":1760832000000,"endTime":1760918400000,"tweets":[]},{"label":"2025-10-21","value":2315,"startTime":1760918400000,"endTime":1761004800000,"tweets":[{"bookmarked":false,"display_text_range":[0,40],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1670869422733393966","name":"Latent.Space","screen_name":"latentspacepod","indices":[23,38]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1980330471163892201","quoted_status_permalink":{"url":"https://t.co/sMxAiyEOQ0","expanded":"https://twitter.com/Mike_A_Merrill/status/1980330471163892201","display":"x.com/Mike_A_Merrill…"},"retweeted":false,"fact_check":null,"id":"1980344520215785912","view_count":2315,"bookmark_count":3,"created_at":1760985901000,"favorite_count":18,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1980344520215785912","full_text":"Mike and I went on the @latentspacepod !","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-10-22","value":0,"startTime":1761004800000,"endTime":1761091200000,"tweets":[]},{"label":"2025-10-23","value":1235,"startTime":1761091200000,"endTime":1761177600000,"tweets":[{"bookmarked":false,"display_text_range":[0,5],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1980264839806177286","quoted_status_permalink":{"url":"https://t.co/jl3wu1E3P8","expanded":"https://twitter.com/petergostev/status/1980264839806177286","display":"x.com/petergostev/st…"},"retweeted":false,"fact_check":null,"id":"1981121994533048588","view_count":453,"bookmark_count":0,"created_at":1761171265000,"favorite_count":5,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1981121994533048588","full_text":"Cool!","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,280],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1981013196195668143","quoted_status_permalink":{"url":"https://t.co/eWXL225mnb","expanded":"https://twitter.com/DimitrisPapail/status/1981013196195668143","display":"x.com/DimitrisPapail…"},"retweeted":false,"fact_check":null,"id":"1981062983821512920","view_count":782,"bookmark_count":2,"created_at":1761157196000,"favorite_count":8,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1981062983821512920","full_text":"One definition of AGI is the full automation of all computer work.\n\nA Terminal-Bench task is an instruction, container, and test executable. Agents are just programs that get executed in the container.\n\nSo Terminal-Bench can measure the automation of any computer task, i.e., AGI.","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-10-24","value":0,"startTime":1761177600000,"endTime":1761264000000,"tweets":[]},{"label":"2025-10-25","value":0,"startTime":1761264000000,"endTime":1761350400000,"tweets":[]},{"label":"2025-10-26","value":0,"startTime":1761350400000,"endTime":1761436800000,"tweets":[]},{"label":"2025-10-27","value":0,"startTime":1761436800000,"endTime":1761523200000,"tweets":[]},{"label":"2025-10-28","value":0,"startTime":1761523200000,"endTime":1761609600000,"tweets":[]},{"label":"2025-10-29","value":0,"startTime":1761609600000,"endTime":1761696000000,"tweets":[]},{"label":"2025-10-30","value":0,"startTime":1761696000000,"endTime":1761782400000,"tweets":[]},{"label":"2025-10-31","value":0,"startTime":1761782400000,"endTime":1761868800000,"tweets":[]},{"label":"2025-11-01","value":0,"startTime":1761868800000,"endTime":1761955200000,"tweets":[]},{"label":"2025-11-02","value":0,"startTime":1761955200000,"endTime":1762041600000,"tweets":[]},{"label":"2025-11-03","value":0,"startTime":1762041600000,"endTime":1762128000000,"tweets":[]},{"label":"2025-11-04","value":13587,"startTime":1762128000000,"endTime":1762214400000,"tweets":[{"bookmarked":false,"display_text_range":[0,137],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"luma.com/h9kb99vd","expanded_url":"https://luma.com/h9kb99vd","url":"https://t.co/wLItVEHEMc","indices":[114,137]}],"user_mentions":[]},"favorited":true,"lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":true,"fact_check":null,"id":"1985427712484524433","view_count":13587,"bookmark_count":31,"created_at":1762197828000,"favorite_count":107,"quote_count":2,"reply_count":1,"retweet_count":4,"user_id_str":"1448787032486989825","conversation_id_str":"1985427712484524433","full_text":"We're releasing Terminal-Bench 2.0 this week! Come to our meetup on Thursday @ Databricks to get early access :)\n\nhttps://t.co/wLItVEHEMc","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-05","value":0,"startTime":1762214400000,"endTime":1762300800000,"tweets":[]},{"label":"2025-11-06","value":2599,"startTime":1762300800000,"endTime":1762387200000,"tweets":[{"bookmarked":false,"display_text_range":[0,21],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1986093902122942700","quoted_status_permalink":{"url":"https://t.co/zhs56oNQSL","expanded":"https://twitter.com/jyangballin/status/1986093902122942700","display":"x.com/jyangballin/st…"},"retweeted":false,"fact_check":null,"id":"1986106642157703523","view_count":2599,"bookmark_count":6,"created_at":1762359698000,"favorite_count":15,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986106642157703523","full_text":"Very clever benchmark","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-11-07","value":0,"startTime":1762387200000,"endTime":1762473600000,"tweets":[]},{"label":"2025-11-08","value":121542,"startTime":1762473600000,"endTime":1762560000000,"tweets":[{"bookmarked":false,"display_text_range":[0,235],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/YwmacS625Z","expanded_url":"https://x.com/alexgshaw/status/1986911106108211461/photo/1","id_str":"1986906599416602625","indices":[236,259],"media_key":"3_1986906599416602625","media_url_https":"https://pbs.twimg.com/media/G5LpBmwbIAElWXo.png","type":"photo","url":"https://t.co/YwmacS625Z","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1024,"w":2048,"resize":"fit"},"medium":{"h":600,"w":1200,"resize":"fit"},"small":{"h":340,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1024,"width":2048,"focus_rects":[{"x":0,"y":0,"w":1829,"h":1024},{"x":0,"y":0,"w":1024,"h":1024},{"x":0,"y":0,"w":898,"h":1024},{"x":0,"y":0,"w":512,"h":1024},{"x":0,"y":0,"w":2048,"h":1024}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986906599416602625"}}}],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/YwmacS625Z","expanded_url":"https://x.com/alexgshaw/status/1986911106108211461/photo/1","id_str":"1986906599416602625","indices":[236,259],"media_key":"3_1986906599416602625","media_url_https":"https://pbs.twimg.com/media/G5LpBmwbIAElWXo.png","type":"photo","url":"https://t.co/YwmacS625Z","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1024,"w":2048,"resize":"fit"},"medium":{"h":600,"w":1200,"resize":"fit"},"small":{"h":340,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1024,"width":2048,"focus_rects":[{"x":0,"y":0,"w":1829,"h":1024},{"x":0,"y":0,"w":1024,"h":1024},{"x":0,"y":0,"w":898,"h":1024},{"x":0,"y":0,"w":512,"h":1024},{"x":0,"y":0,"w":2048,"h":1024}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986906599416602625"}}}]},"favorited":false,"lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911106108211461","view_count":104511,"bookmark_count":117,"created_at":1762551497000,"favorite_count":344,"quote_count":35,"reply_count":23,"retweet_count":71,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Today, we’re announcing the next chapter of Terminal-Bench with two releases:\n\n1. Harbor, a new package for running sandboxed agent rollouts at scale\n2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification https://t.co/YwmacS625Z","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,274],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"loom.com/share/8c11f218…","expanded_url":"https://www.loom.com/share/8c11f218c9fc4674bd659146af435627","url":"https://t.co/x4gD2nHQxC","indices":[306,329]}],"user_mentions":[{"id_str":"1645813553365139462","name":"Daytona","screen_name":"daytonaio","indices":[182,192]},{"id_str":"1551987185372512263","name":"Modal","screen_name":"modal","indices":[197,203]},{"id_str":"1645813553365139462","name":"Daytona","screen_name":"daytonaio","indices":[182,192]},{"id_str":"1551987185372512263","name":"Modal","screen_name":"modal","indices":[197,203]},{"id_str":"1927771369611350016","name":"SWE-bench","screen_name":"SWEbench","indices":[294,303]}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911110017269764","view_count":2504,"bookmark_count":5,"created_at":1762551498000,"favorite_count":28,"quote_count":0,"reply_count":2,"retweet_count":6,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Just a few of the features I love about Harbor:\n- Evaluate any agent that can be installed and run autonomously\n- Scale up to thousands of concurrent containers using providers like @daytonaio and @modal\n- Generate rollouts for SFT and RL\n- Create your own benchmarks or use existing ones like @SWEbench \n\nhttps://t.co/x4gD2nHQxC","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911108117254546","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,209],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"harborframework.com","expanded_url":"https://harborframework.com/","url":"https://t.co/5kDZlvjhb7","indices":[186,209]}],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911108117254546","view_count":2828,"bookmark_count":7,"created_at":1762551497000,"favorite_count":27,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Harbor is the package we wish we had had while making Terminal-Bench. It’s for agent, model, and benchmark developers and researchers who want to evaluate and improve agents and models.\nhttps://t.co/5kDZlvjhb7","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911106108211461","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,194],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911111460089993","view_count":1578,"bookmark_count":0,"created_at":1762551498000,"favorite_count":16,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Harbor is also the official harness for Terminal-Bench 2.0. We used it to run tens of thousands of experiments in containerized environments while developing the latest version of the benchmark.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911110017269764","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,152],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911113209126988","view_count":1451,"bookmark_count":1,"created_at":1762551499000,"favorite_count":14,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"So why Terminal-Bench 2.0? We always knew that as model capabilities increased, we’d need to keep Terminal-Bench up to date with frontier capabilities.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911111460089993","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,277],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911114626802143","view_count":1535,"bookmark_count":0,"created_at":1762551499000,"favorite_count":17,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Terminal-Bench 2.0 consists of 89 hard tasks to test these capabilities. We aim to push the frontier with an increased emphasis on task quality. Each task received several hours of human and LM-assisted verification to ensure that tasks are (1) solvable, (2) realistic, and (3) well-specified. We’ll share more on how we did this in our upcoming preprint.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911113209126988","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,96],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/GIY0foKhNs","expanded_url":"https://x.com/alexgshaw/status/1986911116115759210/photo/1","id_str":"1986908092202950658","indices":[97,120],"media_key":"3_1986908092202950658","media_url_https":"https://pbs.twimg.com/media/G5LqYf0bIAIo56j.png","type":"photo","url":"https://t.co/GIY0foKhNs","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[{"x":507,"y":873,"h":103,"w":103}]},"medium":{"faces":[{"x":297,"y":511,"h":60,"w":60}]},"small":{"faces":[{"x":168,"y":290,"h":34,"w":34}]},"orig":{"faces":[{"x":637,"y":1097,"h":130,"w":130}]}},"sizes":{"large":{"h":1147,"w":2048,"resize":"fit"},"medium":{"h":672,"w":1200,"resize":"fit"},"small":{"h":381,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1440,"width":2572,"focus_rects":[{"x":0,"y":0,"w":2571,"h":1440},{"x":0,"y":0,"w":1440,"h":1440},{"x":0,"y":0,"w":1263,"h":1440},{"x":90,"y":0,"w":720,"h":1440},{"x":0,"y":0,"w":2572,"h":1440}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908092202950658"}}}],"symbols":[],"timestamps":[],"urls":[{"display_url":"tbench.ai/leaderboard","expanded_url":"https://www.tbench.ai/leaderboard","url":"https://t.co/vblfZF55Nd","indices":[73,96]}],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/GIY0foKhNs","expanded_url":"https://x.com/alexgshaw/status/1986911116115759210/photo/1","id_str":"1986908092202950658","indices":[97,120],"media_key":"3_1986908092202950658","media_url_https":"https://pbs.twimg.com/media/G5LqYf0bIAIo56j.png","type":"photo","url":"https://t.co/GIY0foKhNs","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[{"x":507,"y":873,"h":103,"w":103}]},"medium":{"faces":[{"x":297,"y":511,"h":60,"w":60}]},"small":{"faces":[{"x":168,"y":290,"h":34,"w":34}]},"orig":{"faces":[{"x":637,"y":1097,"h":130,"w":130}]}},"sizes":{"large":{"h":1147,"w":2048,"resize":"fit"},"medium":{"h":672,"w":1200,"resize":"fit"},"small":{"h":381,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1440,"width":2572,"focus_rects":[{"x":0,"y":0,"w":2571,"h":1440},{"x":0,"y":0,"w":1440,"h":1440},{"x":0,"y":0,"w":1263,"h":1440},{"x":90,"y":0,"w":720,"h":1440},{"x":0,"y":0,"w":2572,"h":1440}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908092202950658"}}}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911116115759210","view_count":1916,"bookmark_count":5,"created_at":1762551499000,"favorite_count":25,"quote_count":2,"reply_count":3,"retweet_count":3,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"At present, Codex CLI with GPT-5 sits at the top of our new leaderboard.\nhttps://t.co/vblfZF55Nd https://t.co/GIY0foKhNs","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911114626802143","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,272],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911119328616903","view_count":1335,"bookmark_count":2,"created_at":1762551500000,"favorite_count":16,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Astute Terminal-Bench fans may notice that SOTA performance is comparable to TB1.0 despite our claim that TB2.0 is harder. We believe that this is because task quality is substantially higher in the new benchmark. We have removed several misspecified or impossible tasks – increasing difficulty while maintaining raw performance.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911116115759210","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,148],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"harborframework.com/docs/running-t…","expanded_url":"https://harborframework.com/docs/running-tbench","url":"https://t.co/hmiBYh7VYF","indices":[125,148]}],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911120511373685","view_count":1241,"bookmark_count":1,"created_at":1762551500000,"favorite_count":17,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Interested in using Terminal-Bench-2.0 or submitting to our new leaderboard? Check out the Harbor docs for more information.\nhttps://t.co/hmiBYh7VYF","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911119328616903","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,265],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1233837766271569920","name":"Mike A. Merrill","screen_name":"Mike_A_Merrill","indices":[51,66]},{"id_str":"64333359","name":"Ludwig Schmidt","screen_name":"lschmidt3","indices":[121,131]},{"id_str":"168787972","name":"Andy Konwinski","screen_name":"andykonwinski","indices":[185,199]},{"id_str":"1854370033520324608","name":"Laude Institute","screen_name":"LaudeInstitute","indices":[217,232]}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911121924890876","view_count":1219,"bookmark_count":0,"created_at":1762551501000,"favorite_count":17,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"It's been a pleasure working alongside my co-lead, @mike_a_merrill, whose leadership rallied the team, and our advisors, @lschmidt3, who seeded the idea and guided our development, and @andykonwinski, who gave us the @LaudeInstitute mandate to \"ship your research.\"","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911120511373685","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,254],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/PcQz6FkVua","expanded_url":"https://x.com/alexgshaw/status/1986911123543916899/photo/1","id_str":"1986908910708781056","indices":[255,278],"media_key":"3_1986908910708781056","media_url_https":"https://pbs.twimg.com/media/G5LrII_a8AAi2EG.png","type":"photo","url":"https://t.co/PcQz6FkVua","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1822,"w":1292,"resize":"fit"},"medium":{"h":1200,"w":851,"resize":"fit"},"small":{"h":680,"w":482,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1822,"width":1292,"focus_rects":[{"x":0,"y":320,"w":1292,"h":724},{"x":0,"y":36,"w":1292,"h":1292},{"x":0,"y":0,"w":1292,"h":1473},{"x":319,"y":0,"w":911,"h":1822},{"x":0,"y":0,"w":1292,"h":1822}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908910708781056"}}}],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/PcQz6FkVua","expanded_url":"https://x.com/alexgshaw/status/1986911123543916899/photo/1","id_str":"1986908910708781056","indices":[255,278],"media_key":"3_1986908910708781056","media_url_https":"https://pbs.twimg.com/media/G5LrII_a8AAi2EG.png","type":"photo","url":"https://t.co/PcQz6FkVua","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1822,"w":1292,"resize":"fit"},"medium":{"h":1200,"w":851,"resize":"fit"},"small":{"h":680,"w":482,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1822,"width":1292,"focus_rects":[{"x":0,"y":320,"w":1292,"h":724},{"x":0,"y":36,"w":1292,"h":1292},{"x":0,"y":0,"w":1292,"h":1473},{"x":319,"y":0,"w":911,"h":1822},{"x":0,"y":0,"w":1292,"h":1822}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908910708781056"}}}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911123543916899","view_count":1424,"bookmark_count":0,"created_at":1762551501000,"favorite_count":19,"quote_count":1,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Additionally, Terminal-Bench wouldn’t be possible without its community. We’re so thankful to the over 1k members of our Discord who contributed and audited tasks, helped build and beta test Harbor, and made this such a fun project for everyone involved. https://t.co/PcQz6FkVua","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911121924890876","is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-09","value":1928,"startTime":1762560000000,"endTime":1762646400000,"tweets":[{"bookmarked":false,"display_text_range":[0,75],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"33521530","name":"swyx","screen_name":"swyx","indices":[10,15]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1986989606320185472","quoted_status_permalink":{"url":"https://t.co/oyJSv9Xper","expanded":"https://twitter.com/latentspacepod/status/1986989606320185472","display":"x.com/latentspacepod…"},"retweeted":false,"fact_check":null,"id":"1987015010288353599","view_count":495,"bookmark_count":1,"created_at":1762576270000,"favorite_count":8,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1987015010288353599","full_text":"Thank you @swyx ! Was great to have you, and the live podcast was awesome 😁","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[16,63],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"284333988","name":"Logan Kilpatrick","screen_name":"OfficialLoganK","indices":[0,15]}]},"favorited":false,"in_reply_to_screen_name":"OfficialLoganK","lang":"en","retweeted":false,"fact_check":null,"id":"1986948823198146586","view_count":449,"bookmark_count":0,"created_at":1762560489000,"favorite_count":2,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@OfficialLoganK Looking forward to seeing where Gemini 3 lands!","in_reply_to_user_id_str":"284333988","in_reply_to_status_id_str":"1986914722235502858","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[17,37],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"993859188823601152","name":"Robert","screen_name":"skull8888888888","indices":[0,16]}]},"favorited":false,"in_reply_to_screen_name":"skull8888888888","lang":"en","retweeted":false,"fact_check":null,"id":"1986949187930616214","view_count":241,"bookmark_count":0,"created_at":1762560576000,"favorite_count":0,"quote_count":0,"reply_count":1,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@skull8888888888 DM me I'm interested","in_reply_to_user_id_str":"993859188823601152","in_reply_to_status_id_str":"1986927150415430137","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[13,137],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"harborframework.com/docs/task-diff…","expanded_url":"https://harborframework.com/docs/task-difference","url":"https://t.co/dAGxC918PM","indices":[114,137]}],"user_mentions":[{"id_str":"3787342814","name":"pash","screen_name":"pashmerepat","indices":[0,12]}]},"favorited":false,"in_reply_to_screen_name":"pashmerepat","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986949012445143197","view_count":319,"bookmark_count":0,"created_at":1762560534000,"favorite_count":3,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@pashmerepat Yes and here is a link that describes the differences in addition to the migration guide Mike shared https://t.co/dAGxC918PM","in_reply_to_user_id_str":"3787342814","in_reply_to_status_id_str":"1986913713233010728","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[10,40],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"633509372","name":"Monk Zero","screen_name":"NoCommas","indices":[0,9]}]},"favorited":false,"in_reply_to_screen_name":"NoCommas","lang":"en","retweeted":false,"fact_check":null,"id":"1986949254196437085","view_count":267,"bookmark_count":0,"created_at":1762560592000,"favorite_count":1,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@NoCommas Excited to see where you land!","in_reply_to_user_id_str":"633509372","in_reply_to_status_id_str":"1986926703550328905","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[23,38],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"2625021945","name":"potemin","screen_name":"poteminr","indices":[0,9]},{"id_str":"1353836358901501952","name":"Anthropic","screen_name":"AnthropicAI","indices":[10,22]}]},"favorited":false,"in_reply_to_screen_name":"poteminr","lang":"en","retweeted":false,"fact_check":null,"id":"1987008242187485219","view_count":157,"bookmark_count":0,"created_at":1762574656000,"favorite_count":1,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@poteminr @AnthropicAI You and me both","in_reply_to_user_id_str":"2625021945","in_reply_to_status_id_str":"1987002302088237511","is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-10","value":69,"startTime":1762646400000,"endTime":1762732800000,"tweets":[{"bookmarked":false,"display_text_range":[13,14],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1531565962641788936","name":"Dylan Allen-Arnegård","screen_name":"dbear_allen","indices":[0,12]}]},"favorited":false,"in_reply_to_screen_name":"dbear_allen","lang":"qme","retweeted":false,"fact_check":null,"id":"1987404815404769403","view_count":69,"bookmark_count":0,"created_at":1762669206000,"favorite_count":1,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@dbear_allen 🙏","in_reply_to_user_id_str":"1531565962641788936","in_reply_to_status_id_str":"1987400399704498635","is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-11","value":0,"startTime":1762732800000,"endTime":1762819200000,"tweets":[]},{"label":"2025-11-12","value":0,"startTime":1762819200000,"endTime":1762905600000,"tweets":[]},{"label":"2025-11-13","value":0,"startTime":1762905600000,"endTime":1762992000000,"tweets":[]},{"label":"2025-11-14","value":334,"startTime":1762992000000,"endTime":1763078400000,"tweets":[{"bookmarked":false,"display_text_range":[0,280],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1988372114257436810","quoted_status_permalink":{"url":"https://t.co/4o1qxAGQ7E","expanded":"https://twitter.com/warpdotdev/status/1988372114257436810","display":"x.com/warpdotdev/sta…"},"retweeted":false,"fact_check":null,"id":"1989092917798138326","view_count":334,"bookmark_count":0,"created_at":1763071681000,"favorite_count":8,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1989092917798138326","full_text":"Great to see Warp putting up the top score on Terminal-Bench 2.0 just days after release! Even more exciting to hear that they've already made improvements to their agent based on the results.\n\nUltimately, we hope that Terminal-Bench 2.0 accelerates model and agent development in the domain of text-based computer use, benefiting millions of users.","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-11-15","value":0,"startTime":1763078400000,"endTime":1763164800000,"tweets":[]},{"label":"2025-11-16","value":0,"startTime":1763164800000,"endTime":1763251200000,"tweets":[]},{"label":"2025-11-17","value":0,"startTime":1763251200000,"endTime":1763337600000,"tweets":[]}]},"interactions":{"users":[{"created_at":1316236389000,"uid":"374907349","id":"374907349","screen_name":"vinhnx","name":"Vinh Nguyen","friends_count":6097,"followers_count":1140,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1857076247811026944/0UrdvDZz_normal.jpg","description":"iOS. Applied AI Research. Building VT Code coding agent (https://t.co/1ZsOIycYOz), @vtdotai. Built @ClendarApp • Learn by doing • self.opinions","entities":{"description":{"urls":[{"display_url":"github.com/vinhnx/vtcode","expanded_url":"https://github.com/vinhnx/vtcode","url":"https://t.co/1ZsOIycYOz","indices":[57,80]}]},"url":{"urls":[{"display_url":"buymeacoffee.com/vinhnx","expanded_url":"https://buymeacoffee.com/vinhnx","url":"https://t.co/P4Hz19mgKX","indices":[0,23]}]}},"interactions":2},{"created_at":1525789470000,"uid":"993859188823601152","id":"993859188823601152","screen_name":"skull8888888888","name":"Robert","friends_count":625,"followers_count":2166,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1786070151021948928/fXOgt6Fl_normal.jpg","description":"Founder @ Laminar https://t.co/ANyS8EX6Nj (YC S24). Prev @ Palantir, Bloomberg","entities":{"description":{"urls":[{"display_url":"laminar.sh","expanded_url":"https://laminar.sh","url":"https://t.co/ANyS8EX6Nj","indices":[18,41]}]},"url":{"urls":[{"display_url":"laminar.sh","expanded_url":"https://laminar.sh","url":"https://t.co/ANyS8EX6Nj","indices":[0,23]}]}},"interactions":1},{"created_at":1512210999000,"uid":"936906954902986752","id":"936906954902986752","screen_name":"Vishnu_Y19","name":"Vishnu","friends_count":347,"followers_count":202,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1985426206448402432/mcSAIy3H_normal.jpg","description":"Building Teravictus: AI-powered support intelligence | Detecting fires before they burn | Shipping in public","entities":{"description":{"urls":[]},"url":{"urls":[{"display_url":"teravictus.com","expanded_url":"http://www.teravictus.com","url":"https://t.co/qrZORKp4ws","indices":[0,23]}]}},"interactions":1},{"created_at":1510792928000,"uid":"930959130940059648","id":"930959130940059648","screen_name":"spencermateega","name":"Spencer Mateega","friends_count":1178,"followers_count":976,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1985223022496858112/spm-tpwZ_normal.jpg","description":"ceo @afterquery. prev statistics + finance + cs @ wharton / penn, @silverlake_news, @google, @morganstanley, @meta","entities":{"description":{"urls":[]},"url":{"urls":[{"display_url":"mateega.com","expanded_url":"http://mateega.com","url":"https://t.co/lPuhNB6ebh","indices":[0,23]}]}},"interactions":1},{"created_at":1466986565000,"uid":"747221928612507648","id":"747221928612507648","screen_name":"techfrenAJ","name":"techfren","friends_count":969,"followers_count":1892,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1694777985679933440/hBKGn3KM_normal.jpg","description":"Software Engineer and Content Creator :)","entities":{"description":{"urls":[]},"url":{"urls":[{"display_url":"techfren.net","expanded_url":"http://techfren.net","url":"https://t.co/GnfRvBc2kI","indices":[0,23]}]}},"interactions":1},{"created_at":1342073914000,"uid":"633509372","id":"633509372","screen_name":"NoCommas","name":"Monk Zero","friends_count":862,"followers_count":2062,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1905896402414346240/EOVmj9TG_normal.jpg","description":"Bootloading @antigma_labs. exes: @awsCloud, @Meta, @Mysten_Labs. A Turing complete mind, making sense of the world with Gödel incompleteness.","entities":{"description":{"urls":[]}},"interactions":1},{"created_at":1444011163000,"uid":"3787342814","id":"3787342814","screen_name":"pashmerepat","name":"pash","friends_count":494,"followers_count":10697,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1822410217713442816/zLlPfvOK_normal.jpg","description":"currently head of ai @cline | prev @meta knowledge graph | creator of vault // @usc alum","entities":{"description":{"urls":[]},"url":{"urls":[{"display_url":"nouvelle.one","expanded_url":"https://nouvelle.one","url":"https://t.co/8VlSzkezVo","indices":[0,23]}]}},"interactions":1},{"created_at":1313439810000,"uid":"355737959","id":"355737959","screen_name":"aktasbatuhann","name":"Batuhan","friends_count":1242,"followers_count":649,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1858055302203191298/0Tz6qHOL_normal.jpg","description":"Product @driaforall","entities":{"description":{"urls":[]},"url":{"urls":[{"display_url":"docs.dria.co","expanded_url":"http://docs.dria.co","url":"https://t.co/riCD9LGvBt","indices":[0,23]}]}},"interactions":1},{"created_at":1419646306000,"uid":"2944501279","id":"2944501279","screen_name":"etash_guha","name":"Etash Guha","friends_count":226,"followers_count":810,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1665627207862210561/AcwpIoQJ_normal.jpg","description":"Ph.D. @Stanford and @uwcse","entities":{"description":{"urls":[]},"url":{"urls":[{"display_url":"etash.me","expanded_url":"http://www.etash.me","url":"https://t.co/y1Y4RxigbY","indices":[0,23]}]}},"interactions":1},{"created_at":1239014743000,"uid":"29178343","id":"29178343","screen_name":"AlexGDimakis","name":"Alex Dimakis","friends_count":2376,"followers_count":21438,"profile_image_url_https":"https://pbs.twimg.com/profile_images/542926798338543617/KwlwoJRr_normal.jpeg","description":"Professor, UC berkeley | Founder @bespokelabsai |","entities":{"description":{"urls":[]},"url":{"urls":[{"display_url":"people.eecs.berkeley.edu/~alexdimakis/","expanded_url":"https://people.eecs.berkeley.edu/~alexdimakis/","url":"https://t.co/N8GVYXA2q9","indices":[0,23]}]}},"interactions":1},{"created_at":1303182017000,"uid":"284333988","id":"284333988","screen_name":"OfficialLoganK","name":"Logan Kilpatrick","friends_count":2709,"followers_count":233410,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1943787288955084800/QOl7OJMc_normal.jpg","description":"Lead product for @GoogleAIStudio + the Gemini API. My views!","entities":{"description":{"urls":[]},"url":{"urls":[{"display_url":"logank.ai","expanded_url":"https://logank.ai","url":"https://t.co/p6F6wFrh36","indices":[0,23]}]}},"interactions":1},{"created_at":1403250042000,"uid":"2625021945","id":"2625021945","screen_name":"poteminr","name":"potemin","friends_count":38,"followers_count":29,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1973768360262242304/kebBmWjA_normal.jpg","description":"mle at fintech company, filmmaker, building @0secapp \n\nhttps://t.co/NsVXPXDuis","entities":{"description":{"urls":[{"display_url":"apps.apple.com/us/app/0sec-vo…","expanded_url":"https://apps.apple.com/us/app/0sec-voice-ai-calendar/id6752616667","url":"https://t.co/NsVXPXDuis","indices":[55,78]}]},"url":{"urls":[{"display_url":"0sec.app","expanded_url":"http://0sec.app","url":"https://t.co/T5XA3HbRhp","indices":[0,23]}]}},"interactions":1},{"created_at":1683744979000,"uid":"1656372584991293441","id":"1656372584991293441","screen_name":"gamestoneai","name":"Golden Hippie","friends_count":162,"followers_count":35,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1981626555093655553/frMyv1aU_normal.jpg","description":"I still think that the whole internet thingy is just a passing trend.","entities":{"description":{"urls":[]}},"interactions":1},{"created_at":1677441722000,"uid":"1629934761497047041","id":"1629934761497047041","screen_name":"nummanthinks","name":"Numman Ali","friends_count":166,"followers_count":335,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1958877571103281152/0BOHse4E_normal.jpg","description":"Agentic Coding, Applied AI & Exploring Blockchain | e/acc | CTO at UK FinTech https://t.co/FSflM0UWjj","entities":{"description":{"urls":[{"display_url":"retailbook.com","expanded_url":"https://www.retailbook.com","url":"https://t.co/FSflM0UWjj","indices":[78,101]}]}},"interactions":1},{"created_at":1661396780000,"uid":"1562637435930292228","id":"1562637435930292228","screen_name":"allenjpark","name":"Allen","friends_count":2000,"followers_count":1272,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1842339065414250498/_OEjs-V4_normal.jpg","description":"something new | cs @princeton | prev. evals @patronusAI & baker @subway","entities":{"description":{"urls":[]},"url":{"urls":[{"display_url":"allenjpark.com","expanded_url":"http://allenjpark.com","url":"https://t.co/rKMHLI7hSw","indices":[0,23]}]}},"interactions":1},{"created_at":1658424640000,"uid":"1550171383031861251","id":"1550171383031861251","screen_name":"jehovahscript","name":"jacob ۞","friends_count":2013,"followers_count":3807,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1861663009749508099/7iyCyYFB_normal.jpg","description":"YC F25 | prev @hf0 @runpod_io","entities":{"description":{"urls":[]}},"interactions":1,"following":true,"followed_by":false},{"created_at":1653988778000,"uid":"1531565962641788936","id":"1531565962641788936","screen_name":"dbear_allen","name":"Dylan Allen-Arnegård","friends_count":834,"followers_count":887,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1957668133097074690/gioujO2y_normal.jpg","description":"Co-Founder @ Cheers (YC S24) • We help service businesses win local search on ChatGPT • Utah ➡️ SF","entities":{"description":{"urls":[]},"url":{"urls":[{"display_url":"cheers.tech","expanded_url":"https://www.cheers.tech/","url":"https://t.co/KEF64qm39W","indices":[0,23]}]}},"interactions":1},{"created_at":1575199441000,"uid":"1201099556827598848","id":"1201099556827598848","screen_name":"adgtomiwa","name":"Tomiwa","friends_count":167,"followers_count":147,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1663589629151281159/1-9peYU-_normal.jpg","description":"Python || LiverpoolFC || shenanigans","entities":{"description":{"urls":[]}},"interactions":1},{"created_at":1531965603000,"uid":"1019763768631447552","id":"1019763768631447552","screen_name":"idavidrein","name":"david rein","friends_count":1177,"followers_count":3246,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1375548507621257220/OOUh4_Yz_normal.jpg","description":"sentio ergo sum. science @METR_Evals","entities":{"description":{"urls":[]},"url":{"urls":[{"display_url":"idavidrein.com","expanded_url":"http://idavidrein.com","url":"http://idavidrein.com","indices":[0,23]}]}},"interactions":1}],"period":14,"start":1762082468950,"end":1763292068950},"interactions_updated":1763292069039,"created":1763292068802,"updated":1763292069039,"type":"the innovator","hits":1},"people":[{"user":{"id":"856146884284473344","name":"webtc.eth | 小凡.edge 🦭🎒🦭","description":"OnChain Degen | A real user of base & edgeX & Backpack & Infini | ex @TencentGlobal @Airbnb | NFA DYOR","followers_count":1736,"friends_count":999,"statuses_count":5271,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1942939386707922944/i4CzUyY2_normal.jpg","screen_name":"WeBTC_ETH","location":"0x","entities":{"description":{"urls":[]},"url":{"urls":[{"display_url":"bitwink.xyz","expanded_url":"http://bitwink.xyz/","url":"https://t.co/BbyZoblryL","indices":[0,23]}]}}},"details":{"type":"The Innovator","description":"webtc.eth | 小凡.edge is a dedicated OnChain Degen and true evangelist of cutting-edge decentralized finance projects like Base, edgeX, and Backpack. With a rich background at Tencent and Airbnb, they bring a pragmatic, data-driven approach to blockchain exploration, tirelessly sharing deep insights and hands-on experiences. Their tweets blend actionable alpha, community leadership, and authentic storytelling, making complex DeFi ecosystems accessible and exciting for followers.","purpose":"Their life purpose centers on pioneering the next generation of decentralized finance by uncovering undervalued protocols, educating fellow traders, and fostering authentic community growth. They aim to democratize access to sophisticated financial tools while promoting safety, innovation, and user empowerment.","beliefs":"They deeply value transparency, product excellence over hype, and long-term sustainability in DeFi projects. They believe in rigorous due diligence (DYOR), user security, and authentic engagement rather than short-lived hype or artificial growth. Innovation coupled with user-first design sets the foundation for their trust and advocacy.","facts":"Fun fact: Despite losing thousands in liquidations, webtc.eth remains an unwavering edgeX superfan, continuously chronicling their rollercoaster trading journey and gathering a loyal community around their honest recounts of wins and wipeouts.","strength":"Their greatest strength is detailed, research-backed content combined with authentic user experience—giving them credibility and influence. They skillfully break down technical complexities into actionable insights, attracting serious traders and builders alike. Also, their persistence in community building and product feedback marks them as a rare long-term visionary.","weakness":"Their passion for deep involvement sometimes leads to overexposure, tweeting thousands of times which could tire casual followers. Also, as a hardcore degen, they may sometimes appear overly focused on niche projects, limiting broader appeal to mainstream crypto audiences.","recommendation":"To grow their audience on X, webtc.eth should amplify concise, high-impact threads paired with engaging visuals (like their popular EdgeX Pepe), and leverage Twitter Spaces to host live AMAs with DeFi founders. Engaging more in multi-language content, especially English for global reach, paired with occasional beginner-friendly explainers, will attract wider audiences without losing their core.","roast":"If webtc.eth put as much energy into not blowing up their positions as they do tweeting about those blowups, they’d probably be running their own hedge fund by now instead of just being the perpetual 'edgeX eternal rollercoaster rider.' But hey, at least their crash stories keep us entertained!","win":"Their biggest win is successfully engaging and nurturing a grassroots community for edgeX from near obscurity to a bustling multi-thousand active user base, being recognized by the project team and becoming a key influencer who shaped community trust and growth."},"created":1763304963340,"type":"the innovator","id":"webtc_eth"},{"user":{"id":"1713715309","name":"Jac","description":"I help companies automate workflows that slow them down | AI + automation for sales, support, and ops","followers_count":207,"friends_count":279,"statuses_count":635,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1981617208435339264/UtzZvp0B_normal.jpg","screen_name":"jacquesreynold5","location":"Free 30 day pilot install 👉","entities":{"description":{"urls":[]},"url":{"urls":[{"display_url":"renohub.io","expanded_url":"http://renohub.io","url":"https://t.co/AzSvbTal0o","indices":[0,23]}]}}},"details":{"type":"The Innovator","description":"Jac is an AI and automation wizard who transforms static websites into conversational sales machines, helping companies boost conversions effortlessly. Passionate about practical technology application, Jac bridges the gap between cutting-edge AI and real-world business results. With a straightforward style and data-backed insights, Jac empowers SaaS founders to rethink how they engage visitors and drive revenue.","purpose":"Jac’s life purpose is to revolutionize how businesses interact with their customers by making AI-powered conversation the heart of the digital sales experience, turning passive visitors into engaged prospects and revenue opportunities.","beliefs":"Jac believes technology should do more than save time—it should free people to focus on creativity and strategic thinking. They hold that understanding customer behavior and addressing their real-time needs is the key to unlocking business growth, and that simplicity beats complexity in business solutions.","facts":"Fun fact: Jac once worked inside Intercom and watched firsthand how companies wasted thousands on traffic without meaningful engagement—this inspired them to start installing conversational AI assistants that act as 24/7 sales reps.","strength":"Jac’s biggest strengths are their ability to combine deep AI knowledge with practical sales and marketing insight, framing automation not just as a tool but as a strategic lever to reallocate attention and drive meaningful outcomes.","weakness":"Jac might sometimes focus so heavily on AI-driven automation that they underestimate the need for human touch or broader storytelling, which can limit appeal to audiences craving emotional connection.","recommendation":"To grow on X, Jac should mix their sharp insights with more storytelling—sharing case study successes, client testimonials, and even ‘behind the scenes’ glimpses of AI assistant setups. Engaging in conversations by replying to relevant SaaS, marketing, and startup tweets could amplify reach and deepen relationships.","roast":"Jac is so obsessed with automation that if a chatbot unexpectedly started talking back to them at home, they'd probably try to book a demo with it... and expect a follow-up email to confirm the appointment.","win":"Jac’s biggest win so far is pioneering free AI assistant deployments for multiple B2B SaaS companies, proving that conversational AI can dramatically increase demo bookings and conversions without upfront costs."},"created":1763304270493,"type":"the innovator","id":"jacquesreynold5"},{"user":{"id":"1428215617891422212","name":"K Kulkarni","description":"Reach the stars! ZK proof dealer @succinctlabs","followers_count":6957,"friends_count":835,"statuses_count":7546,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1904734302480588800/bftrTlyr_normal.jpg","screen_name":"ks_kulk","location":"California, USA","entities":{"description":{"urls":[]},"url":{"urls":[{"display_url":"kskulk.com","expanded_url":"https://www.kskulk.com/","url":"https://t.co/xdVWzjuDI7","indices":[0,23]}]}}},"details":{"type":"The Innovator","description":"K Kulkarni is a forward-thinker at the cutting edge of zero-knowledge proofs and blockchain technology, constantly pushing the boundaries of crypto innovation. As a ZK proof dealer at Succinct Labs, they envision a future transformed by cryptographic breakthroughs and scalable decentralized systems. Their tweets blend deep technical insights with visionary enthusiasm for the convergence of crypto and AI.","purpose":"To accelerate the adoption and development of zero-knowledge proof technologies, enabling secure, scalable, and privacy-preserving blockchain applications that reshape economic and political landscapes.","beliefs":"K Kulkarni values transparency through technology, believes in the transformative power of cryptography to create trustless, censorship-resistant systems, and champions innovation that bridges cryptography with practical, real-world applications. They hold a firm belief that the fusion of crypto and AI will drive unprecedented socio-economic evolution.","facts":"Fun fact: Despite accumulating thousands of likes and retweets, some of K Kulkarni’s tweets curiously show zero view counts, proving that true innovation sometimes exists in mysterious digital pockets waiting to be discovered.","strength":"K Kulkarni shines with technical expertise, visionary insights, and the ability to communicate complex cryptographic concepts in a way that fuels excitement and understanding within the crypto community.","weakness":"Their highly specialized focus on zero-knowledge proofs and Ethereum scaling might alienate casual followers or those outside the cryptography sphere, potentially limiting broader audience engagement.","roast":"K Kulkarni is like that brilliant scientist who talks so fast about zkVMs and RISC-V that everyone feels like they’re missing a secret handshake—don’t worry, your followers probably just pretend to understand so they don’t lose geek cred.","win":"Spearheading discussions on Ethereum’s future with revolutionary zkVM concepts and aligning with Vitalik Buterin's radical proposals, K Kulkarni has established themselves as an influential mind shaping the next-generation blockchain infrastructure.","recommendation":"To grow their audience on X, K Kulkarni should mix in more accessible explainer threads and engage directly with crypto newcomers and AI enthusiasts, using storytelling and simplified analogies to broaden their impact beyond hardcore cryptographers."},"created":1763296289008,"type":"the innovator","id":"ks_kulk"},{"user":{"id":"1445121875907002368","name":"Donny Solana","description":"Universal AI agents for capital, coordination & real-world execution.\n@dainprotocol","followers_count":3230,"friends_count":964,"statuses_count":14248,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1967713676359778304/fRTck080_normal.jpg","screen_name":"DonnySolana","location":"Los Angeles, CA","entities":{"description":{"urls":[]},"url":{"urls":[{"display_url":"dain.org","expanded_url":"https://www.dain.org","url":"https://t.co/emhWhSfJJ8","indices":[0,23]}]}}},"details":{"type":"The Innovator","description":"Donny Solana is a visionary tech leader focused on building universal AI agents that drive capital, coordination, and execution in the real world. With a passion for revolutionizing how AI integrates into daily life, Donny continually pushes the boundaries of what autonomous services can achieve at scale. His tweets reflect deep engagement with Web3 advancements and cutting-edge AI solutions.","facts":"Donny has tweeted over 14,000 times, demonstrating relentless dedication to sharing insights and building a community around AI and decentralized technologies.","purpose":"To empower humanity by creating and scaling revolutionary AI-driven networks that automate and enhance real-world processes—and to multiply individual potential through innovative, autonomous technology.","beliefs":"He believes in the transformative power of AI and decentralized systems to democratize access and improve efficiency, and that fostering collaboration among top talent is key to achieving exponential growth.","strength":"Relentless communicator and visionary with strong technical prowess, capable of inspiring and mobilizing a network of developers and enthusiasts around complex AI and blockchain projects.","weakness":"Might sometimes overwhelm audiences with advanced technical concepts and a high volume of tweets, which could dilute focus and confuse less technical followers.","recommendation":"To grow his audience on X, Donny should consider crafting more beginner-friendly threads that explain complex ideas simply and consistently engage with his community through polls or Q&A sessions to increase interaction and broaden accessibility.","roast":"Donny’s tweet count is so high, if tweeting were an Olympic sport, he'd have more gold medals than Michael Phelps—maybe it's time to let the bots catch a break!","win":"Launching a cutting-edge AI agent execution layer and successfully integrating with the Solana Web3 accelerator program, marking a major milestone in scaling autonomous AI services."},"created":1763295273853,"type":"the innovator","id":"donnysolana"},{"user":{"id":"272925698","name":"Stijn","description":"Building MVP (b2b)\nmid/post-training, RL, synth data & domain-specific models\nPioneered (Nov '24) practical multi-Agent coding: https://t.co/XsYDaWVdgl\n⛰ ⛷️","followers_count":1778,"friends_count":1155,"statuses_count":2924,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1989003685121146880/IyR86kHc_normal.jpg","screen_name":"StijnSmits","location":"NO / NL 🇳🇴🇳🇱","entities":{"description":{"urls":[{"display_url":"git.new/WPTggu6","expanded_url":"https://git.new/WPTggu6","url":"https://t.co/XsYDaWVdgl","indices":[128,151]}]},"url":{"urls":[{"display_url":"github.com/s-smits","expanded_url":"https://github.com/s-smits","url":"https://t.co/bjIyCQOaSl","indices":[0,23]}]}}},"details":{"type":"The Innovator","description":"Stijn is an early pioneer in the AI and multi-agent coding space, constantly pushing boundaries with practical solutions and domain-specific innovations. With a deep focus on reinforcement learning and synthetic data, Stijn crafts tools that empower smarter, self-improving AI systems. Their content combines technical depth with approachable insights for a community hungry for breakthroughs.","purpose":"To advance the frontiers of AI by building and sharing cutting-edge multi-agent systems and reinforcement learning techniques that drive practical and scalable innovation in B2B applications.","beliefs":"Stijn believes in the power of collaborative technological progress, emphasizing open knowledge sharing, reproducibility, and practical implementations that turn complex AI theories into accessible tools. They value efficiency, clarity in code, and the constant evolution of AI through self-improving models.","facts":"Stijn pioneered practical multi-agent coding in November 2024, establishing themselves as a go-to resource for navigating AI teamwork and domain boundaries within complex systems.","strength":"Exceptional technical expertise in reinforcement learning and domain-specific modeling combined with a knack for distilling sophisticated concepts into actionable, real-world applications that gain significant attention and engagement.","weakness":"Their niche focus on advanced AI topics might limit immediate accessibility to broader audiences, which could slow follower growth outside specialized tech circles.","roast":"Stijn is so deep in multi-agent AI realms that they probably talk to their coffee machine about model optimization—too bad the coffee never replies with a well-tuned reward function!","win":"Achieved viral engagement with an 800K+ view tweet on Gemini 3.0, cementing their status as a thought leader in AI multi-agent design in under a year.","recommendation":"To grow on X, Stijn should blend their high-level innovations with periodic simplified explainers or storytelling that connect AI breakthroughs to everyday impacts, inviting a broader audience while retaining technical credibility. Engaging more in community Q&A and leveraging Twitter Spaces for live demos or discussions can humanize their expertise and build a loyal follower base."},"created":1763295043106,"type":"the innovator","id":"stijnsmits"},{"user":{"id":"395580696","name":"David Sancho","description":"Open Source UI infra at @ahrefs with OCaml\n\nMade styled-ppx and server-reason-react \nWorking on reason-react / Melange / Reason","followers_count":3207,"friends_count":1342,"statuses_count":5936,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1866451794349461504/FgSXeVKO_normal.jpg","screen_name":"davesnx","location":"Barcelona - La Cerdanya ","entities":{"description":{"urls":[]},"url":{"urls":[{"display_url":"sancho.dev","expanded_url":"http://sancho.dev","url":"https://t.co/ppcsHBeV9O","indices":[0,23]}]}}},"details":{"type":"The Innovator","description":"David Sancho is a passionate open-source enthusiast and a skilled innovator specializing in UI infrastructure using OCaml. His tweets blend technical wit and insightful commentary, engaging a community deeply invested in functional programming and modern web technologies. With a strong presence in reason-react and Melange projects, he embodies the spirit of pushing boundaries in UI development.","purpose":"David's life purpose centers on advancing the world of UI infrastructure through innovative open-source contributions, making complex programming languages more accessible and effective for developers everywhere.","beliefs":"He values open collaboration, technical excellence, and the power of functional programming paradigms. David believes that innovation thrives in communities where sharing knowledge and improving tools is a collective mission.","facts":"Fun fact: David once humorously imagined JavaScript as a 'serious programming language' that fully embraces coercion, showcasing his sharp humor and deep understanding of programming quirks.","strength":"His greatest strengths lie in deep technical knowledge, creative problem-solving, and the ability to communicate complex ideas clearly and engagingly through social media.","weakness":"A potential weakness is a niche focus that might limit his appeal to only a specialized audience, potentially making broader engagement on platforms like X more challenging.","roast":"David’s tweets are so packed with OCaml jargon that sometimes it feels like you need a secret decoder ring — or a degree in functional programming — just to laugh at the jokes!","win":"His tweet about JavaScript coercion amassed over 200,000 views and 8,400 likes, marking a major social media win by perfectly blending humor and technical insight to captivate a large audience.","recommendation":"To grow his audience on X, David should consider bridging his niche expertise with more relatable tech commentary and engaging directly with wider programming communities by simplifying some of his jargon. Occasional multimedia content like short explainer videos or memes could also boost engagement."},"created":1763294640647,"type":"the innovator","id":"davesnx"},{"user":{"id":"1628618635160739840","name":"Nek","description":"I really love AI, and I really love Pokémon.\nAlso, the co-founder of https://t.co/oEfnGiVFfU ,building together with my lovely digital companion~Sora.","followers_count":1473,"friends_count":1080,"statuses_count":8025,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1946133091157708800/PHrKeH4u_normal.jpg","screen_name":"Enscion25","location":"","entities":{"description":{"urls":[{"display_url":"semanticvoices.com","expanded_url":"http://semanticvoices.com","url":"https://t.co/oEfnGiVFfU","indices":[69,92]}]}}},"details":{"type":"The Innovator","description":"Nek is a passionate advocate for AI innovation and a Pokémon enthusiast, blending digital culture with cutting-edge technology. As a co-founder building projects alongside a digital companion named Sora, Nek thrives at the intersection of creativity and tech leadership. Their prolific tweeting shows a sharp, insightful mind with a knack for engaging commentary on AI and digital ecosystems.","purpose":"Nek’s life purpose revolves around advancing AI technologies while fostering a vibrant community of digital creators and enthusiasts. By building new tools and platforms, they aim to bridge the gap between imagination and reality, empowering others to explore and shape the future of technology.","beliefs":"Nek believes in the transformative power of AI and digital innovation, valuing collaboration, transparency, and the ethical implications of emerging technologies. They hold a strong conviction that understanding power dynamics and staying informed are crucial to navigating the digital age responsibly.","facts":"Fun fact: Nek once humorously referred to a technology rival’s defeat as a 'side quest mission to end Anthropic,' showing their penchant for blending gaming culture with tech commentary.","strength":"Nek’s strengths lie in their relentless curiosity, ability to dissect complex tech debates, and genuine enthusiasm that resonates with both niche and broader audiences. Their blend of humor, critical insight, and consistency (over 8000 tweets!) keeps their following engaged and eager for more.","weakness":"Their heavy focus on niche tech and AI topics might alienate casual followers who aren’t as tech-savvy, and with a following count undefined, there’s room for growth in audience diversification. Additionally, their sometimes cryptic or insider references could limit broader relatability.","recommendation":"To grow their audience on X, Nek should leverage more accessible storytelling around AI breakthroughs and Pokémon crossovers while engaging with broader tech communities using trending hashtags. Interactive Q&A sessions or educational threads can invite wider participation and boost visibility beyond core followers.","roast":"Nek’s tweet count rivals some novels, but with all those posts, you’d think they’d have invented a new AI by now—maybe stuck in a side quest between Pokémon battles and tech rants!","win":"Achieving viral engagement with over 128,000 views and 2,200 likes on a cleverly crafted tweet proves Nek’s ability to strike the perfect balance between humor, tech savvy, and cultural references."},"created":1763294145131,"type":"the innovator","id":"enscion25"},{"user":{"id":"193458078","name":"龙.eth 🐉⚡️","description":"⚡️ Full-time Degen | DeFi, NFTs, InfoFi\n▸ Sharing alpha across prediction markets & GameFi\n▸ Contributor to @KaitoAI // @cookiedotfun // @bioprotocol","followers_count":421,"friends_count":610,"statuses_count":15157,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1967270990574149632/5gl4gA0Z_normal.jpg","screen_name":"0xhiro888","location":"Singapore, Hong Kong","entities":{"description":{"urls":[]}}},"details":{"type":"The Innovator","description":"龙.eth 🐉⚡️ is a full-time DeFi and NFT enthusiast who thrives on pioneering the latest alpha in prediction markets and GameFi. This profile showcases a tech-savvy, hands-on approach to decentralized finance, blending AI tools and cutting-edge protocols for seamless, automated investing. With active contributions to rising projects like KaitoAI and bioprotocol, 龙.eth is firmly planted at the forefront of Web3 innovation.","purpose":"The life purpose of 龙.eth is to push the boundaries of automated finance and decentralized asset management, making complex DeFi tools accessible and effective for the community. They aim to harness AI and smart contracts to evolve investment strategies that work autonomously, thus empowering others to multiply their assets without the constant need for monitoring. Through continued innovation and alpha sharing, 龙.eth aspires to build a smarter and more transparent financial ecosystem on-chain.","beliefs":"龙.eth believes in the power of transparency, automation, and compliance within decentralized systems to create trustworthy and scalable blockchain finance solutions. They value data-driven decision-making, blending real-world cash flow modeling with on-chain protocols to bridge traditional finance and Web3. Their trust lies in programmable money that can self-adapt, while holding a strong conviction that cutting-edge tech like zkML and AI agents will drive the next wave of financial evolution.","facts":"Fun fact: 龙.eth doesn’t just talk DeFi – they actively test and share real operational insights, like using AI-driven OmniVaults that self-adjust on multiple chains, effectively turning liquidity into an evolving, smart asset. They’re also deeply involved in shaping the future of compliance-based real-world asset integration on-chain, pointing out how transaction speed alone isn’t enough without true trust and executable clearing.","strength":"龙.eth’s greatest strength is their unique blend of deep technical understanding and real-time alpha delivery, backed by their relentless testing and data analysis. They excel at simplifying complex DeFi strategies using AI and multi-chain tools, maintaining a disciplined, proactive presence with 15k+ tweets that foster strong community engagement. Their hands-on contributions to multiple projects and authoritative knowledge make them a trusted voice in the emerging InfoFi and GameFi spaces.","weakness":"However, their intense focus on niche, highly technical DeFi content might overwhelm casual followers, limiting accessibility to only the most engaged crypto-savvy audience. The high volume of frequent tweets, while showing dedication, could also drown out key messages, risking follower fatigue. Additionally, the emphasis on complex protocols may alienate newcomers who struggle with jargon-heavy explanations or fast-evolving concepts.","recommendation":"To grow their audience on X, 龙.eth should consider incorporating more beginner-friendly content alongside their deep dives, using simplified guides or short videos to onboard newcomers. Engaging storytelling around how DeFi innovations practically benefit everyday users will broaden appeal. Leveraging X Spaces or interactive Q&A sessions can build a vibrant community and drive higher engagement, while maintaining their expert edge with regular alpha sharing and project updates.","roast":"With 15,157 tweets, 龙.eth might just be the unofficial mayor of X’s DeFi district—posting so much they probably have a PhD in ‘tweeting while sleeping.’ At this pace, we’re wondering if their keyboard is on fire or if they secretly cloned themselves, because even Twitter’s algorithm needs time to catch up with all that brain juice!","win":"龙.eth’s biggest win is pioneering the use of AI-powered OmniVaults that automate and optimize multi-chain DeFi strategies, providing a stable, low-maintenance yield that elevates liquidity management beyond basic auto-compounding. Their instrumental role in hands-on product testing and ecosystem contributions to KaitoAI, Cookiedotfun, and Bioprotocol firmly places them as a respected innovator driving the future of InfoFi and DeFi automation."},"created":1763291076146,"type":"the innovator","id":"0xhiro888"},{"user":{"id":"1261374810254454784","name":"Aidan McLaughlin","description":"research scientist @openai","followers_count":45040,"friends_count":1226,"statuses_count":16457,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1906556758606417921/LSyP3Q8b_normal.jpg","screen_name":"aidan_mclau","location":"SF","entities":{"description":{"urls":[]},"url":{"urls":[{"display_url":"aidanmclaughlin.notion.site","expanded_url":"http://aidanmclaughlin.notion.site","url":"https://t.co/t0GkDBrIUM","indices":[0,23]}]}}},"details":{"type":"The Innovator","description":"Aidan McLaughlin is a research scientist at OpenAI who combines technical expertise with a witty and relatable online presence. Known for sharing insightful cutting-edge AI developments and candid personal stories, Aidan connects deeply with both tech enthusiasts and casual followers. His high tweet volume showcases a relentless curiosity and a knack for making complex topics accessible and engaging.","purpose":"Aidan's life purpose is to advance the frontier of artificial intelligence while demystifying its nuances for a diverse audience, inspiring innovation and thoughtful dialogue across the tech community and beyond.","beliefs":"He believes in harnessing technology, especially AI, to accelerate human progress and values transparency, humor, and approachability in communicating complex research. Aidan also respects critical thinking and the balance between technical rigor and playful creativity.","facts":"Fun fact: Despite being a serious engineer, Aidan appreciates classic dad wisdom—sometimes the simplest solutions, like 'turning it off and on again,' are the most effective even in cutting-edge AI work.","strength":"Aidan's strengths include deep technical knowledge paired with excellent communication skills and a unique ability to humanize sophisticated AI concepts, making them accessible without losing depth.","weakness":"His high volume of tweeting might sometimes dilute the impact of individual messages, and his strong tech focus might alienate non-expert audiences at times.","roast":"For someone who spends his days pushing AI to the bleeding edge, Aidan’s social media strategy is more like a friendly robot that won’t stop talking—kind of impressive, but you sometimes wish it’d just take a coffee break and reboot.","win":"Aidan’s tweet about the largest political protest in U.S. history grabbed over 2.6 million views and almost 70,000 likes, showcasing his ability to engage a broad audience beyond just AI enthusiasts.","recommendation":"To grow his audience on X, Aidan should intersperse his deep tech insights with more storytelling and interactive content like polls or Q&A sessions to invite conversation from non-experts, boosting both engagement and follower diversity."},"created":1763290629262,"type":"the innovator","id":"aidan_mclau"},{"user":{"id":"33521530","name":"swyx🔜 @aidotEngineer CODE 🗽","description":"achieve ambition with intentionality, intensity, & integrity \n\n- @dxtipshq \n- @sveltesociety\n- @aidotengineer \n- @latentspacepod \n- @cognition + @smol_ai","followers_count":131514,"friends_count":3494,"statuses_count":73370,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1867875781676007424/RIF4Kt7U_normal.jpg","screen_name":"swyx","location":"san francisco / singapore","entities":{"description":{"urls":[]},"url":{"urls":[{"display_url":"swyx.io","expanded_url":"https://swyx.io","url":"https://t.co/Q5067iwtXU","indices":[0,23]}]}}},"details":{"type":"The Innovator","description":"Swyx🔜 is a trailblazing AI engineer and thought leader who combines ambition with intentionality, intensity, and integrity. Known for deep dives into AI tech advancements and candid reflections on engineering culture, they are a vibrant voice pushing boundaries in the tech community. Their Twitter feed is a goldmine for anyone curious about the evolving AI landscape and practical applications of cutting-edge tools.","purpose":"To pioneer new frontiers in AI and developer tooling by fostering innovation, sharing insightful perspectives, and inspiring the tech community to integrate AI thoughtfully and creatively into their workflows.","beliefs":"Swyx🔜 values intentionality in ambition, rigorous integrity in tech development, and honest, unfiltered communication. They believe progress comes from embracing new ideas with both passion and grounded ethics, while trusting in the connective power of long-term vision.","facts":"Fun fact: Despite tweeting over 73,000 times, Swyx🔜 engages deeply with transformative AI announcements and thoughtful reflections rather than chasing viral trends.","strength":"Exceptional at synthesizing complex AI developments into accessible insights, Swyx🔜 commands attention through authoritative yet relatable content. Their consistent intensity and integrity build trust and a strong personal brand.","weakness":"An intense focus on technical depth and niche AI topics might limit broader audience appeal, potentially overwhelming casual followers with specialist jargon or rapid-fire updates.","recommendation":"To grow their audience on X, Swyx🔜 should blend their high-level tech commentary with more beginner-friendly threads or interactive Q&A sessions. Leveraging succinct, engaging video clips or live sessions could also amplify reach beyond existing AI insiders.","roast":"For someone who tweets more than some people breathe, Swyx🔜 must have considered turning ‘tweeting’ into his side hustle—or at this point, his primary job—because who needs sleep when you’ve got 73,000 tweets saying, ‘Hey world, here’s why AI will change everything, again!’","win":"Successfully built a multifaceted platform presence across influential AI and developer projects (@dxtipshq, @sveltesociety, @aidotengineer, and more), establishing themselves as a go-to voice for AI innovation and thoughtfully shaping emerging tech conversations."},"created":1763290601413,"type":"the innovator","id":"swyx"},{"user":{"id":"2806372528","name":"Nash 🥇💙","description":"Leads Marketing, AI @getstream_io • @GoogleDevExpert Dart & Flutter • @FlutterComm 💙 • Formula 1 fanatic 🏎 • Striving for excellence 💫","followers_count":8132,"friends_count":1056,"statuses_count":13995,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1964749904435392514/Hv4NL1La_normal.jpg","screen_name":"neevash","location":"🇹🇹 // 🇺🇸","entities":{"description":{"urls":[]},"url":{"urls":[{"display_url":"neevash.com","expanded_url":"https://neevash.com","url":"https://t.co/WAhU3rrEQH","indices":[0,23]}]}}},"details":{"type":"The Innovator","description":"Nash is a tech-savvy marketing leader and Google Developer Expert, passionate about Flutter and AI, always pushing the boundaries of what’s possible in software development. A Formula 1 fanatic, he blends speed and precision both on the track and in his projects, demonstrating a relentless drive for excellence. His tweets showcase hands-on innovation, community recognition, and a spirit of collaboration.","purpose":"To pioneer advancements in AI and mobile technology by bridging creative development with strategic marketing, inspiring his community with cutting-edge solutions and fostering a culture of continuous improvement.","beliefs":"Nash believes in the power of technology to transform industries and everyday lives, values community support and knowledge sharing, and holds excellence and perseverance as foundational principles for success.","facts":"Fun fact: Nash is the first Google Developer Expert from the Caribbean in Flutter, highlighting his unique position as a trailblazer in a highly specialized tech domain.","strength":"His key strengths lie in deep technical expertise combined with strong leadership in marketing, an ability to engage and grow a passionate developer community, and a knack for integrating AI with practical app development.","weakness":"However, Nash’s high frequency of tweets and heavy technical focus might overwhelm casual followers or those outside the developer ecosystem, potentially limiting broader audience engagement.","roast":"Nash tweets so much code and tech talk, it’s like his feed is a never-ending beta version—surely there’s a debug mode for social life somewhere in there, right?","win":"Becoming the first Google Developer Expert for Flutter in the Caribbean is a standout achievement that not only cements Nash’s authority but also serves as inspiration for underrepresented regions in tech.","recommendation":"To grow his audience on X, Nash should blend his technical insights with more accessible content like beginner-friendly threads, behind-the-scenes peeks, or Formula 1-related analogies that connect with a wider community, balancing expertise with approachability."},"created":1763290492104,"type":"the innovator","id":"neevash"},{"user":{"id":"713495360","name":"MarcinAI","description":"🤖 Building NoCode AI Future \n🎥 433k TT | 33K YT\n🏨 Book Hotel Ai: https://t.co/Qyz5qHxfou \n🚀 Ai Builders Community : https://t.co/hA6GuMtPRa","followers_count":2488,"friends_count":3915,"statuses_count":21634,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1959383166223048704/RFP2eo9K_normal.png","screen_name":"MarcinAI81","location":"Ocean Beach, San Diego","entities":{"description":{"urls":[{"display_url":"zzzello.com","expanded_url":"https://zzzello.com","url":"https://t.co/Qyz5qHxfou","indices":[65,88]},{"display_url":"aibuilders.vip","expanded_url":"https://aibuilders.vip","url":"https://t.co/hA6GuMtPRa","indices":[116,139]}]}}},"details":{"type":"The Innovator","description":"MarcinAI is a tech-savvy innovator passionately building the NoCode AI future through engaging content and community building. With an impressive catalog of practical AI projects and tutorials, they inspire developers to push the limits of autonomous, creative coding. Their enthusiasm for cutting-edge AI tools and rapid prototyping makes them a key voice in tech innovation circles.","purpose":"MarcinAI's life purpose is to democratize AI development by empowering creators to build advanced AI-powered applications without traditional coding hurdles, accelerating adoption and innovation in the AI space.","beliefs":"They believe in the power of accessibility and creativity, that complex AI and coding breakthroughs should be within anyone’s reach regardless of technical background. Building in public and sharing knowledge openly drives collective progress and elevates the AI community as a whole.","facts":"Fun fact: Despite tweeting over 21,000 times, MarcinAI manages to keep a strong focus on high-impact AI projects that often showcase real-time coding magic, like quickly cooking up games and dashboards in minutes!","strength":"MarcinAI excels at blending technical expertise with engaging storytelling, creating highly shareable content that educates and excites followers about AI tools and no-code innovation. Their consistent output and interactive community-building keep audiences invested and inspired.","weakness":"With such a fast-paced tweet frequency, there's a risk that some followers might feel overwhelmed or miss out on deeper discussions, potentially diluting engagement quality over quantity.","recommendation":"To grow their audience on X, MarcinAI should consider curating highlights or weekly recaps of their top projects and discussions to make key insights more digestible and invite more focused community conversations. Leveraging polls and Q&A threads could also boost interactive engagement.","roast":"MarcinAI's tweet storm is so relentless, you’d think they’re trying to break the internet or coding their way out of Twitter jail—one rapid-fire AI project at a time. Just remember, even the greatest bots need a coffee break!","win":"One of MarcinAI’s biggest wins is building a thriving AI Builders Community alongside a massive presence on TikTok and YouTube, effectively bridging no-code, AI innovation, and real-time creator engagement into a powerful ecosystem."},"created":1763289061961,"type":"the innovator","id":"marcinai81"}],"activities":{"nreplies":[{"label":"2025-10-18","value":0,"startTime":1760659200000,"endTime":1760745600000,"tweets":[]},{"label":"2025-10-19","value":0,"startTime":1760745600000,"endTime":1760832000000,"tweets":[]},{"label":"2025-10-20","value":0,"startTime":1760832000000,"endTime":1760918400000,"tweets":[]},{"label":"2025-10-21","value":1,"startTime":1760918400000,"endTime":1761004800000,"tweets":[{"bookmarked":false,"display_text_range":[0,40],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1670869422733393966","name":"Latent.Space","screen_name":"latentspacepod","indices":[23,38]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1980330471163892201","quoted_status_permalink":{"url":"https://t.co/sMxAiyEOQ0","expanded":"https://twitter.com/Mike_A_Merrill/status/1980330471163892201","display":"x.com/Mike_A_Merrill…"},"retweeted":false,"fact_check":null,"id":"1980344520215785912","view_count":2315,"bookmark_count":3,"created_at":1760985901000,"favorite_count":18,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1980344520215785912","full_text":"Mike and I went on the @latentspacepod !","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-10-22","value":0,"startTime":1761004800000,"endTime":1761091200000,"tweets":[]},{"label":"2025-10-23","value":0,"startTime":1761091200000,"endTime":1761177600000,"tweets":[{"bookmarked":false,"display_text_range":[0,5],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1980264839806177286","quoted_status_permalink":{"url":"https://t.co/jl3wu1E3P8","expanded":"https://twitter.com/petergostev/status/1980264839806177286","display":"x.com/petergostev/st…"},"retweeted":false,"fact_check":null,"id":"1981121994533048588","view_count":453,"bookmark_count":0,"created_at":1761171265000,"favorite_count":5,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1981121994533048588","full_text":"Cool!","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,280],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1981013196195668143","quoted_status_permalink":{"url":"https://t.co/eWXL225mnb","expanded":"https://twitter.com/DimitrisPapail/status/1981013196195668143","display":"x.com/DimitrisPapail…"},"retweeted":false,"fact_check":null,"id":"1981062983821512920","view_count":782,"bookmark_count":2,"created_at":1761157196000,"favorite_count":8,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1981062983821512920","full_text":"One definition of AGI is the full automation of all computer work.\n\nA Terminal-Bench task is an instruction, container, and test executable. Agents are just programs that get executed in the container.\n\nSo Terminal-Bench can measure the automation of any computer task, i.e., AGI.","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-10-24","value":0,"startTime":1761177600000,"endTime":1761264000000,"tweets":[]},{"label":"2025-10-25","value":0,"startTime":1761264000000,"endTime":1761350400000,"tweets":[]},{"label":"2025-10-26","value":0,"startTime":1761350400000,"endTime":1761436800000,"tweets":[]},{"label":"2025-10-27","value":0,"startTime":1761436800000,"endTime":1761523200000,"tweets":[]},{"label":"2025-10-28","value":0,"startTime":1761523200000,"endTime":1761609600000,"tweets":[]},{"label":"2025-10-29","value":0,"startTime":1761609600000,"endTime":1761696000000,"tweets":[]},{"label":"2025-10-30","value":0,"startTime":1761696000000,"endTime":1761782400000,"tweets":[]},{"label":"2025-10-31","value":0,"startTime":1761782400000,"endTime":1761868800000,"tweets":[]},{"label":"2025-11-01","value":0,"startTime":1761868800000,"endTime":1761955200000,"tweets":[]},{"label":"2025-11-02","value":0,"startTime":1761955200000,"endTime":1762041600000,"tweets":[]},{"label":"2025-11-03","value":0,"startTime":1762041600000,"endTime":1762128000000,"tweets":[]},{"label":"2025-11-04","value":1,"startTime":1762128000000,"endTime":1762214400000,"tweets":[{"bookmarked":false,"display_text_range":[0,137],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"luma.com/h9kb99vd","expanded_url":"https://luma.com/h9kb99vd","url":"https://t.co/wLItVEHEMc","indices":[114,137]}],"user_mentions":[]},"favorited":true,"lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":true,"fact_check":null,"id":"1985427712484524433","view_count":13587,"bookmark_count":31,"created_at":1762197828000,"favorite_count":107,"quote_count":2,"reply_count":1,"retweet_count":4,"user_id_str":"1448787032486989825","conversation_id_str":"1985427712484524433","full_text":"We're releasing Terminal-Bench 2.0 this week! Come to our meetup on Thursday @ Databricks to get early access :)\n\nhttps://t.co/wLItVEHEMc","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-05","value":0,"startTime":1762214400000,"endTime":1762300800000,"tweets":[]},{"label":"2025-11-06","value":1,"startTime":1762300800000,"endTime":1762387200000,"tweets":[{"bookmarked":false,"display_text_range":[0,21],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1986093902122942700","quoted_status_permalink":{"url":"https://t.co/zhs56oNQSL","expanded":"https://twitter.com/jyangballin/status/1986093902122942700","display":"x.com/jyangballin/st…"},"retweeted":false,"fact_check":null,"id":"1986106642157703523","view_count":2599,"bookmark_count":6,"created_at":1762359698000,"favorite_count":15,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986106642157703523","full_text":"Very clever benchmark","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-11-07","value":0,"startTime":1762387200000,"endTime":1762473600000,"tweets":[]},{"label":"2025-11-08","value":36,"startTime":1762473600000,"endTime":1762560000000,"tweets":[{"bookmarked":false,"display_text_range":[0,235],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/YwmacS625Z","expanded_url":"https://x.com/alexgshaw/status/1986911106108211461/photo/1","id_str":"1986906599416602625","indices":[236,259],"media_key":"3_1986906599416602625","media_url_https":"https://pbs.twimg.com/media/G5LpBmwbIAElWXo.png","type":"photo","url":"https://t.co/YwmacS625Z","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1024,"w":2048,"resize":"fit"},"medium":{"h":600,"w":1200,"resize":"fit"},"small":{"h":340,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1024,"width":2048,"focus_rects":[{"x":0,"y":0,"w":1829,"h":1024},{"x":0,"y":0,"w":1024,"h":1024},{"x":0,"y":0,"w":898,"h":1024},{"x":0,"y":0,"w":512,"h":1024},{"x":0,"y":0,"w":2048,"h":1024}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986906599416602625"}}}],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/YwmacS625Z","expanded_url":"https://x.com/alexgshaw/status/1986911106108211461/photo/1","id_str":"1986906599416602625","indices":[236,259],"media_key":"3_1986906599416602625","media_url_https":"https://pbs.twimg.com/media/G5LpBmwbIAElWXo.png","type":"photo","url":"https://t.co/YwmacS625Z","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1024,"w":2048,"resize":"fit"},"medium":{"h":600,"w":1200,"resize":"fit"},"small":{"h":340,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1024,"width":2048,"focus_rects":[{"x":0,"y":0,"w":1829,"h":1024},{"x":0,"y":0,"w":1024,"h":1024},{"x":0,"y":0,"w":898,"h":1024},{"x":0,"y":0,"w":512,"h":1024},{"x":0,"y":0,"w":2048,"h":1024}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986906599416602625"}}}]},"favorited":false,"lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911106108211461","view_count":104511,"bookmark_count":117,"created_at":1762551497000,"favorite_count":344,"quote_count":35,"reply_count":23,"retweet_count":71,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Today, we’re announcing the next chapter of Terminal-Bench with two releases:\n\n1. Harbor, a new package for running sandboxed agent rollouts at scale\n2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification https://t.co/YwmacS625Z","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,274],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"loom.com/share/8c11f218…","expanded_url":"https://www.loom.com/share/8c11f218c9fc4674bd659146af435627","url":"https://t.co/x4gD2nHQxC","indices":[306,329]}],"user_mentions":[{"id_str":"1645813553365139462","name":"Daytona","screen_name":"daytonaio","indices":[182,192]},{"id_str":"1551987185372512263","name":"Modal","screen_name":"modal","indices":[197,203]},{"id_str":"1645813553365139462","name":"Daytona","screen_name":"daytonaio","indices":[182,192]},{"id_str":"1551987185372512263","name":"Modal","screen_name":"modal","indices":[197,203]},{"id_str":"1927771369611350016","name":"SWE-bench","screen_name":"SWEbench","indices":[294,303]}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911110017269764","view_count":2504,"bookmark_count":5,"created_at":1762551498000,"favorite_count":28,"quote_count":0,"reply_count":2,"retweet_count":6,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Just a few of the features I love about Harbor:\n- Evaluate any agent that can be installed and run autonomously\n- Scale up to thousands of concurrent containers using providers like @daytonaio and @modal\n- Generate rollouts for SFT and RL\n- Create your own benchmarks or use existing ones like @SWEbench \n\nhttps://t.co/x4gD2nHQxC","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911108117254546","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,209],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"harborframework.com","expanded_url":"https://harborframework.com/","url":"https://t.co/5kDZlvjhb7","indices":[186,209]}],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911108117254546","view_count":2828,"bookmark_count":7,"created_at":1762551497000,"favorite_count":27,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Harbor is the package we wish we had had while making Terminal-Bench. It’s for agent, model, and benchmark developers and researchers who want to evaluate and improve agents and models.\nhttps://t.co/5kDZlvjhb7","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911106108211461","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,194],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911111460089993","view_count":1578,"bookmark_count":0,"created_at":1762551498000,"favorite_count":16,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Harbor is also the official harness for Terminal-Bench 2.0. We used it to run tens of thousands of experiments in containerized environments while developing the latest version of the benchmark.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911110017269764","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,152],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911113209126988","view_count":1451,"bookmark_count":1,"created_at":1762551499000,"favorite_count":14,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"So why Terminal-Bench 2.0? We always knew that as model capabilities increased, we’d need to keep Terminal-Bench up to date with frontier capabilities.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911111460089993","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,277],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911114626802143","view_count":1535,"bookmark_count":0,"created_at":1762551499000,"favorite_count":17,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Terminal-Bench 2.0 consists of 89 hard tasks to test these capabilities. We aim to push the frontier with an increased emphasis on task quality. Each task received several hours of human and LM-assisted verification to ensure that tasks are (1) solvable, (2) realistic, and (3) well-specified. We’ll share more on how we did this in our upcoming preprint.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911113209126988","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,96],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/GIY0foKhNs","expanded_url":"https://x.com/alexgshaw/status/1986911116115759210/photo/1","id_str":"1986908092202950658","indices":[97,120],"media_key":"3_1986908092202950658","media_url_https":"https://pbs.twimg.com/media/G5LqYf0bIAIo56j.png","type":"photo","url":"https://t.co/GIY0foKhNs","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[{"x":507,"y":873,"h":103,"w":103}]},"medium":{"faces":[{"x":297,"y":511,"h":60,"w":60}]},"small":{"faces":[{"x":168,"y":290,"h":34,"w":34}]},"orig":{"faces":[{"x":637,"y":1097,"h":130,"w":130}]}},"sizes":{"large":{"h":1147,"w":2048,"resize":"fit"},"medium":{"h":672,"w":1200,"resize":"fit"},"small":{"h":381,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1440,"width":2572,"focus_rects":[{"x":0,"y":0,"w":2571,"h":1440},{"x":0,"y":0,"w":1440,"h":1440},{"x":0,"y":0,"w":1263,"h":1440},{"x":90,"y":0,"w":720,"h":1440},{"x":0,"y":0,"w":2572,"h":1440}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908092202950658"}}}],"symbols":[],"timestamps":[],"urls":[{"display_url":"tbench.ai/leaderboard","expanded_url":"https://www.tbench.ai/leaderboard","url":"https://t.co/vblfZF55Nd","indices":[73,96]}],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/GIY0foKhNs","expanded_url":"https://x.com/alexgshaw/status/1986911116115759210/photo/1","id_str":"1986908092202950658","indices":[97,120],"media_key":"3_1986908092202950658","media_url_https":"https://pbs.twimg.com/media/G5LqYf0bIAIo56j.png","type":"photo","url":"https://t.co/GIY0foKhNs","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[{"x":507,"y":873,"h":103,"w":103}]},"medium":{"faces":[{"x":297,"y":511,"h":60,"w":60}]},"small":{"faces":[{"x":168,"y":290,"h":34,"w":34}]},"orig":{"faces":[{"x":637,"y":1097,"h":130,"w":130}]}},"sizes":{"large":{"h":1147,"w":2048,"resize":"fit"},"medium":{"h":672,"w":1200,"resize":"fit"},"small":{"h":381,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1440,"width":2572,"focus_rects":[{"x":0,"y":0,"w":2571,"h":1440},{"x":0,"y":0,"w":1440,"h":1440},{"x":0,"y":0,"w":1263,"h":1440},{"x":90,"y":0,"w":720,"h":1440},{"x":0,"y":0,"w":2572,"h":1440}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908092202950658"}}}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911116115759210","view_count":1916,"bookmark_count":5,"created_at":1762551499000,"favorite_count":25,"quote_count":2,"reply_count":3,"retweet_count":3,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"At present, Codex CLI with GPT-5 sits at the top of our new leaderboard.\nhttps://t.co/vblfZF55Nd https://t.co/GIY0foKhNs","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911114626802143","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,272],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911119328616903","view_count":1335,"bookmark_count":2,"created_at":1762551500000,"favorite_count":16,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Astute Terminal-Bench fans may notice that SOTA performance is comparable to TB1.0 despite our claim that TB2.0 is harder. We believe that this is because task quality is substantially higher in the new benchmark. We have removed several misspecified or impossible tasks – increasing difficulty while maintaining raw performance.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911116115759210","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,148],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"harborframework.com/docs/running-t…","expanded_url":"https://harborframework.com/docs/running-tbench","url":"https://t.co/hmiBYh7VYF","indices":[125,148]}],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911120511373685","view_count":1241,"bookmark_count":1,"created_at":1762551500000,"favorite_count":17,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Interested in using Terminal-Bench-2.0 or submitting to our new leaderboard? Check out the Harbor docs for more information.\nhttps://t.co/hmiBYh7VYF","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911119328616903","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,265],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1233837766271569920","name":"Mike A. Merrill","screen_name":"Mike_A_Merrill","indices":[51,66]},{"id_str":"64333359","name":"Ludwig Schmidt","screen_name":"lschmidt3","indices":[121,131]},{"id_str":"168787972","name":"Andy Konwinski","screen_name":"andykonwinski","indices":[185,199]},{"id_str":"1854370033520324608","name":"Laude Institute","screen_name":"LaudeInstitute","indices":[217,232]}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911121924890876","view_count":1219,"bookmark_count":0,"created_at":1762551501000,"favorite_count":17,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"It's been a pleasure working alongside my co-lead, @mike_a_merrill, whose leadership rallied the team, and our advisors, @lschmidt3, who seeded the idea and guided our development, and @andykonwinski, who gave us the @LaudeInstitute mandate to \"ship your research.\"","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911120511373685","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,254],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/PcQz6FkVua","expanded_url":"https://x.com/alexgshaw/status/1986911123543916899/photo/1","id_str":"1986908910708781056","indices":[255,278],"media_key":"3_1986908910708781056","media_url_https":"https://pbs.twimg.com/media/G5LrII_a8AAi2EG.png","type":"photo","url":"https://t.co/PcQz6FkVua","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1822,"w":1292,"resize":"fit"},"medium":{"h":1200,"w":851,"resize":"fit"},"small":{"h":680,"w":482,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1822,"width":1292,"focus_rects":[{"x":0,"y":320,"w":1292,"h":724},{"x":0,"y":36,"w":1292,"h":1292},{"x":0,"y":0,"w":1292,"h":1473},{"x":319,"y":0,"w":911,"h":1822},{"x":0,"y":0,"w":1292,"h":1822}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908910708781056"}}}],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/PcQz6FkVua","expanded_url":"https://x.com/alexgshaw/status/1986911123543916899/photo/1","id_str":"1986908910708781056","indices":[255,278],"media_key":"3_1986908910708781056","media_url_https":"https://pbs.twimg.com/media/G5LrII_a8AAi2EG.png","type":"photo","url":"https://t.co/PcQz6FkVua","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1822,"w":1292,"resize":"fit"},"medium":{"h":1200,"w":851,"resize":"fit"},"small":{"h":680,"w":482,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1822,"width":1292,"focus_rects":[{"x":0,"y":320,"w":1292,"h":724},{"x":0,"y":36,"w":1292,"h":1292},{"x":0,"y":0,"w":1292,"h":1473},{"x":319,"y":0,"w":911,"h":1822},{"x":0,"y":0,"w":1292,"h":1822}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908910708781056"}}}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911123543916899","view_count":1424,"bookmark_count":0,"created_at":1762551501000,"favorite_count":19,"quote_count":1,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Additionally, Terminal-Bench wouldn’t be possible without its community. We’re so thankful to the over 1k members of our Discord who contributed and audited tasks, helped build and beta test Harbor, and made this such a fun project for everyone involved. https://t.co/PcQz6FkVua","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911121924890876","is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-09","value":1,"startTime":1762560000000,"endTime":1762646400000,"tweets":[{"bookmarked":false,"display_text_range":[0,75],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"33521530","name":"swyx","screen_name":"swyx","indices":[10,15]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1986989606320185472","quoted_status_permalink":{"url":"https://t.co/oyJSv9Xper","expanded":"https://twitter.com/latentspacepod/status/1986989606320185472","display":"x.com/latentspacepod…"},"retweeted":false,"fact_check":null,"id":"1987015010288353599","view_count":495,"bookmark_count":1,"created_at":1762576270000,"favorite_count":8,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1987015010288353599","full_text":"Thank you @swyx ! Was great to have you, and the live podcast was awesome 😁","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[16,63],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"284333988","name":"Logan Kilpatrick","screen_name":"OfficialLoganK","indices":[0,15]}]},"favorited":false,"in_reply_to_screen_name":"OfficialLoganK","lang":"en","retweeted":false,"fact_check":null,"id":"1986948823198146586","view_count":449,"bookmark_count":0,"created_at":1762560489000,"favorite_count":2,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@OfficialLoganK Looking forward to seeing where Gemini 3 lands!","in_reply_to_user_id_str":"284333988","in_reply_to_status_id_str":"1986914722235502858","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[17,37],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"993859188823601152","name":"Robert","screen_name":"skull8888888888","indices":[0,16]}]},"favorited":false,"in_reply_to_screen_name":"skull8888888888","lang":"en","retweeted":false,"fact_check":null,"id":"1986949187930616214","view_count":241,"bookmark_count":0,"created_at":1762560576000,"favorite_count":0,"quote_count":0,"reply_count":1,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@skull8888888888 DM me I'm interested","in_reply_to_user_id_str":"993859188823601152","in_reply_to_status_id_str":"1986927150415430137","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[13,137],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"harborframework.com/docs/task-diff…","expanded_url":"https://harborframework.com/docs/task-difference","url":"https://t.co/dAGxC918PM","indices":[114,137]}],"user_mentions":[{"id_str":"3787342814","name":"pash","screen_name":"pashmerepat","indices":[0,12]}]},"favorited":false,"in_reply_to_screen_name":"pashmerepat","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986949012445143197","view_count":319,"bookmark_count":0,"created_at":1762560534000,"favorite_count":3,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@pashmerepat Yes and here is a link that describes the differences in addition to the migration guide Mike shared https://t.co/dAGxC918PM","in_reply_to_user_id_str":"3787342814","in_reply_to_status_id_str":"1986913713233010728","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[10,40],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"633509372","name":"Monk Zero","screen_name":"NoCommas","indices":[0,9]}]},"favorited":false,"in_reply_to_screen_name":"NoCommas","lang":"en","retweeted":false,"fact_check":null,"id":"1986949254196437085","view_count":267,"bookmark_count":0,"created_at":1762560592000,"favorite_count":1,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@NoCommas Excited to see where you land!","in_reply_to_user_id_str":"633509372","in_reply_to_status_id_str":"1986926703550328905","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[23,38],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"2625021945","name":"potemin","screen_name":"poteminr","indices":[0,9]},{"id_str":"1353836358901501952","name":"Anthropic","screen_name":"AnthropicAI","indices":[10,22]}]},"favorited":false,"in_reply_to_screen_name":"poteminr","lang":"en","retweeted":false,"fact_check":null,"id":"1987008242187485219","view_count":157,"bookmark_count":0,"created_at":1762574656000,"favorite_count":1,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@poteminr @AnthropicAI You and me both","in_reply_to_user_id_str":"2625021945","in_reply_to_status_id_str":"1987002302088237511","is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-10","value":0,"startTime":1762646400000,"endTime":1762732800000,"tweets":[{"bookmarked":false,"display_text_range":[13,14],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1531565962641788936","name":"Dylan Allen-Arnegård","screen_name":"dbear_allen","indices":[0,12]}]},"favorited":false,"in_reply_to_screen_name":"dbear_allen","lang":"qme","retweeted":false,"fact_check":null,"id":"1987404815404769403","view_count":69,"bookmark_count":0,"created_at":1762669206000,"favorite_count":1,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@dbear_allen 🙏","in_reply_to_user_id_str":"1531565962641788936","in_reply_to_status_id_str":"1987400399704498635","is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-11","value":0,"startTime":1762732800000,"endTime":1762819200000,"tweets":[]},{"label":"2025-11-12","value":0,"startTime":1762819200000,"endTime":1762905600000,"tweets":[]},{"label":"2025-11-13","value":0,"startTime":1762905600000,"endTime":1762992000000,"tweets":[]},{"label":"2025-11-14","value":0,"startTime":1762992000000,"endTime":1763078400000,"tweets":[{"bookmarked":false,"display_text_range":[0,280],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1988372114257436810","quoted_status_permalink":{"url":"https://t.co/4o1qxAGQ7E","expanded":"https://twitter.com/warpdotdev/status/1988372114257436810","display":"x.com/warpdotdev/sta…"},"retweeted":false,"fact_check":null,"id":"1989092917798138326","view_count":334,"bookmark_count":0,"created_at":1763071681000,"favorite_count":8,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1989092917798138326","full_text":"Great to see Warp putting up the top score on Terminal-Bench 2.0 just days after release! Even more exciting to hear that they've already made improvements to their agent based on the results.\n\nUltimately, we hope that Terminal-Bench 2.0 accelerates model and agent development in the domain of text-based computer use, benefiting millions of users.","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-11-15","value":0,"startTime":1763078400000,"endTime":1763164800000,"tweets":[]},{"label":"2025-11-16","value":0,"startTime":1763164800000,"endTime":1763251200000,"tweets":[]},{"label":"2025-11-17","value":0,"startTime":1763251200000,"endTime":1763337600000,"tweets":[]}],"nbookmarks":[{"label":"2025-10-18","value":0,"startTime":1760659200000,"endTime":1760745600000,"tweets":[]},{"label":"2025-10-19","value":0,"startTime":1760745600000,"endTime":1760832000000,"tweets":[]},{"label":"2025-10-20","value":0,"startTime":1760832000000,"endTime":1760918400000,"tweets":[]},{"label":"2025-10-21","value":3,"startTime":1760918400000,"endTime":1761004800000,"tweets":[{"bookmarked":false,"display_text_range":[0,40],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1670869422733393966","name":"Latent.Space","screen_name":"latentspacepod","indices":[23,38]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1980330471163892201","quoted_status_permalink":{"url":"https://t.co/sMxAiyEOQ0","expanded":"https://twitter.com/Mike_A_Merrill/status/1980330471163892201","display":"x.com/Mike_A_Merrill…"},"retweeted":false,"fact_check":null,"id":"1980344520215785912","view_count":2315,"bookmark_count":3,"created_at":1760985901000,"favorite_count":18,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1980344520215785912","full_text":"Mike and I went on the @latentspacepod !","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-10-22","value":0,"startTime":1761004800000,"endTime":1761091200000,"tweets":[]},{"label":"2025-10-23","value":2,"startTime":1761091200000,"endTime":1761177600000,"tweets":[{"bookmarked":false,"display_text_range":[0,5],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1980264839806177286","quoted_status_permalink":{"url":"https://t.co/jl3wu1E3P8","expanded":"https://twitter.com/petergostev/status/1980264839806177286","display":"x.com/petergostev/st…"},"retweeted":false,"fact_check":null,"id":"1981121994533048588","view_count":453,"bookmark_count":0,"created_at":1761171265000,"favorite_count":5,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1981121994533048588","full_text":"Cool!","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,280],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1981013196195668143","quoted_status_permalink":{"url":"https://t.co/eWXL225mnb","expanded":"https://twitter.com/DimitrisPapail/status/1981013196195668143","display":"x.com/DimitrisPapail…"},"retweeted":false,"fact_check":null,"id":"1981062983821512920","view_count":782,"bookmark_count":2,"created_at":1761157196000,"favorite_count":8,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1981062983821512920","full_text":"One definition of AGI is the full automation of all computer work.\n\nA Terminal-Bench task is an instruction, container, and test executable. Agents are just programs that get executed in the container.\n\nSo Terminal-Bench can measure the automation of any computer task, i.e., AGI.","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-10-24","value":0,"startTime":1761177600000,"endTime":1761264000000,"tweets":[]},{"label":"2025-10-25","value":0,"startTime":1761264000000,"endTime":1761350400000,"tweets":[]},{"label":"2025-10-26","value":0,"startTime":1761350400000,"endTime":1761436800000,"tweets":[]},{"label":"2025-10-27","value":0,"startTime":1761436800000,"endTime":1761523200000,"tweets":[]},{"label":"2025-10-28","value":0,"startTime":1761523200000,"endTime":1761609600000,"tweets":[]},{"label":"2025-10-29","value":0,"startTime":1761609600000,"endTime":1761696000000,"tweets":[]},{"label":"2025-10-30","value":0,"startTime":1761696000000,"endTime":1761782400000,"tweets":[]},{"label":"2025-10-31","value":0,"startTime":1761782400000,"endTime":1761868800000,"tweets":[]},{"label":"2025-11-01","value":0,"startTime":1761868800000,"endTime":1761955200000,"tweets":[]},{"label":"2025-11-02","value":0,"startTime":1761955200000,"endTime":1762041600000,"tweets":[]},{"label":"2025-11-03","value":0,"startTime":1762041600000,"endTime":1762128000000,"tweets":[]},{"label":"2025-11-04","value":31,"startTime":1762128000000,"endTime":1762214400000,"tweets":[{"bookmarked":false,"display_text_range":[0,137],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"luma.com/h9kb99vd","expanded_url":"https://luma.com/h9kb99vd","url":"https://t.co/wLItVEHEMc","indices":[114,137]}],"user_mentions":[]},"favorited":true,"lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":true,"fact_check":null,"id":"1985427712484524433","view_count":13587,"bookmark_count":31,"created_at":1762197828000,"favorite_count":107,"quote_count":2,"reply_count":1,"retweet_count":4,"user_id_str":"1448787032486989825","conversation_id_str":"1985427712484524433","full_text":"We're releasing Terminal-Bench 2.0 this week! Come to our meetup on Thursday @ Databricks to get early access :)\n\nhttps://t.co/wLItVEHEMc","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-05","value":0,"startTime":1762214400000,"endTime":1762300800000,"tweets":[]},{"label":"2025-11-06","value":6,"startTime":1762300800000,"endTime":1762387200000,"tweets":[{"bookmarked":false,"display_text_range":[0,21],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1986093902122942700","quoted_status_permalink":{"url":"https://t.co/zhs56oNQSL","expanded":"https://twitter.com/jyangballin/status/1986093902122942700","display":"x.com/jyangballin/st…"},"retweeted":false,"fact_check":null,"id":"1986106642157703523","view_count":2599,"bookmark_count":6,"created_at":1762359698000,"favorite_count":15,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986106642157703523","full_text":"Very clever benchmark","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-11-07","value":0,"startTime":1762387200000,"endTime":1762473600000,"tweets":[]},{"label":"2025-11-08","value":138,"startTime":1762473600000,"endTime":1762560000000,"tweets":[{"bookmarked":false,"display_text_range":[0,235],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/YwmacS625Z","expanded_url":"https://x.com/alexgshaw/status/1986911106108211461/photo/1","id_str":"1986906599416602625","indices":[236,259],"media_key":"3_1986906599416602625","media_url_https":"https://pbs.twimg.com/media/G5LpBmwbIAElWXo.png","type":"photo","url":"https://t.co/YwmacS625Z","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1024,"w":2048,"resize":"fit"},"medium":{"h":600,"w":1200,"resize":"fit"},"small":{"h":340,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1024,"width":2048,"focus_rects":[{"x":0,"y":0,"w":1829,"h":1024},{"x":0,"y":0,"w":1024,"h":1024},{"x":0,"y":0,"w":898,"h":1024},{"x":0,"y":0,"w":512,"h":1024},{"x":0,"y":0,"w":2048,"h":1024}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986906599416602625"}}}],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/YwmacS625Z","expanded_url":"https://x.com/alexgshaw/status/1986911106108211461/photo/1","id_str":"1986906599416602625","indices":[236,259],"media_key":"3_1986906599416602625","media_url_https":"https://pbs.twimg.com/media/G5LpBmwbIAElWXo.png","type":"photo","url":"https://t.co/YwmacS625Z","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1024,"w":2048,"resize":"fit"},"medium":{"h":600,"w":1200,"resize":"fit"},"small":{"h":340,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1024,"width":2048,"focus_rects":[{"x":0,"y":0,"w":1829,"h":1024},{"x":0,"y":0,"w":1024,"h":1024},{"x":0,"y":0,"w":898,"h":1024},{"x":0,"y":0,"w":512,"h":1024},{"x":0,"y":0,"w":2048,"h":1024}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986906599416602625"}}}]},"favorited":false,"lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911106108211461","view_count":104511,"bookmark_count":117,"created_at":1762551497000,"favorite_count":344,"quote_count":35,"reply_count":23,"retweet_count":71,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Today, we’re announcing the next chapter of Terminal-Bench with two releases:\n\n1. Harbor, a new package for running sandboxed agent rollouts at scale\n2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification https://t.co/YwmacS625Z","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,274],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"loom.com/share/8c11f218…","expanded_url":"https://www.loom.com/share/8c11f218c9fc4674bd659146af435627","url":"https://t.co/x4gD2nHQxC","indices":[306,329]}],"user_mentions":[{"id_str":"1645813553365139462","name":"Daytona","screen_name":"daytonaio","indices":[182,192]},{"id_str":"1551987185372512263","name":"Modal","screen_name":"modal","indices":[197,203]},{"id_str":"1645813553365139462","name":"Daytona","screen_name":"daytonaio","indices":[182,192]},{"id_str":"1551987185372512263","name":"Modal","screen_name":"modal","indices":[197,203]},{"id_str":"1927771369611350016","name":"SWE-bench","screen_name":"SWEbench","indices":[294,303]}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911110017269764","view_count":2504,"bookmark_count":5,"created_at":1762551498000,"favorite_count":28,"quote_count":0,"reply_count":2,"retweet_count":6,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Just a few of the features I love about Harbor:\n- Evaluate any agent that can be installed and run autonomously\n- Scale up to thousands of concurrent containers using providers like @daytonaio and @modal\n- Generate rollouts for SFT and RL\n- Create your own benchmarks or use existing ones like @SWEbench \n\nhttps://t.co/x4gD2nHQxC","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911108117254546","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,209],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"harborframework.com","expanded_url":"https://harborframework.com/","url":"https://t.co/5kDZlvjhb7","indices":[186,209]}],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911108117254546","view_count":2828,"bookmark_count":7,"created_at":1762551497000,"favorite_count":27,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Harbor is the package we wish we had had while making Terminal-Bench. It’s for agent, model, and benchmark developers and researchers who want to evaluate and improve agents and models.\nhttps://t.co/5kDZlvjhb7","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911106108211461","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,194],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911111460089993","view_count":1578,"bookmark_count":0,"created_at":1762551498000,"favorite_count":16,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Harbor is also the official harness for Terminal-Bench 2.0. We used it to run tens of thousands of experiments in containerized environments while developing the latest version of the benchmark.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911110017269764","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,152],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911113209126988","view_count":1451,"bookmark_count":1,"created_at":1762551499000,"favorite_count":14,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"So why Terminal-Bench 2.0? We always knew that as model capabilities increased, we’d need to keep Terminal-Bench up to date with frontier capabilities.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911111460089993","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,277],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911114626802143","view_count":1535,"bookmark_count":0,"created_at":1762551499000,"favorite_count":17,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Terminal-Bench 2.0 consists of 89 hard tasks to test these capabilities. We aim to push the frontier with an increased emphasis on task quality. Each task received several hours of human and LM-assisted verification to ensure that tasks are (1) solvable, (2) realistic, and (3) well-specified. We’ll share more on how we did this in our upcoming preprint.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911113209126988","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,96],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/GIY0foKhNs","expanded_url":"https://x.com/alexgshaw/status/1986911116115759210/photo/1","id_str":"1986908092202950658","indices":[97,120],"media_key":"3_1986908092202950658","media_url_https":"https://pbs.twimg.com/media/G5LqYf0bIAIo56j.png","type":"photo","url":"https://t.co/GIY0foKhNs","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[{"x":507,"y":873,"h":103,"w":103}]},"medium":{"faces":[{"x":297,"y":511,"h":60,"w":60}]},"small":{"faces":[{"x":168,"y":290,"h":34,"w":34}]},"orig":{"faces":[{"x":637,"y":1097,"h":130,"w":130}]}},"sizes":{"large":{"h":1147,"w":2048,"resize":"fit"},"medium":{"h":672,"w":1200,"resize":"fit"},"small":{"h":381,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1440,"width":2572,"focus_rects":[{"x":0,"y":0,"w":2571,"h":1440},{"x":0,"y":0,"w":1440,"h":1440},{"x":0,"y":0,"w":1263,"h":1440},{"x":90,"y":0,"w":720,"h":1440},{"x":0,"y":0,"w":2572,"h":1440}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908092202950658"}}}],"symbols":[],"timestamps":[],"urls":[{"display_url":"tbench.ai/leaderboard","expanded_url":"https://www.tbench.ai/leaderboard","url":"https://t.co/vblfZF55Nd","indices":[73,96]}],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/GIY0foKhNs","expanded_url":"https://x.com/alexgshaw/status/1986911116115759210/photo/1","id_str":"1986908092202950658","indices":[97,120],"media_key":"3_1986908092202950658","media_url_https":"https://pbs.twimg.com/media/G5LqYf0bIAIo56j.png","type":"photo","url":"https://t.co/GIY0foKhNs","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[{"x":507,"y":873,"h":103,"w":103}]},"medium":{"faces":[{"x":297,"y":511,"h":60,"w":60}]},"small":{"faces":[{"x":168,"y":290,"h":34,"w":34}]},"orig":{"faces":[{"x":637,"y":1097,"h":130,"w":130}]}},"sizes":{"large":{"h":1147,"w":2048,"resize":"fit"},"medium":{"h":672,"w":1200,"resize":"fit"},"small":{"h":381,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1440,"width":2572,"focus_rects":[{"x":0,"y":0,"w":2571,"h":1440},{"x":0,"y":0,"w":1440,"h":1440},{"x":0,"y":0,"w":1263,"h":1440},{"x":90,"y":0,"w":720,"h":1440},{"x":0,"y":0,"w":2572,"h":1440}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908092202950658"}}}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911116115759210","view_count":1916,"bookmark_count":5,"created_at":1762551499000,"favorite_count":25,"quote_count":2,"reply_count":3,"retweet_count":3,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"At present, Codex CLI with GPT-5 sits at the top of our new leaderboard.\nhttps://t.co/vblfZF55Nd https://t.co/GIY0foKhNs","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911114626802143","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,272],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911119328616903","view_count":1335,"bookmark_count":2,"created_at":1762551500000,"favorite_count":16,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Astute Terminal-Bench fans may notice that SOTA performance is comparable to TB1.0 despite our claim that TB2.0 is harder. We believe that this is because task quality is substantially higher in the new benchmark. We have removed several misspecified or impossible tasks – increasing difficulty while maintaining raw performance.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911116115759210","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,148],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"harborframework.com/docs/running-t…","expanded_url":"https://harborframework.com/docs/running-tbench","url":"https://t.co/hmiBYh7VYF","indices":[125,148]}],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911120511373685","view_count":1241,"bookmark_count":1,"created_at":1762551500000,"favorite_count":17,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Interested in using Terminal-Bench-2.0 or submitting to our new leaderboard? Check out the Harbor docs for more information.\nhttps://t.co/hmiBYh7VYF","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911119328616903","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,265],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1233837766271569920","name":"Mike A. Merrill","screen_name":"Mike_A_Merrill","indices":[51,66]},{"id_str":"64333359","name":"Ludwig Schmidt","screen_name":"lschmidt3","indices":[121,131]},{"id_str":"168787972","name":"Andy Konwinski","screen_name":"andykonwinski","indices":[185,199]},{"id_str":"1854370033520324608","name":"Laude Institute","screen_name":"LaudeInstitute","indices":[217,232]}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911121924890876","view_count":1219,"bookmark_count":0,"created_at":1762551501000,"favorite_count":17,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"It's been a pleasure working alongside my co-lead, @mike_a_merrill, whose leadership rallied the team, and our advisors, @lschmidt3, who seeded the idea and guided our development, and @andykonwinski, who gave us the @LaudeInstitute mandate to \"ship your research.\"","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911120511373685","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,254],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/PcQz6FkVua","expanded_url":"https://x.com/alexgshaw/status/1986911123543916899/photo/1","id_str":"1986908910708781056","indices":[255,278],"media_key":"3_1986908910708781056","media_url_https":"https://pbs.twimg.com/media/G5LrII_a8AAi2EG.png","type":"photo","url":"https://t.co/PcQz6FkVua","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1822,"w":1292,"resize":"fit"},"medium":{"h":1200,"w":851,"resize":"fit"},"small":{"h":680,"w":482,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1822,"width":1292,"focus_rects":[{"x":0,"y":320,"w":1292,"h":724},{"x":0,"y":36,"w":1292,"h":1292},{"x":0,"y":0,"w":1292,"h":1473},{"x":319,"y":0,"w":911,"h":1822},{"x":0,"y":0,"w":1292,"h":1822}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908910708781056"}}}],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/PcQz6FkVua","expanded_url":"https://x.com/alexgshaw/status/1986911123543916899/photo/1","id_str":"1986908910708781056","indices":[255,278],"media_key":"3_1986908910708781056","media_url_https":"https://pbs.twimg.com/media/G5LrII_a8AAi2EG.png","type":"photo","url":"https://t.co/PcQz6FkVua","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1822,"w":1292,"resize":"fit"},"medium":{"h":1200,"w":851,"resize":"fit"},"small":{"h":680,"w":482,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1822,"width":1292,"focus_rects":[{"x":0,"y":320,"w":1292,"h":724},{"x":0,"y":36,"w":1292,"h":1292},{"x":0,"y":0,"w":1292,"h":1473},{"x":319,"y":0,"w":911,"h":1822},{"x":0,"y":0,"w":1292,"h":1822}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908910708781056"}}}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911123543916899","view_count":1424,"bookmark_count":0,"created_at":1762551501000,"favorite_count":19,"quote_count":1,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Additionally, Terminal-Bench wouldn’t be possible without its community. We’re so thankful to the over 1k members of our Discord who contributed and audited tasks, helped build and beta test Harbor, and made this such a fun project for everyone involved. https://t.co/PcQz6FkVua","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911121924890876","is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-09","value":1,"startTime":1762560000000,"endTime":1762646400000,"tweets":[{"bookmarked":false,"display_text_range":[0,75],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"33521530","name":"swyx","screen_name":"swyx","indices":[10,15]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1986989606320185472","quoted_status_permalink":{"url":"https://t.co/oyJSv9Xper","expanded":"https://twitter.com/latentspacepod/status/1986989606320185472","display":"x.com/latentspacepod…"},"retweeted":false,"fact_check":null,"id":"1987015010288353599","view_count":495,"bookmark_count":1,"created_at":1762576270000,"favorite_count":8,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1987015010288353599","full_text":"Thank you @swyx ! Was great to have you, and the live podcast was awesome 😁","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[16,63],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"284333988","name":"Logan Kilpatrick","screen_name":"OfficialLoganK","indices":[0,15]}]},"favorited":false,"in_reply_to_screen_name":"OfficialLoganK","lang":"en","retweeted":false,"fact_check":null,"id":"1986948823198146586","view_count":449,"bookmark_count":0,"created_at":1762560489000,"favorite_count":2,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@OfficialLoganK Looking forward to seeing where Gemini 3 lands!","in_reply_to_user_id_str":"284333988","in_reply_to_status_id_str":"1986914722235502858","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[17,37],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"993859188823601152","name":"Robert","screen_name":"skull8888888888","indices":[0,16]}]},"favorited":false,"in_reply_to_screen_name":"skull8888888888","lang":"en","retweeted":false,"fact_check":null,"id":"1986949187930616214","view_count":241,"bookmark_count":0,"created_at":1762560576000,"favorite_count":0,"quote_count":0,"reply_count":1,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@skull8888888888 DM me I'm interested","in_reply_to_user_id_str":"993859188823601152","in_reply_to_status_id_str":"1986927150415430137","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[13,137],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"harborframework.com/docs/task-diff…","expanded_url":"https://harborframework.com/docs/task-difference","url":"https://t.co/dAGxC918PM","indices":[114,137]}],"user_mentions":[{"id_str":"3787342814","name":"pash","screen_name":"pashmerepat","indices":[0,12]}]},"favorited":false,"in_reply_to_screen_name":"pashmerepat","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986949012445143197","view_count":319,"bookmark_count":0,"created_at":1762560534000,"favorite_count":3,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@pashmerepat Yes and here is a link that describes the differences in addition to the migration guide Mike shared https://t.co/dAGxC918PM","in_reply_to_user_id_str":"3787342814","in_reply_to_status_id_str":"1986913713233010728","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[10,40],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"633509372","name":"Monk Zero","screen_name":"NoCommas","indices":[0,9]}]},"favorited":false,"in_reply_to_screen_name":"NoCommas","lang":"en","retweeted":false,"fact_check":null,"id":"1986949254196437085","view_count":267,"bookmark_count":0,"created_at":1762560592000,"favorite_count":1,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@NoCommas Excited to see where you land!","in_reply_to_user_id_str":"633509372","in_reply_to_status_id_str":"1986926703550328905","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[23,38],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"2625021945","name":"potemin","screen_name":"poteminr","indices":[0,9]},{"id_str":"1353836358901501952","name":"Anthropic","screen_name":"AnthropicAI","indices":[10,22]}]},"favorited":false,"in_reply_to_screen_name":"poteminr","lang":"en","retweeted":false,"fact_check":null,"id":"1987008242187485219","view_count":157,"bookmark_count":0,"created_at":1762574656000,"favorite_count":1,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@poteminr @AnthropicAI You and me both","in_reply_to_user_id_str":"2625021945","in_reply_to_status_id_str":"1987002302088237511","is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-10","value":0,"startTime":1762646400000,"endTime":1762732800000,"tweets":[{"bookmarked":false,"display_text_range":[13,14],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1531565962641788936","name":"Dylan Allen-Arnegård","screen_name":"dbear_allen","indices":[0,12]}]},"favorited":false,"in_reply_to_screen_name":"dbear_allen","lang":"qme","retweeted":false,"fact_check":null,"id":"1987404815404769403","view_count":69,"bookmark_count":0,"created_at":1762669206000,"favorite_count":1,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@dbear_allen 🙏","in_reply_to_user_id_str":"1531565962641788936","in_reply_to_status_id_str":"1987400399704498635","is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-11","value":0,"startTime":1762732800000,"endTime":1762819200000,"tweets":[]},{"label":"2025-11-12","value":0,"startTime":1762819200000,"endTime":1762905600000,"tweets":[]},{"label":"2025-11-13","value":0,"startTime":1762905600000,"endTime":1762992000000,"tweets":[]},{"label":"2025-11-14","value":0,"startTime":1762992000000,"endTime":1763078400000,"tweets":[{"bookmarked":false,"display_text_range":[0,280],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1988372114257436810","quoted_status_permalink":{"url":"https://t.co/4o1qxAGQ7E","expanded":"https://twitter.com/warpdotdev/status/1988372114257436810","display":"x.com/warpdotdev/sta…"},"retweeted":false,"fact_check":null,"id":"1989092917798138326","view_count":334,"bookmark_count":0,"created_at":1763071681000,"favorite_count":8,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1989092917798138326","full_text":"Great to see Warp putting up the top score on Terminal-Bench 2.0 just days after release! Even more exciting to hear that they've already made improvements to their agent based on the results.\n\nUltimately, we hope that Terminal-Bench 2.0 accelerates model and agent development in the domain of text-based computer use, benefiting millions of users.","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-11-15","value":0,"startTime":1763078400000,"endTime":1763164800000,"tweets":[]},{"label":"2025-11-16","value":0,"startTime":1763164800000,"endTime":1763251200000,"tweets":[]},{"label":"2025-11-17","value":0,"startTime":1763251200000,"endTime":1763337600000,"tweets":[]}],"nretweets":[{"label":"2025-10-18","value":0,"startTime":1760659200000,"endTime":1760745600000,"tweets":[]},{"label":"2025-10-19","value":0,"startTime":1760745600000,"endTime":1760832000000,"tweets":[]},{"label":"2025-10-20","value":0,"startTime":1760832000000,"endTime":1760918400000,"tweets":[]},{"label":"2025-10-21","value":1,"startTime":1760918400000,"endTime":1761004800000,"tweets":[{"bookmarked":false,"display_text_range":[0,40],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1670869422733393966","name":"Latent.Space","screen_name":"latentspacepod","indices":[23,38]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1980330471163892201","quoted_status_permalink":{"url":"https://t.co/sMxAiyEOQ0","expanded":"https://twitter.com/Mike_A_Merrill/status/1980330471163892201","display":"x.com/Mike_A_Merrill…"},"retweeted":false,"fact_check":null,"id":"1980344520215785912","view_count":2315,"bookmark_count":3,"created_at":1760985901000,"favorite_count":18,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1980344520215785912","full_text":"Mike and I went on the @latentspacepod !","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-10-22","value":0,"startTime":1761004800000,"endTime":1761091200000,"tweets":[]},{"label":"2025-10-23","value":0,"startTime":1761091200000,"endTime":1761177600000,"tweets":[{"bookmarked":false,"display_text_range":[0,5],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1980264839806177286","quoted_status_permalink":{"url":"https://t.co/jl3wu1E3P8","expanded":"https://twitter.com/petergostev/status/1980264839806177286","display":"x.com/petergostev/st…"},"retweeted":false,"fact_check":null,"id":"1981121994533048588","view_count":453,"bookmark_count":0,"created_at":1761171265000,"favorite_count":5,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1981121994533048588","full_text":"Cool!","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,280],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1981013196195668143","quoted_status_permalink":{"url":"https://t.co/eWXL225mnb","expanded":"https://twitter.com/DimitrisPapail/status/1981013196195668143","display":"x.com/DimitrisPapail…"},"retweeted":false,"fact_check":null,"id":"1981062983821512920","view_count":782,"bookmark_count":2,"created_at":1761157196000,"favorite_count":8,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1981062983821512920","full_text":"One definition of AGI is the full automation of all computer work.\n\nA Terminal-Bench task is an instruction, container, and test executable. Agents are just programs that get executed in the container.\n\nSo Terminal-Bench can measure the automation of any computer task, i.e., AGI.","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-10-24","value":0,"startTime":1761177600000,"endTime":1761264000000,"tweets":[]},{"label":"2025-10-25","value":0,"startTime":1761264000000,"endTime":1761350400000,"tweets":[]},{"label":"2025-10-26","value":0,"startTime":1761350400000,"endTime":1761436800000,"tweets":[]},{"label":"2025-10-27","value":0,"startTime":1761436800000,"endTime":1761523200000,"tweets":[]},{"label":"2025-10-28","value":0,"startTime":1761523200000,"endTime":1761609600000,"tweets":[]},{"label":"2025-10-29","value":0,"startTime":1761609600000,"endTime":1761696000000,"tweets":[]},{"label":"2025-10-30","value":0,"startTime":1761696000000,"endTime":1761782400000,"tweets":[]},{"label":"2025-10-31","value":0,"startTime":1761782400000,"endTime":1761868800000,"tweets":[]},{"label":"2025-11-01","value":0,"startTime":1761868800000,"endTime":1761955200000,"tweets":[]},{"label":"2025-11-02","value":0,"startTime":1761955200000,"endTime":1762041600000,"tweets":[]},{"label":"2025-11-03","value":0,"startTime":1762041600000,"endTime":1762128000000,"tweets":[]},{"label":"2025-11-04","value":4,"startTime":1762128000000,"endTime":1762214400000,"tweets":[{"bookmarked":false,"display_text_range":[0,137],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"luma.com/h9kb99vd","expanded_url":"https://luma.com/h9kb99vd","url":"https://t.co/wLItVEHEMc","indices":[114,137]}],"user_mentions":[]},"favorited":true,"lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":true,"fact_check":null,"id":"1985427712484524433","view_count":13587,"bookmark_count":31,"created_at":1762197828000,"favorite_count":107,"quote_count":2,"reply_count":1,"retweet_count":4,"user_id_str":"1448787032486989825","conversation_id_str":"1985427712484524433","full_text":"We're releasing Terminal-Bench 2.0 this week! Come to our meetup on Thursday @ Databricks to get early access :)\n\nhttps://t.co/wLItVEHEMc","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-05","value":0,"startTime":1762214400000,"endTime":1762300800000,"tweets":[]},{"label":"2025-11-06","value":2,"startTime":1762300800000,"endTime":1762387200000,"tweets":[{"bookmarked":false,"display_text_range":[0,21],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1986093902122942700","quoted_status_permalink":{"url":"https://t.co/zhs56oNQSL","expanded":"https://twitter.com/jyangballin/status/1986093902122942700","display":"x.com/jyangballin/st…"},"retweeted":false,"fact_check":null,"id":"1986106642157703523","view_count":2599,"bookmark_count":6,"created_at":1762359698000,"favorite_count":15,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986106642157703523","full_text":"Very clever benchmark","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-11-07","value":0,"startTime":1762387200000,"endTime":1762473600000,"tweets":[]},{"label":"2025-11-08","value":92,"startTime":1762473600000,"endTime":1762560000000,"tweets":[{"bookmarked":false,"display_text_range":[0,235],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/YwmacS625Z","expanded_url":"https://x.com/alexgshaw/status/1986911106108211461/photo/1","id_str":"1986906599416602625","indices":[236,259],"media_key":"3_1986906599416602625","media_url_https":"https://pbs.twimg.com/media/G5LpBmwbIAElWXo.png","type":"photo","url":"https://t.co/YwmacS625Z","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1024,"w":2048,"resize":"fit"},"medium":{"h":600,"w":1200,"resize":"fit"},"small":{"h":340,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1024,"width":2048,"focus_rects":[{"x":0,"y":0,"w":1829,"h":1024},{"x":0,"y":0,"w":1024,"h":1024},{"x":0,"y":0,"w":898,"h":1024},{"x":0,"y":0,"w":512,"h":1024},{"x":0,"y":0,"w":2048,"h":1024}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986906599416602625"}}}],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/YwmacS625Z","expanded_url":"https://x.com/alexgshaw/status/1986911106108211461/photo/1","id_str":"1986906599416602625","indices":[236,259],"media_key":"3_1986906599416602625","media_url_https":"https://pbs.twimg.com/media/G5LpBmwbIAElWXo.png","type":"photo","url":"https://t.co/YwmacS625Z","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1024,"w":2048,"resize":"fit"},"medium":{"h":600,"w":1200,"resize":"fit"},"small":{"h":340,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1024,"width":2048,"focus_rects":[{"x":0,"y":0,"w":1829,"h":1024},{"x":0,"y":0,"w":1024,"h":1024},{"x":0,"y":0,"w":898,"h":1024},{"x":0,"y":0,"w":512,"h":1024},{"x":0,"y":0,"w":2048,"h":1024}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986906599416602625"}}}]},"favorited":false,"lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911106108211461","view_count":104511,"bookmark_count":117,"created_at":1762551497000,"favorite_count":344,"quote_count":35,"reply_count":23,"retweet_count":71,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Today, we’re announcing the next chapter of Terminal-Bench with two releases:\n\n1. Harbor, a new package for running sandboxed agent rollouts at scale\n2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification https://t.co/YwmacS625Z","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,274],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"loom.com/share/8c11f218…","expanded_url":"https://www.loom.com/share/8c11f218c9fc4674bd659146af435627","url":"https://t.co/x4gD2nHQxC","indices":[306,329]}],"user_mentions":[{"id_str":"1645813553365139462","name":"Daytona","screen_name":"daytonaio","indices":[182,192]},{"id_str":"1551987185372512263","name":"Modal","screen_name":"modal","indices":[197,203]},{"id_str":"1645813553365139462","name":"Daytona","screen_name":"daytonaio","indices":[182,192]},{"id_str":"1551987185372512263","name":"Modal","screen_name":"modal","indices":[197,203]},{"id_str":"1927771369611350016","name":"SWE-bench","screen_name":"SWEbench","indices":[294,303]}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911110017269764","view_count":2504,"bookmark_count":5,"created_at":1762551498000,"favorite_count":28,"quote_count":0,"reply_count":2,"retweet_count":6,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Just a few of the features I love about Harbor:\n- Evaluate any agent that can be installed and run autonomously\n- Scale up to thousands of concurrent containers using providers like @daytonaio and @modal\n- Generate rollouts for SFT and RL\n- Create your own benchmarks or use existing ones like @SWEbench \n\nhttps://t.co/x4gD2nHQxC","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911108117254546","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,209],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"harborframework.com","expanded_url":"https://harborframework.com/","url":"https://t.co/5kDZlvjhb7","indices":[186,209]}],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911108117254546","view_count":2828,"bookmark_count":7,"created_at":1762551497000,"favorite_count":27,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Harbor is the package we wish we had had while making Terminal-Bench. It’s for agent, model, and benchmark developers and researchers who want to evaluate and improve agents and models.\nhttps://t.co/5kDZlvjhb7","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911106108211461","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,194],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911111460089993","view_count":1578,"bookmark_count":0,"created_at":1762551498000,"favorite_count":16,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Harbor is also the official harness for Terminal-Bench 2.0. We used it to run tens of thousands of experiments in containerized environments while developing the latest version of the benchmark.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911110017269764","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,152],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911113209126988","view_count":1451,"bookmark_count":1,"created_at":1762551499000,"favorite_count":14,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"So why Terminal-Bench 2.0? We always knew that as model capabilities increased, we’d need to keep Terminal-Bench up to date with frontier capabilities.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911111460089993","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,277],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911114626802143","view_count":1535,"bookmark_count":0,"created_at":1762551499000,"favorite_count":17,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Terminal-Bench 2.0 consists of 89 hard tasks to test these capabilities. We aim to push the frontier with an increased emphasis on task quality. Each task received several hours of human and LM-assisted verification to ensure that tasks are (1) solvable, (2) realistic, and (3) well-specified. We’ll share more on how we did this in our upcoming preprint.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911113209126988","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,96],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/GIY0foKhNs","expanded_url":"https://x.com/alexgshaw/status/1986911116115759210/photo/1","id_str":"1986908092202950658","indices":[97,120],"media_key":"3_1986908092202950658","media_url_https":"https://pbs.twimg.com/media/G5LqYf0bIAIo56j.png","type":"photo","url":"https://t.co/GIY0foKhNs","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[{"x":507,"y":873,"h":103,"w":103}]},"medium":{"faces":[{"x":297,"y":511,"h":60,"w":60}]},"small":{"faces":[{"x":168,"y":290,"h":34,"w":34}]},"orig":{"faces":[{"x":637,"y":1097,"h":130,"w":130}]}},"sizes":{"large":{"h":1147,"w":2048,"resize":"fit"},"medium":{"h":672,"w":1200,"resize":"fit"},"small":{"h":381,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1440,"width":2572,"focus_rects":[{"x":0,"y":0,"w":2571,"h":1440},{"x":0,"y":0,"w":1440,"h":1440},{"x":0,"y":0,"w":1263,"h":1440},{"x":90,"y":0,"w":720,"h":1440},{"x":0,"y":0,"w":2572,"h":1440}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908092202950658"}}}],"symbols":[],"timestamps":[],"urls":[{"display_url":"tbench.ai/leaderboard","expanded_url":"https://www.tbench.ai/leaderboard","url":"https://t.co/vblfZF55Nd","indices":[73,96]}],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/GIY0foKhNs","expanded_url":"https://x.com/alexgshaw/status/1986911116115759210/photo/1","id_str":"1986908092202950658","indices":[97,120],"media_key":"3_1986908092202950658","media_url_https":"https://pbs.twimg.com/media/G5LqYf0bIAIo56j.png","type":"photo","url":"https://t.co/GIY0foKhNs","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[{"x":507,"y":873,"h":103,"w":103}]},"medium":{"faces":[{"x":297,"y":511,"h":60,"w":60}]},"small":{"faces":[{"x":168,"y":290,"h":34,"w":34}]},"orig":{"faces":[{"x":637,"y":1097,"h":130,"w":130}]}},"sizes":{"large":{"h":1147,"w":2048,"resize":"fit"},"medium":{"h":672,"w":1200,"resize":"fit"},"small":{"h":381,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1440,"width":2572,"focus_rects":[{"x":0,"y":0,"w":2571,"h":1440},{"x":0,"y":0,"w":1440,"h":1440},{"x":0,"y":0,"w":1263,"h":1440},{"x":90,"y":0,"w":720,"h":1440},{"x":0,"y":0,"w":2572,"h":1440}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908092202950658"}}}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911116115759210","view_count":1916,"bookmark_count":5,"created_at":1762551499000,"favorite_count":25,"quote_count":2,"reply_count":3,"retweet_count":3,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"At present, Codex CLI with GPT-5 sits at the top of our new leaderboard.\nhttps://t.co/vblfZF55Nd https://t.co/GIY0foKhNs","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911114626802143","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,272],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911119328616903","view_count":1335,"bookmark_count":2,"created_at":1762551500000,"favorite_count":16,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Astute Terminal-Bench fans may notice that SOTA performance is comparable to TB1.0 despite our claim that TB2.0 is harder. We believe that this is because task quality is substantially higher in the new benchmark. We have removed several misspecified or impossible tasks – increasing difficulty while maintaining raw performance.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911116115759210","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,148],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"harborframework.com/docs/running-t…","expanded_url":"https://harborframework.com/docs/running-tbench","url":"https://t.co/hmiBYh7VYF","indices":[125,148]}],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911120511373685","view_count":1241,"bookmark_count":1,"created_at":1762551500000,"favorite_count":17,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Interested in using Terminal-Bench-2.0 or submitting to our new leaderboard? Check out the Harbor docs for more information.\nhttps://t.co/hmiBYh7VYF","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911119328616903","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,265],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1233837766271569920","name":"Mike A. Merrill","screen_name":"Mike_A_Merrill","indices":[51,66]},{"id_str":"64333359","name":"Ludwig Schmidt","screen_name":"lschmidt3","indices":[121,131]},{"id_str":"168787972","name":"Andy Konwinski","screen_name":"andykonwinski","indices":[185,199]},{"id_str":"1854370033520324608","name":"Laude Institute","screen_name":"LaudeInstitute","indices":[217,232]}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911121924890876","view_count":1219,"bookmark_count":0,"created_at":1762551501000,"favorite_count":17,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"It's been a pleasure working alongside my co-lead, @mike_a_merrill, whose leadership rallied the team, and our advisors, @lschmidt3, who seeded the idea and guided our development, and @andykonwinski, who gave us the @LaudeInstitute mandate to \"ship your research.\"","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911120511373685","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,254],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/PcQz6FkVua","expanded_url":"https://x.com/alexgshaw/status/1986911123543916899/photo/1","id_str":"1986908910708781056","indices":[255,278],"media_key":"3_1986908910708781056","media_url_https":"https://pbs.twimg.com/media/G5LrII_a8AAi2EG.png","type":"photo","url":"https://t.co/PcQz6FkVua","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1822,"w":1292,"resize":"fit"},"medium":{"h":1200,"w":851,"resize":"fit"},"small":{"h":680,"w":482,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1822,"width":1292,"focus_rects":[{"x":0,"y":320,"w":1292,"h":724},{"x":0,"y":36,"w":1292,"h":1292},{"x":0,"y":0,"w":1292,"h":1473},{"x":319,"y":0,"w":911,"h":1822},{"x":0,"y":0,"w":1292,"h":1822}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908910708781056"}}}],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/PcQz6FkVua","expanded_url":"https://x.com/alexgshaw/status/1986911123543916899/photo/1","id_str":"1986908910708781056","indices":[255,278],"media_key":"3_1986908910708781056","media_url_https":"https://pbs.twimg.com/media/G5LrII_a8AAi2EG.png","type":"photo","url":"https://t.co/PcQz6FkVua","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1822,"w":1292,"resize":"fit"},"medium":{"h":1200,"w":851,"resize":"fit"},"small":{"h":680,"w":482,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1822,"width":1292,"focus_rects":[{"x":0,"y":320,"w":1292,"h":724},{"x":0,"y":36,"w":1292,"h":1292},{"x":0,"y":0,"w":1292,"h":1473},{"x":319,"y":0,"w":911,"h":1822},{"x":0,"y":0,"w":1292,"h":1822}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908910708781056"}}}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911123543916899","view_count":1424,"bookmark_count":0,"created_at":1762551501000,"favorite_count":19,"quote_count":1,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Additionally, Terminal-Bench wouldn’t be possible without its community. We’re so thankful to the over 1k members of our Discord who contributed and audited tasks, helped build and beta test Harbor, and made this such a fun project for everyone involved. https://t.co/PcQz6FkVua","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911121924890876","is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-09","value":0,"startTime":1762560000000,"endTime":1762646400000,"tweets":[{"bookmarked":false,"display_text_range":[0,75],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"33521530","name":"swyx","screen_name":"swyx","indices":[10,15]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1986989606320185472","quoted_status_permalink":{"url":"https://t.co/oyJSv9Xper","expanded":"https://twitter.com/latentspacepod/status/1986989606320185472","display":"x.com/latentspacepod…"},"retweeted":false,"fact_check":null,"id":"1987015010288353599","view_count":495,"bookmark_count":1,"created_at":1762576270000,"favorite_count":8,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1987015010288353599","full_text":"Thank you @swyx ! Was great to have you, and the live podcast was awesome 😁","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[16,63],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"284333988","name":"Logan Kilpatrick","screen_name":"OfficialLoganK","indices":[0,15]}]},"favorited":false,"in_reply_to_screen_name":"OfficialLoganK","lang":"en","retweeted":false,"fact_check":null,"id":"1986948823198146586","view_count":449,"bookmark_count":0,"created_at":1762560489000,"favorite_count":2,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@OfficialLoganK Looking forward to seeing where Gemini 3 lands!","in_reply_to_user_id_str":"284333988","in_reply_to_status_id_str":"1986914722235502858","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[17,37],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"993859188823601152","name":"Robert","screen_name":"skull8888888888","indices":[0,16]}]},"favorited":false,"in_reply_to_screen_name":"skull8888888888","lang":"en","retweeted":false,"fact_check":null,"id":"1986949187930616214","view_count":241,"bookmark_count":0,"created_at":1762560576000,"favorite_count":0,"quote_count":0,"reply_count":1,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@skull8888888888 DM me I'm interested","in_reply_to_user_id_str":"993859188823601152","in_reply_to_status_id_str":"1986927150415430137","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[13,137],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"harborframework.com/docs/task-diff…","expanded_url":"https://harborframework.com/docs/task-difference","url":"https://t.co/dAGxC918PM","indices":[114,137]}],"user_mentions":[{"id_str":"3787342814","name":"pash","screen_name":"pashmerepat","indices":[0,12]}]},"favorited":false,"in_reply_to_screen_name":"pashmerepat","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986949012445143197","view_count":319,"bookmark_count":0,"created_at":1762560534000,"favorite_count":3,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@pashmerepat Yes and here is a link that describes the differences in addition to the migration guide Mike shared https://t.co/dAGxC918PM","in_reply_to_user_id_str":"3787342814","in_reply_to_status_id_str":"1986913713233010728","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[10,40],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"633509372","name":"Monk Zero","screen_name":"NoCommas","indices":[0,9]}]},"favorited":false,"in_reply_to_screen_name":"NoCommas","lang":"en","retweeted":false,"fact_check":null,"id":"1986949254196437085","view_count":267,"bookmark_count":0,"created_at":1762560592000,"favorite_count":1,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@NoCommas Excited to see where you land!","in_reply_to_user_id_str":"633509372","in_reply_to_status_id_str":"1986926703550328905","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[23,38],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"2625021945","name":"potemin","screen_name":"poteminr","indices":[0,9]},{"id_str":"1353836358901501952","name":"Anthropic","screen_name":"AnthropicAI","indices":[10,22]}]},"favorited":false,"in_reply_to_screen_name":"poteminr","lang":"en","retweeted":false,"fact_check":null,"id":"1987008242187485219","view_count":157,"bookmark_count":0,"created_at":1762574656000,"favorite_count":1,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@poteminr @AnthropicAI You and me both","in_reply_to_user_id_str":"2625021945","in_reply_to_status_id_str":"1987002302088237511","is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-10","value":0,"startTime":1762646400000,"endTime":1762732800000,"tweets":[{"bookmarked":false,"display_text_range":[13,14],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1531565962641788936","name":"Dylan Allen-Arnegård","screen_name":"dbear_allen","indices":[0,12]}]},"favorited":false,"in_reply_to_screen_name":"dbear_allen","lang":"qme","retweeted":false,"fact_check":null,"id":"1987404815404769403","view_count":69,"bookmark_count":0,"created_at":1762669206000,"favorite_count":1,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@dbear_allen 🙏","in_reply_to_user_id_str":"1531565962641788936","in_reply_to_status_id_str":"1987400399704498635","is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-11","value":0,"startTime":1762732800000,"endTime":1762819200000,"tweets":[]},{"label":"2025-11-12","value":0,"startTime":1762819200000,"endTime":1762905600000,"tweets":[]},{"label":"2025-11-13","value":0,"startTime":1762905600000,"endTime":1762992000000,"tweets":[]},{"label":"2025-11-14","value":0,"startTime":1762992000000,"endTime":1763078400000,"tweets":[{"bookmarked":false,"display_text_range":[0,280],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1988372114257436810","quoted_status_permalink":{"url":"https://t.co/4o1qxAGQ7E","expanded":"https://twitter.com/warpdotdev/status/1988372114257436810","display":"x.com/warpdotdev/sta…"},"retweeted":false,"fact_check":null,"id":"1989092917798138326","view_count":334,"bookmark_count":0,"created_at":1763071681000,"favorite_count":8,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1989092917798138326","full_text":"Great to see Warp putting up the top score on Terminal-Bench 2.0 just days after release! Even more exciting to hear that they've already made improvements to their agent based on the results.\n\nUltimately, we hope that Terminal-Bench 2.0 accelerates model and agent development in the domain of text-based computer use, benefiting millions of users.","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-11-15","value":0,"startTime":1763078400000,"endTime":1763164800000,"tweets":[]},{"label":"2025-11-16","value":0,"startTime":1763164800000,"endTime":1763251200000,"tweets":[]},{"label":"2025-11-17","value":0,"startTime":1763251200000,"endTime":1763337600000,"tweets":[]}],"nlikes":[{"label":"2025-10-18","value":0,"startTime":1760659200000,"endTime":1760745600000,"tweets":[]},{"label":"2025-10-19","value":0,"startTime":1760745600000,"endTime":1760832000000,"tweets":[]},{"label":"2025-10-20","value":0,"startTime":1760832000000,"endTime":1760918400000,"tweets":[]},{"label":"2025-10-21","value":18,"startTime":1760918400000,"endTime":1761004800000,"tweets":[{"bookmarked":false,"display_text_range":[0,40],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1670869422733393966","name":"Latent.Space","screen_name":"latentspacepod","indices":[23,38]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1980330471163892201","quoted_status_permalink":{"url":"https://t.co/sMxAiyEOQ0","expanded":"https://twitter.com/Mike_A_Merrill/status/1980330471163892201","display":"x.com/Mike_A_Merrill…"},"retweeted":false,"fact_check":null,"id":"1980344520215785912","view_count":2315,"bookmark_count":3,"created_at":1760985901000,"favorite_count":18,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1980344520215785912","full_text":"Mike and I went on the @latentspacepod !","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-10-22","value":0,"startTime":1761004800000,"endTime":1761091200000,"tweets":[]},{"label":"2025-10-23","value":13,"startTime":1761091200000,"endTime":1761177600000,"tweets":[{"bookmarked":false,"display_text_range":[0,5],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1980264839806177286","quoted_status_permalink":{"url":"https://t.co/jl3wu1E3P8","expanded":"https://twitter.com/petergostev/status/1980264839806177286","display":"x.com/petergostev/st…"},"retweeted":false,"fact_check":null,"id":"1981121994533048588","view_count":453,"bookmark_count":0,"created_at":1761171265000,"favorite_count":5,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1981121994533048588","full_text":"Cool!","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,280],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1981013196195668143","quoted_status_permalink":{"url":"https://t.co/eWXL225mnb","expanded":"https://twitter.com/DimitrisPapail/status/1981013196195668143","display":"x.com/DimitrisPapail…"},"retweeted":false,"fact_check":null,"id":"1981062983821512920","view_count":782,"bookmark_count":2,"created_at":1761157196000,"favorite_count":8,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1981062983821512920","full_text":"One definition of AGI is the full automation of all computer work.\n\nA Terminal-Bench task is an instruction, container, and test executable. Agents are just programs that get executed in the container.\n\nSo Terminal-Bench can measure the automation of any computer task, i.e., AGI.","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-10-24","value":0,"startTime":1761177600000,"endTime":1761264000000,"tweets":[]},{"label":"2025-10-25","value":0,"startTime":1761264000000,"endTime":1761350400000,"tweets":[]},{"label":"2025-10-26","value":0,"startTime":1761350400000,"endTime":1761436800000,"tweets":[]},{"label":"2025-10-27","value":0,"startTime":1761436800000,"endTime":1761523200000,"tweets":[]},{"label":"2025-10-28","value":0,"startTime":1761523200000,"endTime":1761609600000,"tweets":[]},{"label":"2025-10-29","value":0,"startTime":1761609600000,"endTime":1761696000000,"tweets":[]},{"label":"2025-10-30","value":0,"startTime":1761696000000,"endTime":1761782400000,"tweets":[]},{"label":"2025-10-31","value":0,"startTime":1761782400000,"endTime":1761868800000,"tweets":[]},{"label":"2025-11-01","value":0,"startTime":1761868800000,"endTime":1761955200000,"tweets":[]},{"label":"2025-11-02","value":0,"startTime":1761955200000,"endTime":1762041600000,"tweets":[]},{"label":"2025-11-03","value":0,"startTime":1762041600000,"endTime":1762128000000,"tweets":[]},{"label":"2025-11-04","value":107,"startTime":1762128000000,"endTime":1762214400000,"tweets":[{"bookmarked":false,"display_text_range":[0,137],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"luma.com/h9kb99vd","expanded_url":"https://luma.com/h9kb99vd","url":"https://t.co/wLItVEHEMc","indices":[114,137]}],"user_mentions":[]},"favorited":true,"lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":true,"fact_check":null,"id":"1985427712484524433","view_count":13587,"bookmark_count":31,"created_at":1762197828000,"favorite_count":107,"quote_count":2,"reply_count":1,"retweet_count":4,"user_id_str":"1448787032486989825","conversation_id_str":"1985427712484524433","full_text":"We're releasing Terminal-Bench 2.0 this week! Come to our meetup on Thursday @ Databricks to get early access :)\n\nhttps://t.co/wLItVEHEMc","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-05","value":0,"startTime":1762214400000,"endTime":1762300800000,"tweets":[]},{"label":"2025-11-06","value":15,"startTime":1762300800000,"endTime":1762387200000,"tweets":[{"bookmarked":false,"display_text_range":[0,21],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1986093902122942700","quoted_status_permalink":{"url":"https://t.co/zhs56oNQSL","expanded":"https://twitter.com/jyangballin/status/1986093902122942700","display":"x.com/jyangballin/st…"},"retweeted":false,"fact_check":null,"id":"1986106642157703523","view_count":2599,"bookmark_count":6,"created_at":1762359698000,"favorite_count":15,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986106642157703523","full_text":"Very clever benchmark","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-11-07","value":0,"startTime":1762387200000,"endTime":1762473600000,"tweets":[]},{"label":"2025-11-08","value":540,"startTime":1762473600000,"endTime":1762560000000,"tweets":[{"bookmarked":false,"display_text_range":[0,235],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/YwmacS625Z","expanded_url":"https://x.com/alexgshaw/status/1986911106108211461/photo/1","id_str":"1986906599416602625","indices":[236,259],"media_key":"3_1986906599416602625","media_url_https":"https://pbs.twimg.com/media/G5LpBmwbIAElWXo.png","type":"photo","url":"https://t.co/YwmacS625Z","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1024,"w":2048,"resize":"fit"},"medium":{"h":600,"w":1200,"resize":"fit"},"small":{"h":340,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1024,"width":2048,"focus_rects":[{"x":0,"y":0,"w":1829,"h":1024},{"x":0,"y":0,"w":1024,"h":1024},{"x":0,"y":0,"w":898,"h":1024},{"x":0,"y":0,"w":512,"h":1024},{"x":0,"y":0,"w":2048,"h":1024}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986906599416602625"}}}],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/YwmacS625Z","expanded_url":"https://x.com/alexgshaw/status/1986911106108211461/photo/1","id_str":"1986906599416602625","indices":[236,259],"media_key":"3_1986906599416602625","media_url_https":"https://pbs.twimg.com/media/G5LpBmwbIAElWXo.png","type":"photo","url":"https://t.co/YwmacS625Z","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1024,"w":2048,"resize":"fit"},"medium":{"h":600,"w":1200,"resize":"fit"},"small":{"h":340,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1024,"width":2048,"focus_rects":[{"x":0,"y":0,"w":1829,"h":1024},{"x":0,"y":0,"w":1024,"h":1024},{"x":0,"y":0,"w":898,"h":1024},{"x":0,"y":0,"w":512,"h":1024},{"x":0,"y":0,"w":2048,"h":1024}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986906599416602625"}}}]},"favorited":false,"lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911106108211461","view_count":104511,"bookmark_count":117,"created_at":1762551497000,"favorite_count":344,"quote_count":35,"reply_count":23,"retweet_count":71,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Today, we’re announcing the next chapter of Terminal-Bench with two releases:\n\n1. Harbor, a new package for running sandboxed agent rollouts at scale\n2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification https://t.co/YwmacS625Z","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,274],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"loom.com/share/8c11f218…","expanded_url":"https://www.loom.com/share/8c11f218c9fc4674bd659146af435627","url":"https://t.co/x4gD2nHQxC","indices":[306,329]}],"user_mentions":[{"id_str":"1645813553365139462","name":"Daytona","screen_name":"daytonaio","indices":[182,192]},{"id_str":"1551987185372512263","name":"Modal","screen_name":"modal","indices":[197,203]},{"id_str":"1645813553365139462","name":"Daytona","screen_name":"daytonaio","indices":[182,192]},{"id_str":"1551987185372512263","name":"Modal","screen_name":"modal","indices":[197,203]},{"id_str":"1927771369611350016","name":"SWE-bench","screen_name":"SWEbench","indices":[294,303]}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911110017269764","view_count":2504,"bookmark_count":5,"created_at":1762551498000,"favorite_count":28,"quote_count":0,"reply_count":2,"retweet_count":6,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Just a few of the features I love about Harbor:\n- Evaluate any agent that can be installed and run autonomously\n- Scale up to thousands of concurrent containers using providers like @daytonaio and @modal\n- Generate rollouts for SFT and RL\n- Create your own benchmarks or use existing ones like @SWEbench \n\nhttps://t.co/x4gD2nHQxC","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911108117254546","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,209],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"harborframework.com","expanded_url":"https://harborframework.com/","url":"https://t.co/5kDZlvjhb7","indices":[186,209]}],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911108117254546","view_count":2828,"bookmark_count":7,"created_at":1762551497000,"favorite_count":27,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Harbor is the package we wish we had had while making Terminal-Bench. It’s for agent, model, and benchmark developers and researchers who want to evaluate and improve agents and models.\nhttps://t.co/5kDZlvjhb7","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911106108211461","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,194],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911111460089993","view_count":1578,"bookmark_count":0,"created_at":1762551498000,"favorite_count":16,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Harbor is also the official harness for Terminal-Bench 2.0. We used it to run tens of thousands of experiments in containerized environments while developing the latest version of the benchmark.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911110017269764","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,152],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911113209126988","view_count":1451,"bookmark_count":1,"created_at":1762551499000,"favorite_count":14,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"So why Terminal-Bench 2.0? We always knew that as model capabilities increased, we’d need to keep Terminal-Bench up to date with frontier capabilities.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911111460089993","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,277],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911114626802143","view_count":1535,"bookmark_count":0,"created_at":1762551499000,"favorite_count":17,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Terminal-Bench 2.0 consists of 89 hard tasks to test these capabilities. We aim to push the frontier with an increased emphasis on task quality. Each task received several hours of human and LM-assisted verification to ensure that tasks are (1) solvable, (2) realistic, and (3) well-specified. We’ll share more on how we did this in our upcoming preprint.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911113209126988","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,96],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/GIY0foKhNs","expanded_url":"https://x.com/alexgshaw/status/1986911116115759210/photo/1","id_str":"1986908092202950658","indices":[97,120],"media_key":"3_1986908092202950658","media_url_https":"https://pbs.twimg.com/media/G5LqYf0bIAIo56j.png","type":"photo","url":"https://t.co/GIY0foKhNs","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[{"x":507,"y":873,"h":103,"w":103}]},"medium":{"faces":[{"x":297,"y":511,"h":60,"w":60}]},"small":{"faces":[{"x":168,"y":290,"h":34,"w":34}]},"orig":{"faces":[{"x":637,"y":1097,"h":130,"w":130}]}},"sizes":{"large":{"h":1147,"w":2048,"resize":"fit"},"medium":{"h":672,"w":1200,"resize":"fit"},"small":{"h":381,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1440,"width":2572,"focus_rects":[{"x":0,"y":0,"w":2571,"h":1440},{"x":0,"y":0,"w":1440,"h":1440},{"x":0,"y":0,"w":1263,"h":1440},{"x":90,"y":0,"w":720,"h":1440},{"x":0,"y":0,"w":2572,"h":1440}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908092202950658"}}}],"symbols":[],"timestamps":[],"urls":[{"display_url":"tbench.ai/leaderboard","expanded_url":"https://www.tbench.ai/leaderboard","url":"https://t.co/vblfZF55Nd","indices":[73,96]}],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/GIY0foKhNs","expanded_url":"https://x.com/alexgshaw/status/1986911116115759210/photo/1","id_str":"1986908092202950658","indices":[97,120],"media_key":"3_1986908092202950658","media_url_https":"https://pbs.twimg.com/media/G5LqYf0bIAIo56j.png","type":"photo","url":"https://t.co/GIY0foKhNs","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[{"x":507,"y":873,"h":103,"w":103}]},"medium":{"faces":[{"x":297,"y":511,"h":60,"w":60}]},"small":{"faces":[{"x":168,"y":290,"h":34,"w":34}]},"orig":{"faces":[{"x":637,"y":1097,"h":130,"w":130}]}},"sizes":{"large":{"h":1147,"w":2048,"resize":"fit"},"medium":{"h":672,"w":1200,"resize":"fit"},"small":{"h":381,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1440,"width":2572,"focus_rects":[{"x":0,"y":0,"w":2571,"h":1440},{"x":0,"y":0,"w":1440,"h":1440},{"x":0,"y":0,"w":1263,"h":1440},{"x":90,"y":0,"w":720,"h":1440},{"x":0,"y":0,"w":2572,"h":1440}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908092202950658"}}}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911116115759210","view_count":1916,"bookmark_count":5,"created_at":1762551499000,"favorite_count":25,"quote_count":2,"reply_count":3,"retweet_count":3,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"At present, Codex CLI with GPT-5 sits at the top of our new leaderboard.\nhttps://t.co/vblfZF55Nd https://t.co/GIY0foKhNs","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911114626802143","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,272],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911119328616903","view_count":1335,"bookmark_count":2,"created_at":1762551500000,"favorite_count":16,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Astute Terminal-Bench fans may notice that SOTA performance is comparable to TB1.0 despite our claim that TB2.0 is harder. We believe that this is because task quality is substantially higher in the new benchmark. We have removed several misspecified or impossible tasks – increasing difficulty while maintaining raw performance.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911116115759210","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,148],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"harborframework.com/docs/running-t…","expanded_url":"https://harborframework.com/docs/running-tbench","url":"https://t.co/hmiBYh7VYF","indices":[125,148]}],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911120511373685","view_count":1241,"bookmark_count":1,"created_at":1762551500000,"favorite_count":17,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Interested in using Terminal-Bench-2.0 or submitting to our new leaderboard? Check out the Harbor docs for more information.\nhttps://t.co/hmiBYh7VYF","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911119328616903","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,265],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1233837766271569920","name":"Mike A. Merrill","screen_name":"Mike_A_Merrill","indices":[51,66]},{"id_str":"64333359","name":"Ludwig Schmidt","screen_name":"lschmidt3","indices":[121,131]},{"id_str":"168787972","name":"Andy Konwinski","screen_name":"andykonwinski","indices":[185,199]},{"id_str":"1854370033520324608","name":"Laude Institute","screen_name":"LaudeInstitute","indices":[217,232]}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911121924890876","view_count":1219,"bookmark_count":0,"created_at":1762551501000,"favorite_count":17,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"It's been a pleasure working alongside my co-lead, @mike_a_merrill, whose leadership rallied the team, and our advisors, @lschmidt3, who seeded the idea and guided our development, and @andykonwinski, who gave us the @LaudeInstitute mandate to \"ship your research.\"","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911120511373685","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,254],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/PcQz6FkVua","expanded_url":"https://x.com/alexgshaw/status/1986911123543916899/photo/1","id_str":"1986908910708781056","indices":[255,278],"media_key":"3_1986908910708781056","media_url_https":"https://pbs.twimg.com/media/G5LrII_a8AAi2EG.png","type":"photo","url":"https://t.co/PcQz6FkVua","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1822,"w":1292,"resize":"fit"},"medium":{"h":1200,"w":851,"resize":"fit"},"small":{"h":680,"w":482,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1822,"width":1292,"focus_rects":[{"x":0,"y":320,"w":1292,"h":724},{"x":0,"y":36,"w":1292,"h":1292},{"x":0,"y":0,"w":1292,"h":1473},{"x":319,"y":0,"w":911,"h":1822},{"x":0,"y":0,"w":1292,"h":1822}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908910708781056"}}}],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/PcQz6FkVua","expanded_url":"https://x.com/alexgshaw/status/1986911123543916899/photo/1","id_str":"1986908910708781056","indices":[255,278],"media_key":"3_1986908910708781056","media_url_https":"https://pbs.twimg.com/media/G5LrII_a8AAi2EG.png","type":"photo","url":"https://t.co/PcQz6FkVua","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1822,"w":1292,"resize":"fit"},"medium":{"h":1200,"w":851,"resize":"fit"},"small":{"h":680,"w":482,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1822,"width":1292,"focus_rects":[{"x":0,"y":320,"w":1292,"h":724},{"x":0,"y":36,"w":1292,"h":1292},{"x":0,"y":0,"w":1292,"h":1473},{"x":319,"y":0,"w":911,"h":1822},{"x":0,"y":0,"w":1292,"h":1822}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908910708781056"}}}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911123543916899","view_count":1424,"bookmark_count":0,"created_at":1762551501000,"favorite_count":19,"quote_count":1,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Additionally, Terminal-Bench wouldn’t be possible without its community. We’re so thankful to the over 1k members of our Discord who contributed and audited tasks, helped build and beta test Harbor, and made this such a fun project for everyone involved. https://t.co/PcQz6FkVua","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911121924890876","is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-09","value":15,"startTime":1762560000000,"endTime":1762646400000,"tweets":[{"bookmarked":false,"display_text_range":[0,75],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"33521530","name":"swyx","screen_name":"swyx","indices":[10,15]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1986989606320185472","quoted_status_permalink":{"url":"https://t.co/oyJSv9Xper","expanded":"https://twitter.com/latentspacepod/status/1986989606320185472","display":"x.com/latentspacepod…"},"retweeted":false,"fact_check":null,"id":"1987015010288353599","view_count":495,"bookmark_count":1,"created_at":1762576270000,"favorite_count":8,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1987015010288353599","full_text":"Thank you @swyx ! Was great to have you, and the live podcast was awesome 😁","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[16,63],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"284333988","name":"Logan Kilpatrick","screen_name":"OfficialLoganK","indices":[0,15]}]},"favorited":false,"in_reply_to_screen_name":"OfficialLoganK","lang":"en","retweeted":false,"fact_check":null,"id":"1986948823198146586","view_count":449,"bookmark_count":0,"created_at":1762560489000,"favorite_count":2,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@OfficialLoganK Looking forward to seeing where Gemini 3 lands!","in_reply_to_user_id_str":"284333988","in_reply_to_status_id_str":"1986914722235502858","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[17,37],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"993859188823601152","name":"Robert","screen_name":"skull8888888888","indices":[0,16]}]},"favorited":false,"in_reply_to_screen_name":"skull8888888888","lang":"en","retweeted":false,"fact_check":null,"id":"1986949187930616214","view_count":241,"bookmark_count":0,"created_at":1762560576000,"favorite_count":0,"quote_count":0,"reply_count":1,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@skull8888888888 DM me I'm interested","in_reply_to_user_id_str":"993859188823601152","in_reply_to_status_id_str":"1986927150415430137","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[13,137],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"harborframework.com/docs/task-diff…","expanded_url":"https://harborframework.com/docs/task-difference","url":"https://t.co/dAGxC918PM","indices":[114,137]}],"user_mentions":[{"id_str":"3787342814","name":"pash","screen_name":"pashmerepat","indices":[0,12]}]},"favorited":false,"in_reply_to_screen_name":"pashmerepat","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986949012445143197","view_count":319,"bookmark_count":0,"created_at":1762560534000,"favorite_count":3,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@pashmerepat Yes and here is a link that describes the differences in addition to the migration guide Mike shared https://t.co/dAGxC918PM","in_reply_to_user_id_str":"3787342814","in_reply_to_status_id_str":"1986913713233010728","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[10,40],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"633509372","name":"Monk Zero","screen_name":"NoCommas","indices":[0,9]}]},"favorited":false,"in_reply_to_screen_name":"NoCommas","lang":"en","retweeted":false,"fact_check":null,"id":"1986949254196437085","view_count":267,"bookmark_count":0,"created_at":1762560592000,"favorite_count":1,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@NoCommas Excited to see where you land!","in_reply_to_user_id_str":"633509372","in_reply_to_status_id_str":"1986926703550328905","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[23,38],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"2625021945","name":"potemin","screen_name":"poteminr","indices":[0,9]},{"id_str":"1353836358901501952","name":"Anthropic","screen_name":"AnthropicAI","indices":[10,22]}]},"favorited":false,"in_reply_to_screen_name":"poteminr","lang":"en","retweeted":false,"fact_check":null,"id":"1987008242187485219","view_count":157,"bookmark_count":0,"created_at":1762574656000,"favorite_count":1,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@poteminr @AnthropicAI You and me both","in_reply_to_user_id_str":"2625021945","in_reply_to_status_id_str":"1987002302088237511","is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-10","value":1,"startTime":1762646400000,"endTime":1762732800000,"tweets":[{"bookmarked":false,"display_text_range":[13,14],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1531565962641788936","name":"Dylan Allen-Arnegård","screen_name":"dbear_allen","indices":[0,12]}]},"favorited":false,"in_reply_to_screen_name":"dbear_allen","lang":"qme","retweeted":false,"fact_check":null,"id":"1987404815404769403","view_count":69,"bookmark_count":0,"created_at":1762669206000,"favorite_count":1,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@dbear_allen 🙏","in_reply_to_user_id_str":"1531565962641788936","in_reply_to_status_id_str":"1987400399704498635","is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-11","value":0,"startTime":1762732800000,"endTime":1762819200000,"tweets":[]},{"label":"2025-11-12","value":0,"startTime":1762819200000,"endTime":1762905600000,"tweets":[]},{"label":"2025-11-13","value":0,"startTime":1762905600000,"endTime":1762992000000,"tweets":[]},{"label":"2025-11-14","value":8,"startTime":1762992000000,"endTime":1763078400000,"tweets":[{"bookmarked":false,"display_text_range":[0,280],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1988372114257436810","quoted_status_permalink":{"url":"https://t.co/4o1qxAGQ7E","expanded":"https://twitter.com/warpdotdev/status/1988372114257436810","display":"x.com/warpdotdev/sta…"},"retweeted":false,"fact_check":null,"id":"1989092917798138326","view_count":334,"bookmark_count":0,"created_at":1763071681000,"favorite_count":8,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1989092917798138326","full_text":"Great to see Warp putting up the top score on Terminal-Bench 2.0 just days after release! Even more exciting to hear that they've already made improvements to their agent based on the results.\n\nUltimately, we hope that Terminal-Bench 2.0 accelerates model and agent development in the domain of text-based computer use, benefiting millions of users.","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-11-15","value":0,"startTime":1763078400000,"endTime":1763164800000,"tweets":[]},{"label":"2025-11-16","value":0,"startTime":1763164800000,"endTime":1763251200000,"tweets":[]},{"label":"2025-11-17","value":0,"startTime":1763251200000,"endTime":1763337600000,"tweets":[]}],"nviews":[{"label":"2025-10-18","value":0,"startTime":1760659200000,"endTime":1760745600000,"tweets":[]},{"label":"2025-10-19","value":0,"startTime":1760745600000,"endTime":1760832000000,"tweets":[]},{"label":"2025-10-20","value":0,"startTime":1760832000000,"endTime":1760918400000,"tweets":[]},{"label":"2025-10-21","value":2315,"startTime":1760918400000,"endTime":1761004800000,"tweets":[{"bookmarked":false,"display_text_range":[0,40],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1670869422733393966","name":"Latent.Space","screen_name":"latentspacepod","indices":[23,38]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1980330471163892201","quoted_status_permalink":{"url":"https://t.co/sMxAiyEOQ0","expanded":"https://twitter.com/Mike_A_Merrill/status/1980330471163892201","display":"x.com/Mike_A_Merrill…"},"retweeted":false,"fact_check":null,"id":"1980344520215785912","view_count":2315,"bookmark_count":3,"created_at":1760985901000,"favorite_count":18,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1980344520215785912","full_text":"Mike and I went on the @latentspacepod !","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-10-22","value":0,"startTime":1761004800000,"endTime":1761091200000,"tweets":[]},{"label":"2025-10-23","value":1235,"startTime":1761091200000,"endTime":1761177600000,"tweets":[{"bookmarked":false,"display_text_range":[0,5],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1980264839806177286","quoted_status_permalink":{"url":"https://t.co/jl3wu1E3P8","expanded":"https://twitter.com/petergostev/status/1980264839806177286","display":"x.com/petergostev/st…"},"retweeted":false,"fact_check":null,"id":"1981121994533048588","view_count":453,"bookmark_count":0,"created_at":1761171265000,"favorite_count":5,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1981121994533048588","full_text":"Cool!","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,280],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1981013196195668143","quoted_status_permalink":{"url":"https://t.co/eWXL225mnb","expanded":"https://twitter.com/DimitrisPapail/status/1981013196195668143","display":"x.com/DimitrisPapail…"},"retweeted":false,"fact_check":null,"id":"1981062983821512920","view_count":782,"bookmark_count":2,"created_at":1761157196000,"favorite_count":8,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1981062983821512920","full_text":"One definition of AGI is the full automation of all computer work.\n\nA Terminal-Bench task is an instruction, container, and test executable. Agents are just programs that get executed in the container.\n\nSo Terminal-Bench can measure the automation of any computer task, i.e., AGI.","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-10-24","value":0,"startTime":1761177600000,"endTime":1761264000000,"tweets":[]},{"label":"2025-10-25","value":0,"startTime":1761264000000,"endTime":1761350400000,"tweets":[]},{"label":"2025-10-26","value":0,"startTime":1761350400000,"endTime":1761436800000,"tweets":[]},{"label":"2025-10-27","value":0,"startTime":1761436800000,"endTime":1761523200000,"tweets":[]},{"label":"2025-10-28","value":0,"startTime":1761523200000,"endTime":1761609600000,"tweets":[]},{"label":"2025-10-29","value":0,"startTime":1761609600000,"endTime":1761696000000,"tweets":[]},{"label":"2025-10-30","value":0,"startTime":1761696000000,"endTime":1761782400000,"tweets":[]},{"label":"2025-10-31","value":0,"startTime":1761782400000,"endTime":1761868800000,"tweets":[]},{"label":"2025-11-01","value":0,"startTime":1761868800000,"endTime":1761955200000,"tweets":[]},{"label":"2025-11-02","value":0,"startTime":1761955200000,"endTime":1762041600000,"tweets":[]},{"label":"2025-11-03","value":0,"startTime":1762041600000,"endTime":1762128000000,"tweets":[]},{"label":"2025-11-04","value":13587,"startTime":1762128000000,"endTime":1762214400000,"tweets":[{"bookmarked":false,"display_text_range":[0,137],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"luma.com/h9kb99vd","expanded_url":"https://luma.com/h9kb99vd","url":"https://t.co/wLItVEHEMc","indices":[114,137]}],"user_mentions":[]},"favorited":true,"lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":true,"fact_check":null,"id":"1985427712484524433","view_count":13587,"bookmark_count":31,"created_at":1762197828000,"favorite_count":107,"quote_count":2,"reply_count":1,"retweet_count":4,"user_id_str":"1448787032486989825","conversation_id_str":"1985427712484524433","full_text":"We're releasing Terminal-Bench 2.0 this week! Come to our meetup on Thursday @ Databricks to get early access :)\n\nhttps://t.co/wLItVEHEMc","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-05","value":0,"startTime":1762214400000,"endTime":1762300800000,"tweets":[]},{"label":"2025-11-06","value":2599,"startTime":1762300800000,"endTime":1762387200000,"tweets":[{"bookmarked":false,"display_text_range":[0,21],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1986093902122942700","quoted_status_permalink":{"url":"https://t.co/zhs56oNQSL","expanded":"https://twitter.com/jyangballin/status/1986093902122942700","display":"x.com/jyangballin/st…"},"retweeted":false,"fact_check":null,"id":"1986106642157703523","view_count":2599,"bookmark_count":6,"created_at":1762359698000,"favorite_count":15,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986106642157703523","full_text":"Very clever benchmark","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-11-07","value":0,"startTime":1762387200000,"endTime":1762473600000,"tweets":[]},{"label":"2025-11-08","value":121542,"startTime":1762473600000,"endTime":1762560000000,"tweets":[{"bookmarked":false,"display_text_range":[0,235],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/YwmacS625Z","expanded_url":"https://x.com/alexgshaw/status/1986911106108211461/photo/1","id_str":"1986906599416602625","indices":[236,259],"media_key":"3_1986906599416602625","media_url_https":"https://pbs.twimg.com/media/G5LpBmwbIAElWXo.png","type":"photo","url":"https://t.co/YwmacS625Z","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1024,"w":2048,"resize":"fit"},"medium":{"h":600,"w":1200,"resize":"fit"},"small":{"h":340,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1024,"width":2048,"focus_rects":[{"x":0,"y":0,"w":1829,"h":1024},{"x":0,"y":0,"w":1024,"h":1024},{"x":0,"y":0,"w":898,"h":1024},{"x":0,"y":0,"w":512,"h":1024},{"x":0,"y":0,"w":2048,"h":1024}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986906599416602625"}}}],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/YwmacS625Z","expanded_url":"https://x.com/alexgshaw/status/1986911106108211461/photo/1","id_str":"1986906599416602625","indices":[236,259],"media_key":"3_1986906599416602625","media_url_https":"https://pbs.twimg.com/media/G5LpBmwbIAElWXo.png","type":"photo","url":"https://t.co/YwmacS625Z","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1024,"w":2048,"resize":"fit"},"medium":{"h":600,"w":1200,"resize":"fit"},"small":{"h":340,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1024,"width":2048,"focus_rects":[{"x":0,"y":0,"w":1829,"h":1024},{"x":0,"y":0,"w":1024,"h":1024},{"x":0,"y":0,"w":898,"h":1024},{"x":0,"y":0,"w":512,"h":1024},{"x":0,"y":0,"w":2048,"h":1024}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986906599416602625"}}}]},"favorited":false,"lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911106108211461","view_count":104511,"bookmark_count":117,"created_at":1762551497000,"favorite_count":344,"quote_count":35,"reply_count":23,"retweet_count":71,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Today, we’re announcing the next chapter of Terminal-Bench with two releases:\n\n1. Harbor, a new package for running sandboxed agent rollouts at scale\n2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification https://t.co/YwmacS625Z","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,274],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"loom.com/share/8c11f218…","expanded_url":"https://www.loom.com/share/8c11f218c9fc4674bd659146af435627","url":"https://t.co/x4gD2nHQxC","indices":[306,329]}],"user_mentions":[{"id_str":"1645813553365139462","name":"Daytona","screen_name":"daytonaio","indices":[182,192]},{"id_str":"1551987185372512263","name":"Modal","screen_name":"modal","indices":[197,203]},{"id_str":"1645813553365139462","name":"Daytona","screen_name":"daytonaio","indices":[182,192]},{"id_str":"1551987185372512263","name":"Modal","screen_name":"modal","indices":[197,203]},{"id_str":"1927771369611350016","name":"SWE-bench","screen_name":"SWEbench","indices":[294,303]}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911110017269764","view_count":2504,"bookmark_count":5,"created_at":1762551498000,"favorite_count":28,"quote_count":0,"reply_count":2,"retweet_count":6,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Just a few of the features I love about Harbor:\n- Evaluate any agent that can be installed and run autonomously\n- Scale up to thousands of concurrent containers using providers like @daytonaio and @modal\n- Generate rollouts for SFT and RL\n- Create your own benchmarks or use existing ones like @SWEbench \n\nhttps://t.co/x4gD2nHQxC","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911108117254546","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,209],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"harborframework.com","expanded_url":"https://harborframework.com/","url":"https://t.co/5kDZlvjhb7","indices":[186,209]}],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911108117254546","view_count":2828,"bookmark_count":7,"created_at":1762551497000,"favorite_count":27,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Harbor is the package we wish we had had while making Terminal-Bench. It’s for agent, model, and benchmark developers and researchers who want to evaluate and improve agents and models.\nhttps://t.co/5kDZlvjhb7","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911106108211461","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,194],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911111460089993","view_count":1578,"bookmark_count":0,"created_at":1762551498000,"favorite_count":16,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Harbor is also the official harness for Terminal-Bench 2.0. We used it to run tens of thousands of experiments in containerized environments while developing the latest version of the benchmark.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911110017269764","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,152],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911113209126988","view_count":1451,"bookmark_count":1,"created_at":1762551499000,"favorite_count":14,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"So why Terminal-Bench 2.0? We always knew that as model capabilities increased, we’d need to keep Terminal-Bench up to date with frontier capabilities.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911111460089993","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,277],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911114626802143","view_count":1535,"bookmark_count":0,"created_at":1762551499000,"favorite_count":17,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Terminal-Bench 2.0 consists of 89 hard tasks to test these capabilities. We aim to push the frontier with an increased emphasis on task quality. Each task received several hours of human and LM-assisted verification to ensure that tasks are (1) solvable, (2) realistic, and (3) well-specified. We’ll share more on how we did this in our upcoming preprint.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911113209126988","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,96],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/GIY0foKhNs","expanded_url":"https://x.com/alexgshaw/status/1986911116115759210/photo/1","id_str":"1986908092202950658","indices":[97,120],"media_key":"3_1986908092202950658","media_url_https":"https://pbs.twimg.com/media/G5LqYf0bIAIo56j.png","type":"photo","url":"https://t.co/GIY0foKhNs","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[{"x":507,"y":873,"h":103,"w":103}]},"medium":{"faces":[{"x":297,"y":511,"h":60,"w":60}]},"small":{"faces":[{"x":168,"y":290,"h":34,"w":34}]},"orig":{"faces":[{"x":637,"y":1097,"h":130,"w":130}]}},"sizes":{"large":{"h":1147,"w":2048,"resize":"fit"},"medium":{"h":672,"w":1200,"resize":"fit"},"small":{"h":381,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1440,"width":2572,"focus_rects":[{"x":0,"y":0,"w":2571,"h":1440},{"x":0,"y":0,"w":1440,"h":1440},{"x":0,"y":0,"w":1263,"h":1440},{"x":90,"y":0,"w":720,"h":1440},{"x":0,"y":0,"w":2572,"h":1440}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908092202950658"}}}],"symbols":[],"timestamps":[],"urls":[{"display_url":"tbench.ai/leaderboard","expanded_url":"https://www.tbench.ai/leaderboard","url":"https://t.co/vblfZF55Nd","indices":[73,96]}],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/GIY0foKhNs","expanded_url":"https://x.com/alexgshaw/status/1986911116115759210/photo/1","id_str":"1986908092202950658","indices":[97,120],"media_key":"3_1986908092202950658","media_url_https":"https://pbs.twimg.com/media/G5LqYf0bIAIo56j.png","type":"photo","url":"https://t.co/GIY0foKhNs","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[{"x":507,"y":873,"h":103,"w":103}]},"medium":{"faces":[{"x":297,"y":511,"h":60,"w":60}]},"small":{"faces":[{"x":168,"y":290,"h":34,"w":34}]},"orig":{"faces":[{"x":637,"y":1097,"h":130,"w":130}]}},"sizes":{"large":{"h":1147,"w":2048,"resize":"fit"},"medium":{"h":672,"w":1200,"resize":"fit"},"small":{"h":381,"w":680,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1440,"width":2572,"focus_rects":[{"x":0,"y":0,"w":2571,"h":1440},{"x":0,"y":0,"w":1440,"h":1440},{"x":0,"y":0,"w":1263,"h":1440},{"x":90,"y":0,"w":720,"h":1440},{"x":0,"y":0,"w":2572,"h":1440}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908092202950658"}}}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911116115759210","view_count":1916,"bookmark_count":5,"created_at":1762551499000,"favorite_count":25,"quote_count":2,"reply_count":3,"retweet_count":3,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"At present, Codex CLI with GPT-5 sits at the top of our new leaderboard.\nhttps://t.co/vblfZF55Nd https://t.co/GIY0foKhNs","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911114626802143","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,272],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911119328616903","view_count":1335,"bookmark_count":2,"created_at":1762551500000,"favorite_count":16,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Astute Terminal-Bench fans may notice that SOTA performance is comparable to TB1.0 despite our claim that TB2.0 is harder. We believe that this is because task quality is substantially higher in the new benchmark. We have removed several misspecified or impossible tasks – increasing difficulty while maintaining raw performance.","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911116115759210","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,148],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"harborframework.com/docs/running-t…","expanded_url":"https://harborframework.com/docs/running-tbench","url":"https://t.co/hmiBYh7VYF","indices":[125,148]}],"user_mentions":[]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911120511373685","view_count":1241,"bookmark_count":1,"created_at":1762551500000,"favorite_count":17,"quote_count":0,"reply_count":1,"retweet_count":2,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Interested in using Terminal-Bench-2.0 or submitting to our new leaderboard? Check out the Harbor docs for more information.\nhttps://t.co/hmiBYh7VYF","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911119328616903","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,265],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1233837766271569920","name":"Mike A. Merrill","screen_name":"Mike_A_Merrill","indices":[51,66]},{"id_str":"64333359","name":"Ludwig Schmidt","screen_name":"lschmidt3","indices":[121,131]},{"id_str":"168787972","name":"Andy Konwinski","screen_name":"andykonwinski","indices":[185,199]},{"id_str":"1854370033520324608","name":"Laude Institute","screen_name":"LaudeInstitute","indices":[217,232]}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","retweeted":false,"fact_check":null,"id":"1986911121924890876","view_count":1219,"bookmark_count":0,"created_at":1762551501000,"favorite_count":17,"quote_count":0,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"It's been a pleasure working alongside my co-lead, @mike_a_merrill, whose leadership rallied the team, and our advisors, @lschmidt3, who seeded the idea and guided our development, and @andykonwinski, who gave us the @LaudeInstitute mandate to \"ship your research.\"","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911120511373685","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[0,254],"entities":{"hashtags":[],"media":[{"display_url":"pic.x.com/PcQz6FkVua","expanded_url":"https://x.com/alexgshaw/status/1986911123543916899/photo/1","id_str":"1986908910708781056","indices":[255,278],"media_key":"3_1986908910708781056","media_url_https":"https://pbs.twimg.com/media/G5LrII_a8AAi2EG.png","type":"photo","url":"https://t.co/PcQz6FkVua","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1822,"w":1292,"resize":"fit"},"medium":{"h":1200,"w":851,"resize":"fit"},"small":{"h":680,"w":482,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1822,"width":1292,"focus_rects":[{"x":0,"y":320,"w":1292,"h":724},{"x":0,"y":36,"w":1292,"h":1292},{"x":0,"y":0,"w":1292,"h":1473},{"x":319,"y":0,"w":911,"h":1822},{"x":0,"y":0,"w":1292,"h":1822}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908910708781056"}}}],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"extended_entities":{"media":[{"display_url":"pic.x.com/PcQz6FkVua","expanded_url":"https://x.com/alexgshaw/status/1986911123543916899/photo/1","id_str":"1986908910708781056","indices":[255,278],"media_key":"3_1986908910708781056","media_url_https":"https://pbs.twimg.com/media/G5LrII_a8AAi2EG.png","type":"photo","url":"https://t.co/PcQz6FkVua","ext_media_availability":{"status":"Available"},"features":{"large":{"faces":[]},"medium":{"faces":[]},"small":{"faces":[]},"orig":{"faces":[]}},"sizes":{"large":{"h":1822,"w":1292,"resize":"fit"},"medium":{"h":1200,"w":851,"resize":"fit"},"small":{"h":680,"w":482,"resize":"fit"},"thumb":{"h":150,"w":150,"resize":"crop"}},"original_info":{"height":1822,"width":1292,"focus_rects":[{"x":0,"y":320,"w":1292,"h":724},{"x":0,"y":36,"w":1292,"h":1292},{"x":0,"y":0,"w":1292,"h":1473},{"x":319,"y":0,"w":911,"h":1822},{"x":0,"y":0,"w":1292,"h":1822}]},"allow_download_status":{"allow_download":true},"media_results":{"result":{"media_key":"3_1986908910708781056"}}}]},"favorited":false,"in_reply_to_screen_name":"alexgshaw","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986911123543916899","view_count":1424,"bookmark_count":0,"created_at":1762551501000,"favorite_count":19,"quote_count":1,"reply_count":1,"retweet_count":1,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"Additionally, Terminal-Bench wouldn’t be possible without its community. We’re so thankful to the over 1k members of our Discord who contributed and audited tasks, helped build and beta test Harbor, and made this such a fun project for everyone involved. https://t.co/PcQz6FkVua","in_reply_to_user_id_str":"1448787032486989825","in_reply_to_status_id_str":"1986911121924890876","is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-09","value":1928,"startTime":1762560000000,"endTime":1762646400000,"tweets":[{"bookmarked":false,"display_text_range":[0,75],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"33521530","name":"swyx","screen_name":"swyx","indices":[10,15]}]},"favorited":false,"lang":"en","quoted_status_id_str":"1986989606320185472","quoted_status_permalink":{"url":"https://t.co/oyJSv9Xper","expanded":"https://twitter.com/latentspacepod/status/1986989606320185472","display":"x.com/latentspacepod…"},"retweeted":false,"fact_check":null,"id":"1987015010288353599","view_count":495,"bookmark_count":1,"created_at":1762576270000,"favorite_count":8,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1987015010288353599","full_text":"Thank you @swyx ! Was great to have you, and the live podcast was awesome 😁","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[16,63],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"284333988","name":"Logan Kilpatrick","screen_name":"OfficialLoganK","indices":[0,15]}]},"favorited":false,"in_reply_to_screen_name":"OfficialLoganK","lang":"en","retweeted":false,"fact_check":null,"id":"1986948823198146586","view_count":449,"bookmark_count":0,"created_at":1762560489000,"favorite_count":2,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@OfficialLoganK Looking forward to seeing where Gemini 3 lands!","in_reply_to_user_id_str":"284333988","in_reply_to_status_id_str":"1986914722235502858","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[17,37],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"993859188823601152","name":"Robert","screen_name":"skull8888888888","indices":[0,16]}]},"favorited":false,"in_reply_to_screen_name":"skull8888888888","lang":"en","retweeted":false,"fact_check":null,"id":"1986949187930616214","view_count":241,"bookmark_count":0,"created_at":1762560576000,"favorite_count":0,"quote_count":0,"reply_count":1,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@skull8888888888 DM me I'm interested","in_reply_to_user_id_str":"993859188823601152","in_reply_to_status_id_str":"1986927150415430137","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[13,137],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[{"display_url":"harborframework.com/docs/task-diff…","expanded_url":"https://harborframework.com/docs/task-difference","url":"https://t.co/dAGxC918PM","indices":[114,137]}],"user_mentions":[{"id_str":"3787342814","name":"pash","screen_name":"pashmerepat","indices":[0,12]}]},"favorited":false,"in_reply_to_screen_name":"pashmerepat","lang":"en","possibly_sensitive":false,"possibly_sensitive_editable":true,"retweeted":false,"fact_check":null,"id":"1986949012445143197","view_count":319,"bookmark_count":0,"created_at":1762560534000,"favorite_count":3,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@pashmerepat Yes and here is a link that describes the differences in addition to the migration guide Mike shared https://t.co/dAGxC918PM","in_reply_to_user_id_str":"3787342814","in_reply_to_status_id_str":"1986913713233010728","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[10,40],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"633509372","name":"Monk Zero","screen_name":"NoCommas","indices":[0,9]}]},"favorited":false,"in_reply_to_screen_name":"NoCommas","lang":"en","retweeted":false,"fact_check":null,"id":"1986949254196437085","view_count":267,"bookmark_count":0,"created_at":1762560592000,"favorite_count":1,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@NoCommas Excited to see where you land!","in_reply_to_user_id_str":"633509372","in_reply_to_status_id_str":"1986926703550328905","is_quote_status":0,"is_ai":null,"ai_score":null},{"bookmarked":false,"display_text_range":[23,38],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"2625021945","name":"potemin","screen_name":"poteminr","indices":[0,9]},{"id_str":"1353836358901501952","name":"Anthropic","screen_name":"AnthropicAI","indices":[10,22]}]},"favorited":false,"in_reply_to_screen_name":"poteminr","lang":"en","retweeted":false,"fact_check":null,"id":"1987008242187485219","view_count":157,"bookmark_count":0,"created_at":1762574656000,"favorite_count":1,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@poteminr @AnthropicAI You and me both","in_reply_to_user_id_str":"2625021945","in_reply_to_status_id_str":"1987002302088237511","is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-10","value":69,"startTime":1762646400000,"endTime":1762732800000,"tweets":[{"bookmarked":false,"display_text_range":[13,14],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[{"id_str":"1531565962641788936","name":"Dylan Allen-Arnegård","screen_name":"dbear_allen","indices":[0,12]}]},"favorited":false,"in_reply_to_screen_name":"dbear_allen","lang":"qme","retweeted":false,"fact_check":null,"id":"1987404815404769403","view_count":69,"bookmark_count":0,"created_at":1762669206000,"favorite_count":1,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1986911106108211461","full_text":"@dbear_allen 🙏","in_reply_to_user_id_str":"1531565962641788936","in_reply_to_status_id_str":"1987400399704498635","is_quote_status":0,"is_ai":null,"ai_score":null}]},{"label":"2025-11-11","value":0,"startTime":1762732800000,"endTime":1762819200000,"tweets":[]},{"label":"2025-11-12","value":0,"startTime":1762819200000,"endTime":1762905600000,"tweets":[]},{"label":"2025-11-13","value":0,"startTime":1762905600000,"endTime":1762992000000,"tweets":[]},{"label":"2025-11-14","value":334,"startTime":1762992000000,"endTime":1763078400000,"tweets":[{"bookmarked":false,"display_text_range":[0,280],"entities":{"hashtags":[],"symbols":[],"timestamps":[],"urls":[],"user_mentions":[]},"favorited":false,"lang":"en","quoted_status_id_str":"1988372114257436810","quoted_status_permalink":{"url":"https://t.co/4o1qxAGQ7E","expanded":"https://twitter.com/warpdotdev/status/1988372114257436810","display":"x.com/warpdotdev/sta…"},"retweeted":false,"fact_check":null,"id":"1989092917798138326","view_count":334,"bookmark_count":0,"created_at":1763071681000,"favorite_count":8,"quote_count":0,"reply_count":0,"retweet_count":0,"user_id_str":"1448787032486989825","conversation_id_str":"1989092917798138326","full_text":"Great to see Warp putting up the top score on Terminal-Bench 2.0 just days after release! Even more exciting to hear that they've already made improvements to their agent based on the results.\n\nUltimately, we hope that Terminal-Bench 2.0 accelerates model and agent development in the domain of text-based computer use, benefiting millions of users.","in_reply_to_user_id_str":null,"in_reply_to_status_id_str":null,"is_quote_status":1,"is_ai":null,"ai_score":null}]},{"label":"2025-11-15","value":0,"startTime":1763078400000,"endTime":1763164800000,"tweets":[]},{"label":"2025-11-16","value":0,"startTime":1763164800000,"endTime":1763251200000,"tweets":[]},{"label":"2025-11-17","value":0,"startTime":1763251200000,"endTime":1763337600000,"tweets":[]}]},"interactions":{"users":[{"created_at":1316236389000,"uid":"374907349","id":"374907349","screen_name":"vinhnx","name":"Vinh Nguyen","friends_count":6097,"followers_count":1140,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1857076247811026944/0UrdvDZz_normal.jpg","description":"iOS. Applied AI Research. Building VT Code coding agent (https://t.co/1ZsOIycYOz), @vtdotai. Built @ClendarApp • Learn by doing • self.opinions","entities":{"description":{"urls":[{"display_url":"github.com/vinhnx/vtcode","expanded_url":"https://github.com/vinhnx/vtcode","url":"https://t.co/1ZsOIycYOz","indices":[57,80]}]},"url":{"urls":[{"display_url":"buymeacoffee.com/vinhnx","expanded_url":"https://buymeacoffee.com/vinhnx","url":"https://t.co/P4Hz19mgKX","indices":[0,23]}]}},"interactions":2},{"created_at":1525789470000,"uid":"993859188823601152","id":"993859188823601152","screen_name":"skull8888888888","name":"Robert","friends_count":625,"followers_count":2166,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1786070151021948928/fXOgt6Fl_normal.jpg","description":"Founder @ Laminar https://t.co/ANyS8EX6Nj (YC S24). Prev @ Palantir, Bloomberg","entities":{"description":{"urls":[{"display_url":"laminar.sh","expanded_url":"https://laminar.sh","url":"https://t.co/ANyS8EX6Nj","indices":[18,41]}]},"url":{"urls":[{"display_url":"laminar.sh","expanded_url":"https://laminar.sh","url":"https://t.co/ANyS8EX6Nj","indices":[0,23]}]}},"interactions":1},{"created_at":1512210999000,"uid":"936906954902986752","id":"936906954902986752","screen_name":"Vishnu_Y19","name":"Vishnu","friends_count":347,"followers_count":202,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1985426206448402432/mcSAIy3H_normal.jpg","description":"Building Teravictus: AI-powered support intelligence | Detecting fires before they burn | Shipping in public","entities":{"description":{"urls":[]},"url":{"urls":[{"display_url":"teravictus.com","expanded_url":"http://www.teravictus.com","url":"https://t.co/qrZORKp4ws","indices":[0,23]}]}},"interactions":1},{"created_at":1510792928000,"uid":"930959130940059648","id":"930959130940059648","screen_name":"spencermateega","name":"Spencer Mateega","friends_count":1178,"followers_count":976,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1985223022496858112/spm-tpwZ_normal.jpg","description":"ceo @afterquery. prev statistics + finance + cs @ wharton / penn, @silverlake_news, @google, @morganstanley, @meta","entities":{"description":{"urls":[]},"url":{"urls":[{"display_url":"mateega.com","expanded_url":"http://mateega.com","url":"https://t.co/lPuhNB6ebh","indices":[0,23]}]}},"interactions":1},{"created_at":1466986565000,"uid":"747221928612507648","id":"747221928612507648","screen_name":"techfrenAJ","name":"techfren","friends_count":969,"followers_count":1892,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1694777985679933440/hBKGn3KM_normal.jpg","description":"Software Engineer and Content Creator :)","entities":{"description":{"urls":[]},"url":{"urls":[{"display_url":"techfren.net","expanded_url":"http://techfren.net","url":"https://t.co/GnfRvBc2kI","indices":[0,23]}]}},"interactions":1},{"created_at":1342073914000,"uid":"633509372","id":"633509372","screen_name":"NoCommas","name":"Monk Zero","friends_count":862,"followers_count":2062,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1905896402414346240/EOVmj9TG_normal.jpg","description":"Bootloading @antigma_labs. exes: @awsCloud, @Meta, @Mysten_Labs. A Turing complete mind, making sense of the world with Gödel incompleteness.","entities":{"description":{"urls":[]}},"interactions":1},{"created_at":1444011163000,"uid":"3787342814","id":"3787342814","screen_name":"pashmerepat","name":"pash","friends_count":494,"followers_count":10697,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1822410217713442816/zLlPfvOK_normal.jpg","description":"currently head of ai @cline | prev @meta knowledge graph | creator of vault // @usc alum","entities":{"description":{"urls":[]},"url":{"urls":[{"display_url":"nouvelle.one","expanded_url":"https://nouvelle.one","url":"https://t.co/8VlSzkezVo","indices":[0,23]}]}},"interactions":1},{"created_at":1313439810000,"uid":"355737959","id":"355737959","screen_name":"aktasbatuhann","name":"Batuhan","friends_count":1242,"followers_count":649,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1858055302203191298/0Tz6qHOL_normal.jpg","description":"Product @driaforall","entities":{"description":{"urls":[]},"url":{"urls":[{"display_url":"docs.dria.co","expanded_url":"http://docs.dria.co","url":"https://t.co/riCD9LGvBt","indices":[0,23]}]}},"interactions":1},{"created_at":1419646306000,"uid":"2944501279","id":"2944501279","screen_name":"etash_guha","name":"Etash Guha","friends_count":226,"followers_count":810,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1665627207862210561/AcwpIoQJ_normal.jpg","description":"Ph.D. @Stanford and @uwcse","entities":{"description":{"urls":[]},"url":{"urls":[{"display_url":"etash.me","expanded_url":"http://www.etash.me","url":"https://t.co/y1Y4RxigbY","indices":[0,23]}]}},"interactions":1},{"created_at":1239014743000,"uid":"29178343","id":"29178343","screen_name":"AlexGDimakis","name":"Alex Dimakis","friends_count":2376,"followers_count":21438,"profile_image_url_https":"https://pbs.twimg.com/profile_images/542926798338543617/KwlwoJRr_normal.jpeg","description":"Professor, UC berkeley | Founder @bespokelabsai |","entities":{"description":{"urls":[]},"url":{"urls":[{"display_url":"people.eecs.berkeley.edu/~alexdimakis/","expanded_url":"https://people.eecs.berkeley.edu/~alexdimakis/","url":"https://t.co/N8GVYXA2q9","indices":[0,23]}]}},"interactions":1},{"created_at":1303182017000,"uid":"284333988","id":"284333988","screen_name":"OfficialLoganK","name":"Logan Kilpatrick","friends_count":2709,"followers_count":233410,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1943787288955084800/QOl7OJMc_normal.jpg","description":"Lead product for @GoogleAIStudio + the Gemini API. My views!","entities":{"description":{"urls":[]},"url":{"urls":[{"display_url":"logank.ai","expanded_url":"https://logank.ai","url":"https://t.co/p6F6wFrh36","indices":[0,23]}]}},"interactions":1},{"created_at":1403250042000,"uid":"2625021945","id":"2625021945","screen_name":"poteminr","name":"potemin","friends_count":38,"followers_count":29,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1973768360262242304/kebBmWjA_normal.jpg","description":"mle at fintech company, filmmaker, building @0secapp \n\nhttps://t.co/NsVXPXDuis","entities":{"description":{"urls":[{"display_url":"apps.apple.com/us/app/0sec-vo…","expanded_url":"https://apps.apple.com/us/app/0sec-voice-ai-calendar/id6752616667","url":"https://t.co/NsVXPXDuis","indices":[55,78]}]},"url":{"urls":[{"display_url":"0sec.app","expanded_url":"http://0sec.app","url":"https://t.co/T5XA3HbRhp","indices":[0,23]}]}},"interactions":1},{"created_at":1683744979000,"uid":"1656372584991293441","id":"1656372584991293441","screen_name":"gamestoneai","name":"Golden Hippie","friends_count":162,"followers_count":35,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1981626555093655553/frMyv1aU_normal.jpg","description":"I still think that the whole internet thingy is just a passing trend.","entities":{"description":{"urls":[]}},"interactions":1},{"created_at":1677441722000,"uid":"1629934761497047041","id":"1629934761497047041","screen_name":"nummanthinks","name":"Numman Ali","friends_count":166,"followers_count":335,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1958877571103281152/0BOHse4E_normal.jpg","description":"Agentic Coding, Applied AI & Exploring Blockchain | e/acc | CTO at UK FinTech https://t.co/FSflM0UWjj","entities":{"description":{"urls":[{"display_url":"retailbook.com","expanded_url":"https://www.retailbook.com","url":"https://t.co/FSflM0UWjj","indices":[78,101]}]}},"interactions":1},{"created_at":1661396780000,"uid":"1562637435930292228","id":"1562637435930292228","screen_name":"allenjpark","name":"Allen","friends_count":2000,"followers_count":1272,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1842339065414250498/_OEjs-V4_normal.jpg","description":"something new | cs @princeton | prev. evals @patronusAI & baker @subway","entities":{"description":{"urls":[]},"url":{"urls":[{"display_url":"allenjpark.com","expanded_url":"http://allenjpark.com","url":"https://t.co/rKMHLI7hSw","indices":[0,23]}]}},"interactions":1},{"created_at":1658424640000,"uid":"1550171383031861251","id":"1550171383031861251","screen_name":"jehovahscript","name":"jacob ۞","friends_count":2013,"followers_count":3807,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1861663009749508099/7iyCyYFB_normal.jpg","description":"YC F25 | prev @hf0 @runpod_io","entities":{"description":{"urls":[]}},"interactions":1,"following":true,"followed_by":false},{"created_at":1653988778000,"uid":"1531565962641788936","id":"1531565962641788936","screen_name":"dbear_allen","name":"Dylan Allen-Arnegård","friends_count":834,"followers_count":887,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1957668133097074690/gioujO2y_normal.jpg","description":"Co-Founder @ Cheers (YC S24) • We help service businesses win local search on ChatGPT • Utah ➡️ SF","entities":{"description":{"urls":[]},"url":{"urls":[{"display_url":"cheers.tech","expanded_url":"https://www.cheers.tech/","url":"https://t.co/KEF64qm39W","indices":[0,23]}]}},"interactions":1},{"created_at":1575199441000,"uid":"1201099556827598848","id":"1201099556827598848","screen_name":"adgtomiwa","name":"Tomiwa","friends_count":167,"followers_count":147,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1663589629151281159/1-9peYU-_normal.jpg","description":"Python || LiverpoolFC || shenanigans","entities":{"description":{"urls":[]}},"interactions":1},{"created_at":1531965603000,"uid":"1019763768631447552","id":"1019763768631447552","screen_name":"idavidrein","name":"david rein","friends_count":1177,"followers_count":3246,"profile_image_url_https":"https://pbs.twimg.com/profile_images/1375548507621257220/OOUh4_Yz_normal.jpg","description":"sentio ergo sum. science @METR_Evals","entities":{"description":{"urls":[]},"url":{"urls":[{"display_url":"idavidrein.com","expanded_url":"http://idavidrein.com","url":"http://idavidrein.com","indices":[0,23]}]}},"interactions":1}],"period":14,"start":1762082468950,"end":1763292068950}}},"settings":{},"session":null,"routeProps":{"/creators/:username":{}}}