God, Love, News, Event, Entertainment, Amebo,..... All about Bringing out the best in you...
Show HN: A Bomberman-style 1v1 game where LLMs compete in real time https://ift.tt/07fvTyX
Show HN: A Bomberman-style 1v1 game where LLMs compete in real time A few weeks ago, ARC-AGI 3 was released. For those unfamiliar, it’s a benchmark designed to study agentic intelligence through interactive environments. I'm a big fan of these kinds of benchmarks as IMO they reveal so much more about the capabilities and limits of agentic AI than static Q&A benchmarks. They are also more intuitive to understand when you are able to actually see how the model behaves in these environments. I wanted to build something in that spirit, but with an environment that pits two LLMs against each other. My criteria were: 1. Strategic & Real-time. The game had to create genuine tradeoffs between speed and quality of reasoning. Smaller models can make more moves but less strategic ones; larger models move slower but smarter. 2. Good harness. I deliberately avoided visual inputs — models are still too slow and not accurate enough with them (see: Claude playing Pokémon). Instead, a harness translates the game state into structured text, and the game engine renders the agents' responses as fluid animations. 3. Fun to watch. Because benchmarks don't need to be dry bread :) The end result is a Bomberman-style 1v1 game where two agents compete by destroying bricks and trying to bomb each other. You can check a demo video here: https://youtu.be/4x8tVypmuRk Would love to hear what you think! https://ift.tt/mUMB9uo April 13, 2026 at 09:36PM
Subscribe to:
Post Comments (Atom)
Show HN: Write better Go integration tests with open source dockertest v4 https://ift.tt/ZGSa0Yn
Show HN: Write better Go integration tests with open source dockertest v4 https://ift.tt/kQf4LBv April 14, 2026 at 03:14AM
-
submitted by /u/Dull_Tonight [link] [comments] source https://www.reddit.com/r/worldnews/comments/pehy48/housing_secretary_robert_je...
-
Show HN: A Spotify player in the terminal with full feature parity https://ift.tt/oZgrl1Q July 18, 2024 at 02:57AM
-
Show HN: Wallpapper Splitter for Many Desktop I've build an simple tool to split your wallpapers across multiple desktops. Now you can u...
No comments:
Post a Comment