NYU’s Julian Togelius on why LLMs struggle with video games

Julian Togelius, director of NYU’s Game Innovation Lab, highlights a significant gap in the capabilities of large language models (LLMs) regarding their ability to play video games. Despite rapid advancements in coding and mathematics, LLMs generally struggle to navigate gaming environments effectively. Even when models achieve specific milestones, such as completing Pokémon Blue, they often perform far slower than human players, exhibit repetitive behaviors, and require specialized software to mediate their interactions with the game. Togelius explains that LLMs excel at coding because it functions like a "well-behaved" game with immediate, granular feedback through compilers and test results. Video games, however, present a more complex challenge due to their diverse mechanics and varied input representations. While specialized AI like AlphaZero can master specific games like chess or Go, they must be meticulously retrained for each new environment. This lack of a general game AI suggests that current models cannot easily transfer skills between different gaming titles. A major obstacle to progress in this field is the reliance on training data. LLMs tend to perform better in games like Minecraft or Pokémon because these titles are supported by millions of hours of online guides and documentation. For less popular or newer games, the lack of extensive textual data makes it difficult for models to understand objectives or mechanics. This limitation underscores the broader reality that current AI remains highly dependent on existing human-generated information rather than possessing a true autonomous understanding of game logic.

Open Original Source