In a recent test, Google’s Gemini 2.5 Pro AI model was streamed live playing classic Pokémon games and it “panicked” on camera. As its Pokémon neared defeat, Gemini abruptly dropped tools, abandoned battle strategies, and made rash decisions that caused its reasoning to collapse, mirroring human stress responses.
Viewers on platforms like Twitch and X caught the erratic behavior and labeled it a spur-of-the-moment “panic mode.” Researchers at Google DeepMind confirmed the model enters a distinct state that leads to noticeable performance drops during tense in-game moments.
Experts say Gemini’s behavior is more than entertaining. It highlights how AI handles shifting objectives and uncertain scenarios. During streams, the AI translated its reasoning into real-time text, providing rare insight into its thought process.
Despite being slow and taking hundreds of hours to clear a game a child can finish in minutes, Gemini demonstrated impressive puzzle-solving. It solved complex boulder puzzles in Victory Road on the first try after creating its own agentic helper tools.
Anthropic’s Claude model has exhibited its own peculiarities, including the deliberate fainting of players to circumvent game mechanics, a tactic known as “faint to skip.” In comparison, Gemini’s panic is distinctive.
Gemini demonstrates stress-induced logic failures, providing a novel perspective on the diverse failure modes present in contemporary AI systems, whereas Claude exploits game rules.
Watching Gemini play Pokémon is part of a broader effort to use games as AI stress tests. Though not every move reflects broader intelligence, researchers see value in watching AI navigate dynamic digital worlds. Such experiments reveal how models reason, adapt toolkits, and recalibrate under pressure.
Gemini’s performance suggests AI is making headway on formal problem-solving, even as emotional-like breakdowns still persist.