Moonshot AI, the Beijing-based artificial intelligence startup backed by Alibaba Group, has launched its most “advanced” large language model yet: Kimi K2 Thinking. Kimi K2 Thinking is positioned as a reasoning-first, agentic model designed to handle long, multi-step tasks autonomously.
The new model operates on a mixture-of-experts framework, boasting approximately one trillion parameters in total, with roughly 32 billion active parameters at a time. This allows K2 to dynamically allocate computational resources to specialized sub-models, improving efficiency without sacrificing performance.
It also supports extremely long context windows, reportedly up to 256,000 tokens, enabling it to process large volumes of text, code, or documentation in a single session. Moonshot AI has emphasized that this capability allows K2 to handle “entire research papers, financial reports, or software repositories” seamlessly, a key advantage for enterprise users. The model can also autonomously perform hundreds of tool calls to external APIs and datasets in sequence, essentially acting as a semi-autonomous “research assistant” capable of retrieving, comparing, and synthesizing data.
The model’s input processing reportedly costs about $0.60 per million tokens, dramatically undercutting major competitors such as GPT-5 and Claude Sonnet 4.5, whose usage prices range from $2 to $15 per million tokens depending on task complexity.
Kimi K2 Thinking demonstrates impressive strengths in reasoning and agentic tasks. It achieved a 44.9 percent score on Humanity’s Last Exam, surpassing GPT-5’s 41.7 percent and Claude Sonnet 4.5’s lower marks when tool use was enabled. It also led the BrowseComp benchmark with a 60.2 percent score, excelling in search-integrated reasoning and contextual data synthesis.
In quantitative fields, K2 performed strongly on mathematical and programming challenges such as AIME, MATH-500, and LiveCodeBench, where it outperformed GPT-4o and Claude Sonnet 3.5 by as much as 550 percent in specific areas. Despite these gains, the model remains highly cost-efficient at $0.60 per million input tokens, between 75 and 90 percent cheaper than proprietary models like GPT-5 and Claude Sonnet 4.5.
Moonshot AI plans to integrate Kimi K2 Thinking into various enterprise applications, from financial modeling and legal research to pharmaceutical development. Early partnerships with universities and national laboratories are reportedly underway to test the model in scientific and data-intensive environments.
However, K2’s dominance is not uniform. It scored 48 on Artificial Analysis Intelligence benchmarks, compared to GPT-5’s 68, and underperformed in writing and creative tasks, scoring 73.8 versus Sonnet 4.5’s 79.8. In general-purpose coding, it performed comparably rather than surpassing competitors, and its base score on Humanity’s Last Exam without tool use trailed behind, indicating possible over-reliance on external enhancements for top-tier reasoning.
Overall, Kimi K2 Thinking excels in specialized, cost-effective reasoning and agentic performance but remains a niche contender in broad creative cognition. Its open-source accessibility and affordability make it a powerful alternative for research and development, though it has yet to prove itself as a universal AI leader.