Google Cloud has unveiled two major upgrades to its Kubernetes ecosystem: the GKE Agent Sandbox and the Inference Gateway, marking what experts are calling the company’s most ambitious move yet to make Kubernetes the backbone of enterprise-scale artificial intelligence operations. The announcement was made at KubeCon North America 2025, coinciding with the 10th anniversary of Google Kubernetes Engine (GKE).
The Agent Sandbox provides a secure, isolated runtime for model-generated code and autonomous AI agents inside GKE clusters. Built using gVisor-based container isolation, the Sandbox allows developers to safely execute model-generated instructions.
According to Google engineers, the Agent Sandbox reduces cold-start latency by nearly 90 percent compared to traditional container deployment, offering faster and safer code execution for real-time AI workflows. The environment allows agents to pause and resume with persistent state storage, cutting infrastructure costs while improving response times.
Alongside the Sandbox, Google introduced the Inference Gateway, a production-ready serving layer for large language model (LLM) inference and generative AI workloads. The system provides model-aware routing, prefix caching, and disaggregated prefill and decode pipelines to reduce latency during token generation.
Google claims the Gateway can deliver up to 96 percent faster Time-to-First-Token and 25 percent lower cost per token compared to existing managed Kubernetes services. It is fully integrated into GKE’s networking stack and optimized for GPU and TPU clusters.
The new systems are designed to handle the scale and complexity of modern enterprise AI, with clusters reportedly supporting tens of thousands of nodes. Google says early adopters in financial services, e-commerce, and software engineering have already begun pilot deployments.
Running model-generated code introduces significant security and compliance risks, something Google acknowledges. The company says the Sandbox’s isolation and runtime monitoring features will prevent rogue processes from escaping containers or accessing unauthorized network resources.
Google is expected to expand these features with deeper integrations into its Vertex AI platform and new Ironwood TPU hardware, aiming to make GKE the standard platform for scalable inference and AI-driven applications.
According to multiple industry sources, the new GKE stack could redefine how enterprises manage machine learning in production, offering a bridge between DevOps, MLOps, and AI infrastructure teams.