Core AI Explained: Apple’s New On-Device LLM Framework

By Abdul Wasay|1 hour ago |

Apple announced Core AI, the official successor to Core ML and the unified framework that now underlies Apple Intelligence.

The framework lets developers run large language models and generative AI entirely on-device across iPhone, iPad, Mac, and Apple Vision Pro, with no server dependencies, no per-token cloud costs, and full user data privacy.

As Apple explains it in their official blog:

Core AI helps you build, run, and deploy AI models in your app. Designed with Apple silicon in mind, Core AI allows your app to use the latest model architectures and inference techniques across the CPU, GPU, and Neural Engine. The Swift API makes common tasks simple, while giving you more control over model specialization, caching, and inference performance when needed.

Core AI supports models ranging from compact 3-billion-parameter vision models to large-scale reasoning models with up to 70 billion parameters, all running locally on Apple Silicon. The framework provides a unified API that seamlessly routes workloads across the CPU, GPU, and Neural Engine, eliminating the need for developers to manage hardware resources manually.

A memory-safe Swift API enables zero-copy data paths and fine-grained inference memory control, while ahead-of-time compilation shifts heavy processing off the user’s device, delivering near-instant load times once a model is cached.

Developers can bring custom models to Core AI by converting existing PyTorch models using the Core AI PyTorch library, or by using pre-optimized open-source models Apple provides directly. The framework also supports custom Metal kernels for lower-level hardware optimization. A critical step in deployment is model compression through quantization and palettization, which reduces memory footprint, inference latency, and power consumption simultaneously

Apple has clarified the intended division of its three on-device AI frameworks. Core ML continues to serve classical, non-neural machine learning tasks such as decision trees and tabular feature engineering. Core AI handles neural networks, transformers, and generative models.

MLX Swift targets developers working with custom model weights who want research-level flexibility, though potentially at lower performance compared to Core AI’s hardware-optimized pipeline.

Developer community feedback notes that Core AI makes incorporating high-performance LLMs into apps significantly simpler, while its long-term value will depend on how quickly Apple and the broader developer community expand its official and community support.