Local LLM Benchmark on Intel Lunar Lake
I wanted to evaluate how Intel’s Lunar Lake performs when running local LLM inference using llama.cpp, and compare it against Apple’s M-series chips.
- Vulkan — a cross-platform GPU compute backend.
- SYCL — a C++-based, cross-platform programming model for heterogeneous computing.
- IPEX-LLM — Intel’s optimized PyTorch extension library designed for low-latency LLM inference and fine-tuning on Intel CPUs and GPUs.
Apple Silicon
To establish a baseline, I first measured Apple’s M-series performance using F16, Q8, and Q4 quantization levels from mistralai/Ministral-3–3B–Instruct–2512–GGUF. The BF16 model was converted to F16 using llama-quantize.
llama-bench on Apple M2
llama-bench on Apple M1 Pro
Intel Lunar Lake
The first run using Vulkan delivered underwhelming results.
llama-bench on 258v with Vulkan-backend
When switching to SYCL, optimized for F16 (the only available precision options are F16 and F32), I observed a substantial boost in prompt processing speed (pp or prefill) compared to Vulkan. However, token generation (tg) results were mixed.
llama-bench on 258v with Sycl-backend
Finally, I attempted the same test using IPEX-LLM, following Intel’s official guide. Intel provides a precompiled llama.cpp binary optimized for IPEX-LLM, but development appears to have stalled — the latest release dates back to April 2025. That version lacks support for newer architectures and models, including Mistral 3, so I was only able to benchmark older ones like Llama 3.
Take Away
Lunar Lake’s integrated GPU (140V) demonstrated impressive prefill throughput when compiled with the SYCL backend — roughly 2× faster than the M1 Pro and nearly on par with the M1 Max. However, token generation speed remains disappointing, comparable to the three-year-old M2.
While Lunar Lake isn’t designed for heavy sustained workloads, it performs decently for long-prompt, short-response tasks. For more balanced workloads, Apple’s M-series still offers better overall inference efficiency.