Local LLM Benchmark on Intel Lunar Lake

I wanted to evaluate how Intel’s Lunar Lake performs when running local LLM inference using llama.cpp, and compare it against Apple’s M-series chips.

  • Vulkan — a cross-platform GPU compute backend.
  • SYCL — a C++-based, cross-platform programming model for heterogeneous computing.
  • IPEX-LLM — Intel’s optimized PyTorch extension library designed for low-latency LLM inference and fine-tuning on Intel CPUs and GPUs.

Apple Silicon

To establish a baseline, I first measured Apple’s M-series performance using F16, Q8, and Q4 quantization levels from mistralai/Ministral-3–3B–Instruct–2512–GGUF. The BF16 model was converted to F16 using llama-quantize.

llama-bench on Apple M2 llama-bench on Apple M2

llama-bench on Apple M1 Pro llama-bench on Apple M1 Pro

Intel Lunar Lake

The first run using Vulkan delivered underwhelming results.

llama-bench on 258v with Vulkan-backend llama-bench on 258v with Vulkan-backend

When switching to SYCL, optimized for F16 (the only available precision options are F16 and F32), I observed a substantial boost in prompt processing speed (pp or prefill) compared to Vulkan. However, token generation (tg) results were mixed.

llama-bench on 258v with Sycl-backend llama-bench on 258v with Sycl-backend

Finally, I attempted the same test using IPEX-LLM, following Intel’s official guide. Intel provides a precompiled llama.cpp binary optimized for IPEX-LLM, but development appears to have stalled — the latest release dates back to April 2025. That version lacks support for newer architectures and models, including Mistral 3, so I was only able to benchmark older ones like Llama 3.

Take Away

Lunar Lake’s integrated GPU (140V) demonstrated impressive prefill throughput when compiled with the SYCL backend — roughly 2× faster than the M1 Pro and nearly on par with the M1 Max. However, token generation speed remains disappointing, comparable to the three-year-old M2.

While Lunar Lake isn’t designed for heavy sustained workloads, it performs decently for long-prompt, short-response tasks. For more balanced workloads, Apple’s M-series still offers better overall inference efficiency.