Whisper Burned My CPU to 92°C — That's When I Discovered MacBook Has Three Compute Engines

I was using Whisper to transcribe a recording when the CPU temperature shot up to 92°C.

The metal body of my MacBook Pro was hot enough to fry an egg, yet the fan was spinning at a leisurely 1660 RPM like nothing was wrong. Even stranger, the machine started making a faint high-pitched whine — not the fan, but some kind of electromagnetic squeal.

My first thought: this is an M4 Pro with 24GB of RAM. It shouldn’t be struggling this hard to transcribe audio.

So I started digging. What I found was a lot of things I’d never known before.


Whisper Wasn’t Using the GPU at All

Opening the Whisper source code, the answer is right there at line 130:

if device is None:
    device = "cuda" if torch.cuda.is_available() else "cpu"

The logic is simple: use GPU if NVIDIA CUDA is available, otherwise fall back to CPU. There’s no third branch to check for an Apple GPU.

If the code had been written like this, we’d be fine:

if device is None:
    if torch.cuda.is_available():
        device = "cuda"
    elif torch.backends.mps.is_available():
        device = "mps"
    else:
        device = "cpu"

But OpenAI didn’t do that. The community submitted a fix back in 2022 (GitHub PR #382), and it still hasn’t been merged after three-plus years.

So my M4 Pro has a powerful GPU, PyTorch fully supports it (torch.backends.mps.is_available() returns True), but Whisper’s code never even asks “do you have an Apple GPU?” — it goes straight to CPU. That’s why all 10 CPU cores were pegged at 1026%, temperatures hit 92°C, and the GPU sat there doing nothing.

This also explained two other puzzles:

Why didn’t the fan spin up? This is Apple’s design philosophy — better to let the chip run hot than have a noisy fan. The M4 Pro’s maximum safe temperature is around 105°C, so 92°C is completely normal from the system’s perspective. Apple’s fan curve is very conservative: silence is the priority, the fan only ramps up when approaching 100°C, and the chip throttles only when that’s not enough.

What was that whine? Electromagnetic coil whine — a known issue with the M4 Pro/Max MacBook Pro. Under heavy CPU load, rapidly changing current causes the tiny inductors on the motherboard to vibrate physically at high frequency, producing an audible sound. It’s not a defect. It’s physics.


Wait — What Are CUDA and MPS?

I mentioned CUDA and MPS above. What’s the relationship between them?

CUDA (Compute Unified Device Architecture) is the GPU general-purpose computing framework NVIDIA launched in 2006. The core idea: GPUs were originally just for rendering graphics, but they have thousands of small cores capable of parallel computation — why not let them handle general computing too? CUDA lets developers write code in something like C and run arbitrary compute tasks directly on the GPU.

MPS (Metal Performance Shaders) is Apple’s equivalent. Metal is the GPU programming framework Apple launched in 2014 (analogous to NVIDIA’s CUDA or Microsoft’s DirectX), and MPS is the part of Metal specifically designed for high-performance GPU computation. Starting with PyTorch 1.12, Apple Silicon GPU acceleration became available through the MPS backend.

One thing that’s easy to confuse: Metal and the M-series chips are two different things. The M chips (M1/M2/M3/M4) are Apple’s custom hardware. Metal is Apple’s GPU programming framework — software. Intel Macs before the M-chip era used Metal. iPhones and iPads use Metal. They’re hardware and software, not the same thing.

Short version: MPS = GPU acceleration on Mac. It’s the Apple counterpart to NVIDIA CUDA — CUDA lets PyTorch use NVIDIA GPUs, MPS lets PyTorch use Apple GPUs.

This is also why AI and deep learning are so inseparably tied to NVIDIA today — they built this GPU general-computing ecosystem nearly a decade ahead of Apple and AMD. PyTorch and TensorFlow supported CUDA first; Apple’s MPS came much later.


Wait, MacBook Has a GPU?

Honestly, before chasing down this Whisper issue, I had no idea MacBook even had a GPU. My mental model was that GPUs were discrete cards you stuck in a desktop tower.

But Apple Silicon chips do include a GPU — it’s just not a standalone graphics card. It’s packaged together with the CPU on a single chip.

What’s the difference between CPU and GPU? The most intuitive analogy:

  • CPU = a math professor. Extremely capable, can solve any complex problem, but there’s only one of them (a handful of cores), handling one thing at a time.
  • GPU = a classroom of thousands of elementary schoolers. Each one can only do simple arithmetic, but with thousands working simultaneously, the throughput is extraordinary.

The core operations in AI and deep learning are enormous amounts of simple matrix multiplications — no complex logic required, but massive parallelism is essential. GPUs are naturally suited for this.

CPUGPU
Core countFew (M4 Pro has 12)Many (M4 Pro GPU has 20 cores; NVIDIA RTX 4090 has 16,384 CUDA cores)
Single-core capabilityStrong, handles complex logicWeak, excels only at simple repetitive operations
Good atOS, application logic, branchingGraphics rendering, matrix math, AI training

Apple GPU vs. NVIDIA GPU:

Apple GPU (built into M4 Pro)NVIDIA GPU (e.g., RTX 4090)
FormIntegrated in SoC chip, shares memory with CPUDiscrete graphics card, has its own VRAM
MemoryShares 24GB unified memory with CPUSeparate 24GB GDDR6X VRAM
Power draw~30–50W for the whole machine450W for the GPU alone
AdvantageExceptional power efficiency, silent, no data copying with unified memoryAbsolute raw compute power, mature ecosystem

What Is CUDA and MPS Again?

(Already covered above — CUDA is NVIDIA’s framework, MPS is Apple’s. The short version: MPS = GPU acceleration on Mac.)


The Third Engine Hidden Inside the Chip: ANE

After looking into the GPU situation, I discovered there’s a third compute engine inside Apple Silicon — ANE (Apple Neural Engine).

My M4 Pro chip actually has three hardware units capable of AI computation:

UnitNamePurposeAccessibility
CPUHigh-performance + efficiency coresGeneral-purpose computing, runs anythingFully open
GPUApple GPU (integrated graphics)Graphics rendering + parallel computeOpen via Metal/MPS
ANEApple Neural EngineNeural network inference specificallyAlmost entirely closed

The M4 Pro’s ANE has 16 cores and 15.8 TFLOPS of compute, with extremely low power consumption. Apple uses it internally for FaceID, photo recognition, and Siri voice processing.

But Apple has not published the ANE’s programming interfaces. Developers can only access it indirectly through CoreML, with no direct control.

Comparing to other vendors:

AppleNVIDIAGoogle
AI dedicated acceleratorANE (Neural Engine)Tensor Core (integrated inside GPU)TPU (separate chip)
Design philosophyDedicated chip separate from GPUDedicated compute units embedded within GPUData center–scale standalone chip
GoalOn-device inference, power efficiency above allTraining + inference, raw compute above allLarge-scale training, Google internal use

NVIDIA’s approach is arguably smarter: they put AI acceleration capability directly inside the GPU (Tensor Cores). Developers using CUDA automatically get access to both regular cores and Tensor Cores — no extra learning curve.

Why Doesn’t Apple Open Up ANE?

This is one of the most-discussed topics in the community. Drawing from Hacker News and Reddit:

1. Security and privacy. ANE handles privacy-sensitive tasks like FaceID and on-device Siri. Exposing low-level interfaces could be exploited maliciously.

2. ANE’s architecture doesn’t suit general-purpose development. A developer on Hacker News noted: “ANE is a graph execution engine — a fixed-function accelerator.” It only accepts a fully compiled computation graph and executes it in one shot. It’s nowhere near as flexible as a GPU, which makes it too restrictive for external developers.

3. Apple’s internal teams use lower-level interfaces. An Apple employee disclosed on HN: “In my conversations with people at Apple, they do not use CoreML. Instead, they have access to lower level libraries…CoreML is the crappy middleware they made for 3rd party devs.” Internal Apple teams have far more powerful APIs; CoreML is the stripped-down version for outsiders.

4. Business strategy. If ANE were opened up, third parties could turn any Mac into an efficient AI inference machine, which would undermine the uniqueness of Apple’s own AI services.

5. Even Apple’s own MLX team reportedly can’t access ANE’s source code. Comments on HN suggest the MLX project lead may have left Apple partly because of this. The closure isn’t just external — it’s internal too.

Interestingly, there’s a reverse-engineering project on GitHub that, by reverse-engineering private APIs like _ANEClient and _ANECompiler, successfully trained a 109-million-parameter Transformer model on the ANE — something Apple never intended external developers to do. Testing showed the M4 ANE sustains 1.78 TFLOPS of continuous compute, with far lower power draw than the GPU.

Someone on Hacker News put it perfectly:

“I’m paying for a hardware accelerator that makes Siri go.”


Everything in One Chip: The SoC Architecture

After learning about the CPU, GPU, and ANE, I got curious about their physical locations inside a MacBook. I looked up teardown photos and ran into a key correction to my mental model:

The CPU, GPU, and ANE are not three separate chips. They’re all integrated onto the same single chip.

This is the most fundamental design principle behind Apple Silicon — SoC (System on a Chip). Open the back of a MacBook and you won’t find three separate components. You’ll see one chip: the M4 Pro. On this single piece of silicon the size of a fingernail, 28 billion transistors are etched:

┌─────────────────────────────────────────────────┐
│                  M4 Pro Chip                      │
│                                                   │
│  ┌──────────┐  ┌──────────┐  ┌───────────────┐  │
│  │ CPU      │  │ Neural   │  │     GPU       │  │
│  │ 12 cores │  │ Engine   │  │    20 cores   │  │
│  │(10P+2E)  │  │ 16 cores │  │              │  │
│  └──────────┘  └──────────┘  └───────────────┘  │
│                                                   │
│  ┌──────────┐  ┌──────────┐  ┌───────────────┐  │
│  │ Unified  │  │ Media    │  │  Memory       │  │
│  │ Memory   │  │ Engine   │  │  Controller   │  │
│  │ Controller│ │ Codec    │  │  273 GB/s     │  │
│  └──────────┘  └──────────┘  └───────────────┘  │
│                                                   │
│  ┌──────────────────────────────────────────┐    │
│  │         LPDDR5X Unified Memory (24GB)     │    │
│  │       Packaged adjacent to chip,          │    │
│  │       shared access                       │    │
│  └──────────────────────────────────────────┘    │
└─────────────────────────────────────────────────┘

Open the back of a MacBook Pro 16", and the layout you’d actually see looks something like this:

┌───────────────────────────────────────────┐
│            MacBook Pro 16" Interior        │
│  (Screen above, hinge at top)             │
│                                           │
│  ┌─────────────────────────────────────┐  │
│  │       Heatsink + Fan (left)         │  │
│  │    ┌─────────────┐                  │  │
│  │    │  M4 Pro     │    Heat pipe      │  │
│  │    │  SoC Chip   │  ═══════════     │  │
│  │    │ (CPU+GPU+   │         Fan (right)│ │
│  │    │  ANE all    │                  │  │
│  │    │  in here)   │                  │  │
│  │    └─────────────┘                  │  │
│  │         Logic Board                 │  │
│  ├─────────────────────────────────────┤  │
│  │                                     │  │
│  │   Battery                           │  │
│  │                                     │  │
│  └─────────────────────────────────────┘  │
│                                           │
└───────────────────────────────────────────┘

The M4 Pro is a tiny chip (roughly the size of a postage stamp), but it houses all of the CPU, GPU, and ANE. The cooling system is designed for this single chip — which is why the entire MacBook’s thermal management behaves as one integrated unit rather than independently cooling multiple components.


Unified Memory Does More Than “Avoid Copying Data”

Apple’s Unified Memory Architecture (UMA) offers advantages far beyond zero-copy transfers:

1. Better Memory Utilization

In traditional architectures, system RAM and GPU VRAM operate independently:

Traditional PC:
  System RAM  16GB  (for CPU)
  VRAM         8GB  (for GPU, soldered to the graphics card)
  → CPU can't use that 8GB VRAM no matter how busy it gets, and vice versa
Apple UMA:
  Unified Memory  24GB  (whoever needs it gets it)
  → When GPU is under pressure, it can claim more; CPU releases when idle
  → No more "VRAM exhausted but system RAM has plenty" frustration

This is why a MacBook with 24GB unified memory can run large language model inference — all 24GB is available to the model. With an NVIDIA GPU, even if the system has 64GB of RAM, the model size is limited by VRAM (an RTX 4090 has only 24GB VRAM).

2. Physical Proximity = High Bandwidth, Low Latency

The unified memory sits physically right next to the chip on the package. The M4 Pro’s memory bandwidth is 273 GB/s — several times that of the DDR5 memory in a typical PC.

3. Dramatically Lower Power Consumption

In a traditional architecture, CPU ↔ system RAM ↔ PCIe bus ↔ GPU VRAM involves multiple data transfers and copies. Each crossing burns power and takes time. With UMA, there’s only one memory pool at one physical location — far less movement, far less power used.


Why Apple Silicon Is So Powerful

When M1 launched, there were wall-to-wall reviews proclaiming it “destroys Intel.” I didn’t pay much attention at the time. Looking more deeply now, I found that semiconductor blogger Laoshi Tan Xin (老石谈芯) offers a compelling analysis. He attributes the M-series advantage to three compounding factors:

Layer 1: TSMC’s Advanced Process

“Over the past decade, more than 60% of chip performance improvements have been directly or indirectly driven by advances in semiconductor manufacturing, while only 17% came from chip architecture improvements.” — Laoshi Tan Xin

When M1 launched, it used TSMC’s 5nm process. At the same time, Intel laptop CPUs were still on 14nm. My M4 Pro uses TSMC’s second-generation 3nm.

The “5 nanometer” here isn’t the thickness of the chip — it’s the manufacturing process node of the transistors, roughly correlating to transistor “size.” A quick comparison for scale:

  • Human hair diameter: ~80,000 nm
  • Intel 14nm process: ~14 nm
  • TSMC 5nm process: ~5 nm
  • M4 Pro (2nd-gen 3nm): ~3 nm

Smaller transistors mean more transistors fit in the same area (more performance), shorter distances for electrons to travel (faster switching), and less energy needed per switch (lower power consumption). This is why the 2020 M1 (5nm) needed no fan at all (the MacBook Air is fanless) yet outperformed contemporary Intel 14nm chips.

One caveat: today’s “5nm” is no longer a strict physical measurement. It’s more of a generational label — smaller number = more advanced technology, but it’s not a precise physical measurement of anything.

Layer 2: SoC Architecture + Unified Memory

Integrating CPU, GPU, ANE, memory controller, video codec engine, and security chip all onto one chip, combined with unified memory — this eliminates the communication bottlenecks between components that exist in traditional architectures.

Layer 3: Vertical Hardware-Software Integration

Laoshi considers this the most critical and hardest-to-replicate advantage:

“Only Apple can take an ARM-based CPU and turn it into a genuinely great product. The 5nm process or chip architecture techniques I mentioned — other manufacturers will master those too. But deep optimization across hardware, software, operating system, and ecosystem together — that’s uniquely Apple’s.” — Laoshi Tan Xin

macOS knows every detail of the chip and can precisely schedule work across performance cores, efficiency cores, GPU, and ANE. Compilers and frameworks are deeply optimized. iOS apps run natively on Mac because the underlying instruction set is the same ARM architecture.

The M-series chips aren’t a single technological breakthrough — they’re the result of process leadership + architectural innovation + ecosystem integration stacked together. Any one of those in isolation isn’t unique (anyone can use TSMC; SoC designs have existed in phones for years). But doing all three simultaneously at this level — right now, only Apple is doing that.


Epilogue: MacBook Neo and the 8GB Memory Challenge

After going through all of this, I happened to see Apple had just released the MacBook Neo — a $599 entry-level Mac using the A18 Pro (the same chip as the iPhone 16 Pro), with only 8GB of unified memory and no upgrade path.

This raises a practical question: is 8GB enough?

Apple’s response includes virtual memory (Swap) — when physical memory runs low, inactive data gets temporarily stored on the SSD.

Think of it like a desk:

  • RAM = your desk. Limited surface area, but everything you’re actively working on is right at hand.
  • SSD = the drawer under the desk. Much more storage, but you have to bend down and open it — a few seconds slower, but manageable.
  • Hard disk = the bookshelf in the next room. You have to stand up and walk over. Slow.

Virtual memory is: when the desk is full, push what you’re not using into the drawer, and pull it back out when needed.

RAMSSDTraditional HDD
Speed~100 GB/s~3–4 GB/s~0.1 GB/s
AnalogyDeskDrawerBookshelf in the next room

Apple Silicon’s SSD is particularly fast, so this mechanism works reasonably well. But it has costs: frequent swapping accelerates SSD wear, and even the fastest SSD is 25–30x slower than actual RAM. Add macOS’s built-in memory compression (which can effectively stretch 8GB to roughly 10–12GB usable), and heavy AI tasks routed to Private Cloud Compute on Apple’s servers — Apple is doing its best to make 8GB livable.

Community feedback isn’t encouraging, though. Developers report that macOS Tahoe itself is more memory-hungry than its predecessor, with even 16GB machines feeling the regression — some people are calling it “Mac OS Vista.”

MacBook Neo is fundamentally a Chromebook competitor. It’s for students and light users. Not a developer machine.


Back to Where We Started

Starting from Whisper burning my CPU to 92°C, I ended up understanding the entire Apple Silicon technology stack. A summary of what this journey uncovered:

QuestionAnswer
Why doesn’t Whisper use the GPU?The code only checks for CUDA, not MPS
Why didn’t the fan speed up at 92°C?Apple designed it that way — 105°C is the emergency threshold
Where did the electromagnetic whine come from?Inductor vibration under high load — a known M4 Pro phenomenon
Does MacBook have a GPU?Yes, integrated inside the M4 Pro chip
What else is in there I didn’t know about?ANE (Neural Engine), 16 cores, but Apple keeps it locked down
What’s so great about unified memory?Zero-copy + better utilization + lower power + higher bandwidth
Why are M chips so powerful?Process leadership + SoC architecture + software-hardware integration — all three stacked

There are things you use every day without really understanding. It’s not until something goes wrong — like a CPU burning up to 92°C — that you’re motivated to dig into what’s actually happening underneath.

That might be the best motivation for learning: not “I should know this,” but “I ran into a problem and I have to figure it out.”

If you found this helpful, consider buying me a coffee to support more content like this.

Buy me a coffee