Linux Thermal Management: Fix Laptop Overheating

Contents

Laptop overheating on Linux is rarely one bug. It’s a stack problem. Firmware, kernel power policy, the CPU governor, discrete GPU power, and plain dust in the heatsink all interact. The good news: Linux shows you every layer. Work through it in order and you can cut sustained temps by 8 to 20 C, quiet the fans, and stretch battery life without slowing the laptop down.

This guide reads as a full workflow, not a random list of tweaks. You’ll start with prereqs and a baseline, score how bad the issue is, then fix in order: software first, firmware and kernel next, hardware last.

Prerequisites and Baseline

Before you change anything, check that your platform can do sane thermal control on Linux. Skip this and you can burn hours tuning userspace tools while the real bug lives in firmware.

Minimum baseline for this guide:

Kernel: 6.8+ recommended, 6.10+ preferred for newer Intel/AMD mobile platforms
Userspace: lm-sensors , powertop, turbostat (Intel), nvtop or radeontop (GPU visibility)
Firmware: latest BIOS/UEFI from vendor
Power profile service: choose one primary CPU policy tool (tlp or auto-cpufreq ) to avoid conflicts

Install core diagnostics:

# Debian/Ubuntu
sudo apt update
sudo apt install -y lm-sensors powertop linux-tools-common linux-tools-generic \
  stress-ng htop iotop nvtop

# Arch
sudo pacman -Syu --needed lm_sensors powertop linux-tools stress-ng htop iotop nvtop

Run sensor detection once:

sudo sensors-detect --auto
sensors

Capture a baseline so you can prove improvements later:

# 5-second interval, 24 samples (~2 minutes)
watch -n 5 'sensors; echo; cat /sys/class/thermal/thermal_zone*/temp 2>/dev/null'

Record:

Metric	Idle (plugged in)	Load (10 min compile/stress)
CPU package temp
GPU temp
Fan RPM
CPU package power (W)
Throttling observed	Yes/No	Yes/No

Skip this and you won’t know whether a tweak actually helped or just shifted the fan noise.

Severity Triage: Mild, Moderate, Severe

Treat thermal work like incident triage. Not every machine needs an undervolt or a repaste.

Severity	Typical symptoms	Likely class of fix	Start here
Mild	Warm chassis, occasional fan ramps, no throttle	Software policy tuning	TLP or auto-cpufreq, governor/power profile checks
Moderate	Frequent loud fans, sustained 90 C+ CPU under routine workloads	Mixed software + firmware	CPU boost limits, kernel params, GPU runtime PM, fan curve tuning
Severe	Immediate thermal throttle, shutdowns, hot at idle	Hardware intervention required	Dust cleanup, paste/pad replacement, fan integrity, heatsink seating

A common mistake: throwing severe hardware fixes at a mild issue. The flip is just as common: endless kernel param tweaks on a laptop with a heatsink full of lint.

Why Linux Laptops Overheat (The Root Causes)

Linux laptops run hotter than Windows on some models because OEMs ship vendor thermal logic tied to their Windows stack. On Linux, the generic kernel paths usually work fine. Edge cases show up when ACPI tables, WMI hooks, or fan chips are only half-exposed.

The most common root causes are:

ACPI and WMI feature gaps: thermal zones show up, but vendor fan and boost controls are partial
Poor C-state residency: the CPU doesn’t spend enough time in deep idle
Pushy turbo on AC power: sustained high clocks beyond what the cooler can shed
dGPU staying active: the NVIDIA or AMD discrete GPU never fully enters its low-power runtime state
Suspend and resume regressions: post-resume power policy drifts into hot behavior

Use turbostat to check that your CPU is idling right. If package C-states stay shallow and package power stays high at idle, your heat problem starts before you launch a single app.

sudo turbostat --Summary --show Core,CPU,Avg_MHz,Bzy_MHz,PkgWatt,PkgTmp,POLL,Busy%,C1%,C6%,C10 -i 2

On healthy Intel mobile chips, idle C10 residency should be high at true idle. If C10 is near zero and package power stays up, check background wakeups, the governor, and kernel power params.

Diagnosing Your Thermal Problem

s-tui provides a comprehensive terminal-based view of CPU thermal behavior under load

Diagnosis is the high-leverage phase. You’re trying to answer three things:

Which part is making most of the heat (CPU, GPU, storage, VRM)?
Is the heat from load, from policy, or from bad heat transfer?
Does it happen all the time, or only after events like suspend and resume?

Start with sensor and process visibility. Useful tools here are htop and iotop :

# CPU, board, and fan sensors
sensors

# CPU load and runaway processes
htop

# Per-process disk activity (can cause CPU package wakeups)
sudo iotop -oPa

# GPU utilization and memory clocks
nvtop

For thermal zones:

for z in /sys/class/thermal/thermal_zone*; do
  echo "$(basename "$z") type=$(cat "$z/type" 2>/dev/null) temp_mC=$(cat "$z/temp" 2>/dev/null)"
done

Run powertop calibration once on AC and once on battery. It helps you spot devices that won’t drop into low-power states.

sudo powertop --calibrate
sudo powertop

Then run a controlled load test with stress-ng :

# 5-minute CPU stress for thermal response curve
stress-ng --cpu "$(nproc)" --cpu-method matrixprod -t 300

Watch clocks and throttle behavior as it runs:

watch -n 2 'grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq | head -n 12; echo; sensors'

If clocks collapse early while package temp spikes fast, you’ve got pushy boost settings or weak heat transfer (dust, dry paste, bad fan curve).

Example Terminal Output: Before and After

Before tuning (common pattern):

CPU Package: 98.0 C
PkgWatt: 39.5 W
Fan: 5300 RPM
scaling_cur_freq (avg): 1.2 GHz
Thermal throttle: active

After policy tuning and cleanup:

CPU Package: 84.0 C
PkgWatt: 26.0 W
Fan: 3900 RPM
scaling_cur_freq (avg): 2.4 GHz
Thermal throttle: inactive

Lower temp with higher sustained frequency is exactly what you want.

TLP: The Comprehensive Power Management Suite

TLP is the most complete laptop power tool on Linux. It does more than CPU governors. It also handles PCIe ASPM, runtime power for PCI devices, radios, and a few battery behaviors.

Install and enable:

# Debian/Ubuntu
sudo apt install -y tlp tlp-rdw
sudo systemctl enable --now tlp

# Arch
sudo pacman -S --needed tlp
sudo systemctl enable --now tlp

Key settings in /etc/tlp.conf for thermal control:

CPU_SCALING_GOVERNOR_ON_AC=powersave
CPU_BOOST_ON_AC=1
CPU_SCALING_MAX_FREQ_ON_AC=85%

CPU_SCALING_GOVERNOR_ON_BAT=powersave
CPU_BOOST_ON_BAT=0
CPU_SCALING_MAX_FREQ_ON_BAT=60%

PCIE_ASPM_ON_AC=default
PCIE_ASPM_ON_BAT=powersupersave
RUNTIME_PM_ON_BAT=auto
WIFI_PWR_ON_BAT=on

Notes:

CPU_BOOST_ON_BAT=0 is the biggest fast win on heat and battery for mobile CPUs.
Capping AC max frequency at 80-90% kills thermal spikes with little real-world slowdown.
PCIe ASPM and runtime PM can cut idle platform power by a few watts. That drops baseline temps.

Check status and applied policy:

sudo tlp-stat -s
sudo tlp-stat -p
sudo tlp-stat -t

If your laptop ships vendor daemons that also touch power states, check for overlaps before you blame TLP.

`auto-cpufreq`: Intelligent CPU Governor Switching

auto-cpufreq focuses on the CPU. It shines when your workload shifts a lot between light and heavy bursts. It watches temp, battery state, and load, then adjusts governor and boost on the fly.

Install and test in monitor mode first:

# Snap path
sudo snap install auto-cpufreq

# Dry-run monitoring
sudo auto-cpufreq --monitor

Enable daemon mode once behavior looks sensible:

sudo auto-cpufreq --install

Example /etc/auto-cpufreq.conf:

[charger]
governor = powersave
turbo = auto

[battery]
governor = powersave
turbo = never

Important conflict rule:

Don’t run TLP CPU scaling and auto-cpufreq CPU logic at the same time without a clear split.
A common stable setup: auto-cpufreq for CPU policy, TLP for everything else.

If you do combine them, write down which tool owns which knob.

Kernel Parameters That Matter for Thermals

Kernel boot params aren’t magic fixes. They can settle down rough platforms when the defaults are poor.

Params worth testing:

processor.max_cstate=<n>: caps the deepest C-state. Handy for diagnosing flaky deep-idle behavior. Set it too low and idle power goes up.
intel_pstate=passive: hands frequency choice to schedutil or the governor instead of the active intel_pstate. Can be more predictable on some Intel systems.
amd_pstate=guided: gives the kernel guided control of AMD P-states. Often a better mix of snappy response and lower heat on newer Ryzen mobile CPUs.

Temporary test via GRUB edit at boot, then persist only if beneficial:

# Example persistent change in /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amd_pstate=guided"

sudo update-grub
sudo reboot

Validate after each change. Compare baseline temp, package power, and fan behavior before you keep a kernel param for good.

Undervolting Intel and AMD CPUs

Undervolting cuts power draw at a given clock by lowering voltage headroom. It works well. Stability testing is a must.

Intel path

Use intel-undervolt where firmware allows it.

sudo apt install -y intel-undervolt
sudo nano /etc/intel-undervolt.conf

Typical starting points (conservative):

undervolt 0 'CPU' -70
undervolt 1 'GPU' -40
undervolt 2 'CACHE' -70
undervolt 3 'SYSTEM_AGENT' 0
undervolt 4 'ANALOG_IO' 0

Apply and test:

sudo systemctl enable --now intel-undervolt
stress-ng --cpu "$(nproc)" --cpu-method all -t 600

AMD path

For Ryzen mobile APUs, power-limit tuning with ryzenadj is often more practical than the classic undervolt sliders.

# Example conservative cap values (hardware-specific)
sudo ryzenadj --stapm-limit=18000 --fast-limit=22000 --slow-limit=20000

These values (in mW) cut sustained package power, and so the temperature too.

BIOS lock caveat

Many laptops block undervolting due to Plundervolt -era fixes. Check the logs:

dmesg | grep -Ei 'undervolt|voltage|plundervolt'

If the firmware blocks it, don’t chase software-only workarounds that weaken security.

GPU Thermal Management (NVIDIA and AMD)

On hybrid laptops, the discrete GPU is often the hidden heat source. You can have low CPU load and still run hot because the dGPU is stuck in a high-power state.

NVIDIA

Set power limits where supported with nvidia-smi :

nvidia-smi --query-gpu=name,power.draw,power.limit,temperature.gpu --format=csv
sudo nvidia-smi -pl 55

For hybrid graphics, use profile switching tools like envycontrol or optimus-manager . Force integrated mode when you don’t need the dGPU. Confirm the runtime state after switching.

AMD dGPU

Inspect and set power performance mode:

cat /sys/class/drm/card0/device/power_dpm_force_performance_level
echo auto | sudo tee /sys/class/drm/card0/device/power_dpm_force_performance_level

Use low or auto based on workload and how snappy you need things.

Hybrid mode sanity check

Integrated only for office and dev battery work
Hybrid for the odd burst of acceleration
Discrete only for sustained gaming or rendering with cooling headroom

If your dGPU temp stays high at idle, check for compositor or driver issues , or a browser or media pipeline pinning it active by accident.

Vendor-Specific Tooling: ThinkPad, MSI, ASUS

Generic Linux power tools aren’t always enough. Vendor controllers can unlock fan curve and platform profile knobs that the generic ACPI paths miss.

ThinkPad: `thinkfan`

thinkfan lets you set a custom fan policy from thermal sensors. It’s useful on systems where the default fan hysteresis is too lazy.

sudo apt install -y thinkfan
sudo systemctl enable --now thinkfan

Tune /etc/thinkfan.conf carefully and test in small steps so you don’t grind the fan bearings.

MSI gaming laptops often expose extra fan and performance knobs through vendor interfaces. isw can manage model-specific features that stock Linux profiles can’t reach.

ASUS: `asus-wmi` and ecosystem tools

The asus-wmi kernel module plus ASUS userspace tools can expose platform profiles and fan controls on ROG and Zephyrus devices.

Verify module presence:

lsmod | grep asus_wmi

If the missing feature is vendor-specific, no amount of generic tlp tuning can replace it.

Sleep/Resume Thermal Issues

Some laptops act normal after a cold boot but overheat after suspend and resume. That’s a distinct class of bug. Debug it on its own.

Common signs:

Fans spin hard after wake with no visible workload
Idle package power doubles after resume
The dGPU won’t go back to its runtime-suspended state

Check journal around resume:

journalctl -b -1 | grep -Ei 'suspend|resume|acpi|thermal|nvidia|amdgpu|iwlwifi'

Practical fixes:

Update kernel and firmware first. Many resume regressions are tied to a specific version.
Reapply power policy on resume with systemd sleep scripts.
Restart services after wake if they hold a high-power state. If certain units restart slowly and stall the resume path, profiling systemd boot timing can pin down which one blocks.
Test s2idle vs deep sleep modes if firmware supports both.

Example sleep hook skeleton:

# /lib/systemd/system-sleep/99-thermal-resume
#!/bin/sh
case "$1/$2" in
  post/*)
    /usr/bin/tlp start >/dev/null 2>&1
    ;;
esac

Make executable:

sudo chmod +x /lib/systemd/system-sleep/99-thermal-resume

Physical and BIOS-Level Fixes

When software is tuned and temps are still too high, you’re probably stuck on heat transfer or airflow.

High-impact physical actions:

Heatsink and fan cleaning. Dust raises sustained temps a lot.
Thermal paste swap. Worth doing on any laptop older than two years.
Fan integrity check. Worn bearings cut airflow even if RPM looks fine.
Intake and exhaust path check. Soft surfaces choke intake.

BIOS and UEFI actions:

Update firmware first.
Use the balanced or cool platform profile if it exists.
Review fan curve controls when they’re exposed.
Reset any extreme performance preset you set by accident.

A laptop stand that opens up airflow under the chassis can drop steady-state surface and internal temps for almost nothing.

Troubleshooting Matrix: Symptoms to Root Cause and Fix

Symptom	Most likely root cause	Verify with	First fix
Hot at idle (70 C+)	Shallow C-states, background wakeups	`turbostat`, `powertop`	Governor/pstate tuning, disable noisy wake sources
Fans loud after resume	Resume power-policy drift	`journalctl`, `turbostat` post-wake	Resume hook to reapply TLP/CPU policy
CPU throttles in short bursts	Boost overshoot vs cooling limit	`stress-ng` + live sensors	Reduce boost/freq cap, improve fan curve
GPU hot while browsing	dGPU pinned active in hybrid setup	`nvtop`, `nvidia-smi`, DRM runtime state	Integrated mode, fix app forcing dGPU
Still throttling after tuning	Poor thermal transfer	Load test plus physical inspection	Clean heatsink, repaste, check fan health

This table helps you avoid random tweaking. Always move from a measured symptom to a targeted fix.

Special Cases: ARM Laptops and Data Center Hosts

Thermal advice for x86 laptops doesn’t map cleanly to every system.

ARM laptops (including Asahi Linux on Apple Silicon)

Asahi uses a different thermal and power stack from regular ACPI x86 laptops. The tools and interfaces differ. x86 advice like intel_pstate or ryzenadj doesn’t apply.

Focus on:

Asahi-specific kernel and userspace updates
Performance profiles the platform actually supports
Workload-level tweaks: browser and video codec paths, GPU acceleration settings

Follow the Asahi project docs before any generic Linux laptop tweaks.

Data center and rack systems

Server thermals are mostly airflow and power budget problems, not laptop governor problems. Use BMC and IPMI telemetry, rack airflow planning, and platform-level CPU power caps.

Use laptop-style tuning only on actual mobile platforms. On servers, tune fan zones, inlet temperature, and workload placement.

A Safe Implementation Order

If you want a practical plan, use this order:

Update BIOS or UEFI, and the kernel.
Capture baseline metrics: idle, load, post-resume.
Pick one CPU policy manager (tlp or auto-cpufreq) and tune it carefully.
Fix the dGPU runtime power state if you’ve got hybrid graphics.
Test kernel param changes one at a time.
Apply undervolt or power-limit tuning only after baseline policy is stable.
If severe symptoms remain, clean and repaste.

Re-measure after each step. Thermal work is engineering, not folklore.

Conclusion

Fixing Linux laptop heat is less about a magic command and more about order. You diagnose first, rate the severity, tune policy with care, then move up to firmware and hardware only when the data says so. The fastest path to a cooler machine is steady measurement: package power, temps, fan behavior, and throttle status before and after each change.

For most users, three actions do the heavy lifting. Pick a sane CPU policy. Make sure the discrete GPU can truly sleep. Look after the physical cooling. Add vendor tools when your platform exposes extra knobs. Treat suspend and resume as their own branch when the symptoms point there. With that method, Linux laptops can run cool, quiet, and stable even under heavy load.