Linux Thermal Management: Fix Laptop Overheating

Contents

Laptop overheating on Linux is rarely one single bug. In practice, it is a stack problem: firmware behavior, kernel power-state policy, CPU governor choices, discrete GPU runtime power, and plain old dust in the heatsink all interact. The good news is that Linux gives you excellent visibility into every layer of that stack. If you approach thermal issues in a structured order, you can usually cut sustained temperatures by 8 to 20 C, reduce fan noise, and improve battery runtime without making your laptop feel slow.

This guide is written as a full workflow, not a random list of tweaks. You will start with prerequisites and baseline measurements, classify the severity of the problem, and then apply fixes in the right order: software first, firmware/kernel next, and hardware maintenance last when needed.

Prerequisites and Baseline

Before changing anything, make sure your platform is capable of sane thermal control under Linux. If prerequisites are not met, you can spend hours tuning userspace tools while the real issue is an unsupported firmware path.

Minimum baseline for this guide:

Kernel: 6.8+ recommended, 6.10+ preferred for newer Intel/AMD mobile platforms
Userspace: lm-sensors , powertop, turbostat (Intel), nvtop or radeontop (GPU visibility)
Firmware: latest BIOS/UEFI from vendor
Power profile service: choose one primary CPU policy tool (tlp or auto-cpufreq ) to avoid conflicts

Install core diagnostics:

# Debian/Ubuntu
sudo apt update
sudo apt install -y lm-sensors powertop linux-tools-common linux-tools-generic \
  stress-ng htop iotop nvtop

# Arch
sudo pacman -Syu --needed lm_sensors powertop linux-tools stress-ng htop iotop nvtop

Run sensor detection once:

sudo sensors-detect --auto
sensors

Capture a baseline so you can prove improvements later:

# 5-second interval, 24 samples (~2 minutes)
watch -n 5 'sensors; echo; cat /sys/class/thermal/thermal_zone*/temp 2>/dev/null'

Record:

Metric	Idle (plugged in)	Load (10 min compile/stress)
CPU package temp
GPU temp
Fan RPM
CPU package power (W)
Throttling observed	Yes/No	Yes/No

If you skip this, you will not know whether a tweak helped or just changed fan acoustics.

Severity Triage: Mild, Moderate, Severe

Treat thermal work like incident triage. Not every machine needs undervolting or repasting.

Severity	Typical symptoms	Likely class of fix	Start here
Mild	Warm chassis, occasional fan ramps, no throttle	Software policy tuning	TLP or auto-cpufreq, governor/power profile checks
Moderate	Frequent loud fans, sustained 90 C+ CPU under routine workloads	Mixed software + firmware	CPU boost limits, kernel params, GPU runtime PM, fan curve tuning
Severe	Immediate thermal throttle, shutdowns, hot at idle	Hardware intervention required	Dust cleanup, paste/pad replacement, fan integrity, heatsink seating

A common mistake is applying severe-level hardware changes to a mild-level issue, or the opposite: endlessly tweaking kernel parameters on a laptop with a nearly blocked heatsink.

Why Linux Laptops Overheat (The Root Causes)

Linux laptops run hotter than Windows on some models because OEMs often ship vendor-specific thermal logic that is better integrated with their own Windows control stack. On Linux, generic kernel interfaces usually work well, but edge cases appear when ACPI tables, WMI hooks, or fan controllers are exposed only partially.

The most common root causes are:

ACPI/WMI feature gaps: thermal zones exist, but vendor fan and boost controls are incomplete
Poor C-state residency: CPU fails to spend enough time in deep idle states
Aggressive turbo behavior on AC power: sustained high clocks beyond the cooling system’s realistic envelope
dGPU staying active: NVIDIA/AMD discrete GPU never fully enters low-power runtime state
Suspend/resume regressions: post-resume power policy drifts into high-heat behavior

Use turbostat to confirm whether your CPU is idling correctly. If package C-states are shallow and package power remains elevated at idle, your thermal issue begins before you launch a single app.

sudo turbostat --Summary --show Core,CPU,Avg_MHz,Bzy_MHz,PkgWatt,PkgTmp,POLL,Busy%,C1%,C6%,C10 -i 2

On healthy Intel mobile systems, idle C10 residency should be significant during true idle. If C10 is near zero and package power stays high, investigate background wakeups, governor policy, and kernel power parameters.

Diagnosing Your Thermal Problem

Diagnosis is the highest-leverage phase. You are trying to answer three questions:

Which component is generating most heat (CPU, GPU, storage, VRM)?
Is the heat caused by load, policy, or hardware transfer inefficiency?
Does the issue appear all the time, or only after events like suspend/resume?

Start with sensor and process visibility. Useful tools here include htop and iotop :

# CPU, board, and fan sensors
sensors

# CPU load and runaway processes
htop

# Per-process disk activity (can cause CPU package wakeups)
sudo iotop -oPa

# GPU utilization and memory clocks
nvtop

For thermal zones:

for z in /sys/class/thermal/thermal_zone*; do
  echo "$(basename "$z") type=$(cat "$z/type" 2>/dev/null) temp_mC=$(cat "$z/temp" 2>/dev/null)"
done

Run powertop calibration once on AC and once on battery. It helps identify devices that refuse low-power states.

sudo powertop --calibrate
sudo powertop

Then perform a controlled load test with stress-ng :

# 5-minute CPU stress for thermal response curve
stress-ng --cpu "$(nproc)" --cpu-method matrixprod -t 300

Watch clock and throttle behavior during the run:

watch -n 2 'grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq | head -n 12; echo; sensors'

If clocks collapse early while package temperature spikes rapidly, you likely have either over-aggressive boost settings or insufficient cooling transfer (dust/paste/fan curve).

Example Terminal Output: Before and After

Before tuning (common pattern):

CPU Package: 98.0 C
PkgWatt: 39.5 W
Fan: 5300 RPM
scaling_cur_freq (avg): 1.2 GHz
Thermal throttle: active

After policy tuning and cleanup:

CPU Package: 84.0 C
PkgWatt: 26.0 W
Fan: 3900 RPM
scaling_cur_freq (avg): 2.4 GHz
Thermal throttle: inactive

Lower temperature with higher sustained frequency is exactly what you want.

TLP: The Comprehensive Power Management Suite

TLP remains the most complete laptop power-management toolkit for Linux because it configures more than CPU governors. It also handles PCIe ASPM, runtime power management for PCI devices, radio devices, and several battery-related behaviors.

Install and enable:

# Debian/Ubuntu
sudo apt install -y tlp tlp-rdw
sudo systemctl enable --now tlp

# Arch
sudo pacman -S --needed tlp
sudo systemctl enable --now tlp

Key settings in /etc/tlp.conf for thermal control:

CPU_SCALING_GOVERNOR_ON_AC=powersave
CPU_BOOST_ON_AC=1
CPU_SCALING_MAX_FREQ_ON_AC=85%

CPU_SCALING_GOVERNOR_ON_BAT=powersave
CPU_BOOST_ON_BAT=0
CPU_SCALING_MAX_FREQ_ON_BAT=60%

PCIE_ASPM_ON_AC=default
PCIE_ASPM_ON_BAT=powersupersave
RUNTIME_PM_ON_BAT=auto
WIFI_PWR_ON_BAT=on

Notes:

CPU_BOOST_ON_BAT=0 usually gives the biggest immediate heat and battery improvement on mobile CPUs.
Limiting AC max frequency to 80-90% often removes thermal spikes with small real-world performance impact.
PCIe ASPM and runtime PM can reduce idle platform power by several watts, which directly lowers baseline temperatures.

Check status and applied policy:

sudo tlp-stat -s
sudo tlp-stat -p
sudo tlp-stat -t

If your laptop includes vendor daemons that also manipulate power states, review for overlaps before blaming TLP.

`auto-cpufreq`: Intelligent CPU Governor Switching

auto-cpufreq focuses on CPU behavior and can be excellent when your workload shifts frequently between light and heavy bursts. It monitors temperature, battery status, and load, then adjusts governor and boost policy dynamically.

Install and test in monitor mode first:

# Snap path
sudo snap install auto-cpufreq

# Dry-run monitoring
sudo auto-cpufreq --monitor

Enable daemon mode once behavior looks sensible:

sudo auto-cpufreq --install

Example /etc/auto-cpufreq.conf:

[charger]
governor = powersave
turbo = auto

[battery]
governor = powersave
turbo = never

Important conflict rule:

Do not run TLP CPU scaling logic and auto-cpufreq CPU logic simultaneously without a clear split.
Common stable setup: auto-cpufreq for CPU policy, TLP for non-CPU subsystems.

If you combine them, document exactly which tool owns which parameters.

Kernel Parameters That Matter for Thermals

Kernel boot parameters are not magic fixes, but they can stabilize problematic platforms when defaults are suboptimal.

Relevant parameters to evaluate:

processor.max_cstate=<n>: limits deepest C-state; useful for diagnosing unstable deep-idle behavior, but can increase idle power if set too low
intel_pstate=passive: hands frequency selection to schedutil/governor logic instead of full active intel_pstate; can improve predictability on some Intel systems
amd_pstate=guided: gives kernel guided control for AMD P-state behavior; often improves balance between responsiveness and heat on newer Ryzen mobile CPUs

Temporary test via GRUB edit at boot, then persist only if beneficial:

# Example persistent change in /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amd_pstate=guided"

sudo update-grub
sudo reboot

Validation is mandatory after each change. Compare baseline temperature, package power, and fan behavior before keeping any kernel parameter long-term.

Undervolting Intel and AMD CPUs

Undervolting lowers power draw at a given clock by reducing voltage headroom. It can be highly effective, but stability testing is non-negotiable.

Intel path

Use intel-undervolt where firmware allows it.

sudo apt install -y intel-undervolt
sudo nano /etc/intel-undervolt.conf

Typical starting points (conservative):

undervolt 0 'CPU' -70
undervolt 1 'GPU' -40
undervolt 2 'CACHE' -70
undervolt 3 'SYSTEM_AGENT' 0
undervolt 4 'ANALOG_IO' 0

Apply and test:

sudo systemctl enable --now intel-undervolt
stress-ng --cpu "$(nproc)" --cpu-method all -t 600

AMD path

For Ryzen mobile APUs, power-limit tuning with ryzenadj is often more practical than classic undervolt sliders.

# Example conservative cap values (hardware-specific)
sudo ryzenadj --stapm-limit=18000 --fast-limit=22000 --slow-limit=20000

These values (mW) reduce sustained package power and therefore temperature.

BIOS lock caveat

Many laptops block undervolting because of Plundervolt -era mitigations. Check logs:

dmesg | grep -Ei 'undervolt|voltage|plundervolt'

If blocked at firmware level, do not chase software-only workarounds that compromise security.

GPU Thermal Management (NVIDIA and AMD)

On hybrid laptops, the discrete GPU is frequently the hidden thermal source. You can have low CPU load and still run hot because the dGPU is stuck in a high-power state.

NVIDIA

Set power limits where supported with nvidia-smi :

nvidia-smi --query-gpu=name,power.draw,power.limit,temperature.gpu --format=csv
sudo nvidia-smi -pl 55

For hybrid graphics, use profile switching tools (envycontrol , optimus-manager ) to force integrated mode when dGPU is unnecessary. Confirm runtime state after switching.

AMD dGPU

Inspect and set power performance mode:

cat /sys/class/drm/card0/device/power_dpm_force_performance_level
echo auto | sudo tee /sys/class/drm/card0/device/power_dpm_force_performance_level

Use low or auto depending on workload and responsiveness needs.

Hybrid mode sanity check

Integrated-only for office/dev battery workflows
Hybrid for occasional acceleration
Discrete-only for sustained gaming/rendering with cooling headroom

If your dGPU temperature remains elevated at idle, check for compositor, browser, or media pipeline accidentally pinning it active.

Vendor-Specific Tooling: ThinkPad, MSI, ASUS

Generic Linux power tools are not always enough. Vendor-specific controllers can unlock fan curve and platform profile controls that ACPI generic paths miss.

ThinkPad: `thinkfan`

thinkfan allows custom fan policy based on thermal sensors. It is useful on systems where default fan hysteresis is too conservative.

sudo apt install -y thinkfan
sudo systemctl enable --now thinkfan

Tune /etc/thinkfan.conf carefully and test incrementally to avoid excessive fan wear.

MSI gaming laptops often expose extra fan/performance controls through vendor interfaces. isw can help manage model-specific capabilities where stock Linux profiles are insufficient.

ASUS: `asus-wmi` and ecosystem tools

The asus-wmi kernel module plus ASUS-focused userspace tooling can expose platform profiles and fan behavior controls on ROG/Zephyrus devices.

Verify module presence:

lsmod | grep asus_wmi

If missing functionality is vendor-specific, no amount of generic tlp tuning can fully replace it.

Sleep/Resume Thermal Issues

Some laptops behave normally after cold boot but overheat after suspend/resume. This is a distinct class of issue and should be debugged separately.

Common signs:

Fans spin aggressively after wake with low visible workload
Idle package power doubles post-resume
dGPU fails to return to runtime-suspended state

Check journal around resume:

journalctl -b -1 | grep -Ei 'suspend|resume|acpi|thermal|nvidia|amdgpu|iwlwifi'

Practical fixes:

Update kernel/firmware first (many resume regressions are version-specific)
Reapply power policy hooks on resume with systemd sleep scripts
Restart specific services after wake if they hold high-power state
Test s2idle vs deep sleep modes where firmware supports both

Example sleep hook skeleton:

# /lib/systemd/system-sleep/99-thermal-resume
#!/bin/sh
case "$1/$2" in
  post/*)
    /usr/bin/tlp start >/dev/null 2>&1
    ;;
esac

Make executable:

sudo chmod +x /lib/systemd/system-sleep/99-thermal-resume

Physical and BIOS-Level Fixes

When software is tuned and temperatures are still excessive, you are likely constrained by heat transfer or airflow.

High-impact physical actions:

Heatsink and fan cleaning: dust blockage can raise sustained temps dramatically
Thermal paste replacement: especially valuable on laptops older than two years
Fan integrity check: worn bearings reduce airflow even if RPM appears nonzero
Intake/exhaust path validation: avoid soft surfaces that choke intake

BIOS/UEFI actions:

Update firmware first
Use balanced/cool platform profile if available
Review fan curve controls when exposed
Reset accidental extreme performance presets

A laptop stand that opens underside airflow can lower steady-state surface and internal temperatures with almost no cost.

Troubleshooting Matrix: Symptoms to Root Cause and Fix

Symptom	Most likely root cause	Verify with	First fix
Hot at idle (70 C+)	Shallow C-states, background wakeups	`turbostat`, `powertop`	Governor/pstate tuning, disable noisy wake sources
Fans loud after resume	Resume power-policy drift	`journalctl`, `turbostat` post-wake	Resume hook to reapply TLP/CPU policy
CPU throttles in short bursts	Boost overshoot vs cooling limit	`stress-ng` + live sensors	Reduce boost/freq cap, improve fan curve
GPU hot while browsing	dGPU pinned active in hybrid setup	`nvtop`, `nvidia-smi`, DRM runtime state	Integrated mode, fix app forcing dGPU
Still throttling after tuning	Poor thermal transfer	Load test plus physical inspection	Clean heatsink, repaste, check fan health

This table helps avoid random tweaking. Always move from measured symptom to targeted fix.

Special Cases: ARM Laptops and Data Center Hosts

Thermal guidance on x86 laptops does not map cleanly to all systems.

ARM laptops (including Asahi Linux on Apple Silicon)

Asahi uses a different thermal and power-management stack than conventional ACPI x86 laptops. Tools and interfaces differ, and x86-centric advice like intel_pstate or ryzenadj is irrelevant.

Focus on:

Asahi-specific kernel and userspace updates
Platform-supported performance profiles
Workload-level optimizations (browser/video codec paths, GPU acceleration settings)

Apply documentation from the Asahi project before generic Linux laptop tweaks.

Data center and rack systems

Server thermals are primarily airflow engineering and power budgeting problems, not laptop governor problems. Use BMC/IPMI telemetry, rack airflow planning, and CPU power caps at platform level.

Use laptop-style tuning only on actual mobile platforms. On servers, optimize fan zones, inlet temperature, and workload placement.

A Safe Implementation Order

If you want a practical execution plan, use this order:

Update BIOS/UEFI and kernel.
Capture baseline metrics (idle, load, post-resume).
Choose one CPU policy manager (tlp or auto-cpufreq) and tune conservatively.
Fix dGPU runtime power state if hybrid graphics are used.
Test kernel parameter adjustments one at a time.
Apply undervolt/power-limit tuning only after baseline policy is stable.
If severe symptoms remain, perform physical cleaning and paste replacement.

After each step, re-measure. Thermal work is engineering, not folklore.

Conclusion

Fixing Linux laptop overheating is less about a single miracle command and more about sequencing. You diagnose first, classify severity, apply policy tuning carefully, then escalate to firmware and hardware interventions only when data shows you need them. The fastest path to a cooler machine is disciplined measurement: package power, temperatures, fan behavior, and throttle status before and after each change.

For most users, the biggest wins come from three actions: selecting a sane CPU policy strategy, ensuring the discrete GPU can truly sleep when idle, and addressing physical cooling health. Add vendor-specific tools when your platform exposes extra controls, and treat suspend/resume as its own troubleshooting branch when symptoms point there. With that method, Linux laptops can run cool, quiet, and stable even under demanding workloads.