May 16, 20267 min readSanoLabs Editorial

Apple Watch Sleep Stages vs. Polysomnography: How Accurate Is the Staging?

Apple Watch estimates sleep stages using wrist accelerometry, not EEG. Apple's validation data shows a kappa of 0.68 — better than actigraphy, not equivalent to a sleep clinic.

apple-watchsleep-trackingsleep-stagespolysomnographyhrvaccelerometryaccuracywellness

On this page

TL;DR

Apple Watch has tracked sleep since 2020, and since watchOS 9 (September 2022) it has estimated four sleep states: Awake, REM, Core, and Deep. The algorithm uses a wrist accelerometer — not an EEG — and classifies each 30-second window based on motion patterns including breathing-induced movement. Apple's own validation against expert-scored polysomnography found a four-stage kappa of 0.68 with the updated 2025 algorithm — substantially better than traditional actigraphy, and useful for tracking trends over weeks and months. But it is not equivalent to a clinical sleep study, and Apple explicitly states the feature is not intended for clinical use.

What Apple Watch actually measures

Sleep staging in a clinical sleep laboratory relies on polysomnography (PSG): a comprehensive sensor array including electroencephalography (EEG) to capture brain activity, electro-oculography (EOG) to detect eye movements, electromyography (EMG) for muscle tone, respiratory sensors, and pulse oximetry. Trained sleep technologists visually inspect these combined signals to assign a stage label to every 30-second window of the night, following the American Academy of Sleep Medicine (AASM) scoring rules.

Apple Watch uses none of those sensors for sleep staging. Instead, it relies on its 3-axis accelerometer — the same sensor that tracks steps, detects falls, and monitors activity. The algorithm reads the continuous accelerometer signal and extracts not only gross body movement but also subtle oscillations caused by breathing. These respiration-induced motion patterns carry information about physiological state that an ordinary activity tracker's step-counting algorithm would discard. Each 30-second epoch is classified into one of four outputs: Awake, REM, Core, or Deep.

This approach is not simply a limitation — it is a deliberate design choice that makes continuous multi-night tracking possible in a way that PSG never could be. Apple's white paper on sleep staging notes that the accelerometer-based method outperforms traditional wrist actigraphy, which uses binned activity counts rather than the high-frequency signal Apple's algorithm processes.

How the four stages map to PSG's five

Clinical PSG defines five stages: Awake, N1, N2, N3, and REM. Apple Watch outputs four. The mapping is:

Apple Watch	PSG equivalent	Description
Awake	Awake	Waking state during the sleep period
REM	REM	Rapid eye movement sleep
Deep	N3	Slow-wave or deep sleep
Core	N1 + N2	Light and intermediate sleep combined

The name "Core" — rather than "Light" — was a deliberate choice. N2 typically accounts for more than 50% of a full night's sleep. Calling it "Light" could imply it is less important or less restorative, which the sleep science does not support. N2 contains sleep spindles and K-complexes and is a normal, central feature of healthy sleep architecture. Apple chose Core to convey that this is the foundational majority of a night's sleep, not a thin or inferior layer.

The practical implication: if your Apple Watch shows a night with a large percentage of Core sleep and a small percentage of Deep, that is not necessarily cause for alarm. It may simply reflect normal sleep architecture.

What the accuracy data actually shows

Apple published a detailed white paper documenting the algorithm's development and validation — and updated it in October 2025 to include improvements made in 2024 and 2025. The numbers reported here come from Apple's own validation study, which compared Apple Watch against expert-scored PSG in an independent participant group not used in algorithm training.

Original algorithm (watchOS 9, 2022): The validation set included 166 participants and 299 recorded nights. The four-stage kappa was 0.63 (standard deviation 0.13). Binary sleep/wake classification achieved a median sensitivity of 97.9% — meaning nearly all true sleep epochs were correctly identified — and a specificity of 75.0%, meaning three quarters of true awake epochs were correctly classified.

Updated algorithm (watchOS 26 / iOS 26, 2025): Using foundation models trained on data from the Apple Heart and Movement Study, Apple improved the algorithm's handling of "quiet wake" — the state of lying still just before falling asleep or just after waking, which is particularly difficult to distinguish from light sleep. In the same validation dataset, the updated algorithm achieved a four-stage kappa of 0.68 (SD 0.11) and improved wake detection from 70% correct to 79% correct. Binary sensitivity shifted slightly to 96.8% and specificity improved to 78.9%.

In a separate cohort of clinical patients undergoing PSG as part of actual clinical care — a more challenging population that includes people with sleep disorders — the updated algorithm achieved a kappa of 0.66, binary sensitivity of 95.0%, and specificity of 86.5%.

What kappa of 0.68 actually means

Cohen's kappa corrects for the agreement that would occur by chance alone. A kappa of 0 means the classifier does no better than random assignment; 1.0 is perfect agreement. The commonly used Landis and Koch scale places 0.61–0.80 in the "substantial" category.

For context: inter-rater agreement between trained PSG technologists scoring the same recording independently typically sits around 0.76–0.83 depending on the study. That is the ceiling against which Apple Watch is implicitly competing — because PSG technologists themselves do not agree perfectly, especially at stage boundaries.

Apple's white paper notes explicitly that the most common Apple Watch misclassification — Deep sleep being scored as Core — mirrors the most common disagreement between human PSG raters. The errors are not random; they are concentrated at the boundaries between physiologically adjacent stages, which is exactly where even expert humans disagree most.

This does not make the errors irrelevant. A four-stage kappa of 0.68 means that a meaningful minority of epochs are misclassified. If you are trying to measure precise deep sleep minutes on a specific night, the number Apple Watch gives you is an estimate with non-trivial uncertainty. But if you are tracking whether your deep sleep percentage has consistently declined over the past month, the directional signal is meaningful and more reliable than any single night's snapshot.

A note on the different accuracy numbers you may see quoted elsewhere: kappa, per-epoch sensitivity, and total time-per-stage error all measure slightly different things. The kappa above is a chance-corrected agreement statistic across the four-stage classification. Other studies report binary sleep/wake sensitivity (typically ~95–98%) and total per-stage duration error (an independent 2024 study found ~43 minutes mean underestimate of deep sleep on Series 8). Those figures look different because they are different metrics, not because they contradict each other — context on what they each measure and where they come from is covered in sleep duration is a misleading health metric and the comparison summary in the best sleep tracking apps for Apple Watch in 2026.

What it is genuinely useful for

The practical strength of Apple Watch sleep staging is not that any single night is precisely measured. It is that you accumulate months of consecutive nights in a way that no clinical sleep study could provide.

PSG in a sleep laboratory typically captures one to three nights. It is expensive, disruptive (sensors glued to your scalp, cables, unfamiliar environment), and often affected by the "first-night effect" — the documented phenomenon of sleeping worse in a lab than at home. Apple Watch captures your actual sleep in your own bed, every night, over months.

Across that longer window, the signal-to-noise ratio improves. If your REM percentage drops consistently in weeks when work stress is high, if Deep sleep declines after multiple nights of alcohol, if your total time asleep shifts with season or schedule — these trends emerge clearly from imperfect but consistent data.

The Apple Support page also notes that the Health app offers a Sleep Score (0–100) combining sleep duration, bedtime consistency, and interruptions. That score is distinct from stage quality — it does not reward deep sleep over core sleep, but tracks the structural consistency and duration of your sleep across nights.

What it cannot tell you

Apple Watch sleep staging is not a diagnostic tool for sleep disorders. Sleep apnea, insomnia, restless legs syndrome, and narcolepsy all require clinical evaluation with validated tools. Apple's white paper explicitly states that sleep stages are "not intended for clinical use." The feature's own validation showed lower kappa in clinical populations — patients with sleep disorders are harder to classify correctly, and the Apple Watch algorithm has lower accuracy in those groups.

If you have symptoms of a sleep disorder — excessive daytime sleepiness, witnessed pauses in breathing, restless legs, or persistent difficulty initiating or maintaining sleep — the appropriate step is a referral to a sleep medicine specialist, not closer inspection of your Apple Watch sleep chart.

What Apple Watch can do is provide a prompt: if your data consistently shows fragmented sleep, frequent awake epochs, or unusually low REM over weeks, that pattern is a reasonable basis for a conversation with a healthcare professional. It is a wellness signal, not a diagnosis.

Where Sam Health fits in

Sam surfaces your Apple Watch sleep stage timeline alongside your other overnight signals — HRV, resting heart rate, wrist temperature, respiratory rate — so you can see how your sleep structure relates to the physiological state your body was in that night. A night of fragmented Core sleep reads differently alongside a suppressed HRV and elevated resting heart rate than it does on its own. You can explore the full sensor picture in our complete Apple Watch sensor breakdown for 2026.

Try Sam Health

Sources

Track your sleep on Apple Watch and use Sleep on iPhone — Apple Support, updated April 2026. Accessed 16 May 2026.
Estimating Sleep Stages from Apple Watch — Apple Inc. white paper, updated October 2025. Accessed 16 May 2026.
HKCategoryValueSleepAnalysis — Apple Developer Documentation. Accessed 16 May 2026.

Frequently Asked Questions

How does Apple Watch detect sleep stages?+

Apple Watch uses a 3-axis accelerometer to detect movements — including subtle breathing-induced motion — and classifies each 30-second window into one of four states: Awake, REM, Core, or Deep sleep. It does not use EEG, EOG, or EMG, which are the sensors used in clinical polysomnography.

What do Core, Deep, and REM mean on Apple Watch?+

Deep corresponds to PSG stage N3 (slow-wave sleep). REM corresponds to clinical REM sleep. Core covers both N1 and N2 — what some trackers call 'light sleep.' Apple chose the name Core rather than Light because N2 makes up more than 50% of a typical night and is a normal, important stage of sleep physiology.

How accurate is Apple Watch sleep staging compared to a sleep study?+

Apple's own validation, comparing Apple Watch to expert-scored polysomnography in 166 participants across 299 nights, found a four-stage kappa of 0.68 with the updated 2025 algorithm. That is considered substantial agreement. It outperforms traditional wrist actigraphy but is not equivalent to a clinical sleep study.

What is kappa and why does it matter for sleep tracking accuracy?+

Cohen's kappa is a measure of agreement between two raters or systems that corrects for the agreement expected by chance alone. A kappa of 0 means no better than chance; 1.0 is perfect agreement. A kappa of 0.68 falls in the 'substantial' range (0.61–0.80). For context, inter-rater agreement between trained PSG technologists scoring the same recordings typically sits around 0.76–0.83.

Which sleep stages does Apple Watch most commonly get wrong?+

The most common misclassification is Deep sleep being assigned to Core sleep — physiologically adjacent stages with overlapping motion patterns. Very rare errors include mistaking Deep for Awake (0.13% of epochs) or REM for Deep (0.28%). Most errors move between neighbouring stages rather than across very different ones.

Does Apple Watch sleep staging work for naps?+

From watchOS 11 and iOS 18 onwards, Apple Watch attempts sleep tracking outside of your set sleep schedule. Sessions between one and three hours receive binary sleep/wake data only. Sessions longer than three hours receive full four-stage classification — Core, Deep, REM, and Awake.

Can Apple Watch diagnose sleep apnea or insomnia through sleep staging?+

No. Apple states explicitly that Apple Watch sleep stages are not intended for clinical use. Sleep apnea diagnosis requires a full clinical sleep study (polysomnography or a validated home sleep apnea test). Apple Watch can track sleep patterns as a wellness tool, but it is not a diagnostic device.

Is Apple Watch sleep tracking better or worse than a Fitbit or Oura Ring?+

All consumer wearables face the same fundamental constraint: they use wrist-based sensors rather than EEG. Apple's published validation data (kappa 0.68) is notably transparent. Independent head-to-head comparisons produce varying results depending on population and methodology. Apple explicitly states its accuracy is better than traditional wrist actigraphy, which typically achieves specificity of 50% or lower.