Foundation Models for Wearable Health: Beyond Raw Sensor Data
Researchers at Apple and USC have developed a groundbreaking foundation model that processes behavioral data from wearables rather than just raw sensor readings. Using over 2.5 billion hours of data from 162,000 participants, their Wearable Behavior Model (WBM) demonstrates superior performance in health prediction tasks, particularly those involving sleep, injury, and behavioral patterns. The model's success stems from focusing on higher-level behavioral metrics that align with physiologically relevant timescales, proving that behavioral data can complement traditional sensor-based approaches for comprehensive health monitoring.
Foundation Models for Wearable Health: Beyond Raw Sensor Data
The field of wearable health technology is experiencing a paradigm shift. While most research has focused on processing raw sensor data from devices like smartwatches, a new approach is emerging that promises more accurate and actionable health predictions by modeling human behavior patterns instead.
The Behavioral Data Revolution
A recent breakthrough study from Apple and USC researchers introduces a novel foundation model that processes behavioral data from wearables rather than raw sensor readings. This approach represents a fundamental shift in how we think about health prediction from wearable devices.
The research team, led by researchers at Apple Inc. and USC, developed the Wearable Behavior Model (WBM) using an unprecedented dataset: over 2.5 billion hours of wearable data from 162,000 participants in the Apple Heart and Movement Study. This scale of data collection enables the creation of robust foundation models that can generalize across diverse health detection tasks.
Why Behavioral Data Matters
Traditional approaches to wearable health monitoring have primarily focused on low-level sensor data like photoplethysmogram (PPG), electrocardiogram (ECG), and accelerometer readings. While valuable, these signals present several limitations:
- Temporal Misalignment: Raw sensor data operates at second-level timescales, while health conditions typically manifest over days or weeks
- Inconsistent Availability: Low-level sensors aren't consistently active throughout the day
- Limited Context: Raw signals lack the behavioral context that's crucial for many health predictions
Behavioral data, by contrast, captures higher-level metrics that are:
- Validated through carefully designed algorithms
- Aligned with physiologically relevant timescales
- Sensitive to individual behaviors rather than just physiology
- Available consistently throughout daily activities
The Technical Challenge
Building a foundation model for behavioral wearable data presents unique technical challenges that don't exist with traditional sensor data:
Data Irregularity
Unlike uniform sensor streams, behavioral data exhibits:
- Variable sampling rates across different metrics
- Significant amounts of missing data
- Irregular sampling patterns based on user behavior
Model Architecture Innovation
To address these challenges, the researchers conducted a systematic exploration of different approaches:
Tokenization Strategies:
- TST (Time Series Transformer): Creates dense hourly matrices with global mean imputation
- mTAN (Multi-Time Attention): Uses attention mechanisms to handle missing data
- Tuple: Treats each observation as a time-variable-value tuple
Backbone Architectures:
- Self-attention Transformers with positional encodings
- Rotary Transformers for relative position encoding
- Mamba-2 state-space models for continuous time modeling
Surprisingly, the best-performing combination was the relatively simple TST tokenization with a bi-directional Mamba-2 architecture, challenging assumptions about the necessity of complex missing data handling in this domain.
Comprehensive Health Detection
The WBM model was evaluated on 57 different health-related tasks, spanning both static and dynamic health predictions:
Static Health Detection
- Demographic prediction (age, sex)
- Medical history classification
- Medication usage detection
- Chronic condition identification
Dynamic Health Monitoring
- Sleep quality prediction
- Respiratory infection detection
- Pregnancy monitoring
- Injury state detection
- Diabetes status tracking
Key Performance Insights
The research reveals several crucial findings about the relative strengths of behavioral versus sensor data:
Behavioral Data Excels in Behavior-Driven Tasks
WBM significantly outperformed PPG-based models in tasks where behavioral patterns provide strong signals:
- Sleep prediction: Behavioral data captures overnight activity patterns more comprehensively than sporadic PPG measurements
- Injury detection: Gait and mobility metrics reveal changes that aren't apparent in heart rate data
- Pregnancy monitoring: Activity patterns and exercise behaviors change substantially during pregnancy
Sensor Data Dominates Physiological Tasks
PPG models maintained superior performance in tasks where direct physiological measurements are sufficient:
- Diabetes detection: Blood glucose variations are more directly reflected in cardiovascular signals
- Antidepressant usage: Heart rate variability provides strong indicators of medication effects
Combined Models Achieve Optimal Performance
The most significant finding is that combining behavioral and sensor data creates synergistic effects:
- 42 out of 47 tasks showed best performance with combined models
- Median AUROC improvement of 0.009 across tasks
- Particularly strong gains in complex conditions like atrial fibrillation (0.034 AUROC improvement)
Implications for AI Healthcare
This research has profound implications for the future of AI-driven healthcare:
Foundation Model Paradigm
The success of WBM demonstrates that foundation models can be effectively adapted to health domains beyond traditional NLP and computer vision applications. The key insight is that the foundation model approach works best when the input data is carefully chosen to align with the target prediction tasks.
Multimodal Health AI
The complementary nature of behavioral and sensor data suggests that future health AI systems should be inherently multimodal, combining different types of wearable data to achieve comprehensive health monitoring.
Temporal Health Modeling
The focus on behavioral timescales (hours to weeks) rather than sensor timescales (seconds to minutes) represents a more clinically relevant approach to health prediction that aligns with how health conditions actually develop and manifest.
Technical Architecture Deep Dive
The final WBM architecture processes weekly windows of behavioral data through several key components:
- Data Preprocessing: Aggregates 27 behavioral health metrics at hourly intervals
- Tokenization: Creates 168×54 matrices (hours × features+missingness indicators)
- Patch Embedding: Projects each hour as a patch using linear layers
- Mamba-2 Processing: Bi-directional state-space model processes temporal sequences
- Temporal Aggregation: Averages outputs across time for final embeddings
Challenges and Future Directions
Despite its success, the research identifies several important limitations and future research directions:
Dataset Representativeness
The Apple Heart and Movement Study, while large, may not fully represent diverse populations, particularly:
- Lower socioeconomic groups
- Non-Apple device users
- Certain demographic minorities
Technical Limitations
- Contrastive learning requires careful positive/negative pair definition
- Model doesn't forecast future health states
- Limited exploration of alternative self-supervised objectives
Deployment Considerations
Real-world deployment requires attention to:
- Model fairness across different populations
- Calibration and interpretability
- Privacy and data security
- Clinical validation and regulatory approval
The Future of Wearable Health AI
This research points toward a future where wearable health AI systems:
- Integrate Multiple Data Types: Combine behavioral, sensor, and contextual data for comprehensive health monitoring
- Operate at Human Timescales: Focus on clinically relevant temporal patterns rather than high-frequency sensor data
- Personalize Through Behavior: Leverage individual behavioral patterns for more accurate and actionable predictions
- Enable Proactive Care: Support early detection and intervention through continuous monitoring
The development of WBM represents a significant step forward in making wearable health monitoring more accurate, comprehensive, and clinically relevant. As foundation models continue to evolve, this research provides a template for how AI can be effectively applied to complex, real-world health challenges through careful consideration of data types, model architectures, and evaluation methodologies.
The key takeaway for AI practitioners is that success in health AI often comes not from applying the most sophisticated algorithms, but from carefully selecting and processing the right type of data for the specific problem domain. In the case of wearable health monitoring, behavioral data proves to be a powerful complement to traditional sensor approaches, opening new possibilities for comprehensive, continuous health monitoring.
- 012507.00191v1Beyond Sensor Data: Foundation Models of Behavioral Data from Wearables Improve Health Predictions Eray Erturk * 1 Fahad Kamran * 2 Salar Abbaspourazad 2 Sean Jewell 2 Harsh Sharma 2 Yujie Li 2 Sinead Williamson 2 Nicholas J Foti † 2 Joseph Futoma † 2 arXiv:2507.00191v1 [cs.LG] 30 Jun 2025 Abstract Wearable devices record physiological and behavioral signals that can improve health predictions. While foundation models are increasingly used for such predictions, they have been primarily applie...