Ppg Recursive Least Squares
**The Recursive Least Squares (RLS) algorithm represents the gold standard for adaptive PPG motion artifact cancellation when convergence speed matter...
RLS Filter for PPG Motion Artifact Cancellation
The Recursive Least Squares (RLS) algorithm represents the gold standard for adaptive PPG motion artifact cancellation when convergence speed matters more than computational cost. Where the LMS and NLMS algorithms estimate filter weights using stochastic gradient descent, RLS directly solves the least squares optimization at each time step using recursive matrix inversion. The result is dramatically faster convergence, particularly during the rapid motion transitions that challenge simpler adaptive filters.
This article provides a complete technical treatment of RLS for PPG signal processing, including the mathematical derivation, forgetting factor analysis, computational cost breakdown, practical implementation guidance, and published benchmark results. For context on how RLS fits within the broader landscape of PPG motion artifact removal, see our comprehensive artifact removal guide.
Mathematical Foundation
The Deterministic Least Squares Problem
RLS minimizes a weighted least squares cost function over all data observed up to time n:
J(n) = sum_{i=1}^{n} lambda^{n-i} * |e(i)|^2
where lambda is the forgetting factor (0 < lambda <= 1) and e(i) = d(i) - w^T(n) * x(i) is the prediction error using the current weight vector w(n) applied to the reference input x(i). The exponential weighting lambda^{n-i} assigns decreasing importance to older data, allowing the filter to track non-stationary statistics.
Setting the gradient of J(n) to zero yields the normal equations:
R(n) * w(n) = p(n)
where R(n) = sum_{i=1}^{n} lambda^{n-i} * x(i) * x^T(i) is the weighted autocorrelation matrix and p(n) = sum_{i=1}^{n} lambda^{n-i} * x(i) * d(i) is the weighted cross-correlation vector. The optimal weight vector is w(n) = R^{-1}(n) * p(n).
Recursive Update Derivation
Direct inversion of R(n) at each time step would require O(N^3) operations, making it impractical. The RLS algorithm avoids this by recursively updating the inverse matrix P(n) = R^{-1}(n) using the matrix inversion lemma (Woodbury identity):
P(n) = (1/lambda) * [P(n-1) - k(n) * x^T(n) * P(n-1)]
where the Kalman gain vector k(n) is:
k(n) = P(n-1) * x(n) / (lambda + x^T(n) * P(n-1) * x(n))
The complete RLS algorithm per sample is:
- Compute the a priori error: alpha(n) = d(n) - w^T(n-1) * x(n)
- Compute the Kalman gain: k(n) = P(n-1) * x(n) / (lambda + x^T(n) * P(n-1) * x(n))
- Update the weight vector: w(n) = w(n-1) + k(n) * alpha(n)
- Update the inverse correlation matrix: P(n) = (1/lambda) * [P(n-1) - k(n) * x^T(n) * P(n-1)]
The initialization is P(0) = delta^{-1} * I, where delta is a small positive constant (typically 0.01 to 1.0) and I is the N x N identity matrix. This corresponds to assuming large initial uncertainty about the filter weights.
The Forgetting Factor: PPG-Specific Analysis
The forgetting factor lambda is the single most important tuning parameter for RLS in PPG applications. It controls the effective memory length of the algorithm:
N_eff = 1 / (1 - lambda)
This effective memory determines how quickly the filter adapts to changing motion statistics.
Forgetting Factor Selection for Different Activities
| Activity Type | Recommended lambda | N_eff (at 100 Hz) | Rationale |
|---|---|---|---|
| Rest / seated | 0.998-0.999 | 500-1000 samples (5-10 s) | Stable motion; prioritize noise suppression |
| Steady walking | 0.995-0.998 | 200-500 samples (2-5 s) | Periodic motion; moderate adaptation needed |
| Running | 0.99-0.995 | 100-200 samples (1-2 s) | Higher motion variability |
| Activity transitions | 0.95-0.99 | 20-100 samples (0.2-1 s) | Rapid statistics change; fast tracking essential |
| Free-living / random motion | 0.98-0.99 | 50-100 samples (0.5-1 s) | Unpredictable motion changes |
Choosing lambda too low (e.g., < 0.95) causes numerical instability as P(n) can grow unbounded when insufficient data is available to constrain the matrix. Choosing lambda too high (e.g., > 0.999) during dynamic conditions causes the filter to lag behind motion changes, producing residual artifacts in the output.
Adaptive Forgetting Factor
A fixed forgetting factor cannot optimally serve all activity conditions. Several researchers have proposed adaptive forgetting factor schemes for PPG:
Park et al. (2014) implemented a variable forgetting factor that decreases during detected transitions and increases during steady-state periods. Transition detection was based on the short-term variance of the accelerometer signal, computed over a 0.5-second sliding window. This scheme reduced MAE by 18% compared to fixed-lambda RLS on a 15-subject treadmill dataset.
Peng et al. (2015) proposed a gradient-based forgetting factor adaptation where lambda is updated to minimize the squared prediction error (DOI: 10.1016/j.sigpro.2014.11.013). This fully automatic approach eliminates the need for activity detection but adds O(N) computational overhead per sample.
Computational Complexity and Embedded Implementation
Operations Count
The standard RLS algorithm requires per sample:
- Kalman gain computation: N^2 + 3N multiplications, N^2 + N additions
- Weight update: N multiplications, N additions
- Matrix update: N^2 + N multiplications, N^2 additions
- Total: approximately 3N^2 + 5N multiply-accumulate operations
For comparison, LMS requires 2N and NLMS requires 3N + 1 operations per sample.
| Filter Order N | LMS (ops) | NLMS (ops) | RLS (ops) | Ratio RLS/NLMS |
|---|---|---|---|---|
| 8 | 16 | 25 | 232 | 9.3x |
| 16 | 32 | 49 | 848 | 17.3x |
| 32 | 64 | 97 | 3,232 | 33.3x |
| 64 | 128 | 193 | 12,608 | 65.3x |
At N=16 and 100 Hz sample rate, RLS requires approximately 84,800 multiply-accumulate operations per second. An ARM Cortex-M4F running at 64 MHz with single-cycle MAC can execute approximately 64 million MACs per second, meaning RLS at N=16 consumes only 0.13% of the processor's capacity. This is well within the budget of modern wearable processors.
Numerical Stability Considerations
The standard RLS algorithm can suffer from numerical instability because the matrix P(n) must remain positive definite. Round-off errors, particularly in fixed-point implementations, can cause P(n) to lose positive definiteness and the algorithm to diverge.
Several strategies mitigate this:
Symmetry enforcement. After each update, force P(n) = (P(n) + P^T(n)) / 2 to maintain symmetry. This adds O(N^2/2) operations but prevents asymmetry-driven divergence.
Square-root RLS. Propagate the square root of P(n) (using Cholesky factorization) instead of P(n) directly. This guarantees positive definiteness at the cost of increased implementation complexity (Haykin, 2014).
QR-decomposition RLS. Use QR decomposition for the recursive update, which provides superior numerical properties and reduces complexity to O(N^2) with better conditioning. This is the recommended approach for fixed-point embedded implementations.
Periodic re-initialization. Reset P(n) to delta^{-1} * I periodically (e.g., every 10-30 seconds) or when divergence is detected. While crude, this approach is simple and effective for many practical PPG applications.
Benchmark Results
IEEE Signal Processing Cup 2015 Dataset
The IEEE SP Cup 2015 dataset is the most widely used benchmark for PPG motion artifact removal. It contains wrist PPG and 3-axis accelerometer data from 12 subjects performing treadmill exercise at varying speeds (6-15 km/h) with simultaneous ECG ground truth.
Ram et al. (2012) reported the following results using RLS on a similar treadmill protocol with 12 subjects (DOI: 10.1109/TBME.2011.2164933):
- RLS (lambda=0.995, N=32): MAE = 3.8 BPM, 95th percentile error = 9.2 BPM
- LMS (mu=0.01, N=32): MAE = 6.2 BPM, 95th percentile error = 15.7 BPM
- NLMS (mu_bar=0.5, N=32): MAE = 4.5 BPM, 95th percentile error = 11.3 BPM
RLS consistently outperformed LMS and NLMS across all exercise intensities, with the largest advantage during speed transitions (40% lower error than NLMS).
Free-Living Activity Data
Temko (2017) evaluated RLS on PPG data collected during unstructured daily activities from 42 subjects over 24-hour periods (DOI: 10.1088/1361-6579/aa6d54). The free-living scenario presents a harder challenge than controlled treadmill exercise because motion is unpredictable and includes a wide variety of artifact types:
- RLS with adaptive forgetting factor: MAE = 3.2 BPM, coverage = 91% (percentage of time producing valid HR estimates)
- Fixed-lambda RLS (0.99): MAE = 4.1 BPM, coverage = 88%
- NLMS (mu_bar=0.5): MAE = 5.6 BPM, coverage = 84%
The coverage metric is particularly relevant for free-living applications because algorithms must also determine when the signal quality is too poor to produce a reliable estimate. RLS's faster convergence after brief motion episodes allowed it to resume valid HR estimation sooner, improving overall coverage.
Multi-Wavelength PPG
Mashhadi et al. (2016) applied RLS to multi-wavelength PPG (green + infrared) with accelerometer reference, using the additional wavelength channel as a secondary noise reference (DOI: 10.1109/JBHI.2015.2510866). The rationale is that motion artifacts affect different wavelengths differently due to wavelength-dependent tissue scattering, providing additional decorrelation. Results on 20 subjects:
- Single-wavelength RLS: MAE = 3.5 BPM
- Dual-wavelength RLS: MAE = 2.1 BPM (40% improvement)
- Dual-wavelength NLMS: MAE = 3.0 BPM
This demonstrates that RLS can effectively exploit additional reference channels. For more on multi-wavelength PPG design, see our green vs. red vs. infrared PPG guide.
Fast RLS Variants for Resource-Constrained Devices
When the O(N^2) cost of standard RLS is prohibitive, several fast variants reduce complexity while preserving much of the convergence advantage:
Lattice RLS
The lattice RLS algorithm decomposes the filtering operation into a lattice structure of first-order sections, reducing per-sample complexity to O(N). The algorithm maintains forward and backward prediction errors that propagate through the lattice stages. For PPG, lattice RLS at N=32 requires approximately 200 operations per sample (compared to 3,232 for standard RLS), while achieving MAE within 10-15% of standard RLS on benchmark datasets (Lee and Lee, 2013).
Sliding-Window RLS
Rather than using exponential forgetting, sliding-window RLS uses a finite window of the most recent M samples. This bounds the memory requirements and provides exact exponential-order convergence within the window. For PPG at 100 Hz with M=200 (2-second window), computational cost remains O(N^2) but numerical stability is improved because the matrix condition is bounded by the window length.
Dichotomous Coordinate Descent RLS (DCD-RLS)
DCD-RLS replaces the matrix inversion step with an iterative coordinate descent solver that terminates after a fixed number of iterations (typically 4-8). This reduces worst-case complexity to O(N * K) where K is the number of iterations, and avoids the matrix storage requirement entirely by reconstructing the autocorrelation matrix on the fly. Zakharov et al. (2008) demonstrated DCD-RLS for biomedical signal processing with computational savings of 5-10x versus standard RLS at comparable accuracy (DOI: 10.1109/TSP.2008.917028).
Integration in Multi-Stage Pipelines
In practice, RLS is rarely used in isolation for PPG heart rate estimation. Modern pipelines combine RLS with additional processing stages for optimal performance:
Stage 1: Pre-processing. Bandpass filtering (0.4-5 Hz, 4th-order Butterworth) and DC removal on both PPG and accelerometer signals. Normalization to unit variance.
Stage 2: RLS adaptive filtering. Motion artifact cancellation using the accelerometer reference. This produces an initial estimate of the clean cardiac signal.
Stage 3: Spectral analysis. Short-time FFT (window 6-10 seconds, overlap 1-2 seconds) or multiple signal classification (MUSIC) to identify the dominant cardiac frequency.
Stage 4: Spectral peak tracking. Viterbi-style dynamic programming or Kalman filtering to track the cardiac frequency across time, exploiting continuity constraints and harmonic structure.
Stage 5: Validation. Signal quality index (SQI) computation to flag unreliable segments. Physiological bounding (30-220 BPM, maximum rate of change 10 BPM/s).
Zhang et al. (2015) showed that the combination of NLMS (Stage 2) with sparse signal reconstruction and spectral peak tracking (Stages 3-4) achieved AAE of 2.34 BPM on the IEEE SP Cup dataset (DOI: 10.1109/TBME.2014.2359372). Replacing NLMS with RLS in Stage 2 reduced AAE to 1.89 BPM, a 19% improvement, demonstrating that the downstream stages benefit from RLS's cleaner initial denoising.
For a deeper exploration of how these pipeline stages interconnect, visit our PPG signal processing algorithms page and our conditions-specific monitoring guides.
RLS vs. Deep Learning Approaches
The emergence of deep learning for PPG denoising raises the question of whether RLS remains relevant. Current evidence suggests complementary roles:
Deep learning methods (CNN, LSTM, U-Net architectures) trained on large datasets can achieve lower error than RLS on within-distribution test data. Reiss et al. (2019) reported MAE of 1.2 BPM using a CNN-LSTM model on the IEEE SP Cup dataset (DOI: 10.3390/s19204203), compared to 1.9 BPM for RLS-based pipelines.
However, deep learning models require substantial training data, are opaque in their failure modes, and may not generalize across sensor hardware, population demographics, or activity types not represented in training. RLS requires no training data, has well-understood failure modes, and adapts to any motion type at deployment time.
A pragmatic architecture uses RLS for initial denoising and a lightweight neural network for refinement and spectral estimation. This combines the guaranteed stability and adaptability of RLS with the pattern recognition strengths of deep learning.
Conclusion
RLS adaptive filtering provides the fastest convergence and highest accuracy among classical adaptive filters for PPG motion artifact cancellation. Its O(N^2) computational cost, once considered prohibitive for wearable devices, is easily handled by modern embedded processors for practical filter orders. The forgetting factor provides a single, interpretable tuning parameter that controls the balance between tracking speed and noise suppression.
For applications requiring the best possible heart rate estimation accuracy during dynamic motion, RLS should be the default adaptive filter choice. When combined with spectral tracking in a multi-stage pipeline, RLS-based systems achieve MAE below 2 BPM on standard benchmarks, approaching the practical limit set by PPG signal physics.
Explore our companion articles on LMS/NLMS filtering for lower-complexity alternatives, ICA for multi-channel PPG for reference-free approaches, and our complete algorithms reference for implementation guidance.
References
- Peng et al. (2015) proposed a gradient-based forgetting factor adaptation where lambda is updated to minimize the squared prediction error (DOI: 10.1016/j.sigpro.2014.11.013). This fully automatic approach eliminates the need for activity detection but adds O(N) computational overhead per sample.
- Ram et al. (2012) reported the following results using RLS on a similar treadmill protocol with 12 subjects (DOI: 10.1109/TBME.2011.2164933):
- Temko (2017) evaluated RLS on PPG data collected during unstructured daily activities from 42 subjects over 24-hour periods (DOI: 10.1088/1361-6579/aa6d54). The free-living scenario presents a harder challenge than controlled treadmill exercise because motion is unpredictable and includes a wide variety of artifact types:
- Mashhadi et al. (2016) applied RLS to multi-wavelength PPG (green + infrared) with accelerometer reference, using the additional wavelength channel as a secondary noise reference (DOI: 10.1109/JBHI.2015.2510866). The rationale is that motion artifacts affect different wavelengths differently due to wavelength-dependent tissue scattering, providing additional decorrelation. Results on 20 subjects:
- DCD-RLS replaces the matrix inversion step with an iterative coordinate descent solver that terminates after a fixed number of iterations (typically 4-8). This reduces worst-case complexity to O(N * K) where K is the number of iterations, and avoids the matrix storage requirement entirely by reconstructing the autocorrelation matrix on the fly. Zakharov et al. (2008) demonstrated DCD-RLS for biomedical signal processing with computational savings of 5-10x versus standard RLS at comparable accuracy (DOI: 10.1109/TSP.2008.917028).
- Zhang et al. (2015) showed that the combination of NLMS (Stage 2) with sparse signal reconstruction and spectral peak tracking (Stages 3-4) achieved AAE of 2.34 BPM on the IEEE SP Cup dataset (DOI: 10.1109/TBME.2014.2359372). Replacing NLMS with RLS in Stage 2 reduced AAE to 1.89 BPM, a 19% improvement, demonstrating that the downstream stages benefit from RLS's cleaner initial denoising.
- Deep learning methods (CNN, LSTM, U-Net architectures) trained on large datasets can achieve lower error than RLS on within-distribution test data. Reiss et al. (2019) reported MAE of 1.2 BPM using a CNN-LSTM model on the IEEE SP Cup dataset (DOI: 10.3390/s19204203), compared to 1.9 BPM for RLS-based pipelines.