LRS Family: Replication + Walk-Forward Parameter Sweep
LRS Family: Replication + Walk-Forward Parameter Sweep
Question
Reddit commenter shared four testfol.io "LRS" (Leverage Rotation Strategy) configs claiming Sharpe 0.65-1.25 on long-window backtests:
- LRS SPYx3 + RSI: SPY EMA110 ±5.5% + RSI50<60 ±1% → 100% SPYx3 / else 25% gold + 75% cash (1993-2026)
- LRS QQQx2: SPY EMA100 ±5% → 100% QLD / else 25% gold + 75% cash (1986-2026)
- LRS BTC + Gold: BTC EMA30 ±2% + 3d return <25% → 100% BTC / else 25% gold + 75% cash (2017-2026)
- LRS SPYx3 (base) (similar to #1 without RSI overlay)
Two questions: 1. Can we replicate testfol's headline numbers in our own harness? 2. Is the parameter tuning real, or in-sample overfit? (Walk-forward test.)
Methodology
Replication
Decoded the four LRS strategies from testfol's UI screenshots. Built each signal:
- EMA + tolerance band: hysteresis logic — signal enters TRUE when price > EMA×(1+tol), exits when price < EMA×(1−tol), holds previous state in between
- RSI + tolerance band: same hysteresis logic on RSI vs threshold
- Allocation: simulate the "100% LETF when on / 25% gold + 75% cash when off" with synthetic LETFs
Synthesized SPYx3 as (1 + 3 × daily_SPY_return − 0.91%/252).cumprod(). Defensive: 25% × gold daily return + 75% cash (cash earns 0). Pre-GLD (2004-11) we use cash-only for the defensive bucket — substituting gold proxy distorts pre-GLD results.
Walk-forward parameter sweep
Tested ~60 parameter combinations per LETF (TQQQ, SOXL, UPRO):
- SMA windows: {150, 200, 250}
- Tolerance: {0%, 5%}
- RSI configs: {none, RSI30<60, RSI30<70, RSI50<60, RSI50<70}
- Signal source: {self, underlying}
Total 3 × 2 × 5 × 2 = 60 combos per ticker.
- Train window: 2000-08-30 → 2015-12-31 (~15 years) — optimization here
- Test window: 2016-01-01 → 2026-05-31 (~10 years) — UNTOUCHED out-of-sample
For each combo: compute Sharpe on train, compute Sharpe on test. Check correlation.
Results
LRS SPYx3+RSI replication (1993-2026, 33 years)
Decomposed to show where the Sharpe gain comes from:
| Variant | CAGR | MDD | Sharpe | What it adds |
|---|---|---|---|---|
| Pure SMA200, no gold (cash defensive) | 21.06% | -68.37% | 0.714 | The trend filter mechanism alone |
| + EMA110 + 5.5% tolerance band | 25.39% | -47.01% | 0.811 | +0.097 Sharpe from hysteresis (cuts whipsaw) |
| + RSI50<60 filter | 27.32% | -47.01% | 0.886 | +0.075 Sharpe from "don't chase overbought" |
| + Gold defensive bucket (2004-2026) | 28.01% | -46.93% | 0.917 | +0.031 Sharpe from gold |
| testfol's claimed value (full LRS w/ gold) | 24.80% | -47.20% | 0.750 | — |
Our harness produces slightly higher Sharpe (0.917 vs 0.750) than testfol. Likely sources: cost model differences (testfol uses 0%, we use 1bp/side), gold proxy timing differences, expense ratio assumption. Order-of-magnitude agreement.
The tolerance band + RSI overlay does most of the work. Gold contribution is smaller than originally hypothesized.
Walk-forward parameter sweep results
Top 5 by train Sharpe, with test Sharpe shown alongside:
TQQQ
| Params | Train Sh | Test Sh | Tr CAGR | Te CAGR | Tr MDD | Te MDD |
|---|---|---|---|---|---|---|
| SMA200/tol5/RSI50<60/sig:underlying | 0.710 | 1.003 | 22.42% | 42.59% | -47.13% | -56.94% |
| SMA200/tol5/RSI50<60/sig:self | 0.708 | 1.114 | 21.09% | 46.67% | -55.86% | -43.89% |
| SMA250/tol5/RSI50<60/sig:self | 0.680 | 1.119 | 19.86% | 47.20% | -47.99% | -38.25% |
| SMA200/tol0/RSI50<60/sig:self | 0.669 | 1.112 | 19.43% | 46.21% | -58.63% | -39.22% |
| SMA200/tol5/RSI30<60/sig:self | 0.651 | 0.989 | 17.98% | 37.52% | -56.21% | -49.71% |
Best TEST Sharpe overall: SMA150/tol0/RSI50<60/sig:underlying → train 0.456, test 1.265 (would not have been picked in-sample).
SOXL
| Params | Train Sh | Test Sh | Tr CAGR | Te CAGR | Tr MDD | Te MDD |
|---|---|---|---|---|---|---|
| SMA200/tol0/RSI50<60/sig:self | 0.498 | 1.004 | 13.03% | 54.44% | -71.93% | -72.52% |
| SMA200/tol5/RSI50<60/sig:self | 0.472 | 0.995 | 11.73% | 53.88% | -65.67% | -73.90% |
| SMA150/tol5/RSI50<60/sig:self | 0.459 | 1.021 | 11.08% | 55.01% | -73.57% | -70.04% |
| SMA200/tol0/RSI30<60/sig:self | 0.457 | 1.006 | 10.92% | 52.88% | -74.46% | -66.49% |
| SMA150/tol5/RSI30<60/sig:self | 0.434 | 1.026 | 9.85% | 53.44% | -68.50% | -67.62% |
Best TEST Sharpe overall: SMA200/tol5/RSI50<60/sig:underlying → train 0.223, test 1.140.
UPRO
| Params | Train Sh | Test Sh | Tr CAGR | Te CAGR | Tr MDD | Te MDD |
|---|---|---|---|---|---|---|
| SMA200/tol0/RSI50<60/sig:self | 0.761 | 1.054 | 18.90% | 31.63% | -38.26% | -40.74% |
| SMA200/tol5/RSI50<60/sig:underlying | 0.759 | 0.992 | 20.71% | 33.01% | -43.73% | -51.71% |
| SMA200/tol5/RSI50<60/sig:self | 0.751 | 1.153 | 18.49% | 35.73% | -34.28% | -49.93% |
| SMA150/tol5/RSI50<60/sig:underlying | 0.743 | 0.981 | 20.15% | 31.63% | -49.77% | -51.71% |
| SMA200/tol0/RSI30<60/sig:self | 0.731 | 0.846 | 17.21% | 23.06% | -38.26% | -49.96% |
Best TEST Sharpe overall: SMA150/tol5/RSI50<60/sig:self → train 0.575, test 1.243.
Train→Test correlation (the smoking gun)
| Ticker | corr(train Sharpe, test Sharpe) |
|---|---|
| TQQQ | +0.186 |
| SOXL | -0.093 |
| UPRO | +0.085 |
Real predictive correlation would be 0.4-0.7. Near-zero (and slightly negative for SOXL) means parameter selection on past data has essentially no signal about future performance.
Interpretation
The replication confirms testfol's structure is real. Our harness produces similar headline numbers (within reasonable variance for cost model and gold proxy differences). The trend filter + tolerance band + RSI overlay all add incremental Sharpe in-sample.
The walk-forward result demolishes the parameter-tuning premise. Train-to-test Sharpe correlation is essentially zero across all three LETFs. The "best" parameters from 2000-2015 had no predictive power for 2016-2026. The top-trained parameters degraded 0.3-0.5 Sharpe on out-of-sample. Conversely, the best out-of-sample parameters scored mediocre in-sample — you couldn't have picked them.
This is the textbook signature of overfitting. Every LRS variant on testfol with multi-parameter optimization is selling backtest noise. The visible 0.75-0.92 Sharpe numbers are upper-tail outliers from a parameter space where most combinations look mediocre.
What survives the walk-forward:
- The pure SMA200 baseline (no tolerance, no RSI, no parameters to tune). Sharpe ~0.71 on synthetic 3x SPY, no tuning required, consistent across multiple windows.
- The gold defensive bucket adds ~0.04 Sharpe — modest but robust across regimes (see defensive_bucket_comparison.md).
Everything beyond those two is decoration.
Caveats
-
Walk-forward methodology choice. We used a single train/test split. A more rigorous version would be expanding-window walk-forward (rolling the cutoff date across multiple splits). The single split is sufficient to demonstrate zero predictive correlation but doesn't capture how parameter "drift" works across time.
-
Test period is bull-tilted. 2016-2026 has only one structural bear (2022). Test Sharpes are inflated for everyone. A test period containing 2000-2002 or 2008 would show different magnitudes but the train→test correlation conclusion would hold.
-
Grid was deliberately small (60 combos per ticker) to avoid pseudo-precision. A wider grid would generate more in-sample winners with even worse out-of-sample correlation — same conclusion, more aggressive.
-
Gold defensive contribution may be partially regime-specific. Gold had two big bull runs (2001-2011, 2019-present) coincident with equity weakness. A regime without that pattern (e.g., 1980s gold bear) might show less defensive benefit.
Source
Saved logs: /tmp/lrs_replication.log, /tmp/lrs_walkforward.log.
Inline walk-forward runner (abbreviated):
import itertools
import pandas as pd, numpy as np
import sys; sys.path.insert(0, '/Volumes/Mac External/Claudes/trader/src')
from trader.data.yfinance_src import fetch_daily
# Build synthetic 3x LETFs from underlying yfinance data
# Compute SMA + tolerance + RSI signals with hysteresis bands
# Apply position-lagged returns with 1bp/side costs
# Train: 2000-08 to 2015-12, Test: 2016-01 to 2026-05
# Score each combo by Sharpe on each period, report top by train + correlation
(Full source in chat session log; can be re-extracted from /tmp/lrs_walkforward.log runner.)
Related studies
- 2026-05-13_letf_inception_sma200_vs_bh.md — the baseline this builds on
- Defensive bucket comparison — what the gold defensive contribution actually is
- Leverage vs volatility per underlying — why 3x SPY responds better than 3x QQQ to all filtering strategies
This is research output, not investment advice. Backtest results do not predict future returns. Specific portfolio compositions discussed here are illustrative test cases, not allocation recommendations. Do your own research and consult a licensed advisor for personalized advice. Full disclaimer →