sma200.trade
For informational and educational use only. Not investment advice. Past performance does not predict future returns. Read full disclaimer →
🟢 Research note · May 17, 2026 · Current

LRS Family: Replication + Walk-Forward Parameter Sweep

Methodology #walk-forward#parameter-tuning#overfitting#lrs#rsi#sma

LRS Family: Replication + Walk-Forward Parameter Sweep

Question

Reddit commenter shared four testfol.io "LRS" (Leverage Rotation Strategy) configs claiming Sharpe 0.65-1.25 on long-window backtests:

  1. LRS SPYx3 + RSI: SPY EMA110 ±5.5% + RSI50<60 ±1% → 100% SPYx3 / else 25% gold + 75% cash (1993-2026)
  2. LRS QQQx2: SPY EMA100 ±5% → 100% QLD / else 25% gold + 75% cash (1986-2026)
  3. LRS BTC + Gold: BTC EMA30 ±2% + 3d return <25% → 100% BTC / else 25% gold + 75% cash (2017-2026)
  4. LRS SPYx3 (base) (similar to #1 without RSI overlay)

Two questions: 1. Can we replicate testfol's headline numbers in our own harness? 2. Is the parameter tuning real, or in-sample overfit? (Walk-forward test.)

Methodology

Replication

Decoded the four LRS strategies from testfol's UI screenshots. Built each signal:

  • EMA + tolerance band: hysteresis logic — signal enters TRUE when price > EMA×(1+tol), exits when price < EMA×(1−tol), holds previous state in between
  • RSI + tolerance band: same hysteresis logic on RSI vs threshold
  • Allocation: simulate the "100% LETF when on / 25% gold + 75% cash when off" with synthetic LETFs

Synthesized SPYx3 as (1 + 3 × daily_SPY_return − 0.91%/252).cumprod(). Defensive: 25% × gold daily return + 75% cash (cash earns 0). Pre-GLD (2004-11) we use cash-only for the defensive bucket — substituting gold proxy distorts pre-GLD results.

Walk-forward parameter sweep

Tested ~60 parameter combinations per LETF (TQQQ, SOXL, UPRO):

  • SMA windows: {150, 200, 250}
  • Tolerance: {0%, 5%}
  • RSI configs: {none, RSI30<60, RSI30<70, RSI50<60, RSI50<70}
  • Signal source: {self, underlying}

Total 3 × 2 × 5 × 2 = 60 combos per ticker.

  • Train window: 2000-08-30 → 2015-12-31 (~15 years) — optimization here
  • Test window: 2016-01-01 → 2026-05-31 (~10 years) — UNTOUCHED out-of-sample

For each combo: compute Sharpe on train, compute Sharpe on test. Check correlation.

Results

LRS SPYx3+RSI replication (1993-2026, 33 years)

Decomposed to show where the Sharpe gain comes from:

Variant CAGR MDD Sharpe What it adds
Pure SMA200, no gold (cash defensive) 21.06% -68.37% 0.714 The trend filter mechanism alone
+ EMA110 + 5.5% tolerance band 25.39% -47.01% 0.811 +0.097 Sharpe from hysteresis (cuts whipsaw)
+ RSI50<60 filter 27.32% -47.01% 0.886 +0.075 Sharpe from "don't chase overbought"
+ Gold defensive bucket (2004-2026) 28.01% -46.93% 0.917 +0.031 Sharpe from gold
testfol's claimed value (full LRS w/ gold) 24.80% -47.20% 0.750

Our harness produces slightly higher Sharpe (0.917 vs 0.750) than testfol. Likely sources: cost model differences (testfol uses 0%, we use 1bp/side), gold proxy timing differences, expense ratio assumption. Order-of-magnitude agreement.

The tolerance band + RSI overlay does most of the work. Gold contribution is smaller than originally hypothesized.

Walk-forward parameter sweep results

Top 5 by train Sharpe, with test Sharpe shown alongside:

TQQQ

Params Train Sh Test Sh Tr CAGR Te CAGR Tr MDD Te MDD
SMA200/tol5/RSI50<60/sig:underlying 0.710 1.003 22.42% 42.59% -47.13% -56.94%
SMA200/tol5/RSI50<60/sig:self 0.708 1.114 21.09% 46.67% -55.86% -43.89%
SMA250/tol5/RSI50<60/sig:self 0.680 1.119 19.86% 47.20% -47.99% -38.25%
SMA200/tol0/RSI50<60/sig:self 0.669 1.112 19.43% 46.21% -58.63% -39.22%
SMA200/tol5/RSI30<60/sig:self 0.651 0.989 17.98% 37.52% -56.21% -49.71%

Best TEST Sharpe overall: SMA150/tol0/RSI50<60/sig:underlying → train 0.456, test 1.265 (would not have been picked in-sample).

SOXL

Params Train Sh Test Sh Tr CAGR Te CAGR Tr MDD Te MDD
SMA200/tol0/RSI50<60/sig:self 0.498 1.004 13.03% 54.44% -71.93% -72.52%
SMA200/tol5/RSI50<60/sig:self 0.472 0.995 11.73% 53.88% -65.67% -73.90%
SMA150/tol5/RSI50<60/sig:self 0.459 1.021 11.08% 55.01% -73.57% -70.04%
SMA200/tol0/RSI30<60/sig:self 0.457 1.006 10.92% 52.88% -74.46% -66.49%
SMA150/tol5/RSI30<60/sig:self 0.434 1.026 9.85% 53.44% -68.50% -67.62%

Best TEST Sharpe overall: SMA200/tol5/RSI50<60/sig:underlying → train 0.223, test 1.140.

UPRO

Params Train Sh Test Sh Tr CAGR Te CAGR Tr MDD Te MDD
SMA200/tol0/RSI50<60/sig:self 0.761 1.054 18.90% 31.63% -38.26% -40.74%
SMA200/tol5/RSI50<60/sig:underlying 0.759 0.992 20.71% 33.01% -43.73% -51.71%
SMA200/tol5/RSI50<60/sig:self 0.751 1.153 18.49% 35.73% -34.28% -49.93%
SMA150/tol5/RSI50<60/sig:underlying 0.743 0.981 20.15% 31.63% -49.77% -51.71%
SMA200/tol0/RSI30<60/sig:self 0.731 0.846 17.21% 23.06% -38.26% -49.96%

Best TEST Sharpe overall: SMA150/tol5/RSI50<60/sig:self → train 0.575, test 1.243.

Train→Test correlation (the smoking gun)

Ticker corr(train Sharpe, test Sharpe)
TQQQ +0.186
SOXL -0.093
UPRO +0.085

Real predictive correlation would be 0.4-0.7. Near-zero (and slightly negative for SOXL) means parameter selection on past data has essentially no signal about future performance.

Interpretation

The replication confirms testfol's structure is real. Our harness produces similar headline numbers (within reasonable variance for cost model and gold proxy differences). The trend filter + tolerance band + RSI overlay all add incremental Sharpe in-sample.

The walk-forward result demolishes the parameter-tuning premise. Train-to-test Sharpe correlation is essentially zero across all three LETFs. The "best" parameters from 2000-2015 had no predictive power for 2016-2026. The top-trained parameters degraded 0.3-0.5 Sharpe on out-of-sample. Conversely, the best out-of-sample parameters scored mediocre in-sample — you couldn't have picked them.

This is the textbook signature of overfitting. Every LRS variant on testfol with multi-parameter optimization is selling backtest noise. The visible 0.75-0.92 Sharpe numbers are upper-tail outliers from a parameter space where most combinations look mediocre.

What survives the walk-forward:

  1. The pure SMA200 baseline (no tolerance, no RSI, no parameters to tune). Sharpe ~0.71 on synthetic 3x SPY, no tuning required, consistent across multiple windows.
  2. The gold defensive bucket adds ~0.04 Sharpe — modest but robust across regimes (see defensive_bucket_comparison.md).

Everything beyond those two is decoration.

Caveats

  1. Walk-forward methodology choice. We used a single train/test split. A more rigorous version would be expanding-window walk-forward (rolling the cutoff date across multiple splits). The single split is sufficient to demonstrate zero predictive correlation but doesn't capture how parameter "drift" works across time.

  2. Test period is bull-tilted. 2016-2026 has only one structural bear (2022). Test Sharpes are inflated for everyone. A test period containing 2000-2002 or 2008 would show different magnitudes but the train→test correlation conclusion would hold.

  3. Grid was deliberately small (60 combos per ticker) to avoid pseudo-precision. A wider grid would generate more in-sample winners with even worse out-of-sample correlation — same conclusion, more aggressive.

  4. Gold defensive contribution may be partially regime-specific. Gold had two big bull runs (2001-2011, 2019-present) coincident with equity weakness. A regime without that pattern (e.g., 1980s gold bear) might show less defensive benefit.

Source

Saved logs: /tmp/lrs_replication.log, /tmp/lrs_walkforward.log.

Inline walk-forward runner (abbreviated):

import itertools
import pandas as pd, numpy as np
import sys; sys.path.insert(0, '/Volumes/Mac External/Claudes/trader/src')
from trader.data.yfinance_src import fetch_daily

# Build synthetic 3x LETFs from underlying yfinance data
# Compute SMA + tolerance + RSI signals with hysteresis bands
# Apply position-lagged returns with 1bp/side costs
# Train: 2000-08 to 2015-12, Test: 2016-01 to 2026-05
# Score each combo by Sharpe on each period, report top by train + correlation

(Full source in chat session log; can be re-extracted from /tmp/lrs_walkforward.log runner.)

This is research output, not investment advice. Backtest results do not predict future returns. Specific portfolio compositions discussed here are illustrative test cases, not allocation recommendations. Do your own research and consult a licensed advisor for personalized advice. Full disclaimer →