LRS Family: Replication + Walk-Forward Parameter Sweep

Question

Reddit commenter shared four testfol.io "LRS" (Leverage Rotation Strategy) configs claiming Sharpe 0.65-1.25 on long-window backtests:

LRS SPYx3 + RSI: SPY EMA110 ±5.5% + RSI50<60 ±1% → 100% SPYx3 / else 25% gold + 75% cash (1993-2026)
LRS QQQx2: SPY EMA100 ±5% → 100% QLD / else 25% gold + 75% cash (1986-2026)
LRS BTC + Gold: BTC EMA30 ±2% + 3d return <25% → 100% BTC / else 25% gold + 75% cash (2017-2026)
LRS SPYx3 (base) (similar to #1 without RSI overlay)

Two questions: 1. Can we replicate testfol's headline numbers in our own harness? 2. Is the parameter tuning real, or in-sample overfit? (Walk-forward test.)

Methodology

Replication

Decoded the four LRS strategies from testfol's UI screenshots. Built each signal:

EMA + tolerance band: hysteresis logic — signal enters TRUE when price > EMA×(1+tol), exits when price < EMA×(1−tol), holds previous state in between
RSI + tolerance band: same hysteresis logic on RSI vs threshold
Allocation: simulate the "100% LETF when on / 25% gold + 75% cash when off" with synthetic LETFs

Synthesized SPYx3 as (1 + 3 × daily_SPY_return − 0.91%/252).cumprod(). Defensive: 25% × gold daily return + 75% cash (cash earns 0). Pre-GLD (2004-11) we use cash-only for the defensive bucket — substituting gold proxy distorts pre-GLD results.

Walk-forward parameter sweep

Tested ~60 parameter combinations per LETF (TQQQ, SOXL, UPRO):

SMA windows: {150, 200, 250}
Tolerance: {0%, 5%}
RSI configs: {none, RSI30<60, RSI30<70, RSI50<60, RSI50<70}
Signal source: {self, underlying}

Total 3 × 2 × 5 × 2 = 60 combos per ticker.

Train window: 2000-08-30 → 2015-12-31 (~15 years) — optimization here
Test window: 2016-01-01 → 2026-05-31 (~10 years) — UNTOUCHED out-of-sample

For each combo: compute Sharpe on train, compute Sharpe on test. Check correlation.

Results

LRS SPYx3+RSI replication (1993-2026, 33 years)

Decomposed to show where the Sharpe gain comes from:

Variant	CAGR	MDD	Sharpe	What it adds
Pure SMA200, no gold (cash defensive)	21.06%	-68.37%	0.714	The trend filter mechanism alone
+ EMA110 + 5.5% tolerance band	25.39%	-47.01%	0.811	+0.097 Sharpe from hysteresis (cuts whipsaw)
+ RSI50<60 filter	27.32%	-47.01%	0.886	+0.075 Sharpe from "don't chase overbought"
+ Gold defensive bucket (2004-2026)	28.01%	-46.93%	0.917	+0.031 Sharpe from gold
testfol's claimed value (full LRS w/ gold)	24.80%	-47.20%	0.750	—

Our harness produces slightly higher Sharpe (0.917 vs 0.750) than testfol. Likely sources: cost model differences (testfol uses 0%, we use 1bp/side), gold proxy timing differences, expense ratio assumption. Order-of-magnitude agreement.

The tolerance band + RSI overlay does most of the work. Gold contribution is smaller than originally hypothesized.

Walk-forward parameter sweep results

Top 5 by train Sharpe, with test Sharpe shown alongside:

TQQQ

Params	Train Sh	Test Sh	Tr CAGR	Te CAGR	Tr MDD	Te MDD
SMA200/tol5/RSI50<60/sig:underlying	0.710	1.003	22.42%	42.59%	-47.13%	-56.94%
SMA200/tol5/RSI50<60/sig:self	0.708	1.114	21.09%	46.67%	-55.86%	-43.89%
SMA250/tol5/RSI50<60/sig:self	0.680	1.119	19.86%	47.20%	-47.99%	-38.25%
SMA200/tol0/RSI50<60/sig:self	0.669	1.112	19.43%	46.21%	-58.63%	-39.22%
SMA200/tol5/RSI30<60/sig:self	0.651	0.989	17.98%	37.52%	-56.21%	-49.71%

Best TEST Sharpe overall: SMA150/tol0/RSI50<60/sig:underlying → train 0.456, test 1.265 (would not have been picked in-sample).

SOXL

Params	Train Sh	Test Sh	Tr CAGR	Te CAGR	Tr MDD	Te MDD
SMA200/tol0/RSI50<60/sig:self	0.498	1.004	13.03%	54.44%	-71.93%	-72.52%
SMA200/tol5/RSI50<60/sig:self	0.472	0.995	11.73%	53.88%	-65.67%	-73.90%
SMA150/tol5/RSI50<60/sig:self	0.459	1.021	11.08%	55.01%	-73.57%	-70.04%
SMA200/tol0/RSI30<60/sig:self	0.457	1.006	10.92%	52.88%	-74.46%	-66.49%
SMA150/tol5/RSI30<60/sig:self	0.434	1.026	9.85%	53.44%	-68.50%	-67.62%

Best TEST Sharpe overall: SMA200/tol5/RSI50<60/sig:underlying → train 0.223, test 1.140.

UPRO

Params	Train Sh	Test Sh	Tr CAGR	Te CAGR	Tr MDD	Te MDD
SMA200/tol0/RSI50<60/sig:self	0.761	1.054	18.90%	31.63%	-38.26%	-40.74%
SMA200/tol5/RSI50<60/sig:underlying	0.759	0.992	20.71%	33.01%	-43.73%	-51.71%
SMA200/tol5/RSI50<60/sig:self	0.751	1.153	18.49%	35.73%	-34.28%	-49.93%
SMA150/tol5/RSI50<60/sig:underlying	0.743	0.981	20.15%	31.63%	-49.77%	-51.71%
SMA200/tol0/RSI30<60/sig:self	0.731	0.846	17.21%	23.06%	-38.26%	-49.96%

Best TEST Sharpe overall: SMA150/tol5/RSI50<60/sig:self → train 0.575, test 1.243.

Train→Test correlation (the smoking gun)

Ticker	corr(train Sharpe, test Sharpe)
TQQQ	+0.186
SOXL	-0.093
UPRO	+0.085

Real predictive correlation would be 0.4-0.7. Near-zero (and slightly negative for SOXL) means parameter selection on past data has essentially no signal about future performance.

Interpretation

The replication confirms testfol's structure is real. Our harness produces similar headline numbers (within reasonable variance for cost model and gold proxy differences). The trend filter + tolerance band + RSI overlay all add incremental Sharpe in-sample.

The walk-forward result demolishes the parameter-tuning premise. Train-to-test Sharpe correlation is essentially zero across all three LETFs. The "best" parameters from 2000-2015 had no predictive power for 2016-2026. The top-trained parameters degraded 0.3-0.5 Sharpe on out-of-sample. Conversely, the best out-of-sample parameters scored mediocre in-sample — you couldn't have picked them.

This is the textbook signature of overfitting. Every LRS variant on testfol with multi-parameter optimization is selling backtest noise. The visible 0.75-0.92 Sharpe numbers are upper-tail outliers from a parameter space where most combinations look mediocre.

What survives the walk-forward:

The pure SMA200 baseline (no tolerance, no RSI, no parameters to tune). Sharpe ~0.71 on synthetic 3x SPY, no tuning required, consistent across multiple windows.
The gold defensive bucket adds ~0.04 Sharpe — modest but robust across regimes (see defensive_bucket_comparison.md).

Everything beyond those two is decoration.

Caveats

Walk-forward methodology choice. We used a single train/test split. A more rigorous version would be expanding-window walk-forward (rolling the cutoff date across multiple splits). The single split is sufficient to demonstrate zero predictive correlation but doesn't capture how parameter "drift" works across time.
Test period is bull-tilted. 2016-2026 has only one structural bear (2022). Test Sharpes are inflated for everyone. A test period containing 2000-2002 or 2008 would show different magnitudes but the train→test correlation conclusion would hold.
Grid was deliberately small (60 combos per ticker) to avoid pseudo-precision. A wider grid would generate more in-sample winners with even worse out-of-sample correlation — same conclusion, more aggressive.
Gold defensive contribution may be partially regime-specific. Gold had two big bull runs (2001-2011, 2019-present) coincident with equity weakness. A regime without that pattern (e.g., 1980s gold bear) might show less defensive benefit.

Source

Saved logs: /tmp/lrs_replication.log, /tmp/lrs_walkforward.log.

Inline walk-forward runner (abbreviated):

import itertools
import pandas as pd, numpy as np
import sys; sys.path.insert(0, '/Volumes/Mac External/Claudes/trader/src')
from trader.data.yfinance_src import fetch_daily

# Build synthetic 3x LETFs from underlying yfinance data
# Compute SMA + tolerance + RSI signals with hysteresis bands
# Apply position-lagged returns with 1bp/side costs
# Train: 2000-08 to 2015-12, Test: 2016-01 to 2026-05
# Score each combo by Sharpe on each period, report top by train + correlation

(Full source in chat session log; can be re-extracted from /tmp/lrs_walkforward.log runner.)

2026-05-13_letf_inception_sma200_vs_bh.md — the baseline this builds on
Defensive bucket comparison — what the gold defensive contribution actually is
Leverage vs volatility per underlying — why 3x SPY responds better than 3x QQQ to all filtering strategies