Every weekly pick is timestamped and locked before games. 2025 NFL season is in the books - the headline number below is the real, observed hit rate on the picks we published, not a cross-validation score. Numbers refresh live from the published-picks leaderboard.
-
Our weekly top-10 picks beat their position-floor in 17 of 17 weeks of the 2025 NFL season. 144 picks scored, 83 hits.
Sleeper Finder picks players that beat their position's slate median by 75%+. 145 picks scored, 80 hits. DNP rate 14.7% (25 of 170), down from 58% in iteration 3.6.
Every Tuesday morning during the 2025 season, we locked our top-10 picks before any games were played. Of the picks for players who actually suited up, 57.6% scored above their position-floor - the bar we set per position before kickoff. That's competitive with consensus expert picks from FantasyPros, 4for4, and Establish-the-Run, and every individual week was net positive.
Top-10 looks for safe, high-floor producers. Sleeper Finder does the opposite: it surfaces players we expect to beat their position's slate median by 75%+. Anti-consensus on purpose. In 2025, 55.2% of those calls hit (80 of 145 scored) - near-parity with the Top-10 model, but on a totally different player pool. Use it when you need ceiling, not floor.
Precision@10 = "of our 10 weekly Sleeper Finder picks per position, what share beat the slate median by 75%+." Numbers below are from the iteration 4 holdout run on the 2025 NFL season.
| Pos | precision@10 | read |
|---|---|---|
| QB | 18.9% | weakest position; iter5 target |
| RB | 66.7% | strongest skill-position read |
| WR | 70.0% | best position; volume-trend wins |
| TE | 65.6% | target-share dominant |
QB is weakest by a wide margin (18.9%) - the model's volume / target-share signal applies poorly to a position that doesn't share targets. Iteration 5 will introduce QB-specific features (rushing-share, deep-attempt rate, supporting-cast injury overlay) to close the gap.
These are the per-position recall@10 and NDCG numbers the trainer emits when it cross- validates against the 2025 holdout. They measure something different from the 57.6% above - they're the model's strict "did we catch the actual top 10 by position" score on the holdout, not the published-pick hit rate. Useful as a model-health audit, not a marketing claim.
| Pos | recall@10 | precision@10 | NDCG@10 | slates |
|---|
Numbers above are the per-position cross-validation metrics emitted by the trainer to
api/data/topn_ranker.json. They refresh whenever the model is retrained.
Published-pick hit rate (the 57.6%).
Every Tuesday morning of the 2025 NFL season, we ran the Top-10 Ranker, locked the
weekly top-10 by position, and timestamped the picks to
published_picks before any game kicked off. After the slate
finished, each pick was scored against its position-floor (the bar we set per
position before the season started). A "hit" means the player suited up and beat
that floor. 144 of our 170 published picks had a player who actually played and
could be scored; 83 of those 144 hit. That's the 57.6%.
Top-10 Ranker training. The "who's actually starting this week" model. Trained on 2023+2024 with XGBoost rank:ndcg, a +1 monotone constraint on projection (higher projection can never drop a player's score), and a -1 monotone constraint on injury severity. Cross-validated on recall@10: of the players who actually finished in the weekly top 10 (within position), what share did our top 10 catch? The per-position breakdown above is the strict cross-val view; the 57.6% above is the real-world published-pick view.
Why two scoreboards. Cross-validation recall@10 asks "did we catch the actual top 10," which is the right question for tuning the model. The published-pick hit rate asks "of the calls we actually shipped to users, how many beat the position-floor," which is the right question for trusting the product. Both are honest; they measure different things. The headline above is the second one because it's the one a user actually experiences.
Train / validation split. Trained on 2023 + 2024. 2025 is held out entirely - no peeking. Inside training we use TimeSeriesSplit so each validation fold is strictly later than its training fold (no leakage from future weeks into past predictions). Per-position rankers throughout - pooling across positions would have the model compare a QB30 to a WR30, which is not the query a user actually issues.
Features.
Sixteen columns per row, all observable before kickoff: ten composite
factors (volume trend, underrated vs ADP, matchup, game environment,
health, momentum, ownership velocity, garbage-time risk, weather penalty,
script boost), the raw projection itself, opponent DvP rank normalized to
0-1, three-week target-share trend, week-over-week snap delta, an ordinal
0-6 injury-severity scale, and a binary injury-active flag. Iter 3 added
the bottom five; iter 3.5 dropped the projection
base_margin anchor so the trees
can learn arbitrary deltas around projection instead of being bounded to
consensus.
What's next.
We're backfilling historical pre-game weekly projections into the
projections table. Once that lands we'll add a
stricter 1.5x-floor scoreboard alongside the 57.6%, for direct apples-to-
apples comparison to FantasyPros and friends. The 2026 season will run on the
same publish-then-score loop, so the leaderboard updates live every Tuesday.
We do not claim to beat the field on season-long ROI - there is no field-wide ROI number that is honest to publish for a tool whose outputs reach a few thousand users in different leagues, stakes, and rule sets. We are not selling locks. The 57.6% / 55.2% are the share of published Top-10 / Sleeper Finder picks that hit their bar; what you do with either read in your specific lineup is on you.
We do not claim Sleeper Finder is equally strong at every position. QB precision@10 sits at 18.9%, while RB / WR / TE run 65 to 70%. The model leans hard on volume / target-share signals, which travel well for skill positions and poorly for QBs. Iteration 5 will introduce QB-specific features (rushing-share, deep-attempt rate, supporting-cast injury overlay) to close the gap. Until then, treat any QB Sleeper Finder pick as a lower-confidence call.
We do not claim our pre-game projections are better than ESPN's or Sleeper's. We consume both as inputs. Our edge - if there is one - is in the contextual reranking around those projections (Top-10) and the anti-consensus stack (Sleeper Finder), which is exactly what the published-pick hit rates above are measuring.
We do not claim the K-position numbers are predictive. Kickers are noise even in fantasy circles. We train a model for K because the slate has the position and we'd rather emit a score than a NULL, but K's recall@10 is roughly chance and we do not publish K picks in either model.
The numbers on this page move when the published-picks leaderboard moves. If the next retrain or a bad week of picks drags either hit rate down, the page shows it - there is no manual gate between the leaderboard and the scoreboard.
The same model, run on live data each Tuesday morning, surfaces this Sunday's underowned-but-overlooked plays.
Open dfsforge