DT K

DT K

50 Photos and videos

Tweets

Pinned Tweet

DT K

@OnePitchOneSoul

May 28

Returning to baseball research Twitter in 2026 be like:

1,075

DT K

DT K

@OnePitchOneSoul

Jun 7

Ranking teams by how easy it is to guess their pitchers on a single pitch (i.e. how unique they are) It's *twice* as easy to pick out a Rays pitcher than a Marlins pitcher!

DT K

@OnePitchOneSoul

Jun 6

By training a machine learning model to predict pitchers we can find the most unique ones. Tyler Rogers comes in at #1, with a 𝟵𝟵.𝟵% confidence just from a single pitch!

5,841

DT K

DT K

@OnePitchOneSoul

Jun 6

By training a machine learning model to predict pitchers we can find the most unique ones. Tyler Rogers comes in at #1, with a 𝟵𝟵.𝟵% confidence just from a single pitch!

178

88,599

DT K

DT K

@OnePitchOneSoul

Jun 5

Yeah, the conclusion might've sounded too strong. But the point of stuff models is to predict out of sample (i.e. new pitchers). Is Misio great because his name is Misio or because he sits 101? Imo stuff models tend not to isolate the latter because the former adds predictiveness.

Eli Ben-Porat 🇨🇦@EliBenPorat

Jun 5

This is interesting, but not sure I agree with your conclusion. I as a human would be able to predict 100% just seeing 1 Misiorowski, Clase, Jansen etc. signature pitch. But if another pitcher replicated those exact same physics they’d be equally amazing.

4,057

DT K

DT K

@OnePitchOneSoul

Jun 5

It's moreso apparent for guys like Hendricks, Kershaw, and pitch-to-pitch relationship merchants. Was Brent Suter's changeup good in 2020-2021 because of, or despite the lack of velocity gap? I found that models like to avoid these questions by depending on pitcher identity and therefore understating the effects.

502

DT K

DT K

@OnePitchOneSoul

Jun 5

It's not a bad thing that models can guess pitchers per se. I think the interesting question is "how can we lessen the dependence on pitcher identities".

552

DT K

DT K

@OnePitchOneSoul

Jun 5

Are stuff models just memorizing pitchers? One fun test is to make the model predict pitcher names (instead of ERA). With @TJStats tjStuff , we get a 49% accuracy using a *single* pitch, and 73% using the whole season. (And tjStuff is less overfit than others!) So theoretically at most 73% of the predictiveness is banking on prior season results (RV, xwOBA, etc.)

17,825

DT K

DT K

@OnePitchOneSoul

May 30

If anyone is seeing this, I'm looking for some advice. I replicated Fangraphs' Stuff . I scraped every article, tweet, podcast from @enosarris and @choice_fielder to derive the same 19 features, same 5 stage architecture, same Catboost, same train/test split. .9 correlation to Stuff , just as predictive. Is it okay to open source this? Am I technically copying code here?

12,976

DT K

DT K

@OnePitchOneSoul

May 28

Hey @enosarris I've been trying to faithfully replicate Fangraphs Stuff model, my model is 2020-2024 trained without tuning, and it seems to do about as well. Was the Fangraphs model tuned on 2025? Or tuned at all?

15,604

DT K

DT K

@OnePitchOneSoul

May 27

One reason why xwOBAcon isn't predictive: Pitchers can *generally* nudge EV/LA, but can't stop balls randomly landing here. Predictive wOBAcon solutions basically overweight barrels (@tangotiger @MaxSportsStudio) which doesn't help pitchers who don't control swing power. 1/3

20,221

DT K

DT K

@OnePitchOneSoul

May 27

We can make a more reliable xwOBAcon by *literally* using next season xwOBAcon as target. Since every pitcher just has 1 average xwOBAcon per season, we can use Chain Rule to backpropagate this into a per-BIP loss. 2/3

1,871

DT K

DT K

@OnePitchOneSoul

May 27

This version stabilizes wayy faster than xwOBAcon (& more predictive) which makes it viable for pre/in-season projections. As fast as (EV, LA) combo and almost as fast as (K, BB). 3/3

1,687

DT K

DT K

@OnePitchOneSoul

Apr 28

I actually think posts like these are just engagement farming

s0murphy_

@sean0murphy

Apr 27

Radar guns are a lie. 95 is really 91. Look it up. It’s @mlb marketing.

3:14

Community note

MLB average fastball velocity has increased from 91.4 mph in 2008 to 94.3 mph in 2026 using consistent release-point measurements; radar gun changes account for only 1-2 mph of difference, not the full claimed inflation. baseballamerica.com/stories/mlb-ve… baseballamerica.com/stories/the-me…

449

DT K

DT K

@OnePitchOneSoul

31 Dec 2025

After sprint simulations, I predicted Safe Probability based on how far the batter is from the base (when the 1st baseman catches the ball). But I wanted to remove fielders from the equation. 1/5

DT K

@OnePitchOneSoul

29 Dec 2025

I basically created Hustle using the batter's effort level running down the 1st base line. But how do you even measure "effort"? 1/10

789

more replies

DT K

DT K

@OnePitchOneSoul

31 Dec 2025

And the *batter* can decrease the probability of catching on base (by putting pressure on the 1st baseman). 4/5

344

DT K

DT K

@OnePitchOneSoul

31 Dec 2025

It's super interesting how most of the benefit of extra hustling comes from making plays closer, thereby increasing error probabilities. So hustling not only capitalizes on opposition mistakes, it directly promotes opposition mistakes! (Maybe we should use wOBA errors like RA9 instead of ERA? @tangotiger)

225

DT K

DT K

@OnePitchOneSoul

29 Dec 2025

I basically created Hustle using the batter's effort level running down the 1st base line. But how do you even measure "effort"? 1/10

DT K

@OnePitchOneSoul

6 Oct 2025

I made the finals of the 2025 #SMTDataChallenge! Paper: github.com/kimdoyo5/SMT2025/… Huge thanks to @Bbl_Astrophyscs and @BillyFryer42.

15,372

more replies

DT K

DT K

@OnePitchOneSoul

29 Dec 2025

We can use the simulations to derive Safe Probability Added (from putting more/less effort). Turns out hustling makes a big difference in outcome: 4.1% per 10% effort (minor league data), which roughly means if everyone gave 100% all the time, there would be an extra runner at 1st every 2 games or so. 9/10

645

DT K

DT K

@OnePitchOneSoul

29 Dec 2025

Thanks for reading! I missed a lot of details in this thread, so feel free to see the paper (and the code!) here: github.com/kimdoyo5/SMT2025 The data came from @SMTlive as part of the 2025 #SMTDataChallenge A lot of the inspiration came from @KaiFranke3, @BalchJackson, @JonahAnalytics, @kruth99, @justinochoi, @Pitching_Bot ( others not on Twitter!)

GitHub - kimdoyo5/SMT2025

Contribute to kimdoyo5/SMT2025 development by creating an account on GitHub.

github.com

543