Doyoung-Tom Kim | @UofT, Previously @Phillies R&D

Joined May 2020
50 Photos and videos
Pinned Tweet
Returning to baseball research Twitter in 2026 be like:
7
1,075
Ranking teams by how easy it is to guess their pitchers on a single pitch (i.e. how unique they are) It's *twice* as easy to pick out a Rays pitcher than a Marlins pitcher!
By training a machine learning model to predict pitchers we can find the most unique ones. Tyler Rogers comes in at #1, with a 𝟡𝟡.𝟡% confidence just from a single pitch!
1
24
5,841
By training a machine learning model to predict pitchers we can find the most unique ones. Tyler Rogers comes in at #1, with a 𝟡𝟡.𝟡% confidence just from a single pitch!
12
5
178
88,599
Yeah, the conclusion might've sounded too strong. But the point of stuff models is to predict out of sample (i.e. new pitchers). Is Misio great because his name is Misio or because he sits 101? Imo stuff models tend not to isolate the latter because the former adds predictiveness.
This is interesting, but not sure I agree with your conclusion. I as a human would be able to predict 100% just seeing 1 Misiorowski, Clase, Jansen etc. signature pitch. But if another pitcher replicated those exact same physics they’d be equally amazing.
2
5
4,057
It's moreso apparent for guys like Hendricks, Kershaw, and pitch-to-pitch relationship merchants. Was Brent Suter's changeup good in 2020-2021 because of, or despite the lack of velocity gap? I found that models like to avoid these questions by depending on pitcher identity and therefore understating the effects.
2
3
502
It's not a bad thing that models can guess pitchers per se. I think the interesting question is "how can we lessen the dependence on pitcher identities".
2
3
552
Are stuff models just memorizing pitchers? One fun test is to make the model predict pitcher names (instead of ERA). With @TJStats tjStuff , we get a 49% accuracy using a *single* pitch, and 73% using the whole season. (And tjStuff is less overfit than others!) So theoretically at most 73% of the predictiveness is banking on prior season results (RV, xwOBA, etc.)
6
3
54
17,825
If anyone is seeing this, I'm looking for some advice. I replicated Fangraphs' Stuff . I scraped every article, tweet, podcast from @enosarris and @choice_fielder to derive the same 19 features, same 5 stage architecture, same Catboost, same train/test split. .9 correlation to Stuff , just as predictive. Is it okay to open source this? Am I technically copying code here?
8
73
12,976
Hey @enosarris I've been trying to faithfully replicate Fangraphs Stuff model, my model is 2020-2024 trained without tuning, and it seems to do about as well. Was the Fangraphs model tuned on 2025? Or tuned at all?
1
50
15,604
One reason why xwOBAcon isn't predictive: Pitchers can *generally* nudge EV/LA, but can't stop balls randomly landing here. Predictive wOBAcon solutions basically overweight barrels (@tangotiger @MaxSportsStudio) which doesn't help pitchers who don't control swing power. 1/3
3
4
92
20,221
We can make a more reliable xwOBAcon by *literally* using next season xwOBAcon as target. Since every pitcher just has 1 average xwOBAcon per season, we can use Chain Rule to backpropagate this into a per-BIP loss. 2/3
1
23
1,871
This version stabilizes wayy faster than xwOBAcon (& more predictive) which makes it viable for pre/in-season projections. As fast as (EV, LA) combo and almost as fast as (K, BB). 3/3
2
18
1,687
I actually think posts like these are just engagement farming
Radar guns are a lie. 95 is really 91. Look it up. It’s @mlb marketing.
Community note
MLB average fastball velocity has increased from 91.4 mph in 2008 to 94.3 mph in 2026 using consistent release-point measurements; radar gun changes account for only 1-2 mph of difference, not the full claimed inflation. baseballamerica.com/stories/mlb-ve… baseballamerica.com/stories/the-me…
449
31 Dec 2025
After sprint simulations, I predicted Safe Probability based on how far the batter is from the base (when the 1st baseman catches the ball). But I wanted to remove fielders from the equation. 1/5
29 Dec 2025
I basically created Hustle using the batter's effort level running down the 1st base line. But how do you even measure "effort"? 1/10
1
2
789
31 Dec 2025
And the *batter* can decrease the probability of catching on base (by putting pressure on the 1st baseman). 4/5
1
2
344
31 Dec 2025
It's super interesting how most of the benefit of extra hustling comes from making plays closer, thereby increasing error probabilities. So hustling not only capitalizes on opposition mistakes, it directly promotes opposition mistakes! (Maybe we should use wOBA errors like RA9 instead of ERA? @tangotiger)
3
225
29 Dec 2025
I basically created Hustle using the batter's effort level running down the 1st base line. But how do you even measure "effort"? 1/10
6 Oct 2025
I made the finals of the 2025 #SMTDataChallenge! Paper: github.com/kimdoyo5/SMT2025/… Huge thanks to @Bbl_Astrophyscs and @BillyFryer42.
2
5
63
15,372
29 Dec 2025
We can use the simulations to derive Safe Probability Added (from putting more/less effort). Turns out hustling makes a big difference in outcome: 4.1% per 10% effort (minor league data), which roughly means if everyone gave 100% all the time, there would be an extra runner at 1st every 2 games or so. 9/10
1
1
5
645
29 Dec 2025
Thanks for reading! I missed a lot of details in this thread, so feel free to see the paper (and the code!) here: github.com/kimdoyo5/SMT2025 The data came from @SMTlive as part of the 2025 #SMTDataChallenge A lot of the inspiration came from @KaiFranke3, @BalchJackson, @JonahAnalytics, @kruth99, @justinochoi, @Pitching_Bot ( others not on Twitter!)
1
5
543