Earlier this summer, I informally introduced a pitch quality metric called Expected Run Value (xRV). This was in a piece for South Side Hit Pen, Sports Illustrated’s White Sox site. The article analyzed two unsung heroes of Chicago’s bullpen, Evan Marshall and Jimmy Cordero, as measured by various metrics including xRV. The basis of xRV started as a summer research project from Dr. Ryan Baranowski of Coe College, my classmate Jared White, and myself.
Over the last few months, pitch quality metrics have been a hot topic, so I wanted to formally outline my model, and how it can be useful in analyzing MLB pitchers. The model provides various application methods that I want to address and with the 2020 regular season wrapped up, I can now apply it to the entirety of the abbreviated season. The important aspects are the steps to build the model, an analysis of the model’s results, and finally how we can apply the results to pitcher evaluation and future decision making.
The process starts with Bill Petti’s convenient Savant scraper function within the baseballr package. This gives us the metrics on every pitch thrown from the 2020 regular season that were needed for the model. After getting rid of the random events that did not help us measure pitch quality, there were 244,681 remaining pitches.
Each pitch was evaluated based on its run value, which was calculated differently based on whether or not the ball was put in play. The run value of each strike is the difference between the run value of the resulting count and the run value of the resulting count had the pitch been called a ball, and vice versa for called balls. Back in 2014, this way of thinking by Dan Brooks and Harry Pavlidis spawned a run value by count matrix. From this, we can figure that in 2020, the most consequential strike for a pitcher was a strike on a full count (-.294 runs), and the least consequential was a strike on a 1-0 count (-.035 runs). Similarly for balls, the most consequential ball was also on a full count (.234 runs), and the least consequential was on an 0-2 count (.021 runs).
Why does this matter? Because we don’t want to penalize a pitcher for throwing a ball in an 0-2 as much as we do when he throws that same ball in a 3-2 count. This context matters as it relates to run expectancy.
Furthermore, after a ball is contacted, we should not care about the resulting outcome — whether or not a ball in-play was a double in the gap, or an out with an optimal defensive alignment. The same is true whether or not a batted ball ended up as a fly out to the warning track in a big park, or a home run in a smaller one. We therefore convert the batted ball’s xwOBA to a run value to measure quality of contact. This is done by subtracting the average wOBA of each count from the xwOBA value and then scaling by the 2020 wOBA scale.
To account for other factors that influence the effectiveness of a pitch, dummy variables were added for right handed batters, right handed pitchers, and fastballs, respectively. For handedness, the dummy variable accounts for the platoon advantage while also accounting for the measurements of pitches breaking in vs. breaking away from the hitter. The fastball dummy variable is also important due to the amount of heterogeneity in fastballs. We don’t want the model to project Kyle Hendricks’ fastball as a good changeup based on his “stuff” metrics, for example. Pitchers generally base their arsenals on the quality and characteristics of their fastball, so knowing that base is important. Therefore, the fastball dummy variable also accounts for timing. Hitters can be late against Hendricks’ fastball not because they can’t handle 87 mph, but because of his secondary offerings. We don’t want to trick the algorithm.
Finally, to keep the model agnostic outside of each pitch’s physical characteristics, we did not want to label the pitches, either in buckets or individually.
The machine learning algorithm used was a random forest that was trained on a 50,000 observation sample, roughly 25% of the data set after removing NAs. The attributes that the model was trained on were a combination of “stuff” metrics (velocity, raw spin rate, vertical movement, horizontal movement) and command metrics (vertical location and horizontal location), combined with the aforementioned dummy variables.
After confirming with a feature selection algorithm that all of the “stuff” and command attributes were significant at the 0.01 level, the model was ready to be run.
The top pitches of 2020 are listed below (min 50. pitches of type)
|1||Garrett Crochet||White Sox||4-Seam Fastball||70||-.044|
|9||Joe Kelly||Dodgers||Knuckle Curve||98||-.036|
While a couple of these names may catch you by surprise, it should be known by now how nasty Garrett Crochet’s fastball is. The pitch averaged 100.1 mph with a spin rate of 2501 RPMs and had the 8th highest rise of any 4-seamer in baseball. The stuff metrics were so good that the fact that he tended to throw the pitch right down the middle simply did not matter.
When he did happen to locate it well… good luck.
Valdez, Roe, Williams and Kelly’s pitches on this leaderboard likely aren’t a surprise to most, but how about 2-3-4?
Joely Rodriguez spent much of 2018 and all of 2019 playing in Japan after control issues in 2017 with multiple clubs. It appears as though he may have learned a really good changeup overseas. He consistently kept it below the knees of hitters, and he paired it with a fastball that was a top-50 pitch according to xRV.
The 88 4-seamers that he threw ranked 41st (97th percentile) among all pitches. The pitch’s velocity was in the 56th percentile, but its spin rate was only in the 10th percentile. He did, however, have impact horizontal movement that the algorithm seemingly liked, plus plenty of pitches in the shadow and chase zones. He did also have plenty of center-cut fastballs, but this is where his changeup could be helping his fastball, and this phenomenon is something that future iterations of this could try to account for.
Pierce Johnson is another reliever that returned from a stint in Japan with a new pitch. He changed the profile of his curve, making it slurvier with a velocity jump that followed his fastball. His curveball Whiff% was top-20 in baseball and the Padres have him throwing it more often than not.
After Johnson, it is very interesting that Tyler Alexander’s cutter is fourth on this list. It was his fourth most frequent pitch and his third most frequent fastball variation. His 4-seamer was good, but his sinker was not. Unlike the three pitchers ahead of him on this leaderboard, Alexander did not have the same kind of overall success (lots of blue ink on his savant page) — well, except for those 3 perfect innings. But, he only threw 7 cutters out of 39 pitches through those consecutive strikeouts.
He did throw Moustakas a nasty one to get his first strikeout of the streak.
He typically attacked lefties with a slider/cutter combo, and it worked. A .225/.244/.275 slash will play. Alexander has always been a command over “stuff” pitcher (more on this later), and he looks to have run into some fly ball bad luck this year, meaning he could be an arm to watch in 2021. His cutter was usually down and away from the middle of the zone with very infrequent meatballs. Even as a “command guy”, he was in the 74th percentile of horizontal break with his cutter, so not bad. The last thing on Alexander is that his cutter is yet another new pitch added this season.
Devin Williams’ “changeup” really isn’t a changeup. I don’t believe a 2850 RPM changeup is allowed, so it should probably be labeled as a screwball moving forward, as many have suggested. Nonetheless, its nastiness has been well documented this year, and it has quite possibly made him the best reliever in baseball.
You may think 50 pitches is too low of a minimum to confidently judge its qualities, so let’s bump it up by 100 and include more names.
|5||Aaron Nola||Phillies||Knuckle Curve||278||-.035|
|7||Caleb Ferguson||Dodgers||4-Seam Fastball||216||-.034|
|10||Kevin Gausman||Giants||4-Seam Fastball||434||-.033|
|12||Caleb Baragar||Giants||4-Seam Fastball||239||-.033|
|13||Nik Turley||Pirates||4-Seam Fastball||190||-.032|
|14||Trevor Bauer||Reds||4-Seam Fastball||437||-.032|
|15||Jake McGee||Dodgers||4-Seam Fastball||280||-.032|
|16||Liam Hendriks||Athletics||4-Seam Fastball||242||-.032|
|17||Jacob deGrom||Mets||4-Seam Fastball||469||-.032|
|18||Gerrit Cole||Yankees||4-Seam Fastball||539||-.031|
|20||Lance McCullers Jr.||Astros||Knuckle Curve||312||-.030|
|21||Brandon Woodruff||Brewers||4-Seam Fastball||362||-.030|
|22||Julio Urias||Dodgers||4-Seam Fastball||458||-.030|
This leaderboard can also be subsetted by pitch type.
|5||Colten Brewer||Red Sox||196||-.028|
|7||Nathan Eovaldi||Red Sox||218||-.027|
|10||Dallas Keuchel||White Sox||280||-.024|
|8||Hyun Jin Ryu||Blue Jays||292||-.024|
|5||Lance McCullers Jr.||Astros||312||-.030|
Another implication of this model is that we can look at a pitcher’s arsenal while weighing their xRV results based on pitch type usage. This should tell us more about who was the most effective pitcher of 2020 by optimally utilizing their best pitches. This could also serve to credit their teams for maximizing their potential output. For this leaderboard, I arbitrarily restricted total pitches to a minimum of 400. Only pitches that were thrown a minimum of 50 times were considered part of the pitcher’s arsenal.
Top 2020 Arsenals
|14||Nathan Eovaldi||Red Sox||685||-.025|
Takeaways? A third of these pitchers are either Dodgers or Rays, the best teams in their respective leagues through the 2020 regular season.
Framber Valdez had the best arsenal in baseball when considering his curveball (100th percentile), his sinker (93rd percentile), and his changeup (50th percentile). His curve was a massive whiff generator while hitters pounded the latter two pitches into the ground. His curveball had the 23rd highest Whiff% (min. 100 pitches) while his sinker had the 13th highest GB% (min. 50 BBEs), and his changeup had the 15th highest GB% (min. 15 BBEs).
The rest of the leaderboard consists of some NL Cy Young hopefuls, a couple of power relievers, and intriguingly, 5 pitchers who could see the open market this winter in Smyly, Bauer, Gausman, Richards and Morton.
The model as presented combines both “stuff” and command pitch attributes, but what if we want to try an isolate effective command? By giving every pitcher average stuff, we can see how well their command would still allow them to minimize xRV. While this method is probably more crude compared to other attempts to quantify command, holding “stuff” constant can still provide added context to some of the results seen in the above leaderboards.
For example, in an at-bat where a right handed pitcher is throwing to a right handed batter, each pitcher would be equipped with a 93.9 mph 4-seam fastball with a 2317 RPM spin rate. We can analyze pitchers that would do the best with “stuff” metrics in the 50% quantile.
Remember how Tyler Alexander’s unassuming cutter was the 4th best overall pitch? This leaderboard will help explain why:
Holding Stuff Metrics Constant – Effect of Command on xRV (By Pitch Type)
|4||Ross Stripling||Dodgers/Blue Jays||Slider||110||-.032|
|6||Brett Anderson||Brewers||Knuckle Curve||75||-.031|
|8||Mike Kickham||Red Sox||Slider||121||-.031|
|9||Jeffrey Springs||Red Sox||Slider||107||-.031|
By xRV, Alexander’s cutter was the best commanded pitch in baseball. With two-third of his cutters thrown against high-handed hitters, this is where he put them.
Giving Alexander an average cutter as a left-handed pitcher thrown to a right-handed hitter, it would be a 87 mph pitch with a 2259 RPM spin rate, which is not too dissimilar from his. However, it wouldn’t have 74th percentile horizontal movement, as his does. Even if all of its “stuff” metrics were average, it would still be a top-10 pitch in baseball.
In terms of pitcher arsenals, we can do the same exercise to isolate each pitcher’s command as an effect on xRV. The same qualifications apply when looking at arsenals — a pitcher must have thrown at least 400 pitches and each pitch type thrown at least 50 times is considered.
Holding Stuff Metrics Constant – Effect of Command on xRV (By Pitcher Arsenals)
|3||Nathan Eovaldi||Red Sox||685||-.022|
|7||Ross Stripling||Dodgers/Blue Jays||754||-.021|
|23||Travis Lakins Sr.||Orioles||427||-.019|
Here’s where Tomlin located his top-three pitches.
It would be a challenge to locate 95% of your pitches any better. The model showed Tomlin as having just about average “stuff” anyway, so this exercise did not have much effect on his arsenal’s overall xRV. He had a -0.29 xRV on his cutter that ranked 76th among all pitches (95th percentile), but his curveball and 4-seamer did not fare nearly as well. This underlies the typical importance of above average stuff even with elite command.
The results of the different aspects of the model pass the eye test, but I wanted to do my due diligence in testing its performance. To do so, I tested the model’s Root Mean Squared Error (RMSE), which was 0.063, a value that affirmed my confidence in the model. The RMSE is on the same scale as the dependent variable, which in this case is run value.
With a model that I’m confident in, I still realize that it’s not perfect.
A pitcher’s fastball can play up due to the quality of his secondary offerings. Moving forward, more specific feature selection for each pitch type could further capture the expected effectiveness each pitch. Other aspects like sequencing, release point and pitch tunneling undoubtedly play a role in a pitch’s quality, so an attempt to control for these factors in future model iterations is something I will need to look into. The key is to account for as many factors that make sense as possible given what we are trying to measure. Otherwise, overfitting is the issue. We don’t want a context-neutral model, but we also do not want to include every factor surrounding the pitch even though we can measure it.
As for the model’s application, my thoughts are that this could help a team optimize their pitcher’s pitch usage based on xRV. With a potentially more accurate measurement of expected performance, affecting the roles in which certain pitchers are used is another implication that could end up affecting a pitcher’s value. For each pitcher, there are more specific improvements within pitch design that teams will look to create, but this model should provide a more detailed overview of performance.
In the near future, I plan to apply a version of this model to NCAA D1 pitches through my continued work with BaseballCloud. This will be after controlling for the significant differences between the two levels.