Making Pitching Decisions Based on Predictive Modeling

Baseball is all about matchups. It is called the most individual team sport for a reason, as pitchers and hitters are all alone to compete against one another. Those matchups create scoring, either by the batter winning multiple in a row, or getting the most out of it and putting a ball into the seats. Either way, in order to win games, teams must win as many of these battles as they can and have to make the best decisions to do this. The decisions made here are which pitcher to put in or when to take a pitcher out of the game. 

A big change has come in when to take starters out the past few years, the main one being that a starter struggles once he faces hitters for the third time. This has casual fans up in arms, saying that if a starter is throwing well, he should stay in the game. But analytics say that this shouldn’t necessarily be the case. Sure, there are some guys you can leave in, like the Jacob deGroms or the Gerrit Coles of the world, but a lot of the time, leaving a pitcher in very long could backfire. 

This happened in game 4 of the 2019 NLDS between the St. Louis Cardinals and the Atlanta Braves. Dakota Hudson started for the Cards and he looked sharp through 4 innings, only giving up 1 run with 3 hits. But then, the top of the 5th happened. Ozzie Albies came up to the plate for his third time and did this:

Albies had been having a solid season, with a 117 wRC+ and .354 wOBA. This smash took Hudson out of the ballgame and Tyler Webb struck out Freddie Freeman to end the inning. The Cardinals had come out of it down 1, they did end up winning the game, but this could’ve easily ended their season as they were down 2-1 in the series. 

If the Cardinals had brought in a reliever to see hitters for the first time, this whole situation may have been different and it very well could have been a 1 run lead for them. This led me to the question: How can teams objectively make decisions about when to pull a pitcher out of a ballgame and bring in the right pitcher? I wanted to find out. 


The best way to do this, I felt, was to build a model that could predict wOBA of each plate appearance, giving teams a way to estimate what the outcome of a plate appearance will be. The model that fit the best for this is a linear mixed model, it is very similar to linear regression, however, it is able to adjust coefficients based on different factors, like times through the order or platoon matchups. I used four variables to decide this, the mixed-effects being the two just mentioned and pitcher wOBA and batter wOBA being the normal effects. Here’s how the coefficients change based on the random factors:

These may be hard to comprehend at first, but this just shows how much wOBA is gained or lost based on certain factors. So when a left-handed pitcher faces a lefty hitter, wOBA goes down by – .023, favoring the pitcher. The opposite is true when a left-handed pitcher faces a righty hitter, at .014, favoring hitters. As you can see, in the 1st time through the order, pitchers are favored, but in the 2nd and 3rd, hitters gain an edge. What may be surprising to you is that pitchers regain an advantage in the 4th time through. This really isn’t true, typically if a pitcher goes through the order 4 times, they are either great, having a good game, or both. Small sample size affects this here and if more pitchers got to the 4th time, this would go back into the batters’ favor. 

Here’s how pitcher and hitter wOBA play a factor into the predictions:

It appears that they are about even in predicting performance, meaning that a great pitcher doesn’t have an advantage over a great hitter. This is interesting, as the saying of “good pitching beats good hitting” is a little bit disproven here. Now that we’ve covered the specifics of the model, let’s move on to what really matters: does it work?


I decided to get rid of data that spit out predicted wOBA values of more than .600, as most of these were plate appearances where a positon player was pitching or extremely small sample sizes by either player skewed the results. The reason I didn’t do this for the small outliers is that many of these matchups actually looked reasonable, Jeff Mathis against Justin Verlander in the first time through deserves to have a very low predicted wOBA, that value is .144.


Obviously, there’s no point in having a model if it doesn’t do anything, so we need to test this thing out! The way to do this was to partition the data, so I split 80% of the data into a training set and 20% into a test set. 

I started by creating a model based on my training set so that I would be able to use that on my test set. I ran it, and it gave me an RMSE of 0.534, but what does that exactly mean? This means that, on average, my model misses by .534 wOBA. That doesn’t sound very good, but remember, it is trying to predict set values of wOBA from 0 to 2.1, there’s no in between with them, so it will only get a perfect prediction if the model exactly predicts those values. So, it’d have to predict a .690 wOBA in a certain at bat to be perfect for walks, which wouldn’t happen often unless there’s an extremely small sample size.

I also needed to take into account the standard deviation of the wOBA in my dataset which was 0.539. If the RMSE was also 0.539, then that would mean that the model was just spitting out the mean wOBA. They’re very close together, which means this model definitely isn’t perfect, but it still can show some results. Also, with this model, I wanted to estimate results in at bats based on recent performance alone and not based on sequences that could happen in a certain plate appearance. Remember, this is trying to predict that appearance before it happens, so it wouldn’t make any sense to include something that happens during it.

Real Game Application

Let’s go back to the original plate appearance in the NLDS. As stated, Dakota Hudson really wasn’t looking bad up until that Albies homer; Dansby Swanson hit a ball 66.2 MPH and managed to get a double because of a bad hop off of 3rd base, Adam Duvall hit a hard ground ball that turned into an error, and Ronald Acuna Jr. hit a 84 MPH line out. However, it was only Swanson’s and Duvall’s second time through the order, which may have led to those results. Again, it was Acuna’s and Albies 3rd time through, which typically ends badly for the pitcher. 

Now how bad were those two matchups? Well, my model will tell you. Acuna had a predicted wOBA of .387 and Albies had a predicted wOBA of .366. Both are very high, this shows that a bad result was much more likely for the pitcher compared to the 1st time when it was .370 for Acuna and .349 for Albies. 

Now, what would happen if the Cardinals had made the decision to bring in a reliever, let’s go with their best reliever last year Giovanny Gallegos. It would be the Braves’ first time seeing him and he was very effective last year. This leads to predicted wOBAs of .299 for Acuna and .278 for Albies, that’s below average and WAY below the original predictions with Hudson, being roughly .09 wOBA lower for both. The Cardinals did end up bringing in Tyler Webb to end the inning, but the damage was already done. 

This information would be very useful for Mike Shildt in that situation as he could make an objective decision on how to manage his bullpen. 


As I was writing this, I was watching game 2 of the Dodgers and Brewers series. During this game, I spotted another example of this happening. Brandon Woodruff had started to stuggle, with 3 out of 4 batted balls in the 5th inning having exit velocities of at least 89 mph. That was only his second time through the order, he then had to face the leadoff man Mookie Betts for his third time, he proceeded to lace a double. 

It seems obvious, at least to me, that Woodruff should’ve been taken out before this at bat, but obviously that didn’t happen. Josh Hader came in afterwards to get the final out, but similar to the Hudson example, the damage was already done. So what if Hader was brought in to face Betts? Well, according to my model, there wouldn’t have been a huge difference, with Hader having a predicted wOBA of .318 against Betts while Woodruff had a .323. 

So why wasn’t there much of a difference? It goes back to platoon splits, Betts is a righty and Hader is a lefty, therefore, the matchup favors Betts due to this and that’s why it didn’t move much. Also, we can’t forget the fact that Betts is an extremely good hitter, so that definitely plays into this as well. 

How would this change if the Brewers had the most dominant right handed reliever in 2020, Devin Williams, available for them? In 2020, Williams had a 0.33 ERA and a .161 wOBA against. So if Betts versus Williams had happened, everything would favor the latter, it’d be the first time through the order, a righty-righty matchup, and the level at which he pitched this year. All of these factors led to a predicted wOBA in that matchup of .208 (!) which is much, much lower than either Hader or Woodruff. 

This not only just shows how good Williams was, but it shows the value of having a reliever like him in your bullpen. Instead of that game being only a 3-0 game, the Brewers could have been down by one while they were facing elimination.

Other Use

One other use of this model , to a lesser extent, besides when to pull a pitcher or deciding which pitcher to bring in is that a team could figure out if they want to pinch hit or not. This really wouldn’t be able to benefit a team too much since the hitters are seeing a pitcher for the first time and they’re cold, but it’s definitely a way it could be utilized. 


Like I said earlier, this model isn’t perfect. It does not exactly predict what event will happen after a plate appearance, but, to be fair, no model can. When I validated this, the RMSE I got was okay, but it was the best this model could do with the information that I wanted to use.

Another thing that should be said is that there are pitchers that are “opposite” platoon guys, so if they’re a lefty, the can pitch better or just as well against righties than they can against lefties. My model doesn’t really take into account this information as it takes all of the data together and creates an adjustment on the coefficients based on averages in the model. 

Finally, like I showed earlier, in the 4th time through the order, the model made the adjustment that the pitchers have the advantage, even though this isn’t true. So if a hitter sees a pitcher for the 4th time, it won’t really be an accurate estimation of how that plate appearance is going to go. 


This model gives more support to the claim that managers should be a little careful with their pitchers in the third time through the order as they’re more likely to be hit. I believe this model can definitely help teams make decisions on pitching changes and compare their options in order to come up with he most objective decision possible. 

Special thanks to Ethan Moore for giving me tips and pushing me in the right direction for this project!

All stats from Baseball Savant

Leave a Reply

Powered by

Up ↑