What if I told you slash lines of .147/.253/.365, and .237/.343/.455 weren’t all that different? Everyone would prefer to have the hitter with the second one right? If you are a rational decision maker the answer to that question should most definitely be yes. The first one is Gary Sanchez’s 2020 season, and the second one is Gary Sanchez’s 2020 season if you adjust his stats based on his expected batting average on balls in play (BABIP). Sanchez really struggled in that category this year, but his expected BAIBIP makes him out to be significantly better.

The average MLB player’s BABIP is currently higher than it was fifty years ago. BABIP is calculated using the following equation:

Even with all the advanced batted ball data that teams have on where hitters tend to hit the ball against certain pitchers, they have not been able to stop them as much. This is primarily because of the increase in exit velocity and hard-hit percentage. A lot of the softly hit batted balls that would have likely been outs, and lowered BABIP, have transformed into more strikeouts.

Some of the factors that correlate well with BABIP are pretty obvious. For example, players who hit more ground balls and (especially) line drives will have a higher BABIP because those are more likely to be hits. In the last ten years, league batting average on line drives has been well over .600 while it has been close to .250 for ground balls, and around .200 for fly balls. There are other factors like exit velocity where, if you hit it harder it is obviously more likely to be hit. In 2019, balls hit below 90 mph were hits 22.5% of the time, while balls hit 90 mph or harder were hits 48.5% of the time. One quick, last one is speed. Fast players can beat out more ground balls and can force infielders to play closer in which decreases their range. As you would imagine, speed doesn’t have an exceptionally big impact, but over time it does show up.

Most would also think that pull percentage and opposite field percentage are also important, which is true, but going one step further can help even more. It turns out that ground ball while shifted percentage correlates the best with BABIP and is the most useful when trying to find someone’s true or expected BABIP.

One main point of a shift is to take more batted balls that would have been hits and turn them into outs. John Moore has recently written two articles on shift effectiveness for baseball cloud and concluded that shifts are undoubtably having an impact on BABIP for ground balls. When you look at fly balls however, there seems to be the opposite effect. Since 2017, when there has been any kind of outfield shift whatsoever, (strategic, three outfielders to one side of second base, four outfielders) hitters have a higher batting average on fly balls. Outfield shifts are a lot of times newer than infield shifts, and it really doesn’t seem like it has been perfected because many teams still aren’t using them.

It is important to look at what type of contact a hitter is making when they pull it, or hit it the other way, and how the defense is preparing and reacting to them rather than just looking at all batted balls in general without context.

So BABIP isn’t random, as some players will consistently be better throughout their career. It is not stable however and is a major reason for variation among player performance. There could easily be a player who has extra bloop hits, or some perfectly placed ground balls that can really boost their BABIP, which in turn boosts their other stats. This is why expected BABIP (xBABIP) is important. Finding a good way to predict BABIP can be really useful for many reasons, but the main one is that it can help you see through the noise and predict the future season BABIP better.

Mike Podhorzer of FanGraphs made an xBABIP model a couple years ago that was the first to have an adjusted r-squared value over .5. His model takes into account a lot of launch angle measures (line drive percentage, true fly ball percentage, infield fly ball percentage, and ground ball while shifted percentage), hard hit percentage, and speed. It was proven to be more predictive than just BABIP, which is what makes it useful. People have continued to build off of it, and the xBABIP measures are better than ever before. There were a few hitters this year that, in a shortened season, had absurd BABIP numbers that don’t seem to be sustainable and xBABIP would back that up.

In 2019, some of the hitters that really outperformed their expected BABIPs were Kris Bryant, Bryan Reynolds, and Yoan Moncada. They have all underperformed compared to their numbers from last year but are much closer to their expected numbers now except for Bryan Reynolds. Last year, he his BABIP was .058 higher than expected (.387 – .329), but this year he was one of the most unlucky hitters in the MLB. His BABIP was .073 lower than expected (.231 – .304).

Gary Sanchez and Edwin Encarnacion both had extremely disappointing 2020 seasons. They both struck out way more than league average (36% for Sanchez, 30% for Encarnacion), and hit under .160. They also both had horrible BABIP numbers. Sanchez finished the season at .159 and Encarnacion wasn’t far behind at .156. Since 1955, there have been just three worse BABIP seasons with at least as many at bats as they had. These guys aren’t the worst hitters of all time (Sanchez at least hits the ball very hard), and anytime you finish that poorly in anything with as large as a sample size as they had, you probably got unlucky. This was the case for both of these guys. Even though they classify as the exact kind of hitter that would be at a higher risk of a bad BABIP because of their speed and launch angle patterns (both have had BABIPSs under league average over their career) neither should be this bad. Using Podhorzer’s xBABIP equation, Encarnacion was expected to have a BABIP of .198 (still really bad), while Sanchez’s should have been around .293, which is just about league average. Even if you make the assumption that all of Sanchez’s robbed hits were singles, using his xBABIP he would be hitting .237 with an OPS above league average even with all the strikeouts.

You can see how big of a difference BABIP and xBABIP can make in season statistics. Some of the guys who were the most lucky with their BABIP were Marcell Ozuna (.093 points higher than expected), Mike Yastrzemski (.087 points higher than expected), Willy Adames (.079 points higher than expected), and Michael Conforto (.069 points higher than expected). All of those guys are having career years in terms of WOBA and batting average. You would expect their numbers to regress to the mean next year even if their talent and skill stay the same.

On the other hand, there are some others besides Sanchez and Bryan Reynolds who would were robbed of a lot of hits this year. The guys at the top of the list include Christian Yelich (.078 below expected), Nicholas Castellanos (.076 below expected), and Anthony Rizzo (.072 below expected). All three of those guys had career worst numbers in WOBA and batting average. Some other big names that were right behind them include Matt Olson, Jose Altuve, Cody Bellinger, Carlos Santana, and Javy Baez.

The difference between Sanchez’s slash lines is a really extreme example of what BABIP luck can do to a player, but it is still one of the biggest reasons for large variation in a player’s numbers. Almost all of the big names who had down years in 2020 can be found on unlucky part of this list. The fact that there was only a 60-game season just adds to the amount of randomness too. In many cases, this might be the year to think twice before hopping on or jumping off of a bandwagon.

## Leave a Reply