In 1994, Tony Gwynn led the major leagues with a .394 batting average – the closest mark to .400 since Ted Williams. It was an incredible season, but there is a minor asterisk attached: he only had 419 at bats. Thanks to a strike shortened season, Gwynn never got a chance to sneak his batting average over .400 nor did it have a chance to regress into the .380s.
While Gwynn’s 1994 batting average was the highest since Williams, George Brett’s in 1980 was probably more impressive. He carried a batting average of .400 as late as game 148, but ultimately fell short with a season mark of .390. Had the 1980 season been cut off like 1994, Brett would have been the last .400 hitter.
What is the purpose of this miniature history lesson? To show how sample size goes into record breaking. The qualification for rate statistics is 3.1 plate appearances per team game for batters and one inning pitched per team game for pitchers. Longer seasons breed counting statistic absurdity (home runs, RBIs, runs) while shorter seasons breed rate statistic absurdity (batting average, ERA, slugging percentage).
If Gwynn had broken the .400 batting average barrier, there would have been an asterisk. As impressive as it was, he would have benefited from abnormal circumstances with his team playing only 117 games. From 1992 to 1996, Gwynn batted .357. Treating this as his true skill level and his 419 at bats in 1994 as a binomial, there was a 6.5% chance his batting average would be .394 or higher. That percentage is low, but by no means out of the realm of possibility. It is entirely possible he was just a phenomenal player having a very lucky season.
That was a 117 game season. 2020 is currently slated for seasons barely half that length. Whatever extra randomness was exhibited in the shortened 1994 season will be multiplied in the current plan. Fortunately – for the integrity of the record books and statistical landmarks baseball holds dear – the 2020 brand of baseball is not prone to breaking records people care about.
With regard to batting averages, expected batting averages (xBA) indicate the TOP hitters have skill levels between .300 and .330 – not quite in the .350s like Gwynn. There are simply more strikeouts today then there were in 1994. With the expanded use of the bullpen, there are simply fewer balls in play than there were in Gwynn’s time. Balls in play are volatile events, strikeouts are not. Assuming xBA represents skill and treating at bats as binomials, there was roughly a 0.012% chance of a .400 hitter in 2019. In 2020, I estimate that percentage at 4.01% – more than 300 times more likely!
One key assumption being made is that the batting average skill level across the league is not changing from 2019 to 2020. Usually that would be relatively safe, but considering players are dropping like flies due to COVID, it may not be. The top hitters have between a 0.01% and 0.08% chance of reaching .400, so having someone like Mike Trout sit out dwindles the chances. However, the chances of a .400 hitter increase if the skill level of pitching decreases more than the skill level of batting. If a few elite pitchers sit and the best hitters do not, the cumulative chances could creep upward.
There is an off chance that we will see the sacred .400 mark eclipsed, but this is simply not the era for it. As mentioned, the modern game is more attuned to a reliever-heavy game prone to walks, strikeouts, and homeruns. As done by Gerrit Cole last year, the record for strikeouts per nine innings (K/9) is a strong breakable record candidate. Justin Verlander also quietly had one of the best walks and hits per inning pitched (WHIP) in history. While the conditions of 2020 make these and many other records unusually breakable, they do not hold the same weight as well known statistics.
2020 has incredible potential for producing unbreakable records. Small sample size accentuating a historical trend is the perfect storm for statistical absurdity. In the current climate, that lies with starting pitchers accumulating strikeouts at high rates and relievers… relieving?
Although relief pitchers are evaluated off the same statistics as starting pitchers, there are not similar qualification lines. No reliever meets the inning pitched per team game line, so they are simply excluded from the history books there. It makes sense – for the longest time they were an afterthought. It also would not make sense to shift the pitcher qualification line far down enough that Kirby Yates wins the ERA title. As a result of growing into a game not built for them, relievers are stuck chasing the counting statistics of holds and saves to get into the record books. Despite the potential for a reliever to allow only one run this year, it would break no record.
With that, 2020 will likely pass with no huge records broken and no landmarks established. The numbers will no doubt be wild – somebody will probably bat .360+ – but it will not give the thrill of the Sosa-McGwire home run chase. We may see batting averages and ERAs flirt with historical levels, but they will regress by the end of September.
Gerrit Cole could flirt with an absurd 15.0 K/9, but the metric has not been grandfathered in as important – to most it would seem contextless. Baseball people like counting: 73 home runs, 130 steals, 262 hits. There are just fewer rate statistics that catch the public’s eye the same way.
Now that I am done ranting, let me inject a bit more optimism back into the outlook of the 2020 season. As is obvious at this point, this will be an abnormal season. Maybe starting pitchers who would normally pace themselves, now decide the season is short enough to go all out and suddenly they are able to sustain an unexpectedly low ERA. This is unlikely given the current climate that encourages max-intent pitching as is, but the possibility is there.
One assumption that could prove faulty is that treating events as binomials or multinomials is fair. Many of the books I have read have safely operated with this assumption and it works well – at least for full length seasons. However, this fails to account for the cat-and-mouse type game the batter and pitcher play. If a player has significantly changed since last playing, their opposition is likely playing into an incorrect scouting report. This can be observed with league newcomers like Aaron Judge in 2017. Teams struggle getting the sample sizes they need to learn how to attack him, then he struggles needing to figure out how to counter-adjust. In an abridged season, this natural sequence of events may not have time to play out and a player could over-perform not off a lack of regression, but a lack of scouting.
If baseball does not get shut down due to COVID, this will definitely be the weirdest season on record. For many, that will mean not going to games. For those who enjoy the numbers intertwined with the games’ history, it will mean enjoying the most absurd case study in MLB history. Enjoy the randomness!