Improving Pythagorean Winning %

The Pythagorean Win Percentage for baseball was created by Bill James to correlate a team’s winning percentage to their expected win percentage. His initial equation was:

Over the years this has held up with slight changes in the exponent to get a higher correlation between expected wins and actual wins. Currently, on Baseball Reference the exponent is 1.83, so that is what I used as a starting point. I noticed that a lot of the time, a high variance in a team’s records in 1-run games led to lower correlations between expected and actual outcomes. To adjust to this, I added to Baseball Reference’s current equation with the term “1-run variance”.

The new equation is:

Here is an example of calculating the “1-run variance”:

Using the 2020 Tampa Bay Rays who ended with a .667 winning percentage that differs quite a bit from the .600 expected winning percentage on Baseball References.

As expected, this team’s overperforming for expected results is partnered with a very high winning percentage in 1-run games (.737).

To calculate the term “1-run game variance” I used the following steps:

Step 1) Start with the team’s winning % in 1-run games (0.737)

Step 2) Subtract by average winning % in 1-run games. (0.737-0.500) = 0.237

Step 3) Take the team’s total number of 1-run games and divide by total games. (19/60) = 0.317

Step 4) Multiply the results of Step 2 and Step 3 together. (0.237*0.317) = 0.075

Step 5) Add the result of Step 4 to the above expected winning % (0.600+0.075) = 0.675

Then you do this for every team and compare the results to see how this new calculation compares to the current calculation.

Looking at the past 10 full seasons as well as last season and this season, I tested my changed equation to see how the correlation differed from the current correlation.

Other things affect why expected winning % and actual winning % won’t be perfectly correlated, but I am confident that adding in the factor for 1-run games helps get higher correlations between predictions and results.

Of the years I looked at, the current equation has a correlation (R^2) of 0.875 on average, but with my change, the correlation goes up by 0.071 to 0.946.

Many people have said that the results of one-run games are mostly random, but I do not believe that one bit. I believe better teams will find a way to win more 1-run games than worse teams. Whether it is a good bullpen holding the lead in a close game, or a good offense coming from behind to win the close ball game. Interestingly, Bill James concluded that “winning or losing close games is luck”, even though he did add that “it would be (his) opinion that it is probably not all luck”. To push back on this, the data shows clearly that it is not all luck, and that luck has less to do with it than many think.

The “Current” column is each team’s current winning percentage next to their MLB ranking. The “Expected” column is the Pythagorean win % for each team without my updates. The Updated Expected is the Win % with my updates. The green boxes are predictions that are within 0.010 of the current percentages, and the red boxes are those over 0.030 of current. The “Expected Column has just 3 green with 14 red boxes, while my column has 9 green boxes and 5 reds.

This added calculation to Pythagorean Win % is useful in getting closer estimates to the current win percentages of teams. This can be used so we don’t just call any difference between expected and actual “luck”.