Dec 4, 2006

This question was posed as part of a discussion on the Sons of Sam Horn website . Eric Van asked the question:

What we want to know is whether there's a correlation between standard deviation of the principal hitters, and two things: Actual Runs - Calculated Runs according to EqA, CR, RC or other formula Actual Wins - Pyth Wins IOW, do consistent lineups score more or less runs than expected? Given however many runs they score, do consistent lineups win more or less games than expected?

I will attempt to address the first question: does the actual number of runs scored by a team depend significantly on whether the hitters are relatively uniform in quality, or vary greatly from batter to batter?

For the impatient among you, my conclusion is "No."

The basic data comes from baseball-reference.com's website . I used their list of the "main starting batters" for all American League teams during the period 1995-2006. For example, the 2006 Red Sox are listed like so:

Pos Player Ag G AB R H 2B 3B HR RBI BB SO BA OBP SLG SB CS GDP HBP SH SF IBB OPS+ ---+-------------------+--+----+----+----+----+---+--+---+----+---+----+-----+-----+-----+---+---+---+---+---+---+---+----+ C #Jason Varitek 34 103 365 46 87 19 2 12 55 46 87 .238 .325 .400 1 2 10 2 1 2 7 85 1B Kevin Youkilis 27 147 569 100 159 42 2 13 72 91 120 .279 .381 .429 5 2 12 9 0 11 0 108 2B Mark Loretta 34 155 635 75 181 33 0 5 59 49 63 .285 .345 .361 4 1 16 12 2 5 1 82 3B Mike Lowell 32 153 573 79 163 47 1 20 80 47 61 .284 .339 .475 2 2 22 4 0 7 5 106 SS Alex Gonzalez 29 111 388 48 99 24 2 9 50 22 67 .255 .299 .397 1 0 6 5 7 7 1 77 LF Manny Ramirez 34 130 449 79 144 27 1 35 102 100 102 .321 .439 .619 0 1 13 1 0 8 16 168 CF #Coco Crisp 26 105 413 58 109 22 2 8 36 31 67 .264 .317 .385 22 4 5 1 7 0 1 80 RF *Trot Nixon 32 114 381 59 102 24 0 8 52 60 56 .268 .373 .394 0 2 10 7 0 5 1 98 DH *David Ortiz 30 151 558 115 160 29 2 54 137 119 117 .287 .413 .636 1 0 12 4 0 5 23 164 ---+-------------------+--+----+----+----+----+---+--+---+----+---+----+-----+-----+-----+---+---+---+---+---+---+---+----+

I also grabbed the overall team stats, using all players, for each team.

By the way, the most time-consuming portion of this work was dealing with the changing name of the Angels team: they went from the California Angels to the Anaheim Angels to the Los Angeles Angels during this brief span. Sigh.

I will use simple OPS, defined as the sum of On-base-Percentage (OBP) and Slugging Percentage (SLG), as a measure of the quality of a batter. It is well known that there is a very good correlation between the OPS of a team and the number of runs it scores over the course of a season. OPS isn't the best predictor of runs, but it's pretty good, and it's very easy to compute. Let's just look at how well it correlates with runs scored. In the graphs below, I'll use the overall OPS of all members of a team.

The linear fit is

Runs = -782.46 + 2058.216*(overall OPS) scatter from fit = 35 runs uncertainty in slope = +/- 140 runs/OPS correlation coefficent r = 0.914

It's clear from the graph that some teams score more runs than this model predicts, and other score fewer. The question is -- are those differences from the basic model due to the inhomogeneity of the lineup? Suppose that they were; then we might expect lineups with a mixture of great and terrible hitters to score more runs (or fewer runs) than the model predicts. In that case, most of the points lying above (or below) the line would represent teams with a mixture of good and bad hitters.

One way to test this idea is to subtract from a team's ACTUAL number of runs scored the number of runs predicted based on its OPS. We can then plot this residual, in the sense

residual = (actual runs scored) - (linear model using overall OPS)as a function of the inhomogeneity of the main starting 9 batters. If the hypothesis is correct, we will see a correlation in the graph.

To quantify the inhomogeneity of the main 9 starters, I simply computed the standard deviation from their mean OPS during the season. In other words, using only the stats of the main 9 starting batters, I computed

sum ( OPS of each batter) mean OPS = --------------------------- 9 2 2 ( sum(OPS of each batter) - 9*(mean OPS) ) stdev OPS = sqrt( --------------------------------------------- ) ( (9 - 1) )

For example, in the case of the 2006 Red Sox, the main 9 starters had a mean OPS = 0.814 and a standard deviation of 0.143. By comparison, the overall team OPS was 0.786, so the bench players were (as a group) rightly kept on the bench.

So -- do the residuals in runs scored over a season correlate with the inhomogeneity of the main 9 batters?

In a word, no: there is no correlation here. A formal unweighted linear fit to the residuals as a function of stdev of starting 9's OPS is

residuals = -1.811 + 17.713*(stdev OPS of starting 9) scatter = 34 runs uncertainty in slope = +/- 182 runs/(stdev OPS) correlation coefficient r = 0.015

It might be argued that I erred in the first step,
when I found the linear relationship between runs scored
and a team's overall OPS -- using all the players.
Perhaps, one might say, I ought to have used only the
OPS of the main 9 starters, and then examined the residuals
from *that* linear model.

Okay, let's try that. I'll make a linear fit to the total number of runs scored by each team as a function of the OPS of the main 9 starters only.

The correlation is weaker here, as one would expect, since we are mixing the performance of all players with the statistics of just a subset. We have

Runs = -522.20 + 1690.89*(OPS of 9 starters) scatter from fit = 45 runs (vs. 35) uncertainty in slope = +/- 160 runs/OPS (vs. 140) correlation coefficent r = 0.85 (vs. 0.91)

If we continue the procedure, and look at the residuals
from *this* relationship
as a function of the standard deviation in the starting 9's OPS,
we find

The unweighted linear fit shows a slope which is visible to the eye in this case, but it is of marginal significance.

residuals = -27.59 + 272.36*(stdev OPS of starting 9) scatter = 44 runs uncertainty in slope = +/- 232 runs/(stdev OPS) correlation coefficient r = 0.177

Formally speaking, there is a very weak trend for teams with inhomogeneous lineups -- mixtures of great and terrible hitters with an average OPS of, say, 0.800 -- to score slightly more runs than a lineup of identical batters, all with an OPS of 0.800 each. However, I place little confidence in this result.

For the curious, I include here a few "top 10" and "bottom 10" lists generated during the course of this study. First, let's pick the teams which deviated most in a bad way -- underperformed, in other words -- from the linear model using overall team OPS.

# residual residual # mean stdev mean starters all #-------------------------------------------------------------------------------------- 1995 CLE starters ops 0.817 0.123 overall ops 0.839 runs 840 -19.3 -104.4 1995 BAL starters ops 0.820 0.128 overall ops 0.768 runs 704 -160.3 -94.2 1995 BOS starters ops 0.860 0.104 overall ops 0.808 runs 791 -141.0 -89.6 1995 TOR starters ops 0.737 0.082 overall ops 0.735 runs 642 -82.0 -88.3 1995 MIN starters ops 0.749 0.105 overall ops 0.760 runs 703 -41.3 -78.8 1995 CHW starters ops 0.784 0.168 overall ops 0.785 runs 755 -48.5 -78.2 2006 TOR starters ops 0.805 0.099 overall ops 0.811 runs 809 -30.0 -77.8 1995 KCR starters ops 0.740 0.115 overall ops 0.721 runs 629 -100.1 -72.5 1995 NYY starters ops 0.748 0.083 overall ops 0.777 runs 749 6.4 -67.8 1995 TEX starters ops 0.752 0.125 overall ops 0.746 runs 691 -58.3 -62.0

I think it no coincidence that almost all of these teams played in 1995. There was a jump in the average scoring of teams from the 1994 season to the 1995 season, as I show in an earlier study , which might signal some change in equipment or conditions.

Let's look at the teams which overperformed the most relative to the prediction of their overall OPS:

# residual residual # mean stdev mean starters all #-------------------------------------------------------------------------------------- 1997 ANA starters ops 0.745 0.117 overall ops 0.760 runs 829 91.5 47.2 2003 KCR starters ops 0.784 0.083 overall ops 0.763 runs 836 32.5 48.0 2001 SEA starters ops 0.810 0.108 overall ops 0.805 runs 927 79.6 52.6 2001 OAK starters ops 0.830 0.137 overall ops 0.784 runs 884 2.8 52.8 1998 NYY starters ops 0.850 0.080 overall ops 0.822 runs 965 49.9 55.6 1996 MIN starters ops 0.802 0.087 overall ops 0.778 runs 877 43.1 58.2 2000 CHW starters ops 0.818 0.144 overall ops 0.826 runs 978 117.1 60.4 1999 CLE starters ops 0.873 0.127 overall ops 0.839 runs 1009 55.1 64.6 2000 KCR starters ops 0.793 0.116 overall ops 0.773 runs 879 60.3 70.5 1998 OAK starters ops 0.747 0.092 overall ops 0.731 runs 804 63.1 81.9

Hmmm. No obvious pattern here. The fact that no team (e.g. Cleveland Indians) repeats during a stretch of consecutive years suggests to me -- again -- that there is no real correlation between residuals from predicted runs scored and homogeneity of batters.

We can make similar tables using the residuals from a linear fit to the OPS of the starting 9 players only. Here are the underperformers:

# residual residual # mean stdev mean starters all #-------------------------------------------------------------------------------------- 1995 BAL starters ops 0.820 0.128 overall ops 0.768 runs 704 -160.3 -94.2 1995 BOS starters ops 0.860 0.104 overall ops 0.808 runs 791 -141.0 -89.6 1995 SEA starters ops 0.845 0.146 overall ops 0.795 runs 796 -110.6 -57.8 1995 KCR starters ops 0.740 0.115 overall ops 0.721 runs 629 -100.1 -72.5 2003 DET starters ops 0.711 0.098 overall ops 0.675 runs 591 -89.0 -15.8 2005 DET starters ops 0.786 0.070 overall ops 0.749 runs 723 -83.8 -36.1 1995 TOR starters ops 0.737 0.082 overall ops 0.735 runs 642 -82.0 -88.3 2006 TBD starters ops 0.763 0.085 overall ops 0.734 runs 689 -78.9 -39.3 2000 BAL starters ops 0.824 0.054 overall ops 0.776 runs 794 -77.1 -20.7 2005 BAL starters ops 0.784 0.080 overall ops 0.761 runs 729 -74.5 -54.8

Again, the 1995 season features many of the underperformers. The recent Detroit Tigers appear twice -- does that mean something? I don't know.

Finally, here are the top 10 overperforming teams by this metric.

# residual residual # mean stdev mean starters all #-------------------------------------------------------------------------------------- 2004 BOS starters ops 0.831 0.133 overall ops 0.832 runs 949 66.1 19.0 2003 OAK starters ops 0.723 0.107 overall ops 0.744 runs 768 67.7 19.1 2000 SEA starters ops 0.803 0.146 overall ops 0.803 runs 907 71.4 36.7 1996 CHW starters ops 0.797 0.151 overall ops 0.807 runs 898 72.6 19.5 2001 SEA starters ops 0.810 0.108 overall ops 0.805 runs 927 79.6 52.6 1996 DET starters ops 0.724 0.133 overall ops 0.742 runs 783 81.0 38.3 1997 ANA starters ops 0.745 0.117 overall ops 0.760 runs 829 91.5 47.2 2000 OAK starters ops 0.805 0.141 overall ops 0.818 runs 947 108.0 45.8 1998 TEX starters ops 0.796 0.118 overall ops 0.818 runs 940 116.3 38.8 2000 CHW starters ops 0.818 0.144 overall ops 0.826 runs 978 117.1 60.4

As before, the lack of repeating teams suggests that there is no real effect.