Does the uniformity of a lineup affect its scoring significantly?

Michael Richmond
Dec 4, 2006

This question was posed as part of a discussion on the Sons of Sam Horn website . Eric Van asked the question:

What we want to know is whether there's a correlation between standard deviation of the principal hitters, and two things: Actual Runs - Calculated Runs according to EqA, CR, RC or other formula Actual Wins - Pyth Wins IOW, do consistent lineups score more or less runs than expected? Given however many runs they score, do consistent lineups win more or less games than expected?

I will attempt to address the first question: does the actual number of runs scored by a team depend significantly on whether the hitters are relatively uniform in quality, or vary greatly from batter to batter?

For the impatient among you, my conclusion is "No."

The basic data comes from baseball-reference.com's website . I used their list of the "main starting batters" for all American League teams during the period 1995-2006. For example, the 2006 Red Sox are listed like so:

Pos Player              Ag   G   AB    R    H   2B 3B  HR  RBI  BB  SO   BA    OBP   SLG  SB  CS  GDP HBP  SH  SF IBB  OPS+
---+-------------------+--+----+----+----+----+---+--+---+----+---+----+-----+-----+-----+---+---+---+---+---+---+---+----+
C  #Jason Varitek       34  103  365   46   87  19  2  12   55  46   87  .238  .325  .400   1   2  10   2   1   2   7   85
1B  Kevin Youkilis      27  147  569  100  159  42  2  13   72  91  120  .279  .381  .429   5   2  12   9   0  11   0  108
2B  Mark Loretta        34  155  635   75  181  33  0   5   59  49   63  .285  .345  .361   4   1  16  12   2   5   1   82
3B  Mike Lowell         32  153  573   79  163  47  1  20   80  47   61  .284  .339  .475   2   2  22   4   0   7   5  106
SS  Alex Gonzalez       29  111  388   48   99  24  2   9   50  22   67  .255  .299  .397   1   0   6   5   7   7   1   77
LF  Manny Ramirez       34  130  449   79  144  27  1  35  102 100  102  .321  .439  .619   0   1  13   1   0   8  16  168
CF #Coco Crisp          26  105  413   58  109  22  2   8   36  31   67  .264  .317  .385  22   4   5   1   7   0   1   80
RF *Trot Nixon          32  114  381   59  102  24  0   8   52  60   56  .268  .373  .394   0   2  10   7   0   5   1   98
DH *David Ortiz         30  151  558  115  160  29  2  54  137 119  117  .287  .413  .636   1   0  12   4   0   5  23  164
---+-------------------+--+----+----+----+----+---+--+---+----+---+----+-----+-----+-----+---+---+---+---+---+---+---+----+

I also grabbed the overall team stats, using all players, for each team.

By the way, the most time-consuming portion of this work was dealing with the changing name of the Angels team: they went from the California Angels to the Anaheim Angels to the Los Angeles Angels during this brief span. Sigh.

I will use simple OPS, defined as the sum of On-base-Percentage (OBP) and Slugging Percentage (SLG), as a measure of the quality of a batter. It is well known that there is a very good correlation between the OPS of a team and the number of runs it scores over the course of a season. OPS isn't the best predictor of runs, but it's pretty good, and it's very easy to compute. Let's just look at how well it correlates with runs scored. In the graphs below, I'll use the overall OPS of all members of a team.

The linear fit is

     Runs = -782.46 + 2058.216*(overall OPS)
                   
                scatter from fit = 35 runs
                uncertainty in slope = +/- 140 runs/OPS
                correlation coefficent  r = 0.914

It's clear from the graph that some teams score more runs than this model predicts, and other score fewer. The question is -- are those differences from the basic model due to the inhomogeneity of the lineup? Suppose that they were; then we might expect lineups with a mixture of great and terrible hitters to score more runs (or fewer runs) than the model predicts. In that case, most of the points lying above (or below) the line would represent teams with a mixture of good and bad hitters.

One way to test this idea is to subtract from a team's ACTUAL number of runs scored the number of runs predicted based on its OPS. We can then plot this residual, in the sense


   residual  =  (actual runs scored) - (linear model using overall OPS)

as a function of the inhomogeneity of the main starting 9 batters. If the hypothesis is correct, we will see a correlation in the graph.

To quantify the inhomogeneity of the main 9 starters, I simply computed the standard deviation from their mean OPS during the season. In other words, using only the stats of the main 9 starting batters, I computed

                    sum ( OPS of each batter)
      mean OPS  =  ---------------------------
                             9
                                                2                 2
                      (  sum(OPS of each batter)   -  9*(mean OPS)    )
     stdev OPS  = sqrt( --------------------------------------------- )
                      (                   (9 - 1)                     )

For example, in the case of the 2006 Red Sox, the main 9 starters had a mean OPS = 0.814 and a standard deviation of 0.143. By comparison, the overall team OPS was 0.786, so the bench players were (as a group) rightly kept on the bench.

So -- do the residuals in runs scored over a season correlate with the inhomogeneity of the main 9 batters?

In a word, no: there is no correlation here. A formal unweighted linear fit to the residuals as a function of stdev of starting 9's OPS is

       residuals  =  -1.811 + 17.713*(stdev OPS of starting 9)

                 scatter = 34 runs
                 uncertainty in slope = +/- 182 runs/(stdev OPS)
                 correlation coefficient  r = 0.015

It might be argued that I erred in the first step, when I found the linear relationship between runs scored and a team's overall OPS -- using all the players. Perhaps, one might say, I ought to have used only the OPS of the main 9 starters, and then examined the residuals from that linear model.

Okay, let's try that. I'll make a linear fit to the total number of runs scored by each team as a function of the OPS of the main 9 starters only.

The correlation is weaker here, as one would expect, since we are mixing the performance of all players with the statistics of just a subset. We have

     Runs = -522.20 + 1690.89*(OPS of 9 starters)
                   
                scatter from fit = 45 runs                   (vs. 35)
                uncertainty in slope = +/- 160 runs/OPS      (vs. 140)
                correlation coefficent  r = 0.85             (vs. 0.91)

If we continue the procedure, and look at the residuals from this relationship as a function of the standard deviation in the starting 9's OPS, we find

The unweighted linear fit shows a slope which is visible to the eye in this case, but it is of marginal significance.

       residuals  =  -27.59 + 272.36*(stdev OPS of starting 9)

                 scatter = 44 runs          
                 uncertainty in slope = +/- 232 runs/(stdev OPS)
                 correlation coefficient  r = 0.177

Formally speaking, there is a very weak trend for teams with inhomogeneous lineups -- mixtures of great and terrible hitters with an average OPS of, say, 0.800 -- to score slightly more runs than a lineup of identical batters, all with an OPS of 0.800 each. However, I place little confidence in this result.

For the curious, I include here a few "top 10" and "bottom 10" lists generated during the course of this study. First, let's pick the teams which deviated most in a bad way -- underperformed, in other words -- from the linear model using overall team OPS.

#                                                                       residual  residual
#                          mean  stdev                mean              starters  all
#--------------------------------------------------------------------------------------
 1995  CLE  starters ops  0.817  0.123  overall ops  0.839  runs   840   -19.3  -104.4 
 1995  BAL  starters ops  0.820  0.128  overall ops  0.768  runs   704  -160.3   -94.2 
 1995  BOS  starters ops  0.860  0.104  overall ops  0.808  runs   791  -141.0   -89.6 
 1995  TOR  starters ops  0.737  0.082  overall ops  0.735  runs   642   -82.0   -88.3 
 1995  MIN  starters ops  0.749  0.105  overall ops  0.760  runs   703   -41.3   -78.8 
 1995  CHW  starters ops  0.784  0.168  overall ops  0.785  runs   755   -48.5   -78.2 
 2006  TOR  starters ops  0.805  0.099  overall ops  0.811  runs   809   -30.0   -77.8 
 1995  KCR  starters ops  0.740  0.115  overall ops  0.721  runs   629  -100.1   -72.5 
 1995  NYY  starters ops  0.748  0.083  overall ops  0.777  runs   749     6.4   -67.8 
 1995  TEX  starters ops  0.752  0.125  overall ops  0.746  runs   691   -58.3   -62.0

I think it no coincidence that almost all of these teams played in 1995. There was a jump in the average scoring of teams from the 1994 season to the 1995 season, as I show in an earlier study , which might signal some change in equipment or conditions.

Let's look at the teams which overperformed the most relative to the prediction of their overall OPS:

#                                                                       residual  residual
#                          mean  stdev                mean              starters  all
#--------------------------------------------------------------------------------------
 1997  ANA  starters ops  0.745  0.117  overall ops  0.760  runs   829    91.5    47.2 
 2003  KCR  starters ops  0.784  0.083  overall ops  0.763  runs   836    32.5    48.0 
 2001  SEA  starters ops  0.810  0.108  overall ops  0.805  runs   927    79.6    52.6 
 2001  OAK  starters ops  0.830  0.137  overall ops  0.784  runs   884     2.8    52.8 
 1998  NYY  starters ops  0.850  0.080  overall ops  0.822  runs   965    49.9    55.6 
 1996  MIN  starters ops  0.802  0.087  overall ops  0.778  runs   877    43.1    58.2 
 2000  CHW  starters ops  0.818  0.144  overall ops  0.826  runs   978   117.1    60.4 
 1999  CLE  starters ops  0.873  0.127  overall ops  0.839  runs  1009    55.1    64.6 
 2000  KCR  starters ops  0.793  0.116  overall ops  0.773  runs   879    60.3    70.5 
 1998  OAK  starters ops  0.747  0.092  overall ops  0.731  runs   804    63.1    81.9

Hmmm. No obvious pattern here. The fact that no team (e.g. Cleveland Indians) repeats during a stretch of consecutive years suggests to me -- again -- that there is no real correlation between residuals from predicted runs scored and homogeneity of batters.

We can make similar tables using the residuals from a linear fit to the OPS of the starting 9 players only. Here are the underperformers:

#                                                                       residual  residual
#                          mean  stdev                mean              starters  all
#--------------------------------------------------------------------------------------
 1995  BAL  starters ops  0.820  0.128  overall ops  0.768  runs   704  -160.3   -94.2 
 1995  BOS  starters ops  0.860  0.104  overall ops  0.808  runs   791  -141.0   -89.6 
 1995  SEA  starters ops  0.845  0.146  overall ops  0.795  runs   796  -110.6   -57.8 
 1995  KCR  starters ops  0.740  0.115  overall ops  0.721  runs   629  -100.1   -72.5 
 2003  DET  starters ops  0.711  0.098  overall ops  0.675  runs   591   -89.0   -15.8 
 2005  DET  starters ops  0.786  0.070  overall ops  0.749  runs   723   -83.8   -36.1 
 1995  TOR  starters ops  0.737  0.082  overall ops  0.735  runs   642   -82.0   -88.3 
 2006  TBD  starters ops  0.763  0.085  overall ops  0.734  runs   689   -78.9   -39.3 
 2000  BAL  starters ops  0.824  0.054  overall ops  0.776  runs   794   -77.1   -20.7 
 2005  BAL  starters ops  0.784  0.080  overall ops  0.761  runs   729   -74.5   -54.8

Again, the 1995 season features many of the underperformers. The recent Detroit Tigers appear twice -- does that mean something? I don't know.

Finally, here are the top 10 overperforming teams by this metric.

#                                                                       residual  residual
#                          mean  stdev                mean              starters  all
#--------------------------------------------------------------------------------------
 2004  BOS  starters ops  0.831  0.133  overall ops  0.832  runs   949    66.1    19.0 
 2003  OAK  starters ops  0.723  0.107  overall ops  0.744  runs   768    67.7    19.1 
 2000  SEA  starters ops  0.803  0.146  overall ops  0.803  runs   907    71.4    36.7 
 1996  CHW  starters ops  0.797  0.151  overall ops  0.807  runs   898    72.6    19.5 
 2001  SEA  starters ops  0.810  0.108  overall ops  0.805  runs   927    79.6    52.6 
 1996  DET  starters ops  0.724  0.133  overall ops  0.742  runs   783    81.0    38.3 
 1997  ANA  starters ops  0.745  0.117  overall ops  0.760  runs   829    91.5    47.2 
 2000  OAK  starters ops  0.805  0.141  overall ops  0.818  runs   947   108.0    45.8 
 1998  TEX  starters ops  0.796  0.118  overall ops  0.818  runs   940   116.3    38.8 
 2000  CHW  starters ops  0.818  0.144  overall ops  0.826  runs   978   117.1    60.4

As before, the lack of repeating teams suggests that there is no real effect.

Does the uniformity of a lineup affect its scoring significantly?

Michael Richmond Dec 4, 2006

Michael Richmond
Dec 4, 2006