Does the uniformity of a lineup affect its scoring significantly?

Michael Richmond
Dec 4, 2006
Dec 6, 2006

Revised to use Contextual Runs to predict a team's performance, rather than OPS. You can still read the original version using OPS if you wish.

This question was posed as part of a discussion on the Sons of Sam Horn website . Eric Van asked the question:

What we want to know is whether there's a correlation between standard deviation of the principal hitters, and two things: Actual Runs - Calculated Runs according to EqA, CR, RC or other formula Actual Wins - Pyth Wins IOW, do consistent lineups score more or less runs than expected? Given however many runs they score, do consistent lineups win more or less games than expected?

I will attempt to address the first question: does the actual number of runs scored by a team depend significantly on whether the hitters are relatively uniform in quality, or vary greatly from batter to batter?

For the impatient among you, my conclusion is "No."


The basic data comes from baseball-reference.com's website . I used their list of the "main starting batters" for all American League teams during the period 1997-2006. For example, the 2006 Red Sox are listed like so:

Pos Player              Ag   G   AB    R    H   2B 3B  HR  RBI  BB  SO   BA    OBP   SLG  SB  CS  GDP HBP  SH  SF IBB  OPS+
---+-------------------+--+----+----+----+----+---+--+---+----+---+----+-----+-----+-----+---+---+---+---+---+---+---+----+
C  #Jason Varitek       34  103  365   46   87  19  2  12   55  46   87  .238  .325  .400   1   2  10   2   1   2   7   85
1B  Kevin Youkilis      27  147  569  100  159  42  2  13   72  91  120  .279  .381  .429   5   2  12   9   0  11   0  108
2B  Mark Loretta        34  155  635   75  181  33  0   5   59  49   63  .285  .345  .361   4   1  16  12   2   5   1   82
3B  Mike Lowell         32  153  573   79  163  47  1  20   80  47   61  .284  .339  .475   2   2  22   4   0   7   5  106
SS  Alex Gonzalez       29  111  388   48   99  24  2   9   50  22   67  .255  .299  .397   1   0   6   5   7   7   1   77
LF  Manny Ramirez       34  130  449   79  144  27  1  35  102 100  102  .321  .439  .619   0   1  13   1   0   8  16  168
CF #Coco Crisp          26  105  413   58  109  22  2   8   36  31   67  .264  .317  .385  22   4   5   1   7   0   1   80
RF *Trot Nixon          32  114  381   59  102  24  0   8   52  60   56  .268  .373  .394   0   2  10   7   0   5   1   98
DH *David Ortiz         30  151  558  115  160  29  2  54  137 119  117  .287  .413  .636   1   0  12   4   0   5  23  164
---+-------------------+--+----+----+----+----+---+--+---+----+---+----+-----+-----+-----+---+---+---+---+---+---+---+----+

I also grabbed the overall team stats, using all players, for each team.

By the way, the most time-consuming portion of this work was dealing with the changing name of the Angels team: they went from the California Angels to the Anaheim Angels to the Los Angeles Angels during this brief span. Sigh.

I will use simple OPS, defined as the sum of On-base-Percentage (OBP) and Slugging Percentage (SLG), as a measure of the quality of a batter. If batters have similar OPS, I judge them to be similar; if their OPS values differ greatly, I judge the hitters to be significantly different in quality.

In order to predict the performance of a team, I'll use a slightly modified version of Contextual Runs (CR).

Since my sources didn't list LOB, I set it to zero; sorry! I also scaled the result of the calculation by a constant factor so that, on average over the period of the study, the prediction was equal to the actual number of runs; that happened when I multiplied the CR value by 2.03.

So, if we determine the linear relationship between my "scaled CR" value and the actual number of runs scored, we find:

The linear fit is

     Runs = 204.6  +  0.745*(scaled CR)
                   
                scatter from fit = 28 runs
                uncertainty in slope = +/- 0.040 runs/(scaled CR)
                correlation coefficent  r = 0.943

Those of you who read the earlier version will note that scaled CR does indeed predict actual runs scored better than OPS.

It's clear from the graph that some teams score more runs than this model predicts, and other score fewer. The question is -- are those differences from the basic model due to the inhomogeneity of the lineup? Suppose that they were; then we might expect lineups with a mixture of great and terrible hitters to score more runs (or fewer runs) than the model predicts. In that case, most of the points lying above (or below) the line would represent teams with a mixture of good and bad hitters.

One way to test this idea is to subtract from a team's ACTUAL number of runs scored the number of runs predicted based on its scaled CR. We can then plot this residual, in the sense


   residual  =  (actual runs scored) - (linear model using scaled CR)

as a function of the inhomogeneity of the main starting 9 batters. If the hypothesis is correct, we will see a correlation in the graph.

To quantify the inhomogeneity of the main 9 starters, I simply computed the standard deviation from their mean OPS during the season. In other words, using only the stats of the main 9 starting batters, I computed

                    sum ( OPS of each batter)
      mean OPS  =  ---------------------------
                             9
                                                2                 2
                      (  sum(OPS of each batter)   -  9*(mean OPS)    )
     stdev OPS  = sqrt( --------------------------------------------- )
                      (                   (9 - 1)                     )

For example, in the case of the 2006 Red Sox, the main 9 starters had a mean OPS = 0.814 and a standard deviation of 0.143. By comparison, the overall team OPS was 0.786, so the bench players were (as a group) rightly kept on the bench.

So -- do the residuals in runs scored over a season correlate with the inhomogeneity of the main 9 batters?

In a word, no: there is no correlation here. A formal unweighted linear fit to the residuals as a function of stdev of starting 9's OPS is

       residuals  =  2.334  - 23.711*(stdev OPS of starting 9)

                 scatter = 28 runs
                 uncertainty in slope = +/- 149 runs/(stdev OPS)
                 correlation coefficient  r = -0.024


For the curious, I include here a few "top 10" and "bottom 10" lists generated during the course of this study. First, let's pick the teams which deviated most in a bad way -- underperformed, in other words -- from the linear model using scaled CR.

#                                                                                            
#                          mean  stdev                mean                                  residual
#---------------------------------------------------------------------------------------------------
 1996  MIN  starters ops  0.802  0.087  overall ops  0.778  runs   877  scaled cr   808.0    -70.3 
 2001  SEA  starters ops  0.810  0.108  overall ops  0.805  runs   927  scaled cr   881.5    -65.5 
 1997  NYY  starters ops  0.813  0.109  overall ops  0.798  runs   891  scaled cr   837.8    -62.1 
 2000  KCR  starters ops  0.793  0.116  overall ops  0.773  runs   879  scaled cr   823.9    -60.4 
 1999  CLE  starters ops  0.873  0.127  overall ops  0.839  runs  1009  scaled cr  1000.7    -58.7 
 1998  OAK  starters ops  0.747  0.092  overall ops  0.731  runs   804  scaled cr   730.7    -54.9 
 2002  ANA  starters ops  0.789  0.095  overall ops  0.774  runs   851  scaled cr   793.9    -54.8 
 2003  SEA  starters ops  0.757  0.114  overall ops  0.754  runs   795  scaled cr   732.1    -44.8 
 1996  NYY  starters ops  0.808  0.071  overall ops  0.796  runs   871  scaled cr   837.5    -42.3 
 2005  TOR  starters ops  0.744  0.037  overall ops  0.738  runs   775  scaled cr   708.8    -42.2 

I see no patterns here.

Let's look at the teams which overperformed the most relative to the prediction of their overall OPS:

#                          mean  stdev                mean                                  residual
#---------------------------------------------------------------------------------------------------
 2002  DET  starters ops  0.691  0.096  overall ops  0.679  runs   575  overall cr   556.9     44.6 
 2000  TOR  starters ops  0.832  0.162  overall ops  0.810  runs   861  overall cr   942.5     46.0 
 2002  BAL  starters ops  0.717  0.054  overall ops  0.712  runs   667  overall cr   682.4     46.1 
 2003  CHW  starters ops  0.796  0.101  overall ops  0.777  runs   791  overall cr   850.9     47.7 
 2005  CHW  starters ops  0.768  0.068  overall ops  0.747  runs   741  overall cr   786.8     49.9 
 2005  TEX  starters ops  0.815  0.081  overall ops  0.797  runs   865  overall cr   959.4     54.5 
 2005  BAL  starters ops  0.784  0.080  overall ops  0.761  runs   729  overall cr   778.3     55.6 
 1999  DET  starters ops  0.792  0.061  overall ops  0.761  runs   747  overall cr   804.5     57.1 
 2006  TBD  starters ops  0.763  0.085  overall ops  0.734  runs   689  overall cr   726.8     57.2 
 2003  DET  starters ops  0.711  0.098  overall ops  0.675  runs   591  overall cr   598.5     59.6 

Hmmm. Detroit appears 3 times, Baltimore and the White Sox twice. This repetition might possibly suggest that there is some feature of these teams which causes them to do better than scaled CR would predict, but it could very well be chance.