Saturday, November 27, 2010

Improving a team's Pitching

I have already written two posts on the best way of improving a team, and improving a team's hitting. In this post, I want to do much of the same as the hitting post, but this time on pitching statistics. I am again going to run a linear regression model to determine which statistics are best correlated with pitching performance, which will show us which statistics can be best used to improve pitching.

In this model, instead of trying to estimate runs scored, I am going to use ERA as the dependent variable. Using runs against is a possibility, but since we are estimating the effect of statistics on pitching, and not pitching and defense, using runs against would include the effect of defense, so it is not an appropriate DV in this scenario. We again need to be careful in our selection of independent variables as to avoid collinearity.

Pitching statistics are almost opposite of hitting statistics. Good hitters are generally grouped into two categories: those that can get on base, and those that can hit for power. Good pitchers are those who do not allow very many baserunners and do not allow many home runs. We can measure these qualifications by using the two statistics Walks and hits per innings pitched (WHIP), which measures the average number of baserunners a pitcher allows per inning, and home runs allowed, which will not encompass all extra base hits, but should give us a good feel for pitchers who do and do not allow many home runs that will hopefully be a decent predictor for all extra base hits. Finally, I am also going to include strikeouts as a predictor, because pitchers with high strikeouts rates are valuable, and maybe a pitcher with more strikeouts will allow less runs because he has to rely less on his defense. When we run the regression, we get the following:

                  Estimate        Std. Error     t value    Pr(>|t|)
Intercept    3.2986958     0.4207396    7.840      6.45e-14 ***
WHIP        0.4431320     0.2483891    1.784      0.0753 . 
SO            -0.0014423    0.0001842     -7.829    6.97e-14 ***
HR            0.0117201     0.0008172     14.342   < 2e-16 ***
R2 = 0.551

As you can see from the R2 value, this regression explains a lot less variability than the hitting regression. However, if we replace WHIP by the number of hits and walks given up, we get a lot better regression:

                  Estimate        Std. Error     t value    Pr(>|t|)  
Intercept    -3.410e+00   2.855e-01    -11.942    <2e-16 ***
Hits           3.815e-03      1.547e-04    24.661     <2e-16 ***
BB            2.463e-03      1.414e-04    17.424     <2e-16 ***
SO            -4.711e-05     1.063e-04    -0.443      0.658  
HR            5.298e-03      4.567e-04    11.601     <2e-16 ***
R2 = 0.8906

Now, the R2 value is almost as high as the hitting regressions. All of the variables are significant except for strikeout, so when we take it out of the regression we get the following:

                  Estimate        Std. Error    t value    Pr(>|t|)  
Intercept    -3.5115171   0.1703878   -20.61     <2e-16 ***
Hits           0.0038502     0.0001321    29.14     <2e-16 ***
BB            0.0024598     0.0001410    17.45     <2e-16 ***
HR            0.0053015     0.0004561    11.62     <2e-16 ***
R2 = 0.8905

We can see how insignificant strikeouts were in the regression, because when we remove it the R2 value decreases by only 0.0001 (0.01%). We can now determine which variables impact pitching the most. One more hit given up is associated with a 0.00395 increase in ERA, one more walk given up is associated with a 0.00246 increase in ERA, and one more home run given up is associated with a 0.00530 increase in ERA. Since there are vastly different numbers of hits, walks, and home runs given up, we must also look at the mean of each to determine which will most affect ERA. The mean number of hits given up by a team in a single season is 1469.9, the mean walks is 540.2, and the mean home runs is 172.0. If we multiple the means by the coefficients, we get that, on average, hits will increase team ERA by 5.66, walks will increase ERA by 1.33, and home runs will increase ERA by 0.91. Obviously, we are only looking at statistics that will negatively impact (increase) ERA, so the numbers will look very high, as we are not inputting statistics such as outs or double plays that will positively impact (lower) ERA.

So from the results we can easily see that hits are the statistic that most impacts a team's ERA. So the obviously solution for a team would be to give up less hits, but how? One way would be to acquire pitchers with greater command, possibly leading to those pitchers being able to "nibble" more, making hitters swing at worse pitches. This would probably increase walks, and we already saw that walks also are bad for ERA. A better solution would be to acquire pitchers with a low batting average against and also a low batting average on balls in play (BABIP - although it has been shown that BABIP fluctuates year to year and may not be consistent for any pitchers). Pitchers also want to give up less home runs, but if they can reduce the number of hits against them this should in turn reduce the number of home runs against them.

No comments:

Post a Comment