Thursday, November 25, 2010

Improving a team's Hitting

As a follow up to my last post, which shows how improving a team through hitting or pitching is equally valuable, I wanted to look at how a team should improve their hitting. There are different ways to score runs, and because higher run totals lead to higher win totals, I want to figure out what is the best way to improve a team's hitting performance, thus leading to more runs and accordingly more wins.

Similar to last post, I am going to run a linear regression model, except this time I am going to use "Runs scored" as the dependent variable. Why not simply use "Wins"? If we were to run a regression model with hitting statistics as predictors and Wins as the dependent variable, we will have a much higher standard error, which means that the R2 value will be much lower, showing the the variability in wins is not explained very much by the hitting statistics. So if we have runs as the DV, we need appropriate hitting statistics for the independent variables. This is trickier to figure out then expected, as we cannot have statistics that are correlated with each other, or the regression model will experience "multicollinearity". What this means is that although the overall regression will predict the dependent variable nicely, we will not be able to tell which independent variables are accounting for the variability in the dependent variable. Although this my sound hard to prevent, it can be fairly straightforward, as a quick example will show. If we were to use on-base percentage, slugging percentage, and on-base plus slugging percentage as predictors for runs (or wins), our equation would have a multicollinearity problem. The overall regression may result in a low p-value, showing that we have predicted runs well, but each statistic individually would have a high p-value. We would not be able to tell which statistic is heavily influencing runs as OPS is an extraneous variable, and since OPS is basically measuring what OBP and SLG are already measuring, the best course of action is to remove it from the equation.

This regression demonstrates the collinearity issue:
                   Estimate    Std. Error     t value     Pr(>|t|)  
Intercept     -5.8651      0.2167         -27.070    <2e-16 ***
OBP            22.4591     17.5672       1.278       0.202   
SLG            15.0106     17.5936       0.853       0.394   
OPS            -4.2768      17.5879      -0.243       0.808

R2 = 0.9089

In short, we are going to need to carefully pick our independent variables so they do not experience collinearity. I am going to use the following statistics to try and predict runs: OBP, SLG, and stolen base %. Although there are many different statistics to use, I am using these three because they represent the three basic ways to improve your team's hitting: get on base more, hit for more power, or become a more successful team running the bases. When we run the regression we get the following:

                   Estimate    Std. Error     t value    Pr(>|t|)  
Intercept     -6.0689     0.2246          -27.022   < 2e-16 ***
OBP           17.9231     0.9630         18.612     < 2e-16 ***
SLG           10.7619     0.4947         21.756     < 2e-16 ***
SBperc       0.3956       0.1387         2.852      0.00463 **
R2 = 0.9111


So these three factors explain over 91% of the variability in Runs per Game. Although the R2 value is only slightly higher than the first regression that involved OPS, we can see that all of the statistics are now significant, as opposed to none of the statistics being significant. All three factors have significant effects on runs per game, and OBP has the largest effect. A ten percentage point increase in OBP (e.g. from .350 to .360) is associated with a 0.0179 increase in runs per game. A ten percentage point increase in SLG (e.g. from .450 to .460) is associated with a .0108 increase in runs per game. Finally, a one percentage point increase in SB% (e.g. from 70% to 71%) is associated with a 0.00396 increase in runs per game.

So what does this mean? The best way to increase your team's hitting is to try and score more runs per game. The best way to score more runs per game is to increase OBP. So the best way to increase a team's hitting is to acquire players that will get on base more often, whether it be through a hit, a walk, or a hit-by-pitch. Acquiring players that hit for power will also positively impact a team's hitting, but not as much as players that get on base. So if a team had a limited budget and could only acquire one or two significant players, they should try and acquire those players that can most improve their team's OBP.

No comments:

Post a Comment