Saturday, June 4, 2011

Differentiating between Luck and Skill Part II

The last post explained how we used batted ball statistics to determine a batter's skill in his batting average. In this post, I want to show how much of a batter's difference from his career mean is due to luck and how much is due to skill.

The first thing to do is to figure out a hitter's career batting average. This is, with a large enough sample size, his "average skill". So a hitter with ten seasons in the majors with a career .300 batting average is a ".300 hitter". If he is batting over .300, then he is having a better season, and there is some amount of skill and luck as to why he is batting better. If he is batting under .300, then maybe he is getting unlucky, or maybe he is losing some skill as he ages.

Once we get a hitter's career average, as well as his actual batting average for every season and his predicted average for every season from the regression equation we have already run, we can graph all three. The career average is the baseline and the difference between the career average and the predicted average is the batter's difference due to skill. The difference between the hitter's predicted average and his actual average is the difference due to luck. Anyone familiar with statistics will realize the procedure: the difference between a variable's mean and it's predicted value is the "Sums of Squares Explained", and the difference between the predicted value and observed value is the "Sums of Squares Residuals". The batter's skill is explained, and his luck is a residual, or error, which is unexplained.

The first example I want to use is a player very familiar to the Blue Jays: Vernon Wells. He spent his entire career in Toronto until being traded to the Angels this past offseason. He had a couple of good seasons, as well as some bad seasons, so he should be a good example, showing variance in both actual and predicted AVGs. He is a career .277 hitter, so the blue line in the graph below is his "average skill" as a hitter. The red line shows his observed averages over each season from 2002-2011 (excluding 2008 when he missed a lot of time due to injuries), and the green line is the regression's predicted values for his average.

Vernon Wells AVG from 2002-2011
There are a lot of interesting things to see on the graph. Although Wells is a career .277 hitter, he has had only two full seasons hitting above that (as well as hitting .300 in 2008 in limited action). In 2003, his second full season, he hit .317 even though his predicted average was only .282. This means that of the extra 40 points of batting average above his career mean (.317-.277), 5 of those were due to skill, and 35 due to luck, or randomness. The regression saw him not as a .317 hitter, but more of a .282 hitter, and was correct as the next season he batted only .272. It is interesting to note that in both of his first full season, his batted ball statistics suggested that he was a .282 hitter, but he hit 42 points higher in his second season. That seems to show the luck he had in his second year. This also happened in 2005-06, when the regression predicted he was a .271 hitter, but he hit .269 in 2005 and then .303 in 2006. Unfortunately Wells was seen as a much better hitter than he actually was, and was rewarded with a huge contract that the Jays had to unload for players with lesser than Wells' ability.

Another thing to notice on the graph is how far Vernon has fallen off this year. He is currently hitting only .183 (through Thursday's games), but the regression is expecting him to be hitting .247. This means that although he has lost about 30 points of skill, from .277 to .247, he has also been very unlucky, hitting 64 points lower than expected. This is still a small sample size, so we will need to see how he performs the rest of the season to truly judge whether or not he has simply lost it or if he was unlucky.

Now that we have seen a good example, we can move on to what this entire exercise was all about: figuring out how much of Jose Bautista's increase in average so far this year is due to luck and how much is due to skill. Bautista is a career .252 hitter, so any average above that means that he has either been getting lucky or has become more skilled. Last year, the regression predicted that he would hit .261, and he actually hit .260, showing that all of his increase in batting AVG was skill, and he was actually slightly unlucky (by 1 point, more due to randomness than anything else).

Jose Bautista AVG from 2010-2011

And that leads us to this year. Bautista is currently hitting .363 (through Thursday's games), an amazing 111 points above his career average. The regression predicted that he would be hitting .326 at this point, so 74 points are due to skill and only 37 are due to luck. This means that 2/3 of his increase in batting average is entirely due to skill. Obviously it is still a somewhat small sample size, and we need to see how he finishes the year, but we can say with certainty that this increase in batting average is not due to some fluke. All of the keys to increasing batting average that I mentioned in the last post are evident in Bautista this year. He has increased his GB/FB rate, increased his line drive % by 5%, decreased his FB% by 8%, increased his HR/FB rate by 8%, and decreased his strikeout percentage by 3%. These have all lead to a predicted increase in average, and thus we can see that the increase is mostly due to skill.

This post was a mission to answer the question "can we differentiate between luck and skill in a batter's AVG?", and ended up answering an emphatic yes. We showed that much of a hitter's variation in batting average can be attributed to skill, especially in the case of Jose Bautista. As I said in this post on Bautista, "Bautista probably won't end the year hitting .350, but we can reasonably expect him to finish the year hitting .320 or so." That was my gut feeling, and it is nice to know that the numbers back up that statement. He may well end up hitting over .350 this year, but it is more reasonable to expect him to hit somewhere around .330. He has fundamentally changed his approach in the last two years, and last year reaped the benefits by hitting 54 home runs. This year, he is still hitting home runs, but has now also become a high average hitter, due almost all to skill. 

No comments:

Post a Comment