In my post yesterday, I talked about how baseball teams can become successful on the field and also turn a profit. I said that the statistic $/WAR is very useful in measuring how cost-effective a player is during a season, by measuring how much it cost to pay the player per extra win he generated for the team. Today, I am going to show just a few of the results I have obtained from looking at the past three years (2007-2009), and some of the amazing (really amazing) results that can be found through the simple statistic $/WAR.
The post today is going to focus on $/WAR at different ages. I will do upcoming posts on $/WAR by teams, by individual players, and possibly by positions. Age is a huge factor in many baseball decisions, as it has been proven that a player's peak is somewhere around 28 or 29, and once a player hits a certain age, their skill begins to decline. Many decisions must be made on whether a team should keep around an older player, especially if his skill set has already shown signs of dropping, and if they should keep around younger players who have yet to hit their peak. That's why I believe that this post is so important; as we will find out soon, it could unlock the secret to developing success baseball teams that are also cost-effective.
To figure out how $/WAR differs across ages, I first had to collect the data. I looked at all players with a WAR that was greater or equal to 1.0 for each year, as I wanted to view those players that had a significant impact on their team for that year (also, a player with a negative WAR will probably not be around a major league team for very long!). I summed the WAR and salaries for all players per age, to figure out the $/WAR for each age. I then graphed the results and added a trendline to determine how much of a relationship there is between age and $/WAR. You can see the graphs below.
This first graph is from 2007. In this graph, the R2 value (the coefficient of determination, which put simply shows the relationship between the two variables, and more mathematically means what percentage of the variance in variable Y ($/WAR) can be explained by variable X (age)) is 0.87052, which shows a pretty strong correlation between age and $/WAR. The age ranges from 22-42 (without 41, as there were no players that qualified that were 41 years old), as there was only one player (Daric Barton of the A's) that was 21, so that small sample size skewed the results. We can see that for the first six or so years the trendline is very accurate, before the $/WAR jumps higher than the line, and then fluctuates from age 33 on.
This next graph is from 2008, and the R2 value is a little lower, at 0.86708. This still means there is a strong correlation between age and $/WAR. The age ranges from 22-37, as there was only one player (Cameron Maybin, 21) younger than 22 and three players (Jim Edmonds, 38, Mark Grudzielanek, 38, and Jeff Kent, 40) that were older than 37. It is a very similar graph to the first graph, just with a little less variance in the later years.
The final graph is from 2009. The R2 value is at it's highest, at 0.94014, which shows an extremely strong correlation between age and $/WAR. Other than a peak between ages 31-33, the trendline is pretty accurate with the data. One other thing to note is that all of the trendlines I have used so far have been "power" regression lines, with the formula Y = axb, where a and b are constants. I am using the power regression line (as opposed to a linear, logarithmic, polynomial or exponential model) because the increase in $/WAR is only small to begin with, before growing bigger and bigger as the player gets older (for example, the increase in $/WAR from 22-23 will be much smaller than the increase in $/WAR from 34-35). An exponential model is also a good fit, but the other models do not work very well, and so far the power model has been the best for each graph.
What all of these graphs have shows in that as a player gets older, his $/WAR continues to increase, which means he is less cost-effective. This agrees with all previous evidence that shows that a player's skill set tends to decrease as he ages while he makes a higher and higher salary per year. Now that I have looked at each year individually, I want to look at the aggregate, with all three years combined.
The following table summarizes the WAR, salary, and $/WAR for each age for all of the players in the three years:
When we graph the data in the table, we get this graph:
The R2 value is 0.93809, which again shows a very strong correlation between age and $/WAR. Again, the largest fluctuations occur in the latter stages of a player's career, when he could be either on a long contract (financially secure) or on a series of one-year deals (playing for a job). However, the most interesting part of this graph, between the ages of 22 and 32. The points look like a perfect match for an exponential model, so let's see what happens if we simply graph from age 22 to 32 with an exponential trendline.
Wow! An R2 value of 0.99019, which means that 99% of the variance in $/WAR can be explained by age. This is about as strong of a correlation as you can possibly get! What this means is this: we can predict, with 99% accuracy (on an aggregate level), exactly what a player's $/WAR will be from ages 22-32. If this works (which it should!), teams will be able to predict a player's value to the team and pay them accordingly. This is critical to becoming a good team on the field and also making a profit. Teams that overpay for talent may be successful, but they will not be profitable (unless they are the Yankees).
What we have found here is that players tend to follow a very exact pattern in their cost-effectiveness during the first 10 or so years of their career. We can use this information to predict how young players will perform over the next few years.
In my next post later this week, I am going to explore these results with some young players and try to predict how they will perform over the next few years. It will be interesting to see if this does in fact work on a micro level and not just a macro level.
No comments:
Post a Comment