As I traveled home from college for Christmas break, I started to ponder fantasy baseball. The baseball season in only a few months away, which means it's almost time to start preparing for fantasy baseball. My fantasy baseball history has been mediocre at best, with a few pleasant successes mixed in with a lot of middling results. But this is the year I am going to change that. This year, I am going to develop a system to exploit any market inefficiencies, to properly value players, and to determine the optimal pitcher-player drafting strategy. If it doesn't end up working, then I will return to fighting for that last playoff spot. If it does, then I will be on top of the (fantasy) world.
So what strategy am I going to use to dominate my fantasy baseball league? I got the idea from the seminal paper by Berri, Schmidt, and Brook in which they run regressions to find the value of each basketball play and determine how much each contributes to a player's team winning from the coefficients. (Berri, David J., Schmidt, Martin B., and Brook, Stacey L. The Wages of Wins: Taking Measure of the Many Myths in Modern Sport. Stanford, CA: Stanford Business, 2007, pp. 107. Print.) While the idea was not new to me, and I had seen the paper before, for some reason I started to wonder if this could work when applied to my fantasy baseball team. If I could find just how much a marginal increase in each statistic will add to my total winning percentage, than I can properly valuate players on draft day.
There are a couple of steps in order to determine each player's value. The first is to properly project what his final statistics are going to be in 2012. This is one of the most important steps, as an estimation far from what the player will actually produce will harm my drafting strategy. To properly predict players' statistics, I am going to combine both his statistics from recent previous seasons as well as projections for the upcoming season. I am going to use weighted averages for the player's last three seasons to determine past production as it relates to future production, where a player's season three years ago is valued less than two years ago, and two years ago less than last year. This will be one part of a player's projected production. The other part will be actual future predictions for next year. Thankfully, I do not have to do this myself as much smarter people than I have already (or will soon) do it. The most useful projections for the upcoming season can be found on Fangraphs Projections page. There are a few options to choose from, and although I am not sure what ones I will use, I will probably use those that are available and average them to try and find some sort of middle ground, eliminating any potential outliers. The options on Fangraphs are Marcel projections (from Tom Tango) that also use the weighted averages of players' past productions to determine future projections, ZiPS projections from Baseball Think Factory, fan projections, and RotoChamp projections. Bill James projections are also available on each player's page, and I may also use ESPN or Yahoo! projections in my calculations.
The next step after being comfortable with a player's projections is to properly evaluate what these statistics will mean to a fantasy team. As there are two main types of leagues (rotisserie and head-to-head), as well as many different categories available to use in a fantasy baseball league, I am going to take the simple approach and evaluate the 5x5 categories in a standard roto league. (A brief overview of Rotisserie baseball is available here and another explanation on scoring can be found here.) The 5x5 categories for hitting are batting average, runs, RBI, home runs, and steals. The categories for pitching are wins, saves, strikeouts, ERA, and WHIP. I will need to go back a few years into a league's history (hopefully at least five) to get enough data points, and I will collected total points for each team (if I were to do this for a H2H league I would collect team records), as well as teams' aggregate totals in each of the ten categories.
The next step would be to run a regression, with the independent variables being each of the 5x5 categories and the dependent y variable being total points. The result would be a regression with coefficients on each category determining how much each contributes to the total team winning percentage (in this case, points). I may also include some dummy variables for each team or each year to try and account for differences that may not be present in the categorical variables.
Once I have run the regression and obtained coefficients, I can apply these coefficients to each player's statistics (obviously, hitting coefficients for hitters and pitching coefficients for pitchers only). Multiplying the estimated coefficient by the player's estimated future projection will give me an estimate of how many wins, or team points, each player will be worth. I can then rank by total value, and the differences between my rankings and the rankings of whatever fantasy baseball site I am on will be the market inefficiency which I can exploit. For example, if I believed that Jose Bautista would be the fourth most valuable player in 2012, but the rankings had him as only the ninth most valuable, I should acquire him as he will be worth more to me than to other players because of asymmetrical information. I can also pass on players that are ranked higher that I believe will not produce as much as believed by other players.
There are a few problems that I have already touched on briefly. The first problem is the player projections, as they are highly volatile and you can never be completely sure about a player's production or health in the upcoming season. The next issue is the league differences between my league and the hypothetical 5x5 rotisserie league. This will obviously have to be accounted for, as player rankings will be largely based on what categories the league is using. A strategy for winning in a 5x5 league may not win in a sixteen category lead. I will have to adjust the model accordingly, but once the basic model has been completed I can simply tweak it by adding or subtracting categories. Finally, there may be collinearity in my regression model, as described on pages 10-11 of this paper.
That being said, if I can address the issues and temper their impact on the regression, this model could be very successful. Finding undervalued players in the fantasy baseball "labor" market is somewhat similar to the strategy the Oakland Athletics employed in the early 2000s as detailed in Moneyball. Although I am not looking for a certain type of player that is undervalued (e.g. a high on-base player), I may find certain players undervalued, such as low-win, low ERA pitchers. I am simply looking for outliers, players who are worth more than what it is currently valued at on fantasy baseball sites. If I can find these outliers and acquire them for less than they are worth, there is a high likelihood that the upcoming fantasy baseball season will be successful. I've had enough of the 8th place finishes. I am ready to try something new, just like the A's, and see if it can vault me to the top.
An in-depth view on Major League Baseball and the Toronto Blue Jays from a statistical perspective
Saturday, December 17, 2011
Sunday, December 4, 2011
Risk Aversion in MLB and its Impact on Winning Percentage
This is a paper I wrote for my Sports Economics class. I attempted to measure risk aversion of owners and GMs in baseball using starting payroll percentage, and then tested whether risk aversion was correlated with winning percentage.
Risk Aversion in MLB
Risk Aversion in MLB
Subscribe to:
Posts (Atom)