Thursday, October 14, 2010

Predicting Playoff Series

Now that the first round of the playoffs is over, we are left with only 4 teams: the Yankees, the Rangers, the Phillies, and the Giants. In the American League Championship Series, the Rangers are hosting the Yankees, while in the NLCS the Phillies are hosting the Giants. The Phillies and Yankees are the prohibitive favorites to win their respective series' and advance to the World Series for a rematch of last year. But what are the chances of each of the four teams winning their series?

To calculate the odds of each team advancing, I used three different statistics. The first was the team's regular season record in 162 games, which I converted to a probability between 0 and 1 (each team is actually between 55% and 60%. The second is their Pythagorean Win-Loss record, which uses runs scored and run against to predict what each team's record should have been. I again converted the Pythagorean Win-Loss record to a probability. Finally, I used the team's overall record, combining the regular season record and playoff record through the Division Series', and converted it to a probability. These three estimates will all be used to predict the probability of each team advancing, as three estimates should produce a more accurate result than one. The summary for each team is found below:
 
Texas Rangers
Regular Season record: 90 wins - 72 losses (win probability = 0.55555)
Pythagorean W-L = 91-71 (0.56173)
Overall record: 93-74 (0.55689) - Won their first round series 3-2.

New York Yankees
Regular Season record: 95-67 (0.58642)
Pythagorean W-L = 97-65 (0.59877)
Overall Record: 98-67 (0.59394) - Won their first round series 3-0.

Philadelphia Phillies
Regular Season record: 97-65 (0.59877)
Pythagorean W-L = 95-67 (0.58642)
Overall Record: 100-65 (0.60606) - Won their first round series 3-0.

San Francisco Giants
Regular Season record: 92-70 (0.56790)
Pythagorean W-L = 94-68 (0.58025)
Overall Record: 95-71 (0.57229) - Won their first round series 3-1.

The next part gets a little tricky, as I used a statistical software package to compare the win probability means. I generated 1000 random data points for each win probability, and each data set had a normal distribution of mean = win probability and standard deviation = 1. These data sets are very close to the standard normal distribution of mean = 0 and standard deviation = 1 (if you want to know more about the standard normal distribution, you can see Wikipedia). What this means is that 68% of the data points will be between win probability - 1 and win probability + 1, and since we are creating two different data sets, there should be some difference between the sets if they have different means. The standard deviations represent the differences in performances, as a team will play a different game every night, and not always perform the same. I used 1000 data points so that I could sufficiently determine whether or not there was a difference between the teams, as the more data points, the more precise the estimate. I then subtracted the two data points for the teams that were playing against each other, and figured out a difference. The difference was "home team" - "away team", so if the difference was >= 0, it meant the home team was "better" than the away team, or in my analysis, that the home team won the series. If the difference of "home team" - "away team" < 0, then it meant that the away team would win the series. Finally, I totaled the differences to determine how many times out of 1000 each team would win the series. I did this for each set of win probabilities (regular season, Pythagorean, and total) to find three estimates to the probability of each team winning a series, found below.

ALCS
Using regular season record: Rangers win series 474 times = 47.4%, Yankees win series 52.6%
Using Pythagorean regular season record: Rangers win series 495 times = 49.5%, Yankees win 50.5%
Using regular season and playoff record: Rangers win series 483 times = 48.3%, Yankees win 51.7%

NLCS
Using regular season record: Phillies win series 527 times = 52.7%, Giants win series 47.3%
Using Pythagorean regular season record: Phillies win series 506 times = 50.6%, Giants win series 49.4%
Using regular season and playoff record: Phillies win series 529 times = 52.9%, Giants win series 47.1%

So in the ALCS, the Yankees win the series between 50.5% and 52.6% (and on average 51.6%) of the time according to this estimate. In the NLCS, the Phillies win the series between 50.6% and 52.9% (and on average 52.07%) of the time. These estimates are slightly lower than some other estimates (like here), but this is probably because as the number of data points increases, the probability of each time tends towards 50%. It is a delicate balance between having enough data points to provide an accurate estimate, but not having too many so that the probability is very close to 50% just because of the large n.

So my picks are that both series should be pretty close, but just like popular opinion, the Phillies and Yankees should prevail and meet in the World Series for a second straight year.

No comments:

Post a Comment