Sunday, September 26, 2010

Jose Bautista

Jose Bautista has had himself an incredible year this year, as on Friday night he hit his 51st and 52nd home runs of the season, which is already 5 better than George Bell's previous franchise record of 47 in a season. He has become the first player since 2007 to hit at least 50 home runs and is also currently second in the AL in walks. He has seemingly come out of nowhere to start hitting home runs left and right, and as a result many people have questioned the legitimacy of his season. The question that everyone wants to know is, how can a 29-year old player, with a career high of 16 home runs in a season, suddenly hit 50+? Only one player in history, Cecil Fielder, had ever hit 50 home runs in a season without having previously hit at least 20 in a season. I want to try and explain in this post how it is possible for Bautista to have such a breakout season, without involving the dreaded s-word.

Bautista himself claims that the increase in his home run total is due to regular playing time instead of being a utility player (increased confidence), better pitches to hit, and a change in his swing. I am going to mainly focus on the latter: how he could hit so many more home runs by simply making changes to his swing to maximize his strengths and minimize his weaknesses. There are basically two main factors into hitting home runs: hitting a lot of fly balls, and getting lucky (or by getting stronger and getting lucky) by hitting a higher percentage of those fly balls out of the ballpark. The first factor, Fly Ball %, is a hitter by hitter case, as some hitters are ground ball hitters, while other are fly ball hitters. The second factor, HR/FB, is mainly due to luck. Similar to BABIP, a hitter has some control over his HR/FB rate, but it fluctuates around the league average of 10.6% (if you want to read more about HR/FB, you can visit the Sabermetrics Library here).

If you have watched Bautista hit any home runs at all this year, you can see how he has changed his swing. He simply waits for a pitch in one location (almost always inside), if the pitch is not there he will not swing at it, but if it is there he will take an extremely hard swing. As a result, he is sacrificing contact for power (surprisingly enough, he is hitting for a career high in batting average this year). This means that he is very patient at the plate, drawing 98 walks so far this year, and will simply mash any mistake pitch. He has also added a pronounced uppercut to his swing, adding loft to the balls that he hits, and thus increasing his FB%. He has also increased his HR/FB rate this year, either by luck, but also because he is swinging for the fences every time he steps in the batter's box.

This picture shows all (well the first 49) of Bautista's 2010 home run landing spots, courtesy of Hit Tracker. He has not hit a single home run to the right of dead center field this year, nor any other year. This is again a critical part of Bautista's success: he looks to hit inside pitches for power to left field, while he either doesn't swing at or looks to hit for contact the pitches on the outer half of the plate.

What I want to do now is figure out how many of Bautista's home runs this year are from his conscious adjustment at the plate, and how many are mainly due to luck. To do this, I am going to compare his FB% and HR/FB rates from last year and this year, holding one constant while determine how many home runs the other rate contributed (it will make more sense when I introduce the numbers).

The first thing to figure out is Bautista's predicted 2010 numbers based on his past career numbers. Given his amount of plate appearances this year (currently 649, we are going to assume he will get to 675 by the end of the year), we can predict his home runs, FB%, and HR/FB. His career FB% (before 2010) is 42.8% - this means that 42.8% of the balls he puts in play are fly balls (as opposed to ground balls or line drives). It is important to not misinterpret this number as the % of PA that are fly balls - that would grossly inflate his predicted home runs. His career HR/FB rate (again, before 2010) is 10.4%, which is very similar to the league average of 10.6%. So if he were to have 675 PA this year with his average rates, he would hit 19.54 home runs this year.

The next thing to calculate is his predicted 2010 numbers using this year's splits (we could almost use his numbers right now, but there are 8 games left). With 675 PA, and a FB% of 54.8% and a FB/HR of 21.8%, he is predicted to hit 54.08 home runs.  In 2010 he has increased his FB% by 12 percentage points (from 42.8% to 54.8%) and his HR/FB rate by 11.2% (from 10.6% to 21.8%). These two increases result in a 34.54 increase in home runs this year as opposed to his career average. The question I want to answer is what percentage of that increase is due to skill, and what percentage is due to "luck"?

To determine how much is due to Bautista's swing adjustment, we want to hold his HR/FB rate constant (keep it at his career average) and set his FB% to 54.8%, his rate for this year. This will show the effect of Bautista increasing his fly ball rate, which is a "skill", without increasing his HR/FB rate, which is due mostly to luck. So with 675 PA, FB% of 54.8%, and HR/FB of 10.4%, Bautista would have hit 25.83 home runs this season. This means that 6.29 extra home runs (25.83-19.54) came purely from Bautista's changed swing at the plate.

To figure out how many home runs came from Bautista being "lucky", we are going to hold his FB% constant at a career average of 42.8%, and increase his HR/FB rate to 21.8%, his rate this year. In 675 PA he would then hit 41.08 home runs, which means that 21.54 more home runs came from his HR/FB rate increasing, which has a lot to do with luck. However, since he has changed his swing to become much more powerful, he would have an increase in his HR/FB rate anyway, but for the purpose of this study we will credit these home runs to luck.

Finally, as you may have figured out from the math, there are a couple of home runs which are unaccounted for. If we take the base of 19.54 home runs, and add the increases of 6.29 and 21.54, Bautista would have hit 47.37. But we previously stated that his projection is 54.08 home runs, so we are missing 6.71 home runs. These home runs are found through the "interaction" term, which is when both Bautista's FB% and HR/FB increase. We can safely credit these to "skill", as I believe that his HR/FB rate has increased because of the change in his swing.

What this all means is that overall, Bautista has hit at least 32.54 home runs this season due to his skill and new swing mechanics, while at most he has hit 21.54 home runs due to luck, although that number is probably much lower. So what he is doing this year should not be a fluke, even if his HR/FB rate drops all the way back down to his career average of 10.4% next year, he should still hit at least 30 home runs. The large majority of his home runs this year have indeed come because of his adjustments at the plate, and possibly because of other intangible measures such as improved confidence and better pitches to hit (although that could be measured in an exhaustive study).

What I would like to conclude is that: a) Bautista's season is no fluke, he should return to the 30 or 40 home run club next season, b) I cannot say that he is not taking steroids, but I can say that they are not the reason he has hit so many home runs this season, and c) this season is going to cost the Blue Jays (or some other team) a lot of money, and I do believe that Bautista has at least a couple more good years left in him. I hope this post clears up at least a little bit of the shock and disbelief at Bautista's incredible season, but it is good to know that there are ways to measure why and how he is hitting all of these home runs.

Friday, September 24, 2010

Fact of the Week VII: 2010 - Year of the Pitcher?

As has already been reported in many, many places, 2010 has been known as the year of the pitcher. (You can view just some examples from ESPN, Fanhouse, and Time.) Although 1968 is known as THE year of the pitcher, because strikes zones were expanded and the mound was raised, 2010 has become the year of the return of dominant pitching.

It started early on, when Ubaldo Jimenez threw the first no-hitter on April 17. Then Dallas Braden and Doc Halladay threw perfect games within three weeks of each other in May. Edwin Jackson threw a no-hitter in June, and finally Matt Garza threw yet another no-hitter in July. In all, there have been 5 no-hitters and 23 one-hitters so far in 2010. This is actually the record for most no or one-hitters in a season, passing 1988 which had 26 no or one-hitters. So we can see that 2010 has been a year filled with dominant pitching performances.

However, another amazing thing about this season is how often there have been games where both teams have pitched extremely well. This shows up in the amount of 1-0 games we have seen this year. which has already happened 59 times this season, which is tied for the 6th most in a single season, and the most in any season since 1976.

So we can see that there have been a very high number of extraordinary pitching performances this year. We could compare different stats such as ERA or WHIP to see how 2010 stacks up compared to different years in terms of overall pitching performance, but that is not what I wanted to figure out. I just wanted this post to show that 2010 has in fact been the year of the return of dominant pitching.

Saturday, September 18, 2010

Fact of the Week VI: Home Runs Streaks (Part II)

This fact of the week is a follow-up to my post on Wednesday detailing the Blue Jays' current home run streak. Jose Bautista hit a home run in the 6th inning of tonight's win over the Red Sox, which is significant in two ways: it is his 48th of the season, which is a new Blue Jays single season record, passing George Bell's 47 in 1987. Secondly, it keeps the Blue Jays home run streak alive, now at 18. It is still short of the 23 they posted in 2000, but it is slowly creeping closer.

I just wanted to do some quick calculations on how likely the Jays streak is now that it is at 18 games and counting. They have now hit 228 home runs in 147 games, hitting at least one in 109 games for a probability of 0.7415. For 18 games, their probability is 0.741518, which equals 0.459%. This means that they would hit home runs in 18 consecutive games once every 217.8 "sets" of 18 games, with a total of 145 sets of 18 games per season. So given their home run productivity this year, they were predicted to hit home runs in 18 straight games 0.6659 times. This is interesting, considering their prediction for 16 straight games was 1.138, which shows just how hard it is to continue streaks like these for longer and longer periods of time.

Now I wanted to do a calculation based on the longest home run streak by a team ever. This details the longest home run streaks by any team since 1920 (there would be no long streaks before then with the dead ball era anyway). The record is held by the 2002 Texas Rangers, who hit home runs in 27 consecutive games. I wanted to quickly calculate the odds of that team accomplishing such a feat, as I feel they will be quite low. The 2002 Rangers were nothing out of the ordinary, finishing 72-90, dead last in the AL West, except for one thing: they could mash home runs (their top two HR hitters were A-Rod with 57 and Rafael Palmeiro with 43....hmm). They hit 230 home runs, hitting at least one in 122 out of 162 games. So the probability of them hitting a home run in any given game was 0.753, and the probability of them hitting a home run in 27 straight games was 0.75327, which is equal to 0.000473, or 0.0473%. This means that, on average, they would hit home runs in 27 games straight once out of every 2,114.4 sets of 27 games. Given that there are 136 "sets" of 27 games, they would be able to accomplish this feat an average of 0.0643 times that season. What this means is that with their home run production (which is one of the top home run totals ever), the probability of them hitting home runs in 27 straight games is only about 17%.

So we will see how long the Jays can continue their current streak. I predicted on Wednesday that the streak would end sometime this weekend, and I still believe that will probably be the case. It was a great night for the Jays, with the win over Boston, the continuation of the streak, and Bautista breaking the single season Jays record for home runs.

Wednesday, September 15, 2010

Home Run Streaks

Last night, the Jays hit a home run in their 16th consecutive game, which seems to be a fairly impressive streak. I wanted to find out if in fact it was impressive, and just how difficult is it to do?

First of all, this streak is now the second longest HR streak by the Jays in club history. It is also the second longest HR streak in the MLB so far this year. The 16 games in a row is only surpassed by the 23 games in a row in 2000. So while this isn't exactly uncharted territory, they do have a good streak going. What makes this streak really interesting is that, out of all HR streaks of at least 13 games, it is the only streak where they have a losing record (they are currently 5-11 in the streak, the next worst streak is when they went 7-7 in 14 games in 1996). Also interesting is that they have only scored 73 runs in the 16 games, which gives them a runs scored/game of 4.56, which is also the lowest of the ten times they have hit home runs in at least 13 games straight.

Another interesting fact is that while they have hit 31 home runs in the 16 games (1.94/game, as opposed to 1.51 HR/game the rest of the season), they are scoring fewer runs per game than for the entire season (they were averaging 4.65 runs/game in their first 129 games, they are averaging 4.56 runs/game in the last 16). So while they are hitting more home runs, those runs produced from the home runs seem to be just about the only runs they are scoring.

The last thing I wanted to do was figure out how difficult it is to hit home runs in 16 straight games. The Jays have hit 226 home runs so far this year, and have hit home runs in 107 of the 145 games they have played (here is a summary of every home run they have hit so far if you are so interested). So the probability of them hitting a home run in any given game is 0.738, or 73.8%. That means that the probability of them hitting a home run in n different games is simply 0.738n, as the probability of them hitting a home run in two games is 0.738*0.738, in three games 0.738*0.738*0.738, and so on up to n. So the probability of them hitting a home run in 16 straight games is 0.73816, which is equal to 0.774%. What this means is that out of 1000 "sets" of 16 games, the Blue Jays would hit a home run in each game 7.74 times, or one out of every 129.2 sets. Considering that there are 147 sets of 16 games in each season (games 1-16, 2-17, 3-18, ..., 146-161, 147-162), we can see that this should happen about 1.138 times this season.

So what we can see by looking at the math is that although this streak of home runs is impressive, it is certainly not out of the ordinary and mathematically probably should have happened at least once this season. Now, keep in mind that the Blue Jays are hitting home runs at a mindblowing pace this year (on pace for 252.5), in fact very close to the record for most home runs by a team in a single season (the 1997 Seattle Mariners hold the record with 264 HRs). So the chances of the 2010 Blue Jays to hit home runs in 16 straight games is a lot higher than the chance of any other Blue Jays team to hit home runs in 16 straight games. That is why this is the second longest streak in club history. It remains to be seen how long they can continue the streak, but don't be surprised if it ends tonight or during the weekend series with the Red Sox.

Sunday, September 12, 2010

Fact of the Week V: Walk off hits

This is the Sunday edition of the fact of the week, unfortunately I couldn't post on Friday but this week's fact deals with walk off hits. Today, Adam Lind hit a walk off home run off of Rafael Soriano of the Rays, which got me to wondering how many walk off hits have the Jays had this year?

Surprisingly, today's home run was only the third walk off hit this year for the Jays, and the first walk off home run. It was also the first walk off hit not by Aaron Hill, who has had the only two walk off singles this year, on June 5th against the Yankees and August 27th against the Tigers. Both were in extra innings (14th and 11th, respectively), so today's was also the first walk off hit in the 9th inning.

Considering that there have already have 176 walk off hits (about 6 per team), and 65 walk off home runs (about 2 per team) this year (through Saturday night), the Jays, compared to other MLB teams,  have about half as many of the most exciting plays in baseball.

Sunday, September 5, 2010

The Cost of Winning: Individual Players

In my last post on the topic of winning and staying profitable, I found out that over the last 3 years (2007-2009), there has been quite a pattern going on for the $/WAR for players from age 22 to 32. Today, I want to use the macro example and try to fit it to a micro level, and test it on individual players.

I have picked 5 players who fit into a few of the criteria in the model. The first criterion is that the player must be born no earlier than 1981 so that they we will be able to predict their next couple of seasons (we can only predict until age 32, then a player's performance becomes erratic). The next item is that the player must of played in the league for at least 3 years (so mostly born no later than 1987), so that we have some data points that we can fit a model to. Finally, a player must be under contract until at least 2013 so that we can compute the player's $/WAR using our estimates for WAR and the player's salary. We could predict a player's $/WAR for the next few years, but without a salary we would not be able to predict their WAR. The whole point of this exercise is to try and predict how a player will perform in the next few years, and in our case the performance variable is WAR, so we do need a salary. So what we are looking for are young players locked into long contracts.

The five players I have picked are as follows: Curtis Granderson, Evan Longoria, Nick Markakis, Dustin Pedroia, and Mark Reynolds. I am sure there are other players we could have evaluated, but these were the first five players that I came across that fit the criteria. I picked five because it will show varying results, but having many more than five would be too much work to do for a simple study. We will analyze them player-by-player.

What I have done for each player is found a second-order polynomial function (in other words a quadratic model) that fits the player's $/WAR information for their years from the beginning of their career to 2010. I then extrapolated the function to the last year that they are under contract (or in Granderson's case, until he turns 33), and compared the player's individual function with the league function. The league function (which I modified slightly from my last post) is also a quadratic function, with an equation y = 37720x2 - 144374x + 328152, where x is a number starting at 1, representing a player's age, and y is the player's $/WAR for that year. I then found the arithmetic mean of the two, and set that value as the player's predicted $/WAR. Finally, I took the three predicted $/WAR for each player per year (minimum and maximum, values for the league average and player average, as well as the average value, which is the arithmetic mean of the two) and calculated (using the player's salary) each player's predicted minimum, maximum, and average WAR for each season. If this all sounds confusing, the examples, along with the graphs and tables, should hopefully clear things up.

First up is Curtis Granderson. The graph below shows the league trendline (green), Granderson's trendline (red), and his observed $/WAR values up to 2010 and predicted $/WAR values until 2013 (blue). He has played in the league since 2006, the longest of any player, so we have a lot of data for him. His trendline has an equation of y = 369618x2 - 1441988x + 1285573, which we can see trends upward much faster than the league average.The trendline has an R2 value of 0.98354 for Granderson's first five years in the league. Next year (2011), he has a salary of $8,250,000, and the graph predicts a $/WAR of $4,011,999.50, which means that his predicted WAR is 2.1. In 2012, his predicted WAR is 1.7 (salary of $10 million) and in 2013 it is 1.6 (salary of $13 million).

Next up is Evan Longoria. 2010 is only his third year in the majors, so his data is relatively limited. His trendline, with an equation of  y = 44144x2 - 180679x + 268113, has an R2 value of exactly 1.00, but again that is only based on 3 data points. Longoria's graph is interesting because it is the only one where the league trendline and the player trendline actually converge. This means that as we predict further into the future, the predictions should become more and more accurate. One of the reasons for this is that Longoria has a very reasonable contract, which means that his $/WAR trendline will not increase exponentially until far in the future. Another reason is that Longoria's production has so far been very close to the league average (in terms of $/WAR), so the two trendlines are very similar.
Next is Nick Markakis. Like Granderson, 2010 is his fifth year in the majors, so we have more data points leading to more accurate results. His trendline has the equation y = 295244x2 - 1131898x + 1000953, and it has an R2 value of 0.98564. One interesting point is that Markakis' $/WAR stays almost constant from this year to next, and then increases. This is because so far Markakis has outproduced the league average, and when we are predicting, we take into account the league average. So next year the league brings down his predicted $/WAR.
Next-to-last is Dustin Pedroia. Pedroia has played in the MLB since 2007, so we have a decent amount of data on him. His trendline y = 114641x2 - 328223x + 297908 has an R2 value of 0.99907. Pedroia's $/WAR has seen a pretty constant increase over his first four years, and there is only a slightly lower increase between 2010 and 2011 before constantly increasing again. This means that we can be fairly confident that he will continue to have a slightly higher $/WAR every year (avoiding injury of course).
Finally, the last player I looked at is Mark Reynolds. Reynolds is also a very unique case as his career has been very up and down, starting with a 2.0 WAR in his rookie season (2007), down to 0.9, back up to 2.2, and finally this year his WAR will end around 1.5. So fittingly, his trendline y = 28239x2 - 56377x + 273693 has an R2 value of only 0.38815, which is by far the lowest correlation of the five players. However, the trendline is predicting a big jump from 1.5 WAR this year to 6.6 next year. We will see if Reynolds can bounce back and become a consistently great player.

Finally, I have put together a table that summarizes each player's minimum, maximum, and average predicted WAR each year until 2014. The first number in each year is the minimum WAR, which means the player should at least reach this value for that year. The second number is the maximum WAR, which means that this value is the maximum number the player should reach during that year. Finally, the third number is the average WAR, or the arithmetic mean between the league average $/WAR and player $/WAR. One quick note on the table: Granderson turns 33 before the 2014 season, and Mark Reynolds' contract is only to 2013, so we cannot predict the 2014 values for those two players.
One interesting thing to look at in the table is the range of the player's predicted WAR (range is the difference between the minimum and maximum values). Some years, like Nick Markakis next year, have a large range between the minimum and maximum (range of 10.4). Others, like Evan Longoria in 2014, have no range, as the minimum and maximum values are the same. We always need to keep in mind the R2 value of each players' trendline, as a small range (meaning a more accurate prediction) doesn't mean anything if there is not a high correlation coefficient.

So we started with an idea: the way to run a successful MLB team is to have players that are good at helping the team win, while also being cost-effective. We figured out the statistic $/WAR, and found that there is a strong correlation between age and $/WAR. From there, we used the league trendline to try and predict individual player performances for the next few years. We will have to wait and see how correct these predictions are, as hopefully they are at least moderately right.

I am going to do one more post on $/WAR and some other observations, mostly about the correlation between different teams' $/WAR and how successful they have been in the past few years.

Friday, September 3, 2010

Fact of the Week IV: Strikeouts

Tonight was Brandon Morrow's last start of 2010, as the Jays are capping his innings around 150 for the year to make sure his arm is in great shape for next year. Although Morrow had a rough start to the year (his ERA ballooned to 6.69 at one point), he has pitched extremely well since June and has had a remarkable season, ending with a 10-7 record and a 4.49 ERA.

However, the thing that Brandon Morrow does best is strike batters out. He struck out 17 Rays in one game, and ended the year with 178 Ks. That puts him 11th all-time for Blue Jays pitchers for strikeouts in a season (tonight's stats are not included). You may notice that on that list, he also has fewer innings pitched than most pitchers, so we can look at his SO/9 IP, which is the average number of strikeouts a pitcher records per 9 innings (a complete game).

Morrow ended the year with 178 Ks in 146.1 IP, which means that his SO/9 IP was an astounding 10.95. This ranks as the second highest season SO/9 IP ever recorded by a Blue Jays' pitcher with at least 100 IP (Duane Ward had a SO/9 IP of 11.07 in 1991). If we look purely at starting pitchers, Morrow is only the third pitcher (after Roger Clemens and A.J. Burnett) to post a SO/9 IP of greater than 9 (meaning at least one strikeout per inning), and has the highest SO/9 IP of any Blue Jays starter ever.

So we can see that although Morrow didn't have the greatest season ever, it was certainly a step in the right direction. He is well set up for next year (as this previous post details) with a fresh arm, confidence gained from this year, and the ability to strike out almost any hitter in the American League on any given night.