Monday, August 30, 2010

The Cost of Winning: Age and $/WAR

In my post yesterday, I talked about how baseball teams can become successful on the field and also turn a profit. I said that the statistic $/WAR is very useful in measuring how cost-effective a player is during a season, by measuring how much it cost to pay the player per extra win he generated for the team. Today, I am going to show just a few of the results I have obtained from looking at the past three years (2007-2009), and some of the amazing (really amazing) results that can be found through the simple statistic $/WAR.

The post today is going to focus on $/WAR at different ages. I will do upcoming posts on $/WAR by teams, by individual players, and possibly by positions. Age is a huge factor in many baseball decisions, as it has been proven that a player's peak is somewhere around 28 or 29, and once a player hits a certain age, their skill begins to decline. Many decisions must be made on whether a team should keep around an older player, especially if his skill set has already shown signs of dropping, and if they should keep around younger players who have yet to hit their peak. That's why I believe that this post is so important; as we will find out soon, it could unlock the secret to developing success baseball teams that are also cost-effective.

To figure out how $/WAR differs across ages, I first had to collect the data. I looked at all players with a WAR that was greater or equal to 1.0 for each year, as I wanted to view those players that had a significant impact on their team for that year (also, a player with a negative WAR will probably not be around a major league team for very long!). I summed the WAR and salaries for all players per age, to figure out the $/WAR for each age. I then graphed the results and added a trendline to determine how much of a relationship there is between age and $/WAR. You can see the graphs below.

This first graph is from 2007. In this graph, the R2 value (the coefficient of determination, which put simply shows the relationship between the two variables, and more mathematically means what percentage of the variance in variable Y ($/WAR) can be explained by variable X (age)) is 0.87052, which shows a pretty strong correlation between age and $/WAR. The age ranges from 22-42 (without 41, as there were no players that qualified that were 41 years old), as there was only one player (Daric Barton of the A's) that was 21, so that small sample size skewed the results. We can see that for the first six or so years the trendline is very accurate, before the $/WAR jumps higher than the line, and then fluctuates from age 33 on.
This next graph is from 2008, and the R2 value is a little lower, at 0.86708. This still means there is a strong correlation between age and $/WAR. The age ranges from 22-37, as there was only one player (Cameron Maybin, 21) younger than 22 and three players (Jim Edmonds, 38, Mark Grudzielanek, 38, and Jeff Kent, 40) that were older than 37. It is a very similar graph to the first graph, just with a little less variance in the later years.
The final graph is from 2009. The R2 value is at it's highest, at 0.94014, which shows an extremely strong correlation between age and $/WAR. Other than a peak between ages 31-33, the trendline is pretty accurate with the data. One other thing to note is that all of the trendlines I have used so far have been "power" regression lines, with the formula Y = axb, where a and b are constants. I am using the power regression line (as opposed to a linear, logarithmic, polynomial or exponential model) because the increase in $/WAR is only small to begin with, before growing bigger and bigger as the player gets older (for example, the increase in $/WAR from 22-23 will be much smaller than the increase in $/WAR from 34-35). An exponential model is also a good fit, but the other models do not work very well, and so far the power model has been the best for each graph.

What all of these graphs have shows in that as a player gets older, his $/WAR continues to increase, which means he is less cost-effective. This agrees with all previous evidence that shows that a player's skill set tends to decrease as he ages while he makes a higher and higher salary per year. Now that I have looked at each year individually, I want to look at the aggregate, with all three years combined.

The following table summarizes the WAR, salary, and $/WAR for each age for all of the players in the three years:
When we graph the data in the table, we get this graph:
The R2 value is 0.93809, which again shows a very strong correlation between age and $/WAR. Again, the largest fluctuations occur in the latter stages of a player's career, when he could be either on a long contract (financially secure) or on a series of one-year deals (playing for a job). However, the most interesting part of this graph, between the ages of 22 and 32. The points look like a perfect match for an exponential model, so let's see what happens if we simply graph from age 22 to 32 with an exponential trendline.
Wow! An R2 value of 0.99019, which means that 99% of the variance in $/WAR can be explained by age. This is about as strong of a correlation as you can possibly get! What this means is this: we can predict, with 99% accuracy (on an aggregate level), exactly what a player's $/WAR will be from ages 22-32. If this works (which it should!), teams will be able to predict a player's value to the team and pay them accordingly. This is critical to becoming a good team on the field and also making a profit. Teams that overpay for talent may be successful, but they will not be profitable (unless they are the Yankees).

What we have found here is that players tend to follow a very exact pattern in their cost-effectiveness during the first 10 or so years of their career. We can use this information to predict how young players will perform over the next few years.

In my next post later this week, I am going to explore these results with some young players and try to predict how they will perform over the next few years. It will be interesting to see if this does in fact work on a micro level and not just a macro level.

Sunday, August 29, 2010

The Cost of Winning: Staying Successful and Profitable in Baseball

The key to running a successful major league baseball team is essentially maximizing the performance on the field (e.g. the more wins the more success the franchise has) while also maximizing the profit of the organization. There are really two ways to achieve financial success: either spend the minimum on players and staff, and gain money even though the team is terrible (see the Pittsburgh Pirates), or spend a ton of money, become a very successful team, and reap the benefits on the balance sheet (see the New York Yankees; even though that story is six years old, the balance sheet still shows the huge difference between the two franchises' finances).

The best case scenario, however, is to draft well, develop your young players, and have them blossom into stars while you still only have to pay them near the MLB minimum salary. The Tampa Bay Rays are a good recent example of this, as they drafted and developed young stars such as Carl Crawford, Evan Longoria, James Shields (not quite a star but still performed very well in the 2008 playoffs), and Matt Garza. They won the AL East in 2008, and made it to the World Series before losing to the Phillies, while still turning a profit (a lot of that due to the current revenue sharing agreement in the MLB). However, the one problem with this is that eventually, your young stars will need to be paid a lot of money. The Rays will run into this problem this summer, when Carl Crawford and Carlos Pena, both free agents, will be due hefty raises that the Rays probably won't be able to afford. So how do teams sustain both success on the field and on the balance sheet for multiple years?

The real key to success is to find players that cost only a small amount (relative to their peers) while putting up good numbers on the field. A good way to measure the performance of a player over a season or career is by Wins Above Replacement (WAR), which measures how much better the player is compared to a AAA-level replacement player. It is very convenient to use just this one stat to compare how good each player is to another. You can view the 2009 MLB WAR leaders here. The highest WAR per season is usually under 10 (Albert Pujols led with 9.6 in 2008 and 9.2 in 2009). This means that Pujols is "worth" approximately 9-10 wins for the Cardinals, because if they did not have him playing first (and only had a AAA replacement), they would lose 9-10 more games per year.

While WAR measures the performance of the player, teams also need to keep in mind how cost-effective the player is. For example, is it worth spending around $30 million on Alex Rodriguez, who is only worth about 4-5 extra wins per year, when you can spend that money on three or four above average players who will each be worth 2-3 (or more) wins per year? If you are the Yankees, the answer is yes, but for most teams, A-Rod is not a viable option (and that's just in terms of money, we haven't even got into the chemistry and happiness of the team when he's around!). So teams need to also view the salary of the player, and when combined with WAR, a very useful statistic for measuring success is $/WAR. This statistic measures how much money it costs for an extra win per player. For example, last year Pujols earned $13,870,949 in salary while being worth an extra 9.6 wins. So the Cardinals paid $1,444,890.52 per extra win that Albert Pujols was worth. The key to being successful is to have players with a low $/WAR, which means that it costs a low amount of money (again, relative to their peers) for each extra win that the player is worth.

With the $/WAR statistic in hand, in the next few days I am going to explore the conclusions that can be drawn from analyzing the past couple of years with $/WAR, and post the results. What the study should show is that the successful teams will either have a high WAR regardless of cost (e.g. the Yankees and Red Sox, who spend a lot of money to make money), or a low $/WAR, which most teams need to have in order to be successful on the field while making a profit. I am also going to explore how to predict a player's $/WAR, which, if successful, will be the key to running a successful team. If you can predict a player's cost-effectiveness, then you will be able to obtain successful players at lower costs, thus exploiting the market for baseball players and making your franchise successful on the field and on the balance sheet.

Friday, August 27, 2010

Fact of the Week III: Home Run, Double, and Stolen Base in the Same Game

Unfortunately I moved back to college this week, so I have not been able to write any posts since Saturday. Hopefully I can get a few more off this weekend, but since it's Friday, here's your fact of the week.

In honor of Jose Bautista tonight, who currently has a home run, double, two walks, and a stolen base, here are all of the Blue Jays players with a HR, 2B, and SB in the same game. It has only been done 38 times (Bautista's game tonight has yet to be included) in Blue Jays history, the most recent time being Alex Rios in July 2008.

What makes this trifecta so difficult to accomplish is a player cannot steal a base on a home run or triple (well, only home which very rarely happens), and only rarely will steal third base. Almost all of the players on the list either have a single accompanying the 2B and HR, or at least one walk, which is when they probably got their stolen base. Tonight, Bautista walked in the first inning and then stole second, and only later hit his double and home run.

Saturday, August 21, 2010

The Art of Clutch: Part II

Earlier this week I posted the method to figuring out what I think is a fairly accurate way of calculating how clutch a player is. Today I am going to post some of the results I obtained from the formula and player statistics for the last three years (2007-2009). I will be analyzing this year's statistics once the season ends.

The requirements for inclusion in the study is that the hitter must qualify for batting title (502 PAs). Each year around 150 players qualified, for an overall sample of 464 hitters (that is, 464 seasons worth of hitting, some hitters qualified twice and some even three times). As I mentioned in the last post, out of the 464 results, there was only one negative number and four numbers greater than 2 (none greater than 2.2). This means that but for one outlier, every player drives in more runs in the clutch situation (2 outs and RISP) than every other situation.

So enough with the explanations, here are the results. The graph for each year follows the best and worst for each year. (Note: the graphs don't show every player, just every third or fourth player. The blue points represent the players' clutch factors and the red points represent the players' player clutch).

In 2007, there were 162 players who qualified for the batting title (and thus this study), and the league average "player clutch" was 0.210 (the average player drove in approximately .210 RBIs more in clutch situations per PA than all situations). As a reminder, each player's individual player clutch was then divided by the league clutch (.210) to figure out the clutch factor for each player. The clutch factor represents how a player performed in the clutch compared to the league. A ClF of 1.0 means that the player was just average in his increase in performance. A number of 2 means that the player increased his performance twice as much as the league average. Finally, a number of .250 means that a player was 4 times worse than the league average in increasing his performance (but still did increase his performance, only numbers less than 0 mean the players performed worse in the clutch).

The most clutch player (MCP) in 2007 was Jim Thome of the White Sox, by just a hair over Brad Hawpe of the Rockies. Thome had a clutch factor of 2.185, due to his crazy clutch RBI/PA of .61 (22 RBIs in 42 PAs with 6 IBBs). He is actually the "King of Clutch", with the highest ClF out of any player in the three years. Hawpe finished with a clutch factor of 2.184, with a clutch RBI/PA of .59 (49 RBIs, 89 PAs, 6 IBBs). Other top finishers were Ichiro (2.016 ClF), Ryan Howard (1.983 ClF), and Pudge Rodriguez (1.879 ClF). Interestingly enough, both in 2007 and 2008 Howard had exactly 111 PAs in clutch situations, which was the most PA by any player in the three year period (and his 52 clutch RBIs in 2007 rank as the most by any player in the three years).  

On the other side of the coin, in 2007 the least clutch player (LCP) was Dan Uggla of the Marlins. He had 88 RBIs in 728 PAs (with no IBBs) in the season, and only 12 RBIs in 93 PAs in the clutch. This led to a player clutch of only 0.009 (he had a RBI/PA of 0.12088 for the season, and only 0.12903 for clutch situations), and a clutch factor of 0.044. The second worst player was Rickie Weeks of the Brewers, with a ClF of 0.062, as he only had 5 RBIs in 63 PAs in the clutch. Other LCPs were Jason Varitek (0.233 ClF), Jeff Kent (0.275 ClF), Michael Cuddyer (0.280 ClF), and Vernon Wells of the Jays with a ClF of just 0.299.


In 2008, there were 147 players that qualified for the batting title, and the league average player clutch was 0.201. The MCP was Josh Hamilton of the Rangers, with a clutch factor of 1.916. He had 130 RBIs in 704 PAs (with 9 IBBs) during the season, but really stepped up his game in the clutch, driving in 44 runs in 90 PAs (with 6 IBBs). Other MCPs were Alexei Ramirez (1.901 ClF), Johnny Damon (1.884 ClF), and Grady Sizemore (1.853 ClF).

The LCP during 2008 was Milton Bradley of the Rangers, with an astonishing clutch factor of -0.112. Yes, Milton Bradley was the only player in the three years with a negative ClF, meaning that he was actually worse when the pressure increased. He drove in 77 runs in 509 PAs (13 IBBs) during the season, but only had 8 RBIs in 65 PAs (6 IBBs) in clutch situations. Thus, Milton Bradley has been crowned the "King of Choke", with a performance that no other player even came close to. Other LCPs included Paul Konerko (0.043 ClF) and Adrian Beltre (0.087 ClF).


Finally, last year there were 155 players who qualified for the batting title, with an average player clutch of .213 (the highest league average of the three years). The MCP of 2009 was Hanley Ramirez of the Marlins, with a clutch factor of 2.120 (the third highest total). He had 106 RBIs in 652 PAs (13 IBBs), but had 34 RBIs in 65 PAs (6 IBBs) in the clutch. Other MCPs include Joey Votto (1.796 ClF), J.D. Drew (1.721 ClF), and Edgar Renteria (1.661 ClF).

Lastly, we find the Bengie Molina was the LCP of 2009. He had a clutch factor of 0.137, with only 14 clutch RBIs in 78 clutch PAs. Other LCPs include Troy Tulowitzki (.260 ClF), Mark Teahen (.303 ClF), and Jason Kendall (.306 ClF).



There are the results I have obtained for the past three years using my clutch statistic. It is very interesting to see which players stepped up their game in the clutch, while other players (Milton Bradley!) actually perform only marginally better, sometimes even worse. I am going to do another post later on with some interesting patterns and phenomena that we can see in the data, most importantly, whether or not the statistic shows that players can perform consistently in the clutch.

Friday, August 20, 2010

Fact of the Week II: Home Runs

We all know that Jose Bautista has been having a great season. He is currently leading the league in home runs with 37, 6 more than any other player. But we probably didn't realize just how great of a season it has been in comparison to all other Blue Jays seasons.

Only eight times has a player for the Blue Jays hit at least 40 home runs, and only six players have done it (Carlos Delgado did it three times, in 1999, 2000, and 2003). Bautista already has 37 home runs, and is on pace for 50, which would be the first time a Blue Jays batter has hit 50 home runs in a season (the highest single season total is 47 by George Bell in 1987). Bautista should at least hit 48 home runs to break the single season record.

Pretty good, right? It actually gets better. Bautista is not only hitting a lot of home runs, he is actually getting on base a ton too (he is currently tied with Daric Barton for the AL lead in walks with 73). Bautista is already only the third player in franchise history with at least 37 home runs and an on-base percentage of at least .370. If he can keep his OBP at .370 and hit just eight more home runs, he will have the most home runs by a Jays player with an OBP of at least .370.

So we can see that this season has been a very special one for Jose Bautista and the Blue Jays. Although he has not hit for a high average, he has excelled in a few key areas, such as hitting home runs, walking, and also throwing out baserunners (he is second in the MLB this year with 10 assists, only one behind Shin-Soo Choo). Hopefully he can play this well the rest of the season and for the next couple of years.

Thursday, August 19, 2010

The Art of Clutch: Part I

In today's post I am going to discuss something that has been debated by baseball fans and statisticians for many years: whether or not a certain player is "clutch". Certainly, there are some players out there (Derek Jeter anyone?) with the label of being a "clutch" player, but so far there have not been any studies done that prove that some players are more "clutch" than others. Today I am going to attempt to figure out an appropriate statistic that will measure how clutch a player is, and then later this week I will present my results and any conclusions that can be drawn from the findings.

*Note: I am only measuring batting statistics, a study on pitchers being clutch might come later

The first thing I wanted to determine was an appropriate statistical equation to measure how "clutch" a player is. First of all, what exactly is clutch? It is the ability to score runs in clutch situations, those with high leverage. The clutch situation I am going to be measuring is with runners in scoring position and two outs. There are some, shall I say, more statistically accurate measures of clutch, but the situation of 2 outs and RISP happens multiple times per game, giving us a large enough sample size to be able to draw reasonable conclusions. So the first thing I want to measure is the RBIs per plate appearance for hitters with 2 outs and RISP. One adjustment to make is to eliminate intentional walks from the equation, as when hitters are intentionally walked they are not given the opportunity to drive in runs, so they should not be penalized. I am also going to put a C in front of each statistic that is measured from clutch situations to denote that it is a "clutch" statistic. So the first part of the equation is going to be C RBI divided by (C PA minus C IBB).

Now, we need to expand upon our definition of clutch a little bit. Being clutch not only means driving in runs, which varies widely with the hitter, it means that the hitter raises his game as the pressure of the game raises. This means that a hitter who is clutch should have a higher RBI/PA when the game is on the line opposed to his overall RBI/PA. We can measure this by taking the RBI/PA for clutch situations and subtract the RBI/PA for all other situations. This number will represent the increase in RBI/PA that a player will have in clutch situations over regular situations. The formula is a little convoluted, but all pieces are necessary. We can see that the first part is just the piece from above, and the second part is figuring out the RBI/PA for all non-clutch statistics.


The final step for our equation is to normalize the statistic onto a scale of one. To do this, we are going to divide the individual player's clutch statistic by the league average statistic, which is just the same calculations with the league totals for all eligible players used as the numbers in the formula.


What the resulting number will express is the average increase in "clutch" for each player, relative to league average. A number of one means that that player is average in clutch situations, that is, the player drives in an average extra of runs per plate appearance in clutch situations. A number less than one means the player is a below average clutch player, and a number greater than one means the player is an above average player. The resulting numbers will almost always be between 0 and 2, a number less than 0 means the player is downright miserable in clutch situations, and a number greater than 2 means the player is absolutely amazing in clutch situations.

So this is the process to determine how clutch a player is. It will be interesting to view the results, and determine whether or not a player can consistently maintain how clutch they are from season to season.

Monday, August 16, 2010

Park Factor at SkyDome

In a post last week, I said the following:
Skydome (well the Rogers Center) does seem like it's a fairly hitter-friendly park (in the early 2000s, it's "Park Factor" was over 100 consistently, which means it favored hitters, while in the last couple of years it has been below 100 a couple of times).
I have thought about the statement a lot, and decided to do some more research on the topic, as I do not like making statements without valid statistical analysis backing me up.

You can read all about ballpark adjustments and factors here. That post will show you how to calculate the ballpark factor for a park over one year or multiple years, but since the statistics have already been calculated and are already on the corresponding team and year page, you can take them from there. The numbers will all be around 100, as 100 represents a "medium" ballpark, with no advantages to batters or pitchers. If the number is above 100, the ballpark favors batters, and if it is below 100, it favors pitchers.

I have taken statistics from each year for the Blue Jays since 1990. The SkyDome opened midway through 1989, and since the statistics will be skewed for that year I decided not to include it, but since then the Jays have played all of their home games at the SkyDome. I took the ballpark factor for both multi-year and one year, the number of home runs and number of extra base hits at the SkyDome and away each year (81 games, except for 1994 and 1995 because of the strike, and 2010, which is projected), and finally the batting average and batting average on balls in play for the Jays at home and away. I have constructed a couple of graphs which I believe show that the SkyDome is in fact a fairly friendly hitter's park. (Click on any graph to enlarge it and see more detail.)

The first graph is all home statistics. I wanted to compare the park factor to the number of home runs the Jays hit at home each year. As we can see by the trendlines, the park factor and number of home runs have no correlation at all, as the park factor has stayed fairly constant (actually decreasing by about 0.2 per year), while the home runs have varied widely, with an average gain of 1.3 more home runs per year. This graph shows that the park factor calculations take into account a lot more than just home run totals.

Now I am going to look at home and away splits. Each of the following graphs shows a different statistic presented in home vs away fashion, and the goal is to get a visual representation of whether or not the SkyDome is in fact a hitter's park. Although on average, teams should hit better at home than on the road (familiarity of the stadium and its intricacies, fans cheering instead of booing, the comfort of home, etc.), the actual increase is hard to quantify, and thus, we must assume that home and away statistics should be even. We are also assuming that the road statistics are based on an average ballpark factor of 100, because we are trying to compare home vs away statistics, we want to determine the "friendliness" of the SkyDome to batters.

The first graph shows the number of home runs hit by the Blue Jays at home and on the road each season from 1990 to 2010. This graph is a pretty clear indication that the Jays hit a lot more home runs at home than on the road, especially in the last decade or so. The trendlines show that approximately 1.3 more home runs are hit at home per year, while only about 0.5 more home runs are hit on the road per year. So, we can say with good certainty that the SkyDome is more home run friendly than the average American League ballpark (since the large majority of their away games were played in the AL).


The next graph will compare the extra base hits at home and on the road. Although we already looked at home runs, it is important to look at extra base hits because we want to determine whether the balls that are not home runs on the road will either fall in the gaps for doubles or triples or whether they will be caught. If they find gaps, then the away stadium is still mostly hitter friendly, but if the balls that usually leave the park at home get caught on the road, then we know for sure that the SkyDome is much more hitter friendly for power statistics. We can see from the graphs that almost every year there are more extra base hits at home than on the road, other than a couple of isolated years in the 90s. So since there are more home runs and just as many extra base hits at home, we can safely say that the SkyDome is an above-average hitter friendly park.


The last thing I want to look at is the batting average splits. We have seen that the SkyDome is hitter friendly for power, but we also want to find out if it is hitter friendly for average. I chose to look at both batting average and batting average on balls in play, since batting average can be very inconsistent (although team batting statistics for 81 games is a fairly large sample size). Although the trendlines are almost constant in all of the cases below, we can take the average average for each statistic from them. The average batting average at home is 0.267, and on the road it is 0.264. For BABIP, at home the average is 0.297, while on the road it is 0.294. Although these are only small differences, over the course of 21 years they can become significant changes. Although we may not be able to say that hitting for average at the SkyDome is easier than on the road (see the above explanation for why hitting at home may be easier), we can say that it is at least just as easy at home as on the road.

So after taking a look at the home vs away splits for the last 21 years, I am confident that what I said last week is true. I am glad that my analysis shows that what I said was true, and that I wasn't just saying false facts. SkyDome is in fact a hitter friendly park in comparison to other American League parks, both for power and somewhat for average.

Friday, August 13, 2010

RA Dickey's One-Hitter

Just wanted to do a quick post on RA Dickey's one-hitter against the Phillies tonight. What made the game so special was that the only Phillies hitter to get a hit off of Dickey was the opposing pitcher, Cole Hamels. It made me wonder, when was the last time, and how rare is it for a pitcher threw a one-hit, complete game shutout, allowing only a single hit to the opposing pitcher and less than two baserunners?

This table will help us figure it out, but then we need to cross-reference those games with games on a list like this (but for every year, that only includes games in 2010). The second table is all games in which the 9 hitter for an NL team (presumably the pitcher) got at least one hit, but also did not score or drive in any runs (thus eliminating the shutout). So what we are doing in cross-referencing all one-hitter in the NL since 1920 with all games in which an NL pitcher got a hit.

You actually have to go back quite a ways to find the last such occurrence, which was a game on August 18, 2003 between the Rockies and Mets. That day, Steve Trachsel of the Mets (interesting that both today and the last time was by a Mets pitcher) gave up only one hit and two baserunners (the other was on an error), and the only hit was a double by opposing pitcher Chin-hui Tsao of the Rockies.

It would take a lot of research to find out every occurrence of a one-hitter with only the pitcher getting a hit, but it is definitely a rare case. Going back to 1985, there have only been three occurrences, the two above as well as this game on September 21, 1986 (in which Padres pitcher Jimmy Jones threw a perfect game except for one hit he gave up to opposing Astros pitcher Bob Knepper). So tonight's gem by RA Dickey really was a special event.

UPDATE: I have found two more games that had only one hit by the opposing pitcher, one on May 1, 2006 (which would be the most recent one), and one on June 8, 1992. The first game had a hit and three walks, and the second one had a hit and four walks, which is why they did not turn up in my previous searches. Still, only five occurrences in twenty-five years is very rare, so we will probably be waiting awhile for the next one.

Fact of the Week I: Grand Slams

Every Friday, I am going to introduce a fact that is interesting to me, but may or may not be interesting to you. So this week, I am going to look at grand slams.

So far this year, the Blue Jays have only hit two grand slams. Considering that there have already been 91 grand slams hit so far this season, the fact that the Jays have only hit two is surprising, especially considering they are leading the league in home runs by a wide margin.

Perhaps even more surprising is that the Jays have already given up five grand slams. Since the average major league team should hit and give up the same amount of grand slams (about 3 per team so far), it is interesting that the Jays have given up three more than they have hit (again, especially considering their home run exploits). And from July 3 to July 18, they had actually given up 5 and not hit one, before Yunel Escobar hit his 1st home run of the season (and 1st as a Jay).

So there is the inaugural fact of the week. Grand slams are always interesting, and although so far this year there has been an interesting set of results for the Jays, the data set is simply not yet large enough to conclude that there is a definitive pattern going on. Thus, the grand slam belongs in the fact of the week.

Best, Worst Performances vs the Jays

As a follow-up to my post yesterday, where I analyzed the best and worst performances by Jays' players, today I am going to analyze the best and worst performances so far this year by players against the Blue Jays. Again, I am going to use WPA as a measuring stick, as it is the most useful game-to-game statistical tool.

The best hitting against the Jays so far was Mark Teahen of the White Sox on April 12. Teahen went 3 for 5, with a single, triple, and home run, and 3 RBIs. His leadoff homerun in the top of the 9th inning tied the game at 7, and raised the probability of the White Sox winning by 33%. Then, in the 11th inning, Mark Kotsay led off with a single, and Teahen drove him in with a triple, again raising the probability of the White Sox winning by 33%. Overall, he had a WPA of 0.761, on the day, easily the highest WPA by a Jays' opponent this season.

The best pitching performances against the Jays so far can be found here. Ervin Santana's complete game win on April 18 (boy, was that a bad week for the Jays!) has been the best performance so far. In the Angels' 3-1 win, Santana only allowed 4 hits, and the only run was an Adam Lind solo home run with 2 outs in the bottom of the ninth inning. Santana never faced more than 4 hitters in an inning, and before the home run had retired 17 Jays in a row (a Lind single in the 4th was the last hit). He steadily gained WPA over the course of the game, and since the Angels were already up 3-0 and only needed one more out to win when Lind hit the home run, his WPA barely dropped. He ended up with a WPA of 0.675, pretty easily the top pitching performance against the Jays this year. Surprisingly, the performance only merited a Game Score of 81, and is only the 4th best pitching performance (measured by Game Score) this year by an Angels' pitcher.

Now that we have taken a look at the best performances (well, the worst from the Jays perspective), we can take a look at the worst performances so far this year against the Jays. This shows the worst hitting performances, and Mike Sweeney's 0 for 5 performance on May 19 leads the way. He was already 0 for 3 with a strikeout when he came to the plate in the bottom of the 7th in a 3-2 game (Jays leading) with runners on second and third and 2 out, and when he popped out to second the Jays probability of winning increased by 13%. Then, in the bottom of the 9th, with the Jays still leading 3-2, there were runners on 1st and 2nd and two outs when Sweeney flew out to deep left, increasing the Jays probability of winning by 17% (to 100%). Those two outs were key in his WPA of -0.381 for the game. His aLI was 2.790 for the game, which means that because he made key outs at key points in the game, his WPA took a tumble.

Finally, the worst pitching performance against the Jays so far this year belongs to Bobby Jenks on May 9. Jenks entered the game in the top of the 9th inning with the White Sox winning 7-5, and did not record an out while giving up 4 hits and 4 runs (3 earned). The inning went like this: ground-rule double, single, home run, single, and that was it for Jenks as Scott Linebrink replaced him. The 3-run home run by Fred Lewis single-handedly increased the Jays winning probability by 51%, from 31% to 82%. Overall, Jenks had a WPA of -0.767, and the Jays ended up winning the game 9-7. Interestingly, Jonathon Papelbon's disaster of a ninth inning yesterday finished second on the list of worst pitching performances with a WPA of -0.739. I will be doing a post later this week about last night's game and blown saves.

So there are your best and worst performances by a Jays' opponent so far in 2010. The best performances seemed to occur mostly in April, while the worst performances happened in May (which probably have a correlation to the Jays 12-12 record in April and 19-10 record in May). Surprisingly, none of the games discussed yesterday were repeats today, but some of them were lower down on the list of good and bad performances.

Thursday, August 12, 2010

Best, Worst Jays Performances so far in 2010

I thought it would be interesting to do a quick study on the best and worst batting and pitching performances so far this year. A quick way to measure the value of a performance is WPA, which is Win Probability Added*. It sums the value of each play made by a certain player in a game. For a quick example, say that Vernon Wells comes up in an inning with the Jays chance of winning at 45% (e.g. down by one in the middle innings). If Wells hits a home run, and the chance of winning increases to 50%, then the value of the home run is 5%. If you add up (or subtract) the total changes in chances of winning the games, you will get WPA. It is a little confusing at first, but it is an extremely useful (if not the most useful) statistic for measuring a player's contribution to winning individual games or his value over a season or career.

*Quick note: I am using the WPA from Baseball-Reference, which may differ from that of Bill James or FanGraphs as each site has a slightly different WPA formula, but all numbers end up very close

The first thing we are going to look at is the best performances for Jays hitters. This shows the the most valuable game so far this year by a Jays hitter was Adam Lind against the Cleveland Indians on May 5. The Jays won the game 5-4, as Lind hit a 2-run home run with 2 outs in the 9th inning and the Jays down by 1 run. For the game, he went 2 for 4 with a walk, a run, and 2 RBIs. FanGraphs shows that the home run had a WPA value of .724, as the probability of the Jays winning the game went from 8% to 80.5%. That game by Adam Lind, and especially the home run, is the most valuable hitting performance by any Jays player so far this year.

The next thing to look at is the best performance by a Jays pitcher. Any guesses on what the result might be? Yep, it was Brandon Morrow's gem on Sunday against the Rays that had the highest WPA, which corresponds to what I described here this week. One of the biggest reasons why it was so high was because of aLI, or the Average Leverage Index. aLI measures the average pressure that a player will face in each situation, with 1 being average pressure, under 1 being low pressure, and above 1 being high pressure. (An interesting sidenote is that the top 12 best pitching performances are from starters, while we will see that 6 of the 7 worst pitching performances are from relievers. This has a lot to do with aLI, as the best performances are usually when pitchers pitch 8 or 9 shutout innings in a close ball game, while the worst performances are usually when relievers enter a close game and get lit up.) Morrow had an aLI of 1.454, which was also higher than any of the other performances on the list. A big reason why it was so high was because the Jays were only winning 1-0 throughout the entire game, so Morrow could not take any innings off because of a big lead, but had to pitch hard the entire game.

Now, the flip side. We have looked at the best performances of this year so far, now we need to look at the worst. First up, the worst hitting performances so far. Alex Gonzalez's June 23 game against St. Louis comes in as the worst performance so far with a WPA of -0.376. Gonzalez went 0 for 3 with a walk, and 2 double plays grounded into. Both double plays came after leadoff singles by Vernon Wells (in the 4th and 9th innings), and since the Jays ended up with a 1-0 loss, any run would have helped a lot. The 9th inning double play was the play with the largest change in WPA for the Jays, as the Jays went from having a 34.2% chance of winning to only a 4.9% chance of winning (WPA of -0.294). That play, along with his hitless night, gives him the top spot in the worst hitting performance of this year.

Finally, we are going to look at the worst pitching performance of this year. I don't know about you, but when I think about bad pitching so far this year, one name pops into my head: Kevin Gregg. Gregg has gotten a lot of saves (25 so far), but has also blown a couple (4 so far, resulting in an 86% save percentage). But to me, it always seems as if he pitches well under the least pressure, that is, if he comes into the 9th with a 3-run lead against the Orioles, he will always get the save, but if he comes into the 9th with a 1-run lead on the Yankees, he will blow it. And when we take a look at the leaderboard, Gregg's name is all over it. He actually holds the top 3 spots for worst pitching performances, but the one that takes the cake is his performance against Seattle on May 20. He came in with a 3-1 lead, and then: single, single, walk, walk, sac fly, single, game over. Gregg's line: 0.1 IP, 3 H, 2 BB, 3 ER, BS, and L. He threw 25 pitches, only 12 for strikes, and his WPA was an astoundingly bad -0.905, as the Jays had over a 90% chance of winning when he entered, and ended up losing.

There are your best and worst performances so far in 2010. An interesting note is that all performances came in 1-run games, with the games winning 5-4 and 1-0, and losing 1-0 and 4-3 in the respective order of the performances. This again has to do with aLI. The higher the aLI, the higher the probability that a single play (or a couple plays in a row) can swing the entire outcome of the game. And although we measured everything in terms of WPA, I believe that the results we have come up with should pass the "gut-check"; that is, if you were to eyeball some of the best and worst performances so far this year, many of these performances would be at the very top (or very bottom).

We will check back on this post at the end of the year, to find out if the results will still be the best and worst performances after 162 games. Also, tomorrow I will be working on a post similar to this one, for the best and worst performances by opposing players when they are facing the Jays. I would not be surprised if many of the same games we looked at today will be involved tomorrow.

Cycles and Games oh so Close

In this morning's post, I talked a little bit about players in their MLB debuts hitting for the cycle (which has never been done). This led me to researching about Blue Jays hitting for the cycle. I remember back in 2001 when Jeff Frye did it (interestingly enough, future Blue Jay Frank Catalanotto was on the Rangers when Frye hit for the cycle against them, and we'll talk a little more about him later), so I knew it had at least been done before, but I was interested in digging a little deeper. This shows the number of times a Jay has hit for the cycle, which has happened twice, with Frye and Kelly Gruber back in 1989.

Considering that there have been 111 cycles since 1977 (the year the Jays joined the MLB), you would think that they would have hit more than 2 (the average major league team hit 3.7 cycles over that span). Now obviously some parks are easier to hit cycles in than others (I always think of Fenway Park, with the Green Monster for home runs and doubles and the "triangle" out in right-center field for triples), but Skydome (well the Rogers Center) does seem like it's a fairly hitter-friendly park (in the early 2000s, it's "Park Factor" was over 100 consistently, which means it favored hitters, while in the last couple of years it has been below 100 a couple of times). So I wondered, how close have the Jays been to hitting cycles over the years?

The first, and most obvious thing to look at, is the number of times a Jays hitter has been only a triple away from the cycle. It happened on Saturday, in JP Arencibia's debut (which I talked about earlier today), when he had a single, a double, and two home runs (Edwin Encarnacion also fell just a triple short of the cycle on Saturday). This actually happens fairly frequently, probably even more often than you would think, as there have actually been 257 occasions where a Jays hitter has fallen just a triple short of the cycle, including 11 times already this year. On average, a Jays hitter has fallen short of the cycle by a triple 7.56 times per year.

Since triples are so rare, it is to be expected that most of the time a player misses a cycle will be because of the triple. The next most difficult hit to get is the home run, and this shows that Blue Jays hitters have missed the cycle by a home run 73 times (2.15 times per year). It last happened on June 16, when Fred Lewis had 4 hits, including two singles, a double, and a triple, but couldn't quite complete the cycle. Whereas players usually miss the triple because they aren't quite fast enough, players who miss by a home run are players who aren't power hitters.

On the contrary, those who miss the cycle by only a double are usually players who just don't get enough at-bats or just can't quite stretch a single into a double, especially if they end up with 4 hits or more. This shows that 23 Blue Jays over the years (0.68 per year) have missed the cycle by just a double. In every case but one (Ed Sprague with two home runs in 1996), players end up with either 3 hits: single, triple, home run, or 4 hits: 2 singles, triple, home run.

The most painful non-cycle occurs when a player hits a double, triple, and home run, but misses by just a single. These are rare occurrences, and as this shows, it has only happened 13 times in Blue Jays history. In all 13 games, the player only ended up with 3 hits, all for extra bases. The last time it happened was to Frank Catalanotto in 2005, who actually missed the cycle by a single in 2005, a triple twice in 2003, and a home run in 2003 and 2004. The most recent player who has missed the cycle by every single type of hit has been Vernon Wells, who missed it by a single in 2003, a double in 2002, a triple in 2002, four times in 2003, and once in 2006, 2007, 2008, and 2010, and a home run twice in 2002, and once in 2003 and 2009.

So in total there have been 368 games in Jays history (an average of 10.82 a year) where a player was one hit away from the cycle, yet only twice did the player actually achieve the cycle - only 0.54% of the time! It seems as though the Jays franchise is due for another cycle soon, and already this year there have been 12 instances where a player is one hit shy of the cycle, which is the highest amount since 2006, when there were 16 instances where a player was one hit away from the cycle (this year is on pace for 18, which would be the highest total since 2002).

I believe that we will see another cycle for the Jays by the end of 2011. The way that they have been hitting the ball the last couple of years, and especially this year, I think that it may happen before the end of this year but more likely sometime next year. They have been extremely unlucky so far in their quest for cycles (though not as unlucky as the Marlins and Padres!), and at some point the law of averages must balance out.

JP Arencibia and MLB Debuts

A couple of weeks ago, Alex Rodriguez fouled a pitch straight back, which hit John Buck in the finger, causing Buck to exit the game. He was later placed on the disabled list, which led to the call up of top Jays catching prospect JP Arencibia. Arencibia was drafted 21st overall by the Jays in the 2007 MLB draft, and last Saturday he finally started his first major league game. He hit a home run on the first pitch of his first major league at-bat, then proceeded to get a double, single, and another home run. He finished the game 4 for 5 with two home runs, 3 runs, and 3 RBIs. The Jays won the game 17-11, and his debut was immediately praised everywhere, from Jays nation to leading off Sportscenter. But just how good was his debut?

To start off, Arencibia became the first Blue Jay to have at least 4 hits in his MLB debut. He also became only the 15th player in MLB history with at least four hits. He was only the second player since 1998 to get 4 hits (Wilson Ramos, now of the Nationals, got 4 hits in his debut for the Twins in May), and had the most RBIs for any player with at least 4 hits.

But not only did Arencibia get 4 hits, he also hit for power, with 3 out of the 4 hits going for extra bases. He became only the 4th player in major league history with 2 HRs in his debut, and the first since 1999 (and only the 2nd since 1964). Out of those players with 2 HRs (Mark Quinn in 1999, Bert Campaneris in 1964, and Bob Nieman in 1951), he was the only player with 4 total hits (the other 3 only had 3 hits). So he hit with power as well as any other player ever making their debut, but hit for average better than any player with power.

Related to the number of home runs he hit is the total number of bases he hit for. He actually had the most total bases ever hit in a major league debut (11), and became only the second player in history with at least 10 TB in his major league debut, after Mark Quinn in 1999. Finally, he became only the 8th player to be one hit shy of the cycle, and the 6th to be a triple shy of the cycle (2 other players were a home run short of the cycle).

To wrap up, in all major league debuts he became the first Blue Jay and 15th player ever to get at least 4 hits, first Blue Jay and only 4th player ever to hit at least 2 HRs, and had the most total bases ever in a major league debut. I believe that without question it was the best debut for any Blue Jay hitter ever, as well as either the best or at least top 5 best debuts for any hitter ever. So when you were watching the game, or watching the highlights afterward, you may have believed that what he was doing was special, but the statistics show just how special his debut really was.

Wednesday, August 11, 2010

Young Guns and Strikes

I was reading an interesting piece by Tom Verducci of SI the other day (you can read it here), and it got me wondering: how well is the rotation set up for now, but more importantly, for the future?

The article points out that the key to winning in the AL East is having pitchers that can throw lots of strikes, and also generate lots of strikeouts. We are going to look at statistics for pitchers 25 and under, which classify as young, either still prospects that are trying to break into the major leagues or pitchers that have just established themselves in the past few years. We will look at both the number of strikes and strikeouts individually, but I think the easiest place to start would be with wins.

It has been proven that wins are a very flawed statistic, but if your young pitchers are getting more wins in comparison to other years, it should be a good sign that you have some good young pitchers that are able to pitch and win at the major league level. This table shows the numbers of wins each year by pitchers on the Blue Jays that are 25 or younger (turning 25 before June 30 of the season). As the table shows, the 2009 Blue Jays had 23 total wins from pitchers under 25, mostly from Ricky Romero, Brett Cecil, and Marc Rzepczynski. That total ranked 9th all time for the Blue Jays, which shows that there was some promise there (as a comparison, 2008 had 25 and under Jays' pitchers winning only 13 games). You can see the 2009 totals for all major league teams here, which shows that the Jays ranked 14th. However, this year there has been a big step-up. There have already (barely 2/3 of the way through the season) been 26 occasions where a pitcher 25 and under has gotten the win this year. Again, the familiar names are Romero, Cecil, and now Brandon Morrow. They are currently in 2nd place for all major league teams for 2010. If the pitchers continue at this pace for the rest of the season, they will shatter the record of 34 set in 1982, and get to around 39 or 40. This shows that, in terms of wins, the rotation is set up well for both next year and many years to come.

Now that we have seen what wins have told us about the future, we can look at some less flawed statistics, such as strikes and strikeouts. Since wins are dependent on the pitcher as well as both the offense (scoring runs) and the defense (making plays for the pitcher), strikes are based purely on the pitcher's performance. The first point mentioned in the article is that pitchers in the AL East need to get a lot of strikeouts. If we look at the strikeouts per nine innings for pitchers 25 and under for the Jays, we can see that the past few years have been very good. Last year, there were 5 pitchers (who started at least 60% of their games) 25 or under who struck out at least 6 batters per nine innings. If we look at all teams in the major leagues, we can see that the Jays were second only to the Marlins, who had 6 pitchers that fit. So far this year, the Jays have 3 pitchers (Cecil, Morrow, and Romero) 25 and under that are averaging at least 6 strikeouts per 9 innings. Again, if we look at all teams, the Jays are tied for first with the Braves, Reds, Dodgers, and A's. So in terms of strikeouts, their young pitchers are performing better than ever, which shows that the future is bright for their rotation.

Finally, we are going to look at the other point mentioned in the article: throwing lots of strikes. Although it may seem counter intuitive to look at both strikeouts and % of strikes, there are cases (52 % strikes, 8.5 SO per 9 IP) where a pitcher may get a lot of strikeouts but not throw a high percentage of strikes. If we look at the % of strikes thrown by pitchers 25 and under, we can get an idea of how good they can be versus the AL East. This table shows the number of pitchers 25 and under each season for the Jays that threw strikes at least 60% of the time. Unfortunately, this data is only available from 2000 onwards, but we can still see that the last two years have the most pitchers 25 and under throwing at least 60% strikes (5 and 4, respectively). Again, if we want to view all teams in the major leagues, we can look at 2009 here and 2010 here. The Jays rank in the top 6 in both years, so we can see that their young pitchers are not only striking out hitters, but also throwing a lot of strikes.

It is always interesting to read something and then use statistical analysis to either back up what you just read, or completely falsify it. In this case, we can see that the article is completely correct in stating that the Jays have the pitching to win in the AL East based upon throwing strikes and getting strikeouts. It is excited to know that the rotation is set up very well for the next couple of years, and if the team continues to hit like this year in the future, the team should do very well.

The Jays After 112 Games

Just thought I would post a quick note here about how the Jays have done this year in comparison to previous years.

This table shows that after 112 games this year, their 59 wins tie them for the 9th best total in Jays history. That is, comparing the first 112 games of every Jays season ever (the past 34 seasons), they have won the 9th most games this year.

Even more impressively, it is their highest win total in the first 112 games since the new millennium. The last time they won more games was 1999, when they won 61 of their first 112 games. So even though this was supposed to be a "rebuilding" year, and it still is to some extent, the Jays are doing very well in comparison to their recent history.

Tuesday, August 10, 2010

Brandon Morrow and Other Blue Jays Pitching Gems

Well, now is as good of time as any to write my first post. On Sunday, I went to my first Blue Jays game this season, and happened to witness one of the greatest pitching performances by a Blue Jays pitcher ever. What a game to see! As you probably already know, Brandon Morrow pitched a complete game, one-hit shutout with 17 (17!) strikeouts, and gave up the only hit with two outs in the ninth inning.

So the question is: where does this rank in terms of the greatest games ever pitched by a Blue Jay? Certainly, the best game ever pitched was probably on September 2, 1990, when Dave Stieb no-hit the Cleveland Indians for still the only no-hitter in Blue Jays history.

Here is a list of all games in which a Blue Jays pitcher pitched at least 9 innings and gave up 1 or no hits.

There have been a total of 15 games in which a Blue Jays pitcher threw at a complete game with only one hit. It is interesting to note that the only time a run was scored against in these games was the September 27, 1998 game, where Roy Halladay had a no-hitter through 8 innings before giving up a home run to Bobby Higginson in the 9th (in his rookie season nonetheless!). But Morrow did not simply throw a complete game one-hit shutout. He also struck out seventeen batters. Here is another list of games in which a Blue Jays pitcher has thrown a one-hitter with at least 10 strikeouts.

As you can see, only three times has a pitcher for the Jays gotten at least 10 strikeouts and only given up one hit. And they have all happened this year! Morrow on Sunday, Brett Cecil in May (8 IP), and Ricky Romero in April (also 8 IP). Pretty amazing.

So this game was the first time a Blue Jays pitcher has thrown a complete game one-hitter (also a shutout) with at least 10 strikeouts. And again, he struck out 17 batters, which is the second highest total in Blue Jays history, only behind Roger Clemens' 18 on August 25, 1998 (he's also the first Blue Jays pitcher besides Roger Clemens to get at least 15 K's).

I think the conclusion here is that this was probably either the best or second-best game ever pitched by a Jays' pitcher (up for grabs with the no-hitter). If you want to judge a game by Bill James' Game Score, it ranks as the highest game score ever recorded by a Blue Jays pitcher. It was an amazing performance, and we all hope that Morrow can continue pitching this well the rest of this season and his career!