Upload
jesse-cox
View
14
Download
0
Embed Size (px)
Citation preview
Does being on a more experienced team influence or
increase the performance of rookie players in Major League
Baseball?
Adam Rothstein Jesse Cox
May 3, 2016
Abstract
This paper examines the effect of a new MLB player’s team experience on their performance in
subsequent years. Using MLB statistics for players and for teams from the years 2013 and 2014
we took a sample of rookie players entering the league no earlier than 2011 and used a simple
multiple regression and a logistic regression to determine both the amount and probability of an
increase in skill and proficiency as measured by their respective batting averages. We find that
our models were not highly telling models for the subject matter. We need a bigger, more
comprehensive data set so that we can drop variables that skew results such as a negative change
in batting average.
1
1 Introduction/Literature Review
Drafting players for any major league sport is the highlight of the preseason for loyal
followers of sports and carries a lot of weight for the possible performance of the team that year.
Being able to adequately pick and draft players that are not only top performers in their league
now but have the potential to become even better is pivotal. But what about the current
performance of a team? We speculate that it is not entirely up to the player and their respective
potential to become a better player but possibly the player’s experience and notoriety around
them.
We know that when a player joins the MLB their batting average is lower than it will be
at their career high (Sommers). Sommers finds that a player’s batting average increases from the
point they join the MLB until they hit a career maximum at which point it slowly declines and
that this process takes from 79 years. Sommers’s study took into account the possibility of
injury and minimum atbats that tend to skew these results. Our study is focused on how a
player’s respective team can influence their improvement or better yet the rate at which they
improve.
In Horowitz’s study finds that MLB team owners want teams to be evenly matched
because close wins and losses or rather the potential of a close match drives ticket sales which is
2
a major source of profit for them (Horowitz). They also find that the talent disparities have been
much lower in recent years (Horowitz). Finally they conclude that competition did not drive
performance to its peak (Horowitz). We disagree with this on the basis that even though they
found talent disparities to be less than previous years we see the same teams making it to the
playoffs year after year. We also disagree with the finding that competition does not drive
improvement. We theorize that a more experienced team will have a positive spillover effect on
new players and that the drive to perform up to par is higher and that this will cause them to
improve at a faster rate.
2 Data
We used MLB baseball statistics from baseballreference.com. We took a sample of
players from the 2013 season who had been drafted for their first year in the MLB. After
obtaining the players who made their first year’s appearance we excluded all the players who
were over the age of 25 to try to better capture younger players who still have more room for
improvement. We used their batting average during this season as their starting batting average.
Once we had narrowed our list to these players we found them on the 2014 roster and pulled
their batting averages for 2014.
There were a total of 30 MLB teams that we used for this assignment all of which are
represented within our sample of players. This opened up a lot of variability between teams.
With this we expected different teams to have many different batting averages, runs, playoff
3
wins and world series wins. We used the data from the 162 2014 regular season games that every
team participates in. In addition to these 2014 regular season games we used data from the 2010,
2011, 2012, 2013, and 2014 playoff and world series games to try to capture the notoriety or
longer term experience a team has to offer.
2.1 Dependent Variables
We used two main dependent variables in our modeling and analysis. We began by first
using just one dependent variable, Percentage Change in BA which was just the change in
batting average for our sample of rookie players from 20132014. It didn’t take long before we
saw that not every player saw an increase in batting average as we had assumed from our
predictions and from the literature on the topic of batting average changes over time. To try and
correct for this we created a dummy variable for whether or not the change in batting average
was positive or negative and called this variable Positive Change, leaving us with two
dependent variables for our modeling.
2.2 Explanatory Variables
Average Batting Age on Team: The average batting age on each respective team is just the
average age of those who bat on the team. We think this will be significant because older players
have, presumably, been in the league longer and therefore have more experience in their
4
profession. This experience in turn should translate into a greater improvement in individual
performance. The mean age for batters on MLB teams in 2014 was about 28 with a standard
deviation of 1.3. The range is from 25 to about 30 which gives a rather good spread for possibly
explaining some of the change and player performance (Figure 1).
Team Batting Average: This variable is the player’s 2014 MLB team batting average. We
realize that the batting averages of individual players influence the determination of this variable
but each team consists of enough players we feel the minute amount of correlation isn’t enough
to throw off our results. The mean team batting average for MLB teams in 2014 was 28.1 with a
relatively low standard deviation of only about 1.3. This means that the spread of all teams only
lies within a range of about 8 (Figure 1).
Team Average Run per Game: This variable is the average runs per game that the MLB team
scored over the course of the 2014 regular season. Our thinking is that a team that scores more
runs per game will consist of better players and a higher batting average overall. The summary
statistics for this variable showed a mean of 4 with a standard deviation of only .3 so it doesn’t
seem that it will be very significant (Figure 1).
Team Total Season Runs 2014: Total team runs was the total runs each team had per season.
For the same reason we thought average runs per game would be significant we thought this
would be too. While the average runs per game had a relatively poor spread the total season runs
5
had a mean of about 616 and a standard deviation of 52 which makes for a much better spread
than runs per game (Figure 1).
Team Rank: The team rank variable is a numerical value of 15 that corresponds to each team’s
respective ranking within their division. There are five teams per division and six divisions in all.
The average rank was three which is above the expected value for an even spread which could be
indicative of better teams recruiting more rookie players (in 2013 at least) (Figure 1).
Team Win Percentage: The team’s win percentage is calculated by just dividing the team’s
number of regular season wins by 160 (the number of regular season games each team plays in
per year). With a spread of 5% and a mean 48% the range is large enough to predict significance
(Figure 1).
Post Season Performance: The better teams in the league perform in the postseason and have a
chance to compete in the World Series. We thought this would be a very interesting variable to
include but since teams that are better performers often make multiple and consecutive
appearances which would suggest a certain notoriety associated with some teams. In an attempt
to capture the possible effect of this we created a variable which would show performance in the
postseason. If a team ended the regular season without making an appearance in the playoffs,
they received a value of 1. If a team made it to the playoffs but lost, they received a value of 2. If
a team made a World Series appearance, but lost, they received a value of 3. Finally the team
that won the World Series, received a value of 4. We looked at the five most recent years: 2010,
6
2011, 2012, 2013, 2014, and came up with an aggregate total for each team in respect to their
performance. A team that never made it to the playoffs and thus would have the lowest score for
each year, would have a value of 5. If a team won the World Series all five years, they would
have a value of 20. Of course, in these years the same team did not win all five years so the range
for this variable is from 5 to 14 with a standard deviation of 2 (Figure 1).
3 Problems with Data
Once we formulated summary statistics for the data we compiled we found that there was
not always a positive change in a player’s batting average (Figure 1). This makes sense because
of the pretty low range and standard deviation of batting averages in general. Not every player
will experience a dramatic improvement in all of their first years consistently. To try to correct
for this we decided to take only the players who had a positive change in batting average and use
them in a logistic regression. We felt that doing this would enable us to capture a percentage
increase or correlation with our explanatory variables that would be able to stand apart from just
a basic bivariate regression.
3.1 Experiment
7
We used two econometric models in our study to determine the effect of experience on
individual player development. The first was a simple multivariate regression of our explanatory
variables against the first dependent variable, Percentage Change in BA.
0 1AvgBAge 2TmBA 3TmAvgR/G 4TmTR 5TmRank 6TmW% 7PSeasonY = β + β + β + β + β + β + β + β + Σ
The second was a logistic regression in which we regressed our explanatory variables
against our dummy variable, Positive Change. With this we attempted to associated the
percentage of influence each variable may have associated with the possible change in batting
average.
ogit(Y ) 0 1AvgBAge 2TmBA 3TmAvgR/G 4TmTR 5TmRank 6TmW%l = β + β + β + β + β + β + β
7PSeason+ β + Σ
4 Results
After running a multivariate regression using OLS estimators on the percentage change in
batting average for individual players on our explanatory values we received an output that had
an R of only .0984. We found all explanatory variables to be significant within the regression 2
except for the player’s team’s rank and win percentage. Variables that were significant no higher
than the .10 level were team runs per game, team batting average, postseason performance and
total team runs. The rest of the variables were significant at the .05 level or better (Figure 4).
Team batting age was the most significant variable and holds a negative coefficient.
8
Our logistic model also had a very low pseudo value at .0862. Again we have R 2
explanatory variables that are not significant: the player’s team’s rank, the team’s batting
average, average team batting age, and the team’s win percentage. The remaining variables were
significant at the .05 level or better (Figure 5).
5 Main Findings/Marginal Effects
Within the first model we see that average batting age, batting average and total runs all
had a negative correlation with percentage change in the player’s batting average. The average
team batting average goes against our prediction that older players have a spillover effect on
newer players because of their experience. Team batting average and total runs in the season also
go against our prediction that team who performs better in the season will produce higher
performing rookie players. We do find that there is a positive relationship with runs per game
and postseason performance. This does correlate with our predictions because a team that scores
more runs in a game also hits more balls in a game and teams who consistently make it to the
postseason have an influence on their newer players.
In our logistic model we saw only three significant variables: team runs, team runs per
game and postseason performance. The total team runs in the season has a negative effect on the
odds of a player having a positive change in batting average while runs per game and a better
performance in the postseason had a positive effect on the odds of a player’s batting average
increasing. This does not go along with what we predicted. We would have liked to see an
9
increase in odds with older players and team’s batting average to better illustrate the possible
effect of experience and competition.
6 Conclusion
We believe that the variables that were not significant in the multivariate regression
(team rank and team win percentage) were so because they were correlated to each other. The
team with the highest win percentage was the the top rank within their league. If we used the win
percentage within their division and then used the rank overall for the entire league we may have
seen better results.
The majority of the variables in the logistic regression that were not significant are again
a problem with the win percentage and the team rank being correlated. If we had access or the
time to compile a bigger data set for this experiment we feel we would have been able to drop
the observations that were giving us more issues.
To refine this study a larger and cleaner data set is required. We only took players from
one year across one season but if we could take all the rookie players that entered across various
years and possibly their change over two years instead of just one to try to weed out the negative
values that threw us off early on. The problem with putting together a larger data set for this
information is that it is time consuming since all the data is not in the same place due to players
being traded and switching teams from season to season.
Another way this sampling technique can be applied to other studies on the effectiveness
of rule changes or policy changes within the MLB. For example, drug policies were made more
10
strict in 2004 so a sample of all new players from 2004 to date taken in this manner could be
compared to a sample of the ten years prior to the new rules. Since younger players are less risk
averse and have much more pressure on them to perform they are more likely to try performance
enhancing drugs and it would be interesting to see if the rule changes in the MLB had an effect
on the rate or level of improvement in batting average.
11
7 References
Albert, J. (2010). Baseball Data at Season, PlaybyPlay, and PitchbyPitch Levels. Retrieved
May 2, 2016, from http://www.amstat.org/publications/jse/v18n3/albert.pdf
Bradbury, J. C., & Drinen, D. J. (2008, April). Pigou at the Plate. Retrieved May 02, 2016, from
http://jse.sagepub.com/content/9/2/211.abstract
Furnald, N. A., & O'Hara, M. E. (2012, April 16). The Impact of Age on Baseball Player s ’
Performance. Retrieved May 2, 2016, from
http://www.colgate.edu/portaldata/imagegallerywww/21c0d00240984995941f9ae8013632ee/
ImageGallery/2012/theimpactofageonbaseballplayersperformance.pdf
Horowitz, I. (2000, May). The Impact of Competition on Performance Disparities in
Organizational Systems. Retrieved May 02, 2016, from
https://ideas.repec.org/a/sae/jospec/v1y2000i2p151176.html
Houser, A. (2005). Which Baseball Statistic Is the Most Important When Determining Team
Success? Retrieved May 2, 2016, from https://www.iwu.edu/economics/PPE13/houser.pdf
Sommers, P. M. (2008). The Changing Hitting Performance Profile in Major League Baseball,
19662006. Retrieved May 02, 2016, from
12
http://www.researchgate.net/publication/247739825_The_Changing_Hitting_Performance_Profil
e_in_Major_League_Baseball_19662006
13
8 Appendix
Figure 1 Summary Statistics
14
Figure 2
15
Figure 3
16
Figure 4 Multivariate Regression Output
17
Figure 5 Logistic Regression Output
18