NCSSORS Presentations

Featured Speakers

Roland Beech of the Dallas Mavericks


Sig Mejdal of the St. Louis Cardinals

Oral Presentations

The intra-match home advantage in Australian Rules football

by Richard Ryall & Anthony Bedford

The existence of home advantage in Australian Rules football (AFL) has been well documented in previous literature. This advantage typically refers to the net advantage of several factors which, generally speaking, have a positive effect on the home team and a negative effect on the away team. However, this practice excludes the in-course dynamics of home advantage throughout the match including the interrelationship between pre-game and in-game team characteristics. The aim of the present study is to calculate the intra-match home advantage for each quarter in AFL by incorporating the interaction between team quality and current score. Archival AFL data was obtained from seasons 2000 to 2009 which consisted of year, round, quarter, (nominal) home team, away team, home team score and away team score. Analysis of variance (ANOVA) on margin of victory was used to determine if there was a distinct difference between team quality (favourite/underdog) within current score (ahead/behind). Since the in-game team characteristics (current score) are likely to be caused by pre-game characteristics (team quality) the margin of victory is adjusted for team quality. The results provide marginal evidence that home underdogs in the third quarter irrespective of whether they were ahead or behind at half time receive a greater advantage than home favourites. Furthermore, home advantage is greatest in the final quarter when there is a high level of uncertainty about the outcome of the match.

Pitcher Accuracy through Catcher Spotting: Assessing Rater Reliability

by Andrew C. Thomas

Pitcher intent, as measured by the position of the catcher's glove before a pitch is thrown, is an element of baseball that is regularly observed by commentators ("he's missing his spots") but remains an uncaptured aspect of statistical analysis of the game, offering many potential aspects on pitcher performance that have yet to be exploited. There has yet to be a systematic way of collecting this data for public consumption, a far from trivial task, that can potentially be conducted by an automated video analysis system, or a collection of manual raters with expert judgement. We address two potential manual methods of data collection through video playback, demonstrate the validity and reliability of each measure on multiple raters, and conclude by discussing a broader program of data collection based on these methods.

Optimal Dynamic Clustering Through Relegation and Promotion: How to Design a Competitive Sports League

by Martin L. Puterman and Qingchen Wang

This paper investigates how to best use a relegation-promotion system and a schedule to create a competitive sports league. It develops a statistical/mathematical model of a multi-division hierarchical sports league made up of teams with intrinsic skill levels (ISLs) that change from year to year. Since team skill changes over time, some modification in division (or cluster) composition is necessary to optimize competiveness. This is accomplished through promoting teams with the best records at the end of a season to a higher division and relegating teams with the worst records to lower divisions. Such mechanisms are fundamental to the English football league and the PGA Tour/Nationwide Tour. Using data from the National Basketball Association (in which there is no relegation system) we develop statistical models for year-to-year variability in ISLs and for match outcomes based on the ISLs of the two teams. Within this framework, we develop a multiple season simulation model to investigate the effect of the number of teams relegated and promoted, the schedule and variability of year to year rankings on competitiveness of the divisions. For the NBA data, we find that in a three division league with ten teams in each division, relegating and promoting three teams at the end of the season results in the most competitive divisions measured in terms of the long run average within division standard deviation of ISLs and percentage of teams assigned to the correct division. The effect of schedule on the optimal relegation number is minimal.

Dynamic Effort, Sustainability, Myopia, and 110% Effort

by Stephen Shmanske

By definition, giving 100% effort all of the time is sustainable, but begs the question of how to define 100% effort. As a corollary, once a benchmark for defining 100% effort is chosen, it may be possible, even optimal, to give a greater amount of effort for a short period of time, while recognizing that this level of effort is not sustainable. This dynamic effort provision problem is analyzed in the context of effort and performance by National Basketball Association (NBA) players over the course of a season. Within this context, several benchmarks for sustainable effort are considered, but these are rejected by the data. Meanwhile, the data are consistent with the proposition that NBA players put forth optimal effort, even if such effort is not always sustainable.

Drafts versus Auctions in the Indian Premier League

by Tim Swartz

This paper examines the 2008 player auction used in the Indian Premier League (IPL). An argument is made that the auction was less than satisfactory and that future auctions be replaced by a draft where player salaries are determined by draft order. The salaries correspond to quantiles of a lognormal distribution whose parameters are set according to team payroll constraints. The draft procedure is explored in the context of the IPL auction and in various sports including basketball, highland dance and golf.

Reconsideration of the best batting order in baseball: Is the order to maximize the expected number of runs really the best?

by Nobuyoshi Hirotsu

In previous studies for analyzing the batting order of baseball games, the order is evaluated by its expected number of runs scored in a game, under the Markov chain model on the D’Esopo and Lefkowitz runner advancement model. However, the order to maximize the expected number of runs may not be the best order in the sense that it may not get more than 0.5 in probability of winning the game against other possible batting orders. In this way, the best batting order is reconsidered and we have tried to find a better order than the order which maximizes the expected number of runs. In this paper, we concretely show the existence of such orders based on the data of Major League teams, and discuss the difference between the best orders and the order to maximize the expected number of runs.

Uncovering Football’s Best Goalscorers in the English Premier League, La Liga, Serie A, Bundesliga, Eredivisie, and Ligue 1 for the 2009-2010 Season

by Joel Oberstone

Who are the best goalscorers in football? This paper examines the 20 leading goal scorers in each of the six top leagues in Europe and explores if it reasonable to assume that the best performance should be based solely on the single figure of merit of the number of goals scored during a season. Methodologies are developed that allow an accurate comparison between a player in one league, where the average number of goals scored may be significantly more than those of a player from a less prodigious league, as well as account for the discrepancy of the 34 game season in the Bundesliga and Eredivisie as compared to the English Premier, La Liga, Serie A, and Ligue 1 leagues that play 38. Secondly, an expanded approach using a set of multiple factors that considers a player’s contribution to overall team scoring is developed and determines if any of the measures in this array (1) make statistically significant contributions to the primary factor of individual goal scoring and (2) possibly provides a broader, yet more useful perspective of a player as goalscorer.

An Empirical NFL Draft Value Chart

by Michael Schuckers

Each year the National Football League (NFL) holds an annual draft to assign eligible players to each of the league's teams. The order of selections is based upon a team's previous performance and trades of these selections. The current NFL Draft Value Chart is one way to assess the value of a given selection and the equivalency of trade value for these selections. Recently, Massey and Thaler (2010) wrote about the market inefficiencies of the NFL draft in terms of the Draft Value Chart and player salaries. In this paper we look at the value of a particular draft selection based upon measures of player performance rather than based upon market value. Using a loess regression approach applied to the response metrics, games played and games started, we derive a new draft value chart based upon player performance.

Poster Presentations

The Ratio of Relative Importance: What Dictates Play in the NFL? A Look at the Importance of Offense, Defense, and Special Teams

by Keith Goldner

Over the past three years, there have been almost 18,000 drives during NFL games. During each drive, one team’s offense is on the field while the other team’s defense plays—with special teams peppered throughout the drives on individual plays. Using an efficiency metric of Net Expected Points, we look at the relative importance of offense, defense, and special teams to determine what dictates play in the NFL. In addition, we develop a method to better understand our efficiency metric.

Semi-Automated Collection of Pitch Location and Intent in Baseball

by Andrew Klein and Andrew C. Thomas

An analysis of the pitcher's intent in baseball should ideally depend upon information given by the catcher before the pitch is thrown. Expert judgment has already identified the importance of this factor (e.g. the pitcher is "missing his spots" when unsuccessful in hitting the target). We describe the underpinnings of an automated video analysis system that uses semi-supervised learning methods to identify the catcher position — specifically, the catcher's glove position. The analysis begins with video of a single pitch, supervised by a human controller; this information is then incorporated into one of a selection of learning algorithms and applied to subsequent pitches with minimal involvement from the controller. This is designed to be the first step in creating a public database of pitch intent, to be coupled with existing sources of pitch physics, so that analysts may better evaluate pitcher performance.

On the Extension of the Pythagorean Expectation for Association Football

by Howard Hamilton

The Pythagorean expectation originates from baseball analytics research and uses
offensive and defensive scoring statistics to evaluate team performance relative
to statistical expectations. This research presents an extension of the Pythag-
orean expectation to league competitions in association football. The principal
extensions are the calculation of the probability of a drawn result, the reformul-
ation of the probability of an outright win, and the use of expected points instead
of wins. The extended Pythagorean is applied to professional soccer leagues in
Europe, Asia, and North America, and is shown to predict very well the number of
points won in a league competition from the offensive and defensive goal statistics
of each team. The results across the leagues examined indicate the existence of a universal Pythagorean exponent that describes the shape of the goal distributions. The research study concludes with a study of the effects of expected goal values and the variance of a team's scoring distribution on the accuracy of the Pythagorean expectation.

Adjusted Odds Ratios for Evaluating NBA Players based on Plus/Minus Statistics

by Douglas Okamoto

Adopting the plus/minus statistic from ice hockey, an NBA player is credited plus one or more points whenever his team scores while he is on the basketball court. Conversely, he is debited minus one or more points whenever the opposing team scores. Throughout the NBA season, the league’s better players are consistently pluses, having positive plus/minus statistics as reported by yahoo!sports and In his web blog, Prof. Rosenbaum (2004) estimates adjusted plus/minus statistics by fitting linear regression models to a team’s point differential per possession with players on the floor during a possession as independent variables. Crude or unadjusted odds ratios estimate the relative probabilities of a player having a positive plus/minus in a win, versus a negative plus/minus in a loss. The purpose of this presentation is to estimate adjusted odds ratios by fitting stratified logistic regression models to the logit transform of games won or lost, a binary response variable, with individual player contributions, pluses or minuses, as explanatory variables. Home and away games are twin strata with teams playing 41 home games and 41 road games during an 82-game regular season. Adjusted odds ratios and their 95 percent confidence intervals are estimated for each of the Los Angeles Lakers during the 2008-2009 and 2009-2010 regular seasons. Additionally, adjusted odds ratios are compared with another metric, wins produced, for the 15 players, including Kobe Bryant and Pau Gasol of the Lakers, who made the recently announced first, second and third All NBA teams.

Understanding Intercontinental Contrasts in Stadium Ages

by David Haddock

The median U.S. home is 36 years old, but as of 2010 Major League Baseball (MLB) and National Football League (NFL) stadia each average roughly 21 years of age. Completely new edifices, sometimes in a different city, repeatedly supplant relatively new stadia. U.S. cities, counties, and even states invest heavily to build and renovate professional sports stadia, often for use by wealthy franchises.

In stark contrast, the median age of English Premier League (EPL) soccer stadia this season will be 110 years. English soccer teams frequently renovate their stadia, but almost never abandon their hometown. EPL teams, especially the wealthy ones, bear much or all stadium construction and renovation costs.

This paper argues that those intercontinental contrasts are related. North American sports leagues are cartels. To avoid costly formation of an entire league, an entrant requires the acquiescence of the existing teams acting through their league, and leagues persistently threaten to withhold or withdraw all representation from a city absent below-cost provision of cutting-edge facilities. In most of the world, England being an archetype, an entrant earns or loses its league place independently of other teams via the promotion-and-relegation system. Given a few years for adjustment, a lower division team can unilaterally win promotion into a higher division if city characteristics merit that level of play. The presence of alternative opportunities for league representation under promotion-and-relegation mitigates the pressure to engage in the intercity competition that forces lower level U.S. governments to provide underpriced facilities for generally profitable sports enterprises.


by Brad Null

In this work, we present a hierarchical stochastic model of the game of baseball aimed at improving our understanding of the various sources of uncertainty in this arena as well as how they impact outcomes and optimal decision making. Taking as input a version of the nested Dirichlet based model of the distribution of player abilities first presented in this forum (NCSSORS 2008), we fit play result outcomes using a linear model which provides insight into the relative influence of the batter, pitcher, and certain environmental factors (such as the home field advantage) in various contexts. Using these play prediction models to derive transition probabilities for a series of Markov chain representations of a baseball game, we develop techniques to predict game outcomes with respect to these models. We also derive win probabilities for teams in these games (as well as distributions over entire seasons), and show that these predictions add information to the gambling line, overcoming particular biases of the market. Finally, we evaluate context specific decision problems for specific games with respect to a number of in-game strategies such as bunts, intentional walks, and lineup selection. We also use the model to estimate the value of specific players to specific teams, allowing for a model of player “fit'” in baseball. This marginal evaluation can be used to evaluate potential trades, free agent acquisitions, and other roster changes, allowing teams to use such methods to optimize roster moves and potentially save millions of dollars.

Winning NBA Games in Regular Season versus Playoffs: Different Ball Games?

by Masaru Teramoto & Chad L. Cross

What is the most important factor in winning games in the National Basketball Association (NBA): offense, defense, turnovers, or rebounding? Is the phrase “Defense wins championships” true? Is how teams win games in the NBA different between the regular season and the playoffs? To answer these questions, we examined the relative importance of performance factors in winning basketball games in the past 10 years of the NBA, using a multiple linear regression and a logistic regression analysis. The performance factors analyzed in this study included: overall efficiency (offensive and defensive ratings) and the Four Factors (effective field goal percentage, turnover percentage, rebound percentage, and free throw rate). The results of the data analysis indicate that efficient offense and defense are both essential to be successful in the regular season and the playoffs, but the importance of defense in winning games may be greater in the playoffs than in the regular season. Shooting efficiency on both ends of the floor seems to be the most critical aspect of the game in the regular season as well as the playoffs. In addition, committing fewer turnovers could be another key to winning games, especially in the regular season. It appears that defense becomes more important for winning playoff series as a team advances further in the post-season. Rebounding may play a significant role in deciding the outcome of the Conference Finals where two teams most likely have similar shooting efficiency and turnover rates.

Monte Carlo Simulation for High School Football Playoff Seed Projection

by R. Drew Pasteur

In Ohio high school football, playoff teams are selected and seeded using an objective point system. Teams receive points for each victory, with more points awarded for beating larger schools; additional points are awarded for each game won by a previously defeated opponent. Roughly one-fourth of the state’s 700+ teams qualify, and high seeds earn the right to host first-round games. Even in the final week of the season, a team’s playoff chances can depend on the outcomes of dozens of other games, making direct computation of playoff probabilities impractical. To make such predictions, we first estimate win probabilities for each remaining regular-season game, using a predictive ranking algorithm. We then use these probabilities in multiple simulations of all such games, and compute the associated playoff points for each such trial. This allows us to approximate the playoff probabilities for all teams, and also to project the most likely playoff seeds in each region. We also estimate conditional probabilities that hinge on the number of future games won by a contending team. For the 2009 season, we correctly predicted 78% of the playoff teams after just four weeks of the season. Going into the final week, we predicted 93% of the berths and 92% of the host teams, as well as 86% of the seeds within one seeding line. We also compare the model weekly against season-to-date playoff point standings, and find statistically-significant differences in correct predictions.

Never Too Late To Win

by Adam Gold

Sports fans can be freed from cheering against their favorite team for a draft pick. Draft orders based on how teams perform after becoming mathematically eliminated from playoff contention trigger highly competitive atmospheres that inspire fans with passion and optimism. The worst teams receive the handicap of playing more games to earn the top draft pick. Using the reverse standings to determine draft order creates the distressing paradox where success and failure become synonymous. Each fan’s right to cheer for their favorite team, from the first game to the last, is more important than the attempt to list teams from the worst to best. Advanced mathematical formulas and rigorous computer algorithms create the demand for professional sports franchises to make the pursuit of winning unlimited.

Short and Long-Run Labor Market Effects of Age Eligibility Rules: Evidence from Women's Professional Tennis

by Ryan Rodenberg

Age is often used in law and public policy as a low-cost proxy for competency, maturity, and ability. Age is also used in numerous sport (and non-sport) labor markets to determine workplace eligibility. We exploit the enactment of the women’s professional tennis minimum age rule (AR) in 1995 to estimate the effects of ARs on short-run and long-run labor market outcomes. We find no evidence that the AR has had a beneficial effect on players’ career longevity or success, and weak evidence that players subject to the AR actually had worse outcomes than those not subject to the rule. Our results suggest that sport governing bodies should revisit “one size fits all” eligibility rules that are paternalistic in nature.

Perception is Not Reality: An Analytical Framework for Evaluating Specific Allegations of NBA Referee Bias

by Ryan Rodenberg

The 2007 gambling scandal involving former NBA referee Tim Donaghy put allegations of basketball referee bias in the spotlight. Using methods similar to Rodenberg and Lim (2009) and Winston (2009), this paper analyzes specific allegations of referee bias by former Miami Heat head coach Pat Riley against NBA referees Steve Javie and Derrick Stafford. In the course of analyzing eight years’ worth of game-level data involving the Miami Heat, I find that neither Javie nor Stafford had a significant adverse effect on Heat team performance or exhibited bias against Riley’s team. In fact, the Heat performed slightly better than predicted when Stafford officiated their games. Our results provide empirical evidence consistent with the “bias blind spot” and other theories of human perception in economics and psychology that are grounded in the finding that individuals with a vested interest in certain self-enhancing or self-justifying outcomes may reach generalized conclusions unsupported by actual evidence.

Defining the Performance Coefficient in Golf

by Andy Hoegh

Unlike many other sports where only the top ten or twenty participants have a realistic shot at victory, when 144 players tee it up at a PGA tournament every participant has a legit chance at winning. In golf even the greatest players lose more often than they win and long-shots and unknowns win some of the most prestigious events. With such parity, random chance plays a large part in determining the winner. This is evident by the world ranking of four major champions in the 2009: 69th, 71st, 33rd, and 110th. While statistical modeling is commonplace in many sports, particularly baseball, the golf world is largely untapped. Using historical data from one of golf’s major championships, the Masters, this paper establishes a technique for modeling hole-by-hole results. This research allows for derivation of the performance coefficient –which quantifies the level of performance within the player’s capability. For instance 2009 Masters winner, the 69th rated Angel Cabrera, only had a seven percent chance of defeating both Phil Mickelson and Tiger Woods over seventy-two holes. However, his performance coefficient of .01 signifies that he performed close to his optimal performance. While Tiger Woods and Phil Mickelson performed above average with performance coefficients of .35 and .22 respectively, on this given week they were unable to better Cabrera.

Defense Wins in the NBA

by Sam McLaughlin

The best 10-11 defensive performers each season and the greatest 10 overall players each season in the NBA can be identified with the All-Defensive First and Second Team(s) (ADFT & ADST), and the All-NBA First and Second Team(s) (ANFT & ANST), respectively. I have analyzed data from the 1968-2009 seasons to find the average at which the ADFT & ADST honorees win championships, in comparison with the average that the ANFT & ANST win championships. Within this study, I have found that the All-Defensive group wins championships at a 25.7% higher rate than the All-NBA group. Also, that the greatest 25 defenders of the past 35 years have won 10% more championships per capita than the greatest 25 overall players of the past 35 seasons. Logically one would assume that the All-NBA honorees should win titles at a higher rate based on their understood game impact; however, All-Defensive honorees win championships at a 25.7% higher rate per capita since 1968.

Dependence relationships between on field performance, wins, and payroll in Major League Baseball: Evidence from 1985-2009

by Derek Stimel

We examine the dependence and direction of dependencies between on the field performance variables, winning percentage, and payroll in Major League Baseball using team data from 1985-2009. We give particular focus to the relationship between winning and payroll. Our method is to employ the PC algorithm, which is an implementation of graph theoretic methods in order to identify these dependence relationships. Results indicate that winning percentage directly depends on fielding percentage, on-base percentage, and saves while payroll directly depends on fielding percentage, strike outs against, and winning percentage. Using these results we estimate panel models to assess the magnitudes of the relationships. Further, we estimate a system based on these relationships to examine the effects of winning and payroll on each other over time using impulse responses. Those responses show that an increase in payroll can temporarily increase winning but not permanently so. Finally, we offer some cautions in interpreting the results and some suggestions for future research.

The Relationship Between Leader Experience and Team Performance in Cross-Sectional and Longitudinal Designs

by Thomas Timmerman

The purpose of this study was to examine the relationship between leader experience and team performance. Major League Baseball managers from 1903 to 2006 provided the context within which the relationships were studied. Experience was conceptualized in terms of games managed and seasons managed at the occupational and organizational levels. Cross-sectional analyses revealed a small but significant positive relationship between all types of experience and team performance after controlling for team ability. Analyzing the data longitudinally with random coefficient modeling, however, revealed only one significant experience-performance relationship. After controlling for team ability, there is evidence that managers improve team performance until a peak at about 1200 games. There is no evidence of a positive linear relationship of any kind or that this improvement transfers to other teams.

Estimating NHL Scoring Rates

by Sam Buttrey, Alan Washburn, and Wilson Price

We propose a model for estimating the probability that a particular team wins a National Hockey League game. Data from is used to estimate offensive and defensive scoring rates for each team. The model includes adjustments for the advantage associated with the different manpower situations, as arise during power plays). Scoring rates are then converted into probability predictions under a Poisson model. The model includes an indicator for home-ice advantage and is easy to extend and to fit in standard statistical software. We evaluate the model’s utility for handicapping using published odds for all the games of the 2008-2009 season. We also examine the extent to which this model might inform tactical decisions, specifically the optimal moment at which to pull the goaltender. This decision requires knowledge of 6-skaters-on-5 and 5-on-6 scoring rates.

Using Nonlinear Regression Analysis to Explain Performance in Baseball

by Jeff Hamrick and John Rasp

Linear regression models are commonplace and often useful for analyzing sports-related data. However, many sports phenomena are inherently nonlinear in nature. We propose to replace linear regression analysis and the Pearson correlation coefficient with a nonlinear regression model and a local correlation function. These tools can provide better models for sports data and generate deeper insights into their underlying dynamics. We illustrate this approach through applications to a variety of problems in baseball, including deciding whether championships are won on pitching or defense, and determining the age at which baseball players' hitting abilities peak. Nonlinear regression is powerfully general and we will demonstrate how it can be used to open up new avenues of investigation in the quantitative analysis of sports.

Shootout or Crapshoot: An Analysis of the NHL Shootout after Five Years

by Michael E. Schuckers and Chris Wells

For the 2005-6 season, the National Hockey League (NHL) instituted a new method for determining a winner for regular season games. The shootout system that was introduced has now been in place for five seasons. This past season the Philadelphia Flyers, the eventual Eastern Conference playoff champions, made the NHL postseason only by beating the New York Rangers in a shootout after the last regular season game. In this paper we analyze the results of every NHL shootout over the past five seasons. We look at the impact of both individuals - shooters and goalies - and teams over this period. Our primary analysis considers the effects of individual shooters and goalies using a Bayesian logistic regression. The aim of this analysis is to determine whether or not shootout goals are determined by skill or luck.

A New Approach to Basketball Analytics

by Nima Shaahinfar

While much of basketball analytics focuses on the value of individual players, this research paper analyzes the relationship between the division of roles on teams and success in the NBA in an effort to better understand and define chemistry in basketball. It is commonly understood that efficient basketball is winning basketball and this study takes the next step to determine how efficient basketball is best achieved by introducing standard deviations to the regression analysis, where higher standard deviations in various statistical categories represent more well-defined, or broadly distributed, roles. Using regular season NBA data from the 1997-1998 season, when the three-point line was last moved, through 2009-2010, and 2009-2010 NBA playoff game data, to calculate standard deviations in various statistical categories of different groups of players on each team, regression analysis shows that a broader distribution of roles is more successful and leads to more efficient basketball.

NFL Prediction Using Neural Networks

by John A. David, R. Drew Pasteur, Michael C. Janning, and M. Saif Ahmad

Our research analyzes the ability of a neural network model to predict the outcome of regular-season NFL games. This model uses only readily-available statistics, such as passing yards, rushing yards, and fumbles lost. A key component of this model is the use of statistical differentials to compare teams. For example, the offensive passing yards gained by one team are compared to the defensive passing yards allowed by an opposing team. By using principal component analysis and derivative based analysis, we determined which statistics influence our model the most. We assessed the performance of the model by comparing its predictions to those of media members and the Las Vegas oddsmakers. We also consider the absolute error in predicting the margin of each game. Measuring over the second half of the season, we obtained an average prediction accuracy of 58.1% for 2006, 66.1% for 2007, 65.0% for 2008, and 68.8% for 2009, across 10 different realizations of the model for each season. The standard deviation in each year was less than 1.3%.

How Sports Collectables Retain Their Value: The Case of the 1957 Baseball Cards

by Thomas H. Thompson and Kabir Sen

Before the space race and computer games, 1957 was a simpler time. Baseball card collectors, mostly young boys, had few distractions to advance their hobby. In 1957 there were 16 major league baseball teams. The Kansas City Athletics, moved from Philadelphia in 1955, was the farthest west team. For others west of Saint Louis, baseball appetites were fed with the TV Game of the Week with Dizzy Dean, KMOX –the St Louis radio station, the Sporting News, baseball cards, or any combination. The young baseball fans of the period were the “baby boomers” who most likely were the major collectors in 1957. We examine several factors that influence the values of the Topps 1957 baseball card issue through the years and show how this baseball card set avoided the price collapse of the post-1980 baseball card issues.

Stochastic Modeling and Optimization in Baseball

by Brad Null

In this work, we present a hierarchical stochastic model of the game of baseball aimed at improving our understanding of the various sources of uncertainty in this arena as well as how they impact outcomes and optimal decision making. Taking as input a version of the nested Dirichlet based model of the distribution of player abilities first presented in this forum (NCSSORS 2008), we fit play result outcomes using a linear model which provides insight into the relative influence of the batter, pitcher, and certain environmental factors (such as the home field advantage) in various contexts. Using these play prediction models to derive transition probabilities for a series of Markov chain representations of a baseball game, we develop techniques to predict game outcomes with respect to these models. We also derive win probabilities for teams in these games (as well as distributions over entire seasons), and show that these predictions add information to the gambling line, overcoming particular biases of the market. Finally, we evaluate context specific decision problems for specific games with respect to a number of in-game strategies such as bunts, intentional walks, and lineup selection. We also use the model to estimate the value of specific players to specific teams, allowing for a model of player “fit'” in baseball. This marginal evaluation can be used to evaluate potential trades, free agent acquisitions, and other roster changes, allowing teams to use such methods to optimize roster moves and potentially save millions of dollars.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License