by Brad Null
In this study, we introduce the nested Dirichlet distribution and propose a method of using it to model Major League Baseball (MLB) player abilities. To do so, we define fourteen distinct outcome types for any typical plate appearance (excluding intentional walks and bunt attempts), and we assume that every player has an underlying fourteen dimensional ability vector, x, where each element represents the probability that the player will experience the corresponding outcome type in any typical plate appearance. Then we use the method of maximum likelihood to fit a nested Dirichlet prior joint distribution on x for all MLB batters (excluding pitchers) over the period from 2003-2006. As the nested Dirichlet (like the Dirichlet distribution) is a conjugate prior for multinomial data, this model yields a nested Dirichlet posterior distribution for all players as well. We evaluate these posteriors as a forecasting tool versus 2007 results and evaluate point estimates of the OPS (on base percentage plus slugging percentage) for all players derived from this model. Our results indicate that the model’s accuracy is near that of some popular projection systems, even without incorporation of potentially useful information such as age effects, stadium influences, and Minor League data. We conclude by discussing possible extensions to the model.
Brad Null is a Doctoral candidate in the department of Management Science and Engineering at Stanford University. His research interests include Forecasting and Decision Optimization in sports, Design and Analysis of Algorithms, and Game Theory and Mechanism Design as applied to sports and the Internet. From 2006-07, Brad also worked as the Sr. Director of Strategic Operations with the Golden Baseball League. He holds an MS and a BA from Stanford University, and is co-author of the book, The Summer that Saved Baseball.