NewLogo.PNG

 

Home

How to Play

Download Seasons

About Doc Dawson

Statistical Commentary

Donate

 

 

Statistical Commentary

 

Reading this information is not necessary for playing and enjoying Doc Dawson's Full Season Baseball.  It is provided for those individuals that have an interest in the statistical properties of the game.  Although some of the material is technical, other portions are more easily understood.

Design

I designed Full Season Baseball to accomplish the following objectives:

  • The expected value of games won for each team is the same as that team's actual number of games won, provided the same schedule is played.  Of course, when playing the game there will be some variation caused by the random nature of dice; see below.
  • The league-average number of runs scored per team per game is correct for a given season, and can vary from year to year.  (Note: for this reason, different year's cards cannot necessarily be compared).
  • The overall distribution of runs scored has the correct shape, as according to the literature on the subject.  Note to users of the previous version of the game from the early 2000s:  The distribution has changed from a truncated normal to a three-parameter Weibull distribution, as according to the more recent research of Steven J. Miller of Brown University.  This change resulted in an improvement in the tail probabilities, fixing the one remaining dissatisfaction I had with the original version of the game.
  • As in real baseball, an unlimited number of runs is possible, but high scores are only as common as for real baseball (i.e., the tail probabilities are approximately correct; see above).
  • The game is as simple as possible to play, can be played in a reasonable amount of time, and is great fun for everyone!

Additional Analysis

Variation

One of the things that makes Full Season Baseball fun to play is the variation associated with random events such as rolling dice.  Although the team with the best record has the best chance of winning their league or division if the same schedule is replayed, they are not guaranteed to do so.  In a tight pennant race, the second-place team may have a pretty good chance of winning as well.  However, if a team won its division by, say, 15 games, it is very unlikely that some other team will win the replay.  More on this will be said later.

Strength of Schedule

Notice that in the first item listed under design, the schedule played is important.  Full Season Baseball can tell the difference between teams with the same record that have different strengths of schedule.  A team with an 82–80 record in a strong division will have a better card than a team with an 82–80 record in a weak division.  If the usual schedule is replayed, though, each team's expected number of wins is still 82.

Distributions

In Full Season Baseball, the distribution of games won in a season by a particular team is actually the sum of many different binomial distributions with various probabilities of success.  However, in this case it is approximately a binomial distribution with probability of success on a single trial equal to the team's winning percentage for that season (assuming replay of the same schedule).  This, in turn, can be approximated by a normal distribution with mean equal to the number of games that team won in the season being replayed and standard deviation equal to the square root of the quantity (the number of games played) X (the team's winning percentage) X (1 minus the team's winning percentage).  This works out to a standard deviation of about 5.48 to 6.36 games won for a 162-game schedule, or slightly less for a 154-game schedule.  The higher value is for teams that win about half their games, and the lower value for teams that have very low or very high winning percentages.  Thus, about 68% of the teams will have within +/-6 wins of their actual number of wins for the season.  About 95% will have within +/-12 wins of their actual total, and about 99.7% will have within +/-19 wins of their actual total.  This means that although a team that went 81-81 could still win or lose 100 or more games, the probability of such an event is about 3 in 1000.  One final word of caution in interpreting these results: because the win-loss records of all the teams in a league are not independent of one another, the percentages stated above should not necessarily be interpreted as a good model for single seasons (especially those with a small number of teams).  Collectively over many seasons, however, it tends to follow the above analysis very closely.

Volume of Play

There have been more than 130 seasons of major league baseball if we start counting from the 1876 National League season.  In that period of time, many unusual events have occurred.  That is minuscule, though, when compared to the number of seasons that can be simulated by everyone playing Full Season Baseball.  Once a thousand seasons have been recreated, a few of those once-in-a-thousand-seasons kind of events that have not occurred in major league baseball will likely have occurred sometime by someone playing this game.  These events could include a team scoring 35 or 40 runs, a game lasting 40 innings, a team compiling a record of 140-22, or a team with a 100-loss season winning 100 games in the replay.  Thus, if you hear some incredible stories from your friends, don't be too surprised.  Some of them might even be true!

All-Time Best and Worst

Because of the nature of the game, where the expected value of the number of wins for each team is the same as they experienced in the regular season, a complete recreation of the history of major league baseball is likely to produce an all-time best team and an all-time worst team better and worse than in true major league history, respectively.  The reason is that last century's worst team, the 1916 Philadelphia Athletics at 36–117, have a nearly 50% chance of losing more than 117 games (an equal chance also of losing fewer than 117 games, with a slight chance of losing exactly 117 games).  But, teams like the 1935 Boston Braves (38–115), the 1962 New York Mets (40–120), and others also have a chance of compiling a worse winning percentage, making the probability of an all-time worst team with winning percentage under .235 more than 50%.  The situation is similar for the best winning percentage.  The only alternative to this situation is to change the first design objective.  That would, in my opinion, be an undesirable change.  Something similar to think about: because of the actual distribution of World Series championships, the record best number of World Series championships in a complete replay of the twentieth century would likely be smaller than the actual record.

Parting Thoughts

Observing pennant races as they develop in this game can also help build one's intuition as far as basic random processes go.  Just as in real baseball, teams get on winning streaks and losing streaks.  Some teams seem to play steadily throughout the season, while others collapse or come on strong at the end.  The realism of the pennant races is nothing short of amazing!  I truly hope you enjoy Doc Dawson's Full Season Baseball.