
|
|
Statistical
Commentary Reading
this information is not necessary for playing and enjoying Doc Dawson's Full
Season Baseball. It is provided for those
individuals that have an interest in the statistical properties of the
game. Although some of the material is
technical, other portions are more easily understood. Design
I designed
Full Season Baseball to accomplish the following objectives:
Additional
Analysis
Variation
One of the
things that makes Full Season Baseball fun to play is the variation
associated with random events such as rolling dice. Although the team with the best record has
the best chance of winning their league or division if the same schedule is
replayed, they are not guaranteed to do so.
In a tight pennant race, the second-place team may have a pretty good
chance of winning as well. However, if
a team won its division by, say, 15 games, it is very unlikely that some
other team will win the replay. More on
this will be said later. Strength
of Schedule Notice that
in the first item listed under design, the schedule played is important. Full Season Baseball can tell the
difference between teams with the same record that have different strengths
of schedule. A team with an 82–80
record in a strong division will have a better card than a team with an 82–80
record in a weak division. If the
usual schedule is replayed, though, each team's expected number of wins is
still 82. Distributions In Full
Season Baseball, the distribution of games won in a season by a particular
team is actually the sum of many different binomial distributions with
various probabilities of success.
However, in this case it is approximately a binomial distribution with
probability of success on a single trial equal to the team's winning
percentage for that season (assuming replay of the same schedule). This, in turn, can be approximated by a
normal distribution with mean equal to the number of games that team won in
the season being replayed and standard deviation equal to the square root of
the quantity (the number of games played) X (the team's winning percentage) X
(1 minus the team's winning percentage).
This works out to a standard deviation of about 5.48 to 6.36 games won
for a 162-game schedule, or slightly less for a 154-game schedule. The higher value is for teams that win
about half their games, and the lower value for teams that have very low or
very high winning percentages. Thus,
about 68% of the teams will have within +/-6 wins of their actual number of
wins for the season. About 95% will
have within +/-12 wins of their actual total, and about 99.7% will have
within +/-19 wins of their actual total.
This means that although a team that went 81-81 could still win or
lose 100 or more games, the probability of such an event is about 3 in
1000. One final word of caution in
interpreting these results: because the win-loss records of all the teams in
a league are not independent of one another, the percentages stated above should
not necessarily be interpreted as a good model for single seasons (especially
those with a small number of teams).
Collectively over many seasons, however, it tends to follow the above
analysis very closely. Volume of
Play There have
been more than 130 seasons of major league baseball if we start counting from
the 1876 National League season. In
that period of time, many unusual events have occurred. That is minuscule, though, when compared to
the number of seasons that can be simulated by everyone playing Full Season
Baseball. Once a thousand seasons have
been recreated, a few of those once-in-a-thousand-seasons kind of events that
have not occurred in major league baseball will likely have occurred sometime
by someone playing this game. These events
could include a team scoring 35 or 40 runs, a game lasting 40 innings, a team
compiling a record of 140-22, or a team with a 100-loss season winning 100
games in the replay. Thus, if you hear
some incredible stories from your friends, don't be too surprised. Some of them might even be true! All-Time
Best and Worst Because of
the nature of the game, where the expected value of the number of wins for
each team is the same as they experienced in the regular season, a complete
recreation of the history of major league baseball is likely to produce an
all-time best team and an all-time worst team better and worse than in true
major league history, respectively.
The reason is that last century's worst team, the 1916 Philadelphia
Athletics at 36–117, have a nearly 50% chance of losing more than 117 games
(an equal chance also of losing fewer than 117 games, with a slight chance of
losing exactly 117 games). But, teams
like the 1935 Boston Braves (38–115), the 1962 New York Mets (40–120), and
others also have a chance of compiling a worse winning percentage, making the
probability of an all-time worst team with winning percentage under .235 more
than 50%. The situation is similar for
the best winning percentage. The only
alternative to this situation is to change the first design objective. That would, in my opinion, be an
undesirable change. Something similar
to think about: because of the actual distribution of World Series championships,
the record best number of World Series championships in a complete replay of
the twentieth century would likely be smaller than the actual record. Parting
Thoughts Observing pennant races as they develop in this game can also help build one's intuition as far as basic random processes go. Just as in real baseball, teams get on winning streaks and losing streaks. Some teams seem to play steadily throughout the season, while others collapse or come on strong at the end. The realism of the pennant races is nothing short of amazing! I truly hope you enjoy Doc Dawson's Full Season Baseball. |