Upload
daselknam
View
127
Download
22
Embed Size (px)
DESCRIPTION
Monte Carlo Simulation Soccer
Citation preview
bettingexpert Blog How To Build A Monte Carlo SimulationEnglish
How To Build A Monte Carlo
SimulationWhat is a Monte Carlo Simulation? How can it help you project
end of season points totals and finishing positions? Today on the
blog Zach Slaton introduces Monte Carlo simulations and shows
us how to develop one.
By Zach Slaton
Published: 7th June 2013
Updated: 26th February 2014
5
LikeLike Tweet
29 0
Sign up or Login
This is the forth post in Zach Slaton's series explaining how to use simple-but-effective
statistical concepts that can help provide a richer understanding of the data already at
your fingertips. The first post in the series dealt with how linear regression prediction
intervals can yield deeper insights, the second post explained how to use exponential
regression to quantify rare events like goal scoring totals, and the third post explained
how ordered logistic regression can be used to forecast individual match outcomes.
Today Zach explains how individual match outcome likelihoods can be used to simulate
the outcome of the all the remaining fixtures in a season.
In my last post in this series I explained how an ordered logistic regression could be built
to explain soccer match outcomes, and even provided several examples of the types of
inputs Ive included in the ordered logistic regression models I have built over time.
These models are highly useful in understanding the potential impact statistically
significant predictors may have on the likelihood of a match ending in a win, tie, or loss.
But how can those individual building blocks be assembled to form a comprehensive
forecast for how all of the teams in a league may sit relative to each other over the next
week, next month, or at the end of the season? There appears to be a nearly infinite
number of point combinations that could be realised given there are 380 matches in a
20-team leagues season, each match could end in a loss, tie, or win for each team, and
no match has the odds of each outcome evenly split into thirds. How can an analyst
make sense of such a range of possible outcomes?
Introducing Monte Carlo Simulation
One answer to this complexity is Monte Carlo simulation. As the name implies, Monte
Carlo simulation is essentially a model of chance. Wikipedia describes it as:
a broad class of computational algorithms that rely on a repeated random sampling to
obtain numerical results, i.e. by running simulations many times over in order to calculate
those same probabilities heuristically just like actually playing and recording your results
in a real casino situation Monte Carlo methods are mainly used for three distinct
problems: optimisation, numerical integration, and generation of samples from a
probability distribution.
The repeated random simulations of individual inputs can thus project the likelihood of
an aggregate outcome if one has the probability of outcome(s) for each event. Such an
approach may sound intimidating, but a solution can be found in the much-maligned-but-
infinitely-useful Microsoft Excel.
Simulating Individual Match Results
To start, assume that the analyst interested in the aggregate outcome has created a
model in their statistical tool of choice. In this case, its a model that projects the
likelihood of winning, tying (drawing), or losing a match. The model is applied to each
match in a league season, in this case Major League Soccer in the United States.
The first order of business is to create a random outcome for each match, and the
method used within this example is Excels RAND function that creates a random number
between 0 and 1. The output of the RAND function is then compared to the match
outcomes using the following logic:
IF RAND Probability of Loss
THEN match outcome is a loss
ELSE
IF RAND (Probability of Loss + Probability of Tie/Draw)
THEN match outcome = tie/draw
ELSE match outcome = win
A screenshot of a just such a setup is provided below.
Now that the analyst has a random outcome assigned to every match in a season, how
should they go about creating a Monte Carlo simulation and how many random
simulations of the season should they run?
Last things first: the answer is that it depends. For a typical season most analysts run
10,000 simulations. This number is often found to offer the proper balance between
simulation duration of a couple hours and model resolution given the number of
interactions due to each individual match.
Utilising Pivot Tables to Roll Up Match
Results
Now first things last: Microsoft Excel offers a solution for running those 10,000
simulations. Pivot table functionality within Excel is the perfect way to roll up the results
from the individual matches in point total, goal differential, and win/draw/loss outcome
count. These totals are achieved by creating pivot tables with team/club on the rows
and either match outcome or points on the columns. In either case, the values within the
pivot table are the sums of either match outcome or points. See the example below.
The other benefit of using a pivot table is that refreshing it is a calculation within Excel,
and the RAND function re-calculates each time there is a calculation elsewhere in an
Excel workbook. This means that 10,000 simulated seasons can be created with the
RAND function, a few linked pivot tables, and less than twenty lines of Visual BASIC
code that could be learned in a first-level computer science and consists of do/while
loops of copy/paste commands of the projected table of each simulated season.
Doing so should produce results that look like this:
The 10,000 simulations of the remaining fixtures now must be added to the point totals,
match outcomes, and goal differential to date. This can be done via Excels VLOOKUP
command referencing another pivot table built using the results to date, and adding the
returned value to the value for the same attribute in the projected results. Auto-filling the
columns with VLOOKUP commands provides projected values for all of the variables,
and all thats left to do is sort the results by run, then point total, then by the leagues tie
breakers.
Doing this sort ensures data stays within the respective run in which it was generated,
and it provides projected table positions within each season.
All thats left to generate is a likelihood of each teams finish position, and another pivot
table of table position versus team can do this. In this case the pivot table plots teams on
the rows and table position in the columns and values. The pivot tables values will need
to be changed to a count rather than a sum (the model is measuring how many times a
team is projected to finish in a table position), and the Show data as: field should be
marked as % of row.
The resultant pivot table should look like this:
Thats it. That is all that is required to build a Monte Carlo simulation. Users of the
simulation can now update its inputs matches played versus upcoming fixtures as
frequently as they like, run what if studies for the next weeks matches, and any other
variety of forecasts. The process can become highly automated and take less than 10
minutes a week to update if special attention is paid to the Excel workbooks
construction. A person can automate even the process of combining prior matches and
future fixtures with VLOOKUP and sort functions with even the most basic programming
skills via Excels record macro function.
Applications of Monte Carlo Simulation
Here are some examples of how this very basic approach can be utilised in competition
forecasting.
Transfer Price Index Simulations of the English Premier
League Season
Transfer Price Indexs mSq model, which utilises venue and relative squad costs as
inputs, was used to forecast the most likely final table positions of each club on a weekly
basis. This model quantified individual match outcomes impacts on each teams likely
finish position ( it wasnt just Manchester Uniteds win over City in October that swung
the title their way), as well just how much of an advantage a club might have surrendered
along the way (see Tottenhams 80%+ likelihood of a Top 4 after beating Arsenal in early
March and how much it fell away over the final two-and-a-half months of the season).
MLS Eastwood Index
Blogger Martin Eastwood created the Eastwood Index as a way to know where teams
stand relative to each other, how results against clubs with various levels of quality
impact a teams rating, and how the ratings difference between two clubs can help
predict future match outcomes.
This model has been applied to MLS, and the Monte Carlo simulations have been used
to quantify things like the impact the Seattle Sounders poor start had on the danger (or
lack thereof) of not making the league playoffs.
CONCACAF World Cup Qualification
Finally, Monte Carlo simulations can even be used to run a post-mortem what if using
others forecast match outcomes after the matches are completed. One such source for
such match forecasts are bookmaker odds. Bookmakers are looking to maximise their
profit, so they often don't forecast more than one match in advance, or only a few
matches in advance if the schedule is compact. As an example, Monte Carlo methods
have been paired with bookmaker odds to help analyse the likelihood of current point
totals within CONCACAFs final round of World Cup qualifying.
While everyone knows Mexico has struggled from match-to-match, it turns out that
bookmakers only foresaw Mexicos current three points or less in 10% of the aggregate
outcomes contained in their forecasts. Meanwhile, the United States four points puts
them squarely within bookmaker expectations.
Conclusion
Using Monte Carlo simulation methods allows analysts to properly measure and model
discrete events like soccer matches, and then roll the results of those discrete events up
to a bigger forecast over a season or more.
More importantly, Monte Carlo simulation methods provide a probabilistic outlook to
such forecasts, allowing the analyst to express their level of statistical certainty (or
uncertainty) in the forecast. This is key to thinking in a noisy, uncertain sport like soccer,
and as this post has attempted to explain its not too complex an analysis to set up. All
thats needed is a probabilistic model, a tool like Microsoft Excel for storing results, and a
bare minimum of programming capability.