Monte Carlo Simulation Soccer

Embed Size (px)

DESCRIPTION

Monte Carlo Simulation Soccer

Citation preview

  • bettingexpert Blog How To Build A Monte Carlo SimulationEnglish

    How To Build A Monte Carlo

    SimulationWhat is a Monte Carlo Simulation? How can it help you project

    end of season points totals and finishing positions? Today on the

    blog Zach Slaton introduces Monte Carlo simulations and shows

    us how to develop one.

    By Zach Slaton

    Published: 7th June 2013

    Updated: 26th February 2014

    5

    LikeLike Tweet

    29 0

    Sign up or Login

  • This is the forth post in Zach Slaton's series explaining how to use simple-but-effective

    statistical concepts that can help provide a richer understanding of the data already at

    your fingertips. The first post in the series dealt with how linear regression prediction

    intervals can yield deeper insights, the second post explained how to use exponential

    regression to quantify rare events like goal scoring totals, and the third post explained

    how ordered logistic regression can be used to forecast individual match outcomes.

    Today Zach explains how individual match outcome likelihoods can be used to simulate

    the outcome of the all the remaining fixtures in a season.

    In my last post in this series I explained how an ordered logistic regression could be built

    to explain soccer match outcomes, and even provided several examples of the types of

    inputs Ive included in the ordered logistic regression models I have built over time.

    These models are highly useful in understanding the potential impact statistically

    significant predictors may have on the likelihood of a match ending in a win, tie, or loss.

    But how can those individual building blocks be assembled to form a comprehensive

    forecast for how all of the teams in a league may sit relative to each other over the next

    week, next month, or at the end of the season? There appears to be a nearly infinite

    number of point combinations that could be realised given there are 380 matches in a

    20-team leagues season, each match could end in a loss, tie, or win for each team, and

    no match has the odds of each outcome evenly split into thirds. How can an analyst

    make sense of such a range of possible outcomes?

    Introducing Monte Carlo Simulation

    One answer to this complexity is Monte Carlo simulation. As the name implies, Monte

    Carlo simulation is essentially a model of chance. Wikipedia describes it as:

    a broad class of computational algorithms that rely on a repeated random sampling to

    obtain numerical results, i.e. by running simulations many times over in order to calculate

    those same probabilities heuristically just like actually playing and recording your results

    in a real casino situation Monte Carlo methods are mainly used for three distinct

    problems: optimisation, numerical integration, and generation of samples from a

    probability distribution.

    The repeated random simulations of individual inputs can thus project the likelihood of

    an aggregate outcome if one has the probability of outcome(s) for each event. Such an

  • approach may sound intimidating, but a solution can be found in the much-maligned-but-

    infinitely-useful Microsoft Excel.

    Simulating Individual Match Results

    To start, assume that the analyst interested in the aggregate outcome has created a

    model in their statistical tool of choice. In this case, its a model that projects the

    likelihood of winning, tying (drawing), or losing a match. The model is applied to each

    match in a league season, in this case Major League Soccer in the United States.

    The first order of business is to create a random outcome for each match, and the

    method used within this example is Excels RAND function that creates a random number

    between 0 and 1. The output of the RAND function is then compared to the match

    outcomes using the following logic:

    IF RAND Probability of Loss

    THEN match outcome is a loss

    ELSE

    IF RAND (Probability of Loss + Probability of Tie/Draw)

    THEN match outcome = tie/draw

    ELSE match outcome = win

    A screenshot of a just such a setup is provided below.

  • Now that the analyst has a random outcome assigned to every match in a season, how

    should they go about creating a Monte Carlo simulation and how many random

    simulations of the season should they run?

    Last things first: the answer is that it depends. For a typical season most analysts run

    10,000 simulations. This number is often found to offer the proper balance between

    simulation duration of a couple hours and model resolution given the number of

    interactions due to each individual match.

    Utilising Pivot Tables to Roll Up Match

    Results

    Now first things last: Microsoft Excel offers a solution for running those 10,000

    simulations. Pivot table functionality within Excel is the perfect way to roll up the results

    from the individual matches in point total, goal differential, and win/draw/loss outcome

    count. These totals are achieved by creating pivot tables with team/club on the rows

    and either match outcome or points on the columns. In either case, the values within the

    pivot table are the sums of either match outcome or points. See the example below.

    The other benefit of using a pivot table is that refreshing it is a calculation within Excel,

    and the RAND function re-calculates each time there is a calculation elsewhere in an

    Excel workbook. This means that 10,000 simulated seasons can be created with the

    RAND function, a few linked pivot tables, and less than twenty lines of Visual BASIC

  • code that could be learned in a first-level computer science and consists of do/while

    loops of copy/paste commands of the projected table of each simulated season.

    Doing so should produce results that look like this:

    The 10,000 simulations of the remaining fixtures now must be added to the point totals,

    match outcomes, and goal differential to date. This can be done via Excels VLOOKUP

    command referencing another pivot table built using the results to date, and adding the

    returned value to the value for the same attribute in the projected results. Auto-filling the

    columns with VLOOKUP commands provides projected values for all of the variables,

    and all thats left to do is sort the results by run, then point total, then by the leagues tie

    breakers.

    Doing this sort ensures data stays within the respective run in which it was generated,

    and it provides projected table positions within each season.

  • All thats left to generate is a likelihood of each teams finish position, and another pivot

    table of table position versus team can do this. In this case the pivot table plots teams on

    the rows and table position in the columns and values. The pivot tables values will need

    to be changed to a count rather than a sum (the model is measuring how many times a

    team is projected to finish in a table position), and the Show data as: field should be

    marked as % of row.

    The resultant pivot table should look like this:

  • Thats it. That is all that is required to build a Monte Carlo simulation. Users of the

    simulation can now update its inputs matches played versus upcoming fixtures as

    frequently as they like, run what if studies for the next weeks matches, and any other

    variety of forecasts. The process can become highly automated and take less than 10

    minutes a week to update if special attention is paid to the Excel workbooks

    construction. A person can automate even the process of combining prior matches and

    future fixtures with VLOOKUP and sort functions with even the most basic programming

    skills via Excels record macro function.

    Applications of Monte Carlo Simulation

    Here are some examples of how this very basic approach can be utilised in competition

    forecasting.

    Transfer Price Index Simulations of the English Premier

    League Season

    Transfer Price Indexs mSq model, which utilises venue and relative squad costs as

    inputs, was used to forecast the most likely final table positions of each club on a weekly

    basis. This model quantified individual match outcomes impacts on each teams likely

    finish position ( it wasnt just Manchester Uniteds win over City in October that swung

    the title their way), as well just how much of an advantage a club might have surrendered

    along the way (see Tottenhams 80%+ likelihood of a Top 4 after beating Arsenal in early

    March and how much it fell away over the final two-and-a-half months of the season).

  • MLS Eastwood Index

    Blogger Martin Eastwood created the Eastwood Index as a way to know where teams

    stand relative to each other, how results against clubs with various levels of quality

    impact a teams rating, and how the ratings difference between two clubs can help

    predict future match outcomes.

    This model has been applied to MLS, and the Monte Carlo simulations have been used

    to quantify things like the impact the Seattle Sounders poor start had on the danger (or

    lack thereof) of not making the league playoffs.

    CONCACAF World Cup Qualification

  • Finally, Monte Carlo simulations can even be used to run a post-mortem what if using

    others forecast match outcomes after the matches are completed. One such source for

    such match forecasts are bookmaker odds. Bookmakers are looking to maximise their

    profit, so they often don't forecast more than one match in advance, or only a few

    matches in advance if the schedule is compact. As an example, Monte Carlo methods

    have been paired with bookmaker odds to help analyse the likelihood of current point

    totals within CONCACAFs final round of World Cup qualifying.

    While everyone knows Mexico has struggled from match-to-match, it turns out that

    bookmakers only foresaw Mexicos current three points or less in 10% of the aggregate

    outcomes contained in their forecasts. Meanwhile, the United States four points puts

    them squarely within bookmaker expectations.

    Conclusion

    Using Monte Carlo simulation methods allows analysts to properly measure and model

    discrete events like soccer matches, and then roll the results of those discrete events up

    to a bigger forecast over a season or more.

    More importantly, Monte Carlo simulation methods provide a probabilistic outlook to

    such forecasts, allowing the analyst to express their level of statistical certainty (or

    uncertainty) in the forecast. This is key to thinking in a noisy, uncertain sport like soccer,

    and as this post has attempted to explain its not too complex an analysis to set up. All

    thats needed is a probabilistic model, a tool like Microsoft Excel for storing results, and a

    bare minimum of programming capability.