Wednesday, October 20, 2010

Monte Carlo vs. Decision Trees – Round One

By Jan Schulze, Vice President, Software Development

I’d like to begin this article by stating that although this topic is a discussion centered on methodology, it can become a very emotional subject. Disciples on either side of the debate can get very charged up on the merits of one approach over the other.

So, I’d like to emphasize that I was raised to believe in the virtues of Monte Carlo analysis techniques for all things probabilistic. I saw great utility with this approach and even wrote my own Monte Carlo simulator. I came late to Decision Tree analysis, and now have a much deeper appreciation for its merits, believing that each approach has its place.Over the next few newsletters, I intend to point out the strengths and weakness of both approaches, beginning, in this issue, with a quantitative comparison.

A Two Dimensional Plot of the Solution Space

To begin, let’s use the two-dimensional plot shown in Figure A to represent the full set of potential outcomes for a given opportunity, one outcome of which is displayed on the plot. Instead of relying on one potential outcome, one of our tasks, as analysts, is to describe the full risk profile of the opportunity. Monte Carlo and Decision Tree techniques typically do this in two different ways. Let’s explore.


Figure A - Two Dimensional Problem Space

A Monte Carlo Approach
First, let’s look at Monte Carlo. A Monte Carlo model will generate new potential outcomes in the solution space at random and, if we have enough time and computation power, and if we generate a large number of trials, we will uncover many possible results and explore the extreme corners of the plot. This can be very useful if we are considering “black swan” events (low probability – high impact occurrences). With a valid, representative model, this can provide powerful insights and identify areas of potential concern.


Figure B - Monte Carlo Model

A Decision Tree Approach
By comparison, a Decision Tree model generates solutions in a controlled way. Instead of defining complete distributions, uncertainty ranges are typically represented as three branches on a tree, the P10, P50, and P90 outcomes. When we combine uncertainties together, we might create a plot as seen in Figure C. The tree combinations cover the decision space more symmetrically.


Figure C – Decision Tree Model

Comparing the Two
Now, let us delve into the problem space a little more. If we run the same number of Monte Carlo trials as Decision Tree nodes, we might end up with a plot that would look something like Figure D, where the Monte Carlo analysis is unlikely to explore the space as thoroughly.

If we assume that the opportunity is characterized with five key uncertainties, each with three possible outcomes, we end up with 243 combinations (3*3*3*3*3 = 243). Assuming we use a 30% / 40% / 30% distribution for the probability of each outcome, the chance of all three uncertainties being low at the same time becomes quite small (30% * 30% * 30% * 30% *30% = 0.243%). The Decision Tree dots at the edges of the plot represent these extreme outcomes. Are they close enough to the axes?

The answer depends on the questions you are trying to answer about the opportunity. If you’re primary concern is with the expected reserves distribution, this mapping is probably okay; but if you’re concerned about the remote chance of a well blow-out, this decision tree mapping may not map the extremes well enough.


Figure D – Monte Carlo vs. Decision Tree
If we now look at the probability of any of these extreme events occurring, then we are saying that there is a roughly 1 in 400 chance of seeing one of these results (30% + 30% + 30% + 30% +30% = 0.243%). In other words, we will need to run a minimum of 400 perfectly placed Monte Carlo trials to be able to achieve a coverage that matches that of a Decision Tree. But Monte Carlo analysis will select points randomly; the coverage is unlikely to be “perfect”. To compensate, we need to increase the number of trials, so that we don’t end up with skewed results.

For instance, if we use the analogy that Monte Carlo analysis is like the game “pin the tail on the donkey”, we could end up with a skewed result at the end of the game where we’ve pinned the tail on the right front hoof more times than not, creating not only a painful donkey but an inaccurate view of where the tail could actually be.

The Issue of Sampling and Iterations
How much sampling is required in a Monte Carlo run to ensure so that we get the same coverage of the result space as a Decision Tree? I’ve worked the problem manually, by repeating a Monte Carlo analysis and each time increasing the number of trials, until the results matched that of the Decision Tree (that is, until the distribution of results matched AND until the Monte Carlo analysis produced points that matched or exceeded the points at the extremes as the Decision Tree).

I was only able to match results after I increased the number of Monte Carlo trials to one hundred times more than that of the Decision Tree. For Monte Carlo analysis to match the results of the five-uncertainty Decision Tree example (with 243 combinations, or outcomes), we would need to run 400 x 100 = 40,000 trials.

But how many times would we run 40,000 Monte Carlo trials on five-uncertainty evaluation? This might not even be viable computationally with a complex model. 500 trials, yes. 1000, yes. 5000, maybe. But 40,000? Are we routinely running enough trials to truly understand the results space or are we falsely chasing precision and erroneously feeling that we are getting a better result simply because we have expended more effort to achieve it?

When you compare the two methods now, we see that Decision Tree analysis provides broad coverage of the problem space with a near symmetrical distribution of points, while Monte Carlo analysis can give us the same coverage if we are prepared to run it enough times. Granted at the end of that number of trials, the granularity we will have is much greater than we get with a Decision Tree but, if we can arrive at the same conclusions with 94% less computational effort, why wouldn’t we take it?

What if we are only interested in finding the expected value (or risk weighted value) of the results space? How many Monte Carlo runs does it take to match the results of the Decision Tree? Much fewer. In fact, with a large number of uncertainties, it’s likely that the Monte Carlo evaluation will converge on the expected value with fewer trials than that of a Decision Tree because the Monte Carlo points will tend to cluster around the average. But, remember, our mission as analysts is to map out the full risk profile of the potential results. For this, you’ll need to run a much larger number of iterations.

The Right Tool for the Problem You are Trying to Solve
It is for these reasons that I believe decision trees lend so much value to the typical decision problems we see in the petroleum, pharmaceutical and other sectors we support and why I undertook the challenge of building a new Decision Tree tool.

As ever, which tool is the right tool depends on the problem you are trying to solve. In my view, one of the major reasons Monte Carlo analysis tools have tended to be most popular is because they have been historically easier to use.

Right click on an input in your spreadsheet model, enter a couple numbers and presto, it’s done. Entering distributions on ten uncertainties is easy. By comparison, walking through that same ten-level decision tree with 59,049 end node branches might become painful (if your software tool can even do it). Yet, for all the reasons, I’ve stated above, Decision Trees can be very valuable, insightful, efficient and (depending upon the decision problem) better than Monte Carlo analysis.

As a software developer, I took this as a challenge. Bridge the gap between the ease of use of a Monte Carlo tool and the unique attributes of Decision Tree analysis, one might have a brand new powerful analytic tool. TreeTop™ is my attempt to address this challenge; I will be very interested to hear your thoughts on my efforts.

I appreciate your comments on this article and look forward to more dialogue in this area.

1 comments:

Andrew said...

Hi, Jan.

One useful way to think about decision trees vs Monte Carlo simulation is to look at the probability density functions (pdfs) that underpin the uncertainties that you're modeling.

Let's start with the premise that there's a set of straightforward mathematical function (normal, lognormal, extreme value, poisson, etc) that we can use to characterize the distribution - and for simplicity, let's assume distributions aren't correlated for the moment.

I've found it useful to suggest to people that both simulation and decision trees are approximations to the analytic solutions to the joint pdf that we could calculate if we knew sufficiently advanced calculus. The decision tree uses three areas centered on the P10, P50, P90 values (with 30/40/30 weightings) to make use of Swenson's rule: the simulation approach is a random sampling with more trials generating another approximation to the distribution, one which will vary slightly from trial to trial. Both are approximations, though - and both are imperfect descriptions of reality.

If reality (or your best modeled construct of reality) contains significant discontinuities (for example low probability, high impact events), you had better make sure your quantitative model is equipped to handle it. For example, what if house prices level off, or begin to fall - what happens to the value of mortgage backed securities or collateralized debt obligations? My understanding of the modeling work done on these by rating agencies is that the models blew up in these circumstances, and perhaps this led to a pretty expensive managerial outcome - because the models weren't helpful, managers ignored the (real) probability that the outcome would happen.

In other words, the technical challenge of selecting appropriate distribution functions, correlations, number of trials etc is part of the problem. A more significant question is how good a representation the model is of the underlying problem, and whether it does (potentially) capture outcomes which span the potential outcome space. Get the thinking right, and the maths will follow.