Posted on March 7, 2013 by Tony Welsh

Some of you may have noticed that the illustrations in this blog post were all based upon 1 million iterations, or trials. I did this to ensure that the histograms produced were nice and smooth, but this number is overkill for most practical purposes. However, it is essential to do a sufficient number of trials to get reliable results. This in turn means that the speed of your risk analysis software is critical, especially if you are dealing with a large network.

How many trials is sufficient depends upon what we want to know. Typically we want to know the probability of completing the project (or of achieving a particular milestone) by a certain date or for a certain cost. The reliability of such an estimate will depend upon the number of trials we do.

To establish how the number of trials affects the reliability of our estimates, first consider just one trial. There are only two possibilities – the date either will or won’t be met – so we can represent the result as a number which is either 1 (if we meet the date) or 0 (if we don’t). The distribution of such a binary value is known as the binomial distribution and it is fairly easy to show that the variance (the square of the standard deviation) of this value is p times (1-p).

Now consider doing a number of trials. Since variances are additive, the variance of the total of n trials is n times p times (1-p) and the standard deviation is the square root of this. Thus the standard deviation of our estimate is given by

Let us suppose that this estimate is 0.8. Then the above formula gives us the following errors for different values of n:

Trials Error
10 0.1265
100 0.0400
1000 0.0126
10000 0.0040

Of course we do not actually know the value of p. Fortunately the formula is not very sensitive to the exact value of p, so if n is not too small we can use an estimate of p. In the above example, if we did 100 trials and 72 of them met the date, our estimate of p would be 0.72 and the estimated value of our error would be would be 0.045.

For most purposes an error of around 0.01 should be acceptable, so between 1000 and 10,000 iterations should be enough. Note however that this will not guarantee nice smooth histograms of the sort obtained using 100,000 or more iterations.

Also note that if we turn the question around, so that instead of wanting to know the probability of meeting a particular date we want to know the date which we have an 80% chance of meeting, often called the “P80” point, the above does not help much. The standard error in the date depends upon the slope of the cumulative distribution function at that point. Full Monte does have an option to show a confidence interval around this curve. In the example below, it is set to 68% which corresponds approximately to one standard deviation each side of the estimate. (This example relates to cost, but it works the same for dates.) If this range is looked at in the vertical direction it agrees with the calculations above, but because the binomial points are interpolated one can also read off in the horizontal direction the error in our estimate of the P80 point.

In the above case, the range of values is approximately + or – $800, so we would expect the estimated P80 point to be between $39,700 and $41,300 68% of the time. As a check, I did the whole simulation 10 times and got values between $39,250 and $40,833. Nine of them were in the expected range, but note that this range is based just on our estimate of p.

Whatever way you look at it, doing just 100 trials is not good enough. The picture changes considerably if we do 1000 iterations:

Note that the histogram is much smoother, and the 68% confidence interval around the s-curve is much narrower. We also have a better estimate of the true P80 point, which is actually $40,000. Doing 10,000 or even 100,000 trials would give us even greater confidence in our results, as well as producing smoother histograms.

So, back to the question of speed. The above indicates that we should probably do at least 10,000 trials. How long this takes will depend upon a number of factors, the most important being the number of tasks in the project network. The software also has a major impact. I will only speak for Full Monte, but clients have told us that it is 100 to 1000 times faster than competitors. (One client aborted a run using a rather expensive piece of software after 24 hours!)

The speed of the processor also matters of course and my tests were done on a moderately fast laptop with a 2.3GHz i5 processor. (This was state of the art when I bought it but is middle-of-the road now.) I did my tests on various numbers of trials on networks varying in size from 100 to over 10,000 tasks, and derived the following approximate formula:

where S is the time in seconds, T1000 is the number of tasks in thousands, and N1000 is the number of trials in thousands.

So, for example, doing 1,000 trials on a 1,000-task network takes about 18.5 seconds, while doing 2,000 trials on the same network would take about 31 seconds. Note that there is a considerable overhead independent of the number of trials, which mitigates even more in favor of doing as many as you reasonably can. As they say, “your mileage may differ,” but this should be a good guide.