avatarDarío Weitz

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

7275

Abstract

<p id="bb35">The logic behind the MCS is described in the following lines of code:</p><div id="6c3a"><pre>list_of_costs = <span class="hljs-selector-attr">[]</span> <span class="hljs-keyword">for</span> run <span class="hljs-keyword">in</span> <span class="hljs-built_in">range</span>(Number_of_Replications): acum_cost = <span class="hljs-number">0</span> <span class="hljs-keyword">for</span> <span class="hljs-selector-tag">i</span> <span class="hljs-keyword">in</span> <span class="hljs-built_in">range</span>(Number_of_Component_Costs): cost = np<span class="hljs-selector-class">.random</span><span class="hljs-selector-class">.triangular</span>(minimum<span class="hljs-selector-attr">[i]</span>, most_like<span class="hljs-selector-attr">[i]</span>, maximum<span class="hljs-selector-attr">[i]</span>, <span class="hljs-number">1</span>) acum_cost +=cost
Total_Cost = <span class="hljs-built_in">round</span>(<span class="hljs-attribute">float</span>(acum_cost),<span class="hljs-number">1</span>)

list_of_costs<span class="hljs-selector-class">.append</span>(Total_Cost)</pre></div><p id="a501">We used Numpy and Scipy to obtain a summary of key descriptive statistical measures, particularly a <i>Percentile Table.</i></p><div id="50c6"><pre>media = <span class="hljs-built_in">round</span>(<span class="hljs-built_in">np</span>.<span class="hljs-built_in">mean</span>(list_of_costs),<span class="hljs-number">2</span>)

stand = <span class="hljs-built_in">round</span>(<span class="hljs-built_in">np</span>.<span class="hljs-built_in">std</span>(list_of_costs),<span class="hljs-number">2</span>) <span class="hljs-built_in">var</span> = <span class="hljs-built_in">round</span>(<span class="hljs-built_in">np</span>.<span class="hljs-built_in">var</span>(list_of_costs),<span class="hljs-number">2</span>) std_error = <span class="hljs-built_in">round</span>(sem(list_of_costs),<span class="hljs-number">2</span>)</pre></div><div id="5ef5"><pre><span class="hljs-built_in">median</span> = <span class="hljs-built_in">round</span>(<span class="hljs-built_in">np</span>.<span class="hljs-built_in">median</span>(list_of_costs),<span class="hljs-number">2</span>) skew = <span class="hljs-built_in">round</span>(stats.skew(list_of_costs),<span class="hljs-number">2</span>) kurt = <span class="hljs-built_in">round</span>(stats.<span class="hljs-built_in">kurtosis</span>(list_of_costs),<span class="hljs-number">2</span>)</pre></div><div id="313d"><pre><span class="hljs-attr">dof</span> = Number_of_Replications - <span class="hljs-number">1</span>
<span class="hljs-attr">t_crit</span> = np.abs(stats.t.ppf((<span class="hljs-number">1</span>-confidence)/<span class="hljs-number">2</span>,dof))</pre></div><div id="1ef9"><pre><span class="hljs-attr">half_width</span> = round(stand *t_crit/np.sqrt(Number_of_Replications),<span class="hljs-number">2</span>)
<span class="hljs-attr">inf</span> = media - half_width <span class="hljs-attr">sup</span> = media + half_width

<span class="hljs-attr">inf</span> = round(float(inf),<span class="hljs-number">2</span>) <span class="hljs-attr">sup</span> = round(float(sup),<span class="hljs-number">2</span>)</pre></div><div id="319a"><pre>t = <span class="hljs-built_in">PrettyTable</span>(<span class="hljs-selector-attr">[<span class="hljs-string">'Statistic'</span>, <span class="hljs-string">'Value'</span>]</span>) t<span class="hljs-selector-class">.add_row</span>(<span class="hljs-selector-attr">[<span class="hljs-string">'Trials'</span>, Number_of_Replications]</span>) t<span class="hljs-selector-class">.add_row</span>(<span class="hljs-selector-attr">[<span class="hljs-string">'Mean'</span>, media]</span>) t<span class="hljs-selector-class">.add_row</span>(<span class="hljs-selector-attr">[<span class="hljs-string">'Median'</span>, median]</span>) t<span class="hljs-selector-class">.add_row</span>(<span class="hljs-selector-attr">[<span class="hljs-string">'Variance'</span>, var]</span>) t<span class="hljs-selector-class">.add_row</span>(<span class="hljs-selector-attr">[<span class="hljs-string">'Stand Dev'</span>, stand]</span>) t<span class="hljs-selector-class">.add_row</span>(<span class="hljs-selector-attr">[<span class="hljs-string">'Skewness'</span>, skew]</span>) t<span class="hljs-selector-class">.add_row</span>(<span class="hljs-selector-attr">[<span class="hljs-string">'Kurtosis'</span>, kurt]</span>) t<span class="hljs-selector-class">.add_row</span>(<span class="hljs-selector-attr">[<span class="hljs-string">'Half Width'</span>, half_width]</span>) t<span class="hljs-selector-class">.add_row</span>(<span class="hljs-selector-attr">[<span class="hljs-string">'CI inf'</span>, inf]</span>) t<span class="hljs-selector-class">.add_row</span>(<span class="hljs-selector-attr">[<span class="hljs-string">'CI sup'</span>, sup]</span>) <span class="hljs-function"><span class="hljs-title">print</span><span class="hljs-params">(t)</span></span></pre></div><div id="edbb"><pre><span class="hljs-attribute">percents</span> = np.percentile(list_of_costs,[<span class="hljs-number">10</span>, <span class="hljs-number">20</span>, <span class="hljs-number">30</span>, <span class="hljs-number">40</span>, <span class="hljs-number">50</span>, <span class="hljs-attribute">60</span>, <span class="hljs-number">70</span>, <span class="hljs-number">80</span>, <span class="hljs-number">90</span>, <span class="hljs-number">100</span>])</pre></div><div id="ceeb"><pre><span class="hljs-selector-tag">p</span> = <span class="hljs-built_in">PrettyTable</span>(<span class="hljs-selector-attr">[<span class="hljs-string">'Percentile (%)'</span>, <span class="hljs-string">'Cost'</span>]</span>) <span class="hljs-selector-tag">p</span><span class="hljs-selector-class">.add_row</span>(<span class="hljs-selector-attr">[<span class="hljs-string">' 10 '</span>, percents[0]</span>]) <span class="hljs-selector-tag">p</span><span class="hljs-selector-class">.add_row</span>(<span class="hljs-selector-attr">[<span class="hljs-string">' 20 '</span>, percents[1]</span>]) <span class="hljs-selector-tag">p</span><span class="hljs-selector-class">.add_row</span>(<span class="hljs-selector-attr">[<span class="hljs-string">' 30 '</span>, percents[2]</span>]) <span class="hljs-selector-tag">p</span><span class="hljs-selector-class">.add_row</span>(<span class="hljs-selector-attr">[<span class="hljs-string">' 40 '</span>, percents[3]</span>]) <span class="hljs-selector-tag">p</span><span class="hljs-selector-class">.add_row</span>(<span class="hljs-selector-attr">[<span class="hljs-string">' 50 '</span>, percents[4]</span>]) <span class="hljs-selector-tag">p</span><span class="hljs-selector-class">.add_row</span>(<span class="hljs-selector-attr">[<span class="hljs-string">' 60 '</span>, percents[5]</span>]) <span class="hljs-selector-tag">p</span><span class="hljs-selector-class">.add_row</span>(<span class="hljs-selector-attr">[<span class="hljs-string">' 70 '</span>, percents[6]</span>]) <span class="hljs-selector-tag">p</span><span class="hljs-selector-class">.add_row</span>(<span class="hljs-selector-at

Options

tr">[<span class="hljs-string">' 80 '</span>, percents[7]</span>]) <span class="hljs-selector-tag">p</span><span class="hljs-selector-class">.add_row</span>(<span class="hljs-selector-attr">[<span class="hljs-string">' 90 '</span>, percents[8]</span>]) <span class="hljs-selector-tag">p</span><span class="hljs-selector-class">.add_row</span>(<span class="hljs-selector-attr">[<span class="hljs-string">' 100 '</span>, percents[9]</span>]) <span class="hljs-function"><span class="hljs-title">print</span><span class="hljs-params">(p)</span></span></pre></div><p id="01f4">Finally, we draw two charts: a histogram of the outcome variable (<i>Total Cost</i>) including a green vertical line indicating the media, and another vertical line (in red) indicating the most likely value. The second chart is a cumulative histogram indicating the probability that the final cost will be lower than the most likely value.</p><div id="a8e5"><pre><span class="hljs-attr">percentile_of_most_like</span> = round(percentileofscore(list_of_costs, Sum_most_like),<span class="hljs-number">2</span>)</pre></div><div id="1e31"><pre><span class="hljs-attr">n_bins</span> = <span class="hljs-number">20</span></pre></div><div id="9d90"><pre>fig, ax = plt.subplots(figsize=(8, 6)) ax.hist(list_of_costs, histtype =<span class="hljs-string">'bar'</span>, <span class="hljs-attribute">bins</span>=20, color = <span class="hljs-string">'c'</span>, <span class="hljs-attribute">edgecolor</span>=<span class="hljs-string">'k'</span>, <span class="hljs-attribute">alpha</span>=0.65, density = <span class="hljs-literal">True</span>) # <span class="hljs-attribute">density</span>=<span class="hljs-literal">False</span> show counts</pre></div><div id="a272"><pre>ax.axvline(media, <span class="hljs-attribute">color</span>=<span class="hljs-string">'g'</span>, <span class="hljs-attribute">linestyle</span>=<span class="hljs-string">'dashed'</span>, <span class="hljs-attribute">linewidth</span>=3) ax.axvline(Sum_most_like, <span class="hljs-attribute">color</span>=<span class="hljs-string">'r'</span>, <span class="hljs-attribute">linestyle</span>=<span class="hljs-string">'dashed'</span>, <span class="hljs-attribute">linewidth</span>=3) ax.text(48,0.185, <span class="hljs-string">'Mean - - -'</span>, color = <span class="hljs-string">'green'</span> ) ax.text(48,0.155, <span class="hljs-string">'Most Like - - -'</span>, color = <span class="hljs-string">'red'</span> )</pre></div><div id="e85c"><pre>ax<span class="hljs-selector-class">.set_title</span>(<span class="hljs-string">'Frequency Chart'</span>) ax<span class="hljs-selector-class">.set_ylabel</span>(<span class="hljs-string">'Probability'</span>) ax<span class="hljs-selector-class">.set_xlabel</span>(<span class="hljs-string">'Total Cost (MM*1000 US)'</span>) ax<span class="hljs-selector-class">.grid</span>(axis = <span class="hljs-string">'y'</span>)</pre></div><div id="7048"><pre>plt.<span class="hljs-keyword">show</span>()</pre></div><div id="0d20"><pre>fig, ax = plt.subplots(figsize=(8, 6)) n, bins, patches = ax.hist(list_of_costs, n_bins, <span class="hljs-attribute">density</span>=<span class="hljs-literal">True</span>, <span class="hljs-attribute">histtype</span>=<span class="hljs-string">'bar'</span>, <span class="hljs-attribute">cumulative</span>=<span class="hljs-literal">True</span>, <span class="hljs-attribute">edgecolor</span>=<span class="hljs-string">'k'</span>, alpha = 0.65) ax.hlines(percentile_of_most_like/100, 46, Sum_most_like, <span class="hljs-attribute">color</span>=<span class="hljs-string">'r'</span>, <span class="hljs-attribute">linestyle</span>=<span class="hljs-string">'dashed'</span>, <span class="hljs-attribute">linewidth</span>=2) ax.text(46 + 0.1, percentile_of_most_like/100 + 0.1, <span class="hljs-string">'Cumulative Probability:'</span> + str(percentile_of_most_like) +<span class="hljs-string">'%'</span>,<span class="hljs-attribute">rotation</span>=360)</pre></div><div id="3ef9"><pre>ax<span class="hljs-selector-class">.set_title</span>(<span class="hljs-string">'Cumulative Histogram'</span>) ax<span class="hljs-selector-class">.set_ylabel</span>(<span class="hljs-string">'Cumulative Probability'</span>) ax<span class="hljs-selector-class">.set_xlabel</span>(<span class="hljs-string">'Total Cost (MM*1000 US)'</span>) ax<span class="hljs-selector-class">.grid</span>(axis = <span class="hljs-string">'y'</span>)</pre></div><div id="e0a5"><pre>plt.<span class="hljs-keyword">show</span>()</pre></div><h1 id="5d47">Analysis</h1><p id="89b4">We made 3000 replications appending the total cost of each replication to a list (<i>list_of_costs.append(Total_Cost</i>). Then, we calculated the key descriptive statistical measures:</p><figure id="01e1"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*o4LOM9J5Peet3FC5Zlj7HA.png"><figcaption>Table 3: a Statistic Report</figcaption></figure><p id="8301">Table 3 shows that the mean and the median have practically the same value, the half-width interval is almost negligible, and the distribution has zero skewness and almost zero kurtosis.</p><figure id="ec55"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*O74aINQlj6DEdTnuMKRi5g.png"><figcaption>Table 4: a Percentile Report</figcaption></figure><p id="44ca">Table 4 shows the Percentile Report. We see that the chance that the total cost will be less than 53.0 MM U$S is equal to or less than 30%.</p><p id="bdf6">Figure 1 shows the distribution of our outcome (<i>Total Cost</i>) as a histogram with 20 bins. The mean cost and the most likely one are indicated with green and red vertical lines respectively. Remember that a histogram provides a visual representation of the distribution of a dataset: location, spread, skewness, and kurtosis of the data. Clearly, the appearance of the distribution matches the values indicated in the Statistic Report.</p><figure id="ef32"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*c8UUz0E4pp1UwmrErsSPEw.png"><figcaption>Figure 1: a frequency chart made by the author with Matplotlib</figcaption></figure><p id="0086"><b>Questions involving risk can be answered with a cumulative histogram</b> (Fig. 2). The red horizontal line corresponds to the percentile of the most likely value. There is a 44.1% (100–55.9) probability that our forecasted costs will be exceeded. Now, <b>we can numerically assess the risk associated with our investment project.</b></p><figure id="b098"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*aJHBi07G6Au8BoFEuOFZ-A.png"><figcaption>Fig. 2: a cumulative histogram made by the author with Matplotlib</figcaption></figure><p id="61a4">Properly developed, a Monte Carlo Simulation is a relatively simple numerical technique that allows a risk analysis to be performed on highly challenging projects.</p><p id="4da1">Don’t forget to tip the author, particularly if you add this article to a list.</p><h1 id="3c7f">References</h1><p id="79c8">[1]: <a href="https://en.wikipedia.org/wiki/Triangular_distribution">https://en.wikipedia.org/wiki/Triangular_distribution</a></p></article></body>

Monte Carlo Simulation

Part 2: Risk Analysis

Photo by Meriç Dağlı on Unsplash

In the previous article of this series, we defined a Monte Carlo Simulation (MCS) as “a sampling experiment whose aim is to estimate the distribution of a quantity of interest that depends on one or more stochastic input variables”. We build a mathematical model of the problem under study and estimate its outcomes through repeated sampling. The method provides a distribution of simulated outcomes from which charts and summary statistics can be easily obtained.

On the other hand, risk is defined as the probability of occurrence of an undesirable outcome. Quantitative risk analysis is a numerical procedure for estimating the risk of any project by numeric resources.

So, if a Monte Carlo Simulation provides a range of possible outcomes and their corresponding probabilities, we can use it to quantitatively account for risk in decision making.

We will illustrate a Monte Carlo simulation to account for risk in an investment project.

Risk Analysis for an Investment Project

Mayor investment projects, return on a stock portfolio, planning construction time schedules, and equivalents are classical situations where a risk analysis is mandatory. We must be able to answer questions such as: What is the probability that the costs will outweigh the benefits? What is the probability that we will incur a financial loss? What is the probability that the project will be completed on time? We should have estimators of these questions before we invest the first dollar.

Let’s suppose that the decision to implement an important investment project is in our hands. The project has six cost components: international tradable goods, local tradable goods, labor, equipment and machinery, land acquisition, taxes, and financial costs. The sum of these components results in the final cost of the project.

Our first task is to obtain an estimate of the most likely value of each of these costs. Table 1 shows the most likely values for each cost component. The corresponding total cost is 54.56 MM U$S.

Table 1: most likely values

Of course, if we are absolutely certain of the accuracy of our cost calculations, the only remaining task is to sum the six components to show the final cost. But this situation rarely arises in practice when the project is of a certain size. In a real project, every component will show a certain degree of randomness, and we must perform a risk analysis to quantify the probability of undesirable outcomes.

In the absence of rigorous data, it is customary to use a triangular distribution as a rough approximation of the unknown input distributions. The triangular distribution depends only on three parameters: the minimum (a), the most likely ©, and the maximum (b). The distribution is simple, flexible, and it can assume a variety of shapes.

Parameters and formulas for the Triangular Distribution, source: Wikipedia [1]
Charts of the PDF & CDF of a Triangular Distribution, source: Wikipedia [1]

Table 2 shows the value of the parameters for each cost component.

Table 2: parameters for each component

Python code for a MCS

The following Python code allows us to develop an MCS with a statistic summary and a frequency histogram:

First, we imported the Numpy library as np, Scipy for some statistics calculations, and Matplotlib and PrettyTable for the final outputs.

@author: darwt
"""
# Import Modules
import numpy  as np
from scipy import stats
from scipy.stats import sem
from scipy.stats import percentileofscore
import matplotlib.pyplot as plt
from prettytable import PrettyTable
  
your_path = your path 

The three following lists describe the parameters for the triangular distribution:

minimum   = [13.80, 4.80, 4.14, 15.00, 3.29, 1.44]
most_like = [16.25, 5.33, 6.44, 19.23, 4.43, 2.88]
maximum   = [17.20, 7.20, 7.45, 22.50, 6.97, 4.32]
Sum_most_like = sum(most_like)
 
Number_of_Component_Costs = 6

To obtain valid results with an MCS, we need to make a very large number of replications. We also defined the level of confidence for our Confidence Interval (CI).

Number_of_Replications = 3000
confidence = 0.95

The logic behind the MCS is described in the following lines of code:

list_of_costs = []
for run in range(Number_of_Replications):
    acum_cost = 0
    for i in range(Number_of_Component_Costs):
        cost = np.random.triangular(minimum[i], 
                                    most_like[i],  maximum[i], 1)
        acum_cost +=cost
        
    Total_Cost = round(float(acum_cost),1)
    
    list_of_costs.append(Total_Cost)

We used Numpy and Scipy to obtain a summary of key descriptive statistical measures, particularly a Percentile Table.

media = round(np.mean(list_of_costs),2)
stand = round(np.std(list_of_costs),2)
var   = round(np.var(list_of_costs),2) 
std_error = round(sem(list_of_costs),2)
median = round(np.median(list_of_costs),2)
skew   = round(stats.skew(list_of_costs),2)
kurt   = round(stats.kurtosis(list_of_costs),2)
dof  = Number_of_Replications - 1    
t_crit = np.abs(stats.t.ppf((1-confidence)/2,dof))
half_width =  round(stand *t_crit/np.sqrt(Number_of_Replications),2)  
inf = media - half_width
sup = media + half_width  
                      
inf = round(float(inf),2)
sup = round(float(sup),2)
t = PrettyTable(['Statistic', 'Value'])
t.add_row(['Trials', Number_of_Replications])
t.add_row(['Mean', media])
t.add_row(['Median', median])
t.add_row(['Variance', var])
t.add_row(['Stand Dev', stand])
t.add_row(['Skewness', skew])
t.add_row(['Kurtosis', kurt])
t.add_row(['Half Width', half_width])
t.add_row(['CI inf', inf])
t.add_row(['CI sup', sup])
print(t)
percents = np.percentile(list_of_costs,[10, 20, 30, 40, 50, 
                                        60, 70, 80, 90, 100])
p = PrettyTable(['Percentile (%)', 'Cost'])
p.add_row(['  10  ', percents[0]])
p.add_row(['  20  ', percents[1]])
p.add_row(['  30  ', percents[2]])
p.add_row(['  40  ', percents[3]])
p.add_row(['  50  ', percents[4]])
p.add_row(['  60  ', percents[5]])
p.add_row(['  70  ', percents[6]])
p.add_row(['  80  ', percents[7]])
p.add_row(['  90  ', percents[8]])
p.add_row([' 100  ', percents[9]])
print(p)

Finally, we draw two charts: a histogram of the outcome variable (Total Cost) including a green vertical line indicating the media, and another vertical line (in red) indicating the most likely value. The second chart is a cumulative histogram indicating the probability that the final cost will be lower than the most likely value.

percentile_of_most_like = round(percentileofscore(list_of_costs, Sum_most_like),2)
n_bins = 20
fig, ax = plt.subplots(figsize=(8, 6))
ax.hist(list_of_costs,  histtype ='bar', bins=20, color = 'c',
        edgecolor='k', alpha=0.65,
        density = True)  # density=False show counts
ax.axvline(media, color='g', linestyle='dashed', linewidth=3)
ax.axvline(Sum_most_like, color='r', 
           linestyle='dashed', linewidth=3)
ax.text(48,0.185, 'Mean - - -', color = 'green'  )
ax.text(48,0.155, 'Most Like - - -', color = 'red'  )
ax.set_title('Frequency Chart')
ax.set_ylabel('Probability')
ax.set_xlabel('Total Cost (MM*1000 U$S)')
ax.grid(axis = 'y')
plt.show()
fig, ax = plt.subplots(figsize=(8, 6))
n, bins, patches = ax.hist(list_of_costs, n_bins,
                           density=True, histtype='bar',
                           cumulative=True, edgecolor='k',
                           alpha = 0.65)
ax.hlines(percentile_of_most_like/100, 46, Sum_most_like, 
          color='r', linestyle='dashed', linewidth=2)
ax.text(46 + 0.1, percentile_of_most_like/100 + 0.1, 
        'Cumulative Probability:' + str(percentile_of_most_like) +'%',rotation=360)
ax.set_title('Cumulative Histogram')
ax.set_ylabel('Cumulative Probability')
ax.set_xlabel('Total Cost (MM*1000 U$S)')
ax.grid(axis = 'y')
plt.show()

Analysis

We made 3000 replications appending the total cost of each replication to a list (list_of_costs.append(Total_Cost). Then, we calculated the key descriptive statistical measures:

Table 3: a Statistic Report

Table 3 shows that the mean and the median have practically the same value, the half-width interval is almost negligible, and the distribution has zero skewness and almost zero kurtosis.

Table 4: a Percentile Report

Table 4 shows the Percentile Report. We see that the chance that the total cost will be less than 53.0 MM U$S is equal to or less than 30%.

Figure 1 shows the distribution of our outcome (Total Cost) as a histogram with 20 bins. The mean cost and the most likely one are indicated with green and red vertical lines respectively. Remember that a histogram provides a visual representation of the distribution of a dataset: location, spread, skewness, and kurtosis of the data. Clearly, the appearance of the distribution matches the values indicated in the Statistic Report.

Figure 1: a frequency chart made by the author with Matplotlib

Questions involving risk can be answered with a cumulative histogram (Fig. 2). The red horizontal line corresponds to the percentile of the most likely value. There is a 44.1% (100–55.9) probability that our forecasted costs will be exceeded. Now, we can numerically assess the risk associated with our investment project.

Fig. 2: a cumulative histogram made by the author with Matplotlib

Properly developed, a Monte Carlo Simulation is a relatively simple numerical technique that allows a risk analysis to be performed on highly challenging projects.

Don’t forget to tip the author, particularly if you add this article to a list.

References

[1]: https://en.wikipedia.org/wiki/Triangular_distribution

Simulation
Monte Carlo Simulation
Python
Data Science
Recommended from ReadMedium