Free AI web copilot to create summaries, insights and extended knowledge, download it at here

Abstract

93cf"><b>b) Population Standard Deviation</b></p><p id="f079">Similarly, the population standard deviation of a uniform distribution with a = 0 and b = 100 is</p><figure id="7af3"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*jctN1GorSvLqJ6CfRpa0Hg.png"><figcaption></figcaption></figure><p id="0d18"><b>c) Sample 1 with N = 100</b></p><figure id="af8c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*_2zomN9i-sqqHPFRwwmjow.png"><figcaption></figcaption></figure><p id="de2b"><b>d) Sample 2 with N = 1000</b></p><figure id="37d5"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*sWTOLwLbp_EKJ8iqQNPFHQ.png"><figcaption></figcaption></figure><h1 id="55ee">II. Monte-Carlo Simulation Results</h1><p id="ea9c">For Monte-Carlo simulations, we generate a very large population of size 10,000.</p><p id="9fb7"><b>a) Population Mean</b></p><div id="7ff7"><pre><span class="hljs-attribute">pop_mean</span> <- mean(runif(<span class="hljs-number">10000</span>,<span class="hljs-number">0</span>,<span class="hljs-number">100</span>))</pre></div><p id="258f">Output is 50.0 which agrees with our analytical results.</p><p id="b9c9"><b>b) Population Standard Deviation</b></p><div id="c78b"><pre><span class="hljs-attribute">pop_sd</span> <- sd(runif(<span class="hljs-number">10000</span>,<span class="hljs-number">0</span>,<span class="hljs-number">100</span>))</pre></div><p id="df3d">Output is 28.9 which agrees with out analytical results.</p><p id="1f22"><b>c) Monte-Carlo Code for two Samples with N = 100 and N = 1000</b></p><div id="ecea"><pre><span class="hljs-function"><span class="hljs-title">library</span><span class="hljs-params">(tidyverse)</span></span></pre></div><div id="b748"><pre><span class="hljs-attribute">a</span> <-<span class="hljs-number">0</span> <span class="hljs-attribute">b</span> <-<span class="hljs-number">100</span></pre></div><div id="120e"><pre>mean_function <-<span class="hljs-function"><span class="hljs-keyword">function</span><span class="hljs-params">(x)</span><span class="hljs-title">mean</span><span class="hljs-params">(runif<span class="hljs-params">(x,a,b)</span>)</span></span></pre></div><div id="887b"><pre><span class="hljs-attribute">B</span> <-<span class="hljs-number">10000</span></pre></div><div id="048b"><pre>sample_1 <-replicate(<span class="hljs-name">B</span>, {mean_function(<span class="hljs-number">100</span>)})</pre></div><div id="06fc"><pre>sample_2 <-replicate(<span class="hljs-name">B</span>, {mean_function(<span class="hljs-number">1000</span>)})</pre></div><p id="5753">Outputs from Monte-Carlo Simulation</p><div id="1a00"><pre><span class="hljs-function"><span class="hljs-title">mean</span><span class="hljs-params">(sample_1)</span></span></pre></div><p id="f49c">yields 50.0 which is consistent with our analytical results.</p><div id="83db"><pre><span class="hljs-function"><span class="hljs-title">sd</span><span class="hljs-params">(sample_1)<

Options

/span></span></pre></div><p id="fa4f">yields 2.83 which is consistent with our analytical results (2.89).</p><div id="3d5a"><pre><span class="hljs-function"><span class="hljs-title">mean</span><span class="hljs-params">(sample_2)</span></span></pre></div><p id="b374">yields 50.0 which is consistent with our analytical results.</p><div id="abe0"><pre><span class="hljs-function"><span class="hljs-title">sd</span><span class="hljs-params">(sample_2)</span></span></pre></div><p id="c223">yields 0.888 which is consistent with our analytical results (0.914).</p><p id="13af"><b>d) Generate Probability Distributions of the Mean for N = 100 and N = 1000.</b></p><div id="978e"><pre>X <- data<span class="hljs-selector-class">.frame</span>(sample_size=<span class="hljs-built_in">rep</span>(<span class="hljs-built_in">c</span>(<span class="hljs-string">"N=100"</span>,<span class="hljs-string">"N=1000"</span>), times=<span class="hljs-built_in">c</span>(B,B)),mean=<span class="hljs-built_in">c</span>(sample_1,sample_2))</pre></div><div id="8662"><pre>X%>%ggplot(aes(mean,<span class="hljs-attribute">color</span>=sample_size))+ geom_density(aes(mean,<span class="hljs-attribute">fill</span>=sample_size),alpha=0.2)+ theme_bw()</pre></div><figure id="1323"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*qHK-tQTHQGvvDbjJxrmp2w.png"><figcaption><b>Probability distribution of the sample mean of a uniform distribution using Monte-Carlo simulation.</b></figcaption></figure><p id="d7bb">The figure shows that the sample mean is a random variable that is normally distributed with a mean value equal to the population mean, and a standard deviation that is given by the population standard deviation divided by the square root of the sample size. Since the sample standard deviation (uncertainty) is inversely proportional to the sample size, the precision of the calculated mean value decreases with larger sample sizes.</p><h1 id="ba7a">Implications of the Central Limit Theorem</h1><p id="ee62">We’ve shown that the sample mean of any probability distribution is a random variable with mean value equal to the population mean and standard deviation of the mean given by:</p><figure id="5ecc"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*KYqyy6miSQWZ3c66p8bt7Q.png"><figcaption></figcaption></figure><p id="75b0">Based on this equation, we can observe that as the sample size N → Infinity, the uncertainty or standard deviation of the mean goes to zero. This means that the larger the size of our dataset, the better, as larger samples lead to smaller variance error.</p><p id="6304">In summary, we’ve discussed how the central limit theorem can be proved using Monte-Carlo simulation. The central limit theorem is one of the most important theorems in statistics and data science, so as a practicing data science, familiarity with the mathematical foundations of the central limit theorem is very important.</p></article></body>

Illustration of Central Limit Theorem Using Monte-Carlo Simulation

**Probability distribution of the sample mean of a uniform distribution using Monte-Carlo simulation.**

The Central Limit Theorem (CLT) is one of the most important theorems in statistics and data science. The CLT states that the sample mean of a probability distribution sample is a random variable with a mean value given by population mean and standard deviation given by population standard deviation divided by square root of N, where N is the sample size.

We can illustrate the central limit theorem using the uniform distribution. Any probability distribution such as normal, Poisson, Binomial would work as well.

The Uniform Distribution

Lets consider a uniform distribution defined in the range [a, b]. The probability distribution function, mean, and standard deviation are obtained from Wolfram Mathematica Website.

a) Population mean

For a uniform distribution, the population mean is given by

b) Population standard deviation

For a uniform distribution, the population standard deviation is given by

The Central Limit Theorem states that the mean of any sample with size N is a random variable with mean value of

and standard deviation given by

Let us now illustrate our calculation by considering a uniform distribution with a = 0, b = 100. We shall illustrate the Central Limit Theorem by considering the population (N →Infinity) and two samples, one with N = 100, and the other with N = 1000.

I. Analytical Results

a) Population Mean

Using the equations above, the population mean for a uniform distribution with a = 0, and b = 100 is

b) Population Standard Deviation

Similarly, the population standard deviation of a uniform distribution with a = 0 and b = 100 is

c) Sample 1 with N = 100

d) Sample 2 with N = 1000

II. Monte-Carlo Simulation Results

For Monte-Carlo simulations, we generate a very large population of size 10,000.

a) Population Mean

pop_mean <- mean(runif(10000,0,100))

Output is 50.0 which agrees with our analytical results.

b) Population Standard Deviation

pop_sd <- sd(runif(10000,0,100))

Output is 28.9 which agrees with out analytical results.

c) Monte-Carlo Code for two Samples with N = 100 and N = 1000

library(tidyverse)

a <-0
b <-100

mean_function <-function(x)mean(runif(x,a,b))

B <-10000

sample_1 <-replicate(B, {mean_function(100)})

sample_2 <-replicate(B, {mean_function(1000)})

Outputs from Monte-Carlo Simulation

mean(sample_1)

yields 50.0 which is consistent with our analytical results.

sd(sample_1)

yields 2.83 which is consistent with our analytical results (2.89).

mean(sample_2)

yields 50.0 which is consistent with our analytical results.

sd(sample_2)

yields 0.888 which is consistent with our analytical results (0.914).

d) Generate Probability Distributions of the Mean for N = 100 and N = 1000.

X <- data.frame(sample_size=rep(c("N=100","N=1000"),
                times=c(B,B)),mean=c(sample_1,sample_2))

X%>%ggplot(aes(mean,color=sample_size))+
    geom_density(aes(mean,fill=sample_size),alpha=0.2)+
    theme_bw()

The figure shows that the sample mean is a random variable that is normally distributed with a mean value equal to the population mean, and a standard deviation that is given by the population standard deviation divided by the square root of the sample size. Since the sample standard deviation (uncertainty) is inversely proportional to the sample size, the precision of the calculated mean value decreases with larger sample sizes.

Implications of the Central Limit Theorem

We’ve shown that the sample mean of any probability distribution is a random variable with mean value equal to the population mean and standard deviation of the mean given by:

Based on this equation, we can observe that as the sample size N → Infinity, the uncertainty or standard deviation of the mean goes to zero. This means that the larger the size of our dataset, the better, as larger samples lead to smaller variance error.

In summary, we’ve discussed how the central limit theorem can be proved using Monte-Carlo simulation. The central limit theorem is one of the most important theorems in statistics and data science, so as a practicing data science, familiarity with the mathematical foundations of the central limit theorem is very important.