# Hypothesis Testing #1c — Anderson-Darling Test using R

The Normality Test is a test to determine whether the sample data is drawn from a normally distributed population. There are several tests available for this purpose. We will look at each one in detail.

**Assuming you have installed RStudio Desktop…**

**Anderson-Darling Test**

The Anderson-Darling Test (A-D Test) was first introduced by American Theodore Wilbur Anderson and American Donald A. Darling in 1952. It was introduced as a modification to the Kolmogorov-Smirnov Test. It is entirely similar to the K-S Test except for two aspects: it places more emphasis on the tails of the distribution and the approach in building the empirical distribution differs.

**Installation**

The A-D Test in R is not part of the base packages. You will need to install the package `nortest`

and load it into your environment.

```
install.packages('nortest')
library(nortest)
```

The A-D Test within R has the following code:

**ad.test**

```
function (x)
{
DNAME <- deparse(substitute(x))
x <- sort(x[complete.cases(x)])
n <- length(x)
if (n < 8)
stop("sample size must be greater than 7")
logp1 <- pnorm((x - mean(x))/sd(x), log.p = TRUE)
logp2 <- pnorm(-(x - mean(x))/sd(x), log.p = TRUE)
h <- (2 * seq(1:n) - 1) * (logp1 + rev(logp2))
A <- -n - mean(h)
AA <- (1 + 0.75/n + 2.25/n^2) * A
if (AA < 0.2) {
pval <- 1 - exp(-13.436 + 101.14 * AA - 223.73 * AA^2)
}
else if (AA < 0.34) {
pval <- 1 - exp(-8.318 + 42.796 * AA - 59.938 * AA^2)
}
else if (AA < 0.6) {
pval <- exp(0.9177 - 4.279 * AA - 1.38 * AA^2)
}
else if (AA < 10) {
pval <- exp(1.2937 - 5.709 * AA + 0.0186 * AA^2)
}
else pval <- 3.7e-24
RVAL <- list(statistic = c(A = A), p.value = pval, method = "Anderson-Darling normality test",
data.name = DNAME)
class(RVAL) <- "htest"
return(RVAL)
}
```

Let’s break the code down a little with the help of GPT4.0.

- We extract the name of the variable
`x`

and convert it into`DNAME`

using`deparse`

. - We obtain all complete cases of
`x`

, ensuring no empty values are included. - We calculate the number of these complete cases and ensure that there are at least 8.
- We standardize the values by transforming them to have a mean of 0 and a standard deviation of 1.
- Next, we generate a series of probabilities for these standardized values using the cumulative density function of the normal distribution. We ensure that the values returned are the logarithms of these probabilities.
- A similar process is repeated for the negative of the standardized values, effectively creating a mirror image of the probabilities obtained in step 6. We then create the variable
`h`

by: 1) generating all possible values from 1 to`n`

, where`n`

is the number of non-empty values of`x`

, 2) multiplying these values by 2 and then subtracting 1, 3) taking this series of odd values and multiplying it by the sum of the logarithmic values from step 6 and the reverse of those from step 7. The reversal in step 7 is intended to align the other tail of the distribution with the logarithmic values from step 6. - The values in
`h`

reflect their contribution to the Anderson-Darling Test, with higher values indicating a greater deviation from the mean based on the standard deviation. - We calculate the Anderson-Darling test statistic
`A`

by subtracting the mean of`h`

from the negated sample size`-n`

. This step is crucial for determining the test statistic's p-value. - We then adjust
`A`

to`AA`

, scaling the test statistic based on the sample size`n`

. - The p-value is generated based on the conditions shown in the R code. The constants used are derived from previous research specifically for the Anderson-Darling Test. (For more information, refer to https://apps.dtic.mil/sti/tr/pdf/ADA079807.pdf)

11. Finally, we create the variable `RVAL`

, which stores the test statistics, p-value, name of the test, and the name of the variable `x`

.

The same question arises for the Shapiro-Wilk Test regarding the sample size requirements. Why must we have a sample size of at least 8? Let’s seek assistance from GPT-4.0 on this matter.

Statistical Power and Precision: Smaller sample sizes tend to provide less reliable estimates of a population’s properties due to higher variability. In the context of the Anderson-Darling test, a sample size of less than 8 might not provide enough data to accurately assess the distribution’s deviation from normality. Larger sample sizes generally provide more statistical power and precision in detecting departures from the normal distribution.

Test Calibration: The Anderson-Darling test involves comparing the calculated test statistic against critical values or using it to calculate a p-value. These critical values and p-value calculations are based on the assumption of a sufficiently large sample size. With very small samples, the distribution of the test statistic can deviate significantly from what the test assumes, leading to inaccurate results.

Empirical Basis: The cutoff of 8 is likely based on empirical studies and simulations that have shown the test to perform adequately with this minimum sample size. Statistical tests often have such guidelines, established through research and practical application, to ensure their reliability.

Handling Extreme Values and Tails of the Distribution: The Anderson-Darling test places more emphasis on the tails of the distribution compared to other normality tests. With too few data points, it becomes difficult to adequately assess the behavior in the tails of the distribution, which is crucial for this test.

I guess it is wise to heed the advice of the experts who have done countless simulations.

Let’s generate some random data as an example:

**One-Sample Test (Prepare arbitrary data)**

```
set.seed(98765)
# Generate normally distributed data
mean_value <- 50
std_dev <- 10
sample_size <- 1000
# Create a normal distribution
data_normal <- rnorm(sample_size, mean=mean_value, sd=std_dev)
random_noise <- runif(sample_size, min=-5, max=5) # Uniform distribution noise
x <- data_normal + random_noise
```

We therefore execute the code, assuming your data is stored in the ‘x’ variable. We want to test whether the create dataset follows normality.

`ad.test(x)`

Result:

We have the ‘A’ value and the p-value. Let’s study the ‘A’ value a little.

Unlike the Kolmogorov-Smirnov Test, where the empirical distribution is calculated through step-wise accumulation (for more information, please refer to this article: https://readmedium.com/hypothesis-testing-1b-kolmogorov-smirnow-test-k-s-test-cef227e525ec), the K-S Test constructs its empirical distribution by standardizing the data points from the original dataset and then mapping them onto a normal distribution under the assumption of normality. The resulting empirical distribution is represented as follows:

We then generate a perfectly modeled normal distribution based on the mean and standard deviation of the data. Since these functions are cumulative distribution functions, we aim to find their integral output — the area under the curve. We calculate the difference between the areas under the empirical distribution curve and the perfect normal distribution curve and square this difference. This calculation is equivalent to observing the variance between both models — the empirical distribution based on the original data and the perfectly modeled normal distribution.

We then derive the weight function from the perfectly modeled normal distribution, which is based on the mean and standard deviation of the original data. This calculation enables the test statistic to assign greater weight to the tails of the distribution. For example, a significant deviation from the empirical distribution to the perfectly modeled normal distribution at the left tail has a greater impact on the test statistic compared to a deviation in the middle of the distribution.

In summary for this section, our objective is to determine the ratio of the variance between the empirical distribution and the perfectly modeled normal distribution to the weight function, depending on the position of `x`

. For instance, when `x`

is at the left tail, the weight function…

Will be adjusted to focus more on the left tail compared to when `x`

is in the middle of the distribution. Therefore, if there is a high variance when `x`

is at the left tail, and considering that a weight function is applied, its contribution to the test statistic `A²`

will be significantly higher compared to a test statistic without the weight function.

This is the logic for the A-D Test.

Interestingly, the p-value is calculated based on empirical simulations conducted in past years. There is no specific ‘Anderson-Darling’ distribution; instead, the test statistic is used directly to determine the p-value, as demonstrated in the code below.

```
if (AA < 0.2) {
pval <- 1 - exp(-13.436 + 101.14 * AA - 223.73 * AA^2)
}
else if (AA < 0.34) {
pval <- 1 - exp(-8.318 + 42.796 * AA - 59.938 * AA^2)
}
else if (AA < 0.6) {
pval <- exp(0.9177 - 4.279 * AA - 1.38 * AA^2)
}
else if (AA < 10) {
pval <- exp(1.2937 - 5.709 * AA + 0.0186 * AA^2)
}
else pval <- 3.7e-24
```

Unfortunately, decoding years of simulations conducted by statisticians would be a lengthy process. For now, we must accept these constants as they are. However, by examining the coding of the test in R, we can observe that when the test statistic is less than 0.34, the p-value is derived as the inverse of the exponential function of the polynomial in `AA`

. Conversely, when the test statistic is 0.6 or more, we directly obtain the p-value from the exponential function of the polynomial in `AA`

.

**p-value Interpretation**

Null Hypothesis (H0): The data follows the normal distribution. This hypothesis posits that there is no significant difference between the observed data and the expected values under the normal distribution.

Alternative Hypothesis (H1): The data does not follow the normal distribution. Under this hypothesis, it is assumed that there are significant deviations in the data from what would be expected if it were drawn from the normal distribution.

A higher test statistic generally leads to a smaller p-value. In this context, a high test statistic that results in a p-value of 0.05 or lower typically warrants the rejection of the null hypothesis. On the other hand, lower test statistics, which correspond to p-values greater than 0.05, provide insufficient evidence to reject the null hypothesis. It’s important to note that the Anderson-Darling Test places additional emphasis on the tails of the normal distribution. This means that even slight deviations of the empirical distribution from the ideal normal distribution at the tails can lead to higher test statistics, and consequently, smaller p-values are more common. This often results in accepting the alternative hypothesis, which suggests that the data does not follow a normal distribution.

The rationale behind the test’s design, particularly the emphasis on the tails of the distribution, invites discussion. It’s assumed that events occurring in the tails are more likely to be rare or chance occurrences. Consequently, deviations in these areas are deemed more significant than those in the middle of the distribution, where data is more abundant. Essentially, we attribute greater significance to unusual events at the tails because they are less likely to happen by chance, making any deviation there more notable.

But is this assumption valid? Considering that these tail events are chance occurrences, one could argue that they are inherently random and thus more prone to deviations compared to the middle of the distribution. Given their susceptibility to variability, it might be expected that chance events would deviate more than those in the more stable, central part of the distribution. So, why then do we place greater emphasis on these tail events? The higher weighting may not necessarily reflect an increased importance but rather a heightened sensitivity to detect deviations from the expected norm.

There are two considerations that warrant further scrutiny in the context of the Anderson-Darling test. Firstly, the test does not rely on a standard, universally applicable distribution for analyzing the data. Instead, the distribution used for determining the p-value is the result of empirical studies and simulations conducted over many years. Secondly, it’s debatable whether it is necessary to place heightened sensitivity on detecting deviations at the tails of the distribution. While these deviations are deemed significant in the Anderson-Darling test, it raises the question of whether such emphasis on the tails is always justified, especially considering that extreme values might not always represent more crucial or informative deviations from the expected norm.

Daniel started off his career as a senior list researcher with a British publishing firm. Back then, his role involved contact sourcing through the internet and performed data entry into the Microsoft Dynamic CRM system. (Microsoft Dynamic CRM 3.0) Progressively, he explored the option of using Visual Basic scripting within excel to automate the contact sourcing process.

He successfully developed and implemented the scripts, leading to 95% increase in data entry efficiency. He then moved on to take on the role of a CRM executive with Fuji Xerox Singapore.

As a CRM executive, he liaised with third party vendor for technical enhancement of the CRM system (Microsoft Dynamic CRM 4.0 and 365). He also performs functional enhancement of the CRM system for hundreds of end users.

His notable achievement was the development of the CRM boy that led to 98% improvement in data quality and data integrity in the CRM system. Following his Masters studies in Consumer Insight with Nanyang Business School, he took on the role of an Analytics instructor with Singapore Management University. He prepared class notes and technical walkthrough, and taught Analytics to the undergraduate students from various disciplines. Subsequently, he took on various roles as consultants in the consultancy, manufacturing and information technology industries in Singapore.

He travelled to Paris, London, Sri Lanka, Japan and Malaysia to fulfill his role as a consultant. The cultural and professional exchanges between local and overseas data analytics had given him a very good overview of the expectations and motivations from people around the world. He also had a chance to relocate to the United States for one year, particularly focusing on Operations Management.

Prior to his current freelance status, he took on the role of the Data Science Lead in a Singaporean software company. His primary role was to develop Artificial Intelligence using logic, data science and machine learning techniques through in-depth, full-stacked scripting. He also developed customized Reporting for his customers. In his point of view, 95% of today’s reporting can be automated, which can free up staff from daily manual work.

He holds a Bachelor of Science in Marketing (BSc. Marketing Pass with Merit) from Singapore University of Social Sciences (in which he graduated as a Valedictorian), a Master of Science in Marketing and Consumer Insights (MSc. Marketing and Consumer Insights) from Nanyang Technological University, a Doctor of Business Administration (DBA) from Swiss School of Business and Management.