Is There A Statistical Method To Test A Claim?

Use Data Science To Test Your Hypothesis

In this article, I will be explaining the core concepts of hypothesis analysis.

Hypothesis analysis helps researchers attain deeper insight about their data. Consequently, it allows them to make better decisions as they are backed by a set of mathematically calculated measures.

Article Aim

Hypothesis analysis is a well-known concept and is used extensively by researchers, statisticians and quantitative analysts.

It allows us to follow a set of formal steps to perform calculated analysis on their data. It is also widely used in machine learning and data science world.

We can use Hypothesis Analysis to formally test our hypothesis

Please read Disclaimer

Photo by Maarten van den Heuvel on Unsplash

What Are The Formal Statistical Steps?

State The Hypothesis

2. Collect Samples

3. Choose Alpha

4. Choose Test Tail

5. Choose Test Statistics

6. Calculate Test Statistics

7. State The Claim

Let’s Understand The Steps

Step 1. State the Hypothesis — Null & Alternative

In practice, two hypothesis assumptions are being made about the data:

One which is believed to be true; known as Null Hypothesis (H0)
One which is believed to be false; known as Alternative Hypothesis (Ha)

It is essential to ensure that both Null and Alternative hypothesis are quantifiable so that they can be measured during verification stage.

Importantly, neither null nor alternative hypothesis can be true at the same time. Hence both null and alternative hypothesis are mutually inclusive.

Step 2. Gather The Sample To Represent Population

We are often required to assess and make a judgement about a population of data. As testing all observations in a population is occasionally impossible therefore a representative sample is chosen.

The sample is chosen such that it is the best representation of the population of data under test. Success of hypothesis analysis is based on the quality of the chosen sample.

Sample Has A Probability Distribution

A number of measures can be calculated once a sample is collected.

For an instance: mean, variance, kertosis, skewness and standard deviation are common set of measures which can be measured on a sample.

A sample can be thought of as a random variable having its own probability distribution, patterns and trends.

We can collect a number of samples and workout their means, standard deviation and variances to gain better insight into the data.

Mean of a sample is the sum of all possible values in a sample divided by the number of observations in a sample. It is the first moment.
Variance of a sample tells a statistician about dispersion of the random variable from its mean. It is the second moment. When calculating the variance, the nominator is chosen to be the size of the sample — 1 to ensure that the calculated values are unbiased.
Standard Deviation is the square root of the variance of the sample
Standard Error is the standard deviation measure of the sample.

Step 3: Let’s consider a valid level of significance — Alpha value

What is Alpha?

Alpha is the level of significance in Hypothesis analysis. To elaborate, Alpha is the range of values which can be accepted before Null Hypothesis is rejected.

It is the lower threshold.

The level of significance can be 1% or 5% for example.

Step 4: Is your test 1 tail or 2 tail

Alternative Hypothesis can take two forms:

One tail or Two tail

One tail Alternative Hypothesis Test:

One Tail Alternative Hypothesis are uni-directional tests. For an instance, let’s assume that you are an investor and want to test if returns of construction sector is greater than the returns of pharmaceutical sector so that you can make conscious decision before you invest your millions.

Your test is one directional as it’s simply testing returns of one sector vs the other.

Two tail Alternatives Hypothesis test:

Two tail alternative hypothesis tests are bidirectional tests and a statistician is interested in checking equality of data.

The results of the test can move in either direction.

For example, assume the Null Hypothesis states that on average, a batch job in an IT system takes 5 minutes to complete. On the other hand, Alternative Hypothesis can be that on average, a batch job in the IT system does not take 5 minutes.

Hence average time can move in either direction.

You might find that it takes on average 6 minutes or 4 minutes for the job to complete.

Step 5: Select Appropriate Statistics: T vs Z vs CHI vs F

A set of questions can be asked to figure out an appropriate test statistics:

Is data frequency known? If it is known then use chi squares test.
Is data variance known? If the answer is Yes then use Z statistics, otherwise use Student T statistics.

Each of the test statistics have their own formula which I have explained in my other blog.

Step 6: Calculate The Test Statistics

Based on the chosen test statistics in step 5, apply the formula and calculate the value. Compare the value with the level of significance.

If you want to understand how to measure the test statistics:

Statistical Hypothesis Testing

Test statistics (including p-value) is a must-know concept in finance and data science. Process of test statistics can…

medium.com

Step 7: State Decision

Based on the results of the calculation in step 6, whether the hypothesis analysis is accepted or rejected is stated.

These set of steps are dependent on the sample that was chosen and how good the tests were.

This implies that there is always a chance that an error was made. For example, the tests could end up proving Null Hypothesis wrong when it is right or could end up proving Alternative Hypothesis wrong when it is right.

There Can Be Errors

Types Of Errors:

In Hypothesis Analysis, there are two types of errors:

Type 1 And Type 2

Type 1 error: Null Hypothesis was correct but the analysis proved it wrong
Type 2 error: Null Hypothesis was wrong but the analysis couldn’t prove that it was wrong

Hypothesis Analysis Explained With An Example

Let’s assume you are an IT manager in a hedge fund. One of your critical systems runs an overnight batch and it has slowed down significantly.

The batch now takes on average 12 hours to complete daily. It has been notified by the support team and you are looking for alternative solutions to the current IT system.

As there is a cost associated with running batches for the hypothesis, the IT management concludes that it only makes sense to replace the existing framework with the new framework, if the new framework ensures on average each batch job completes in less than 6 hours.

This implies that if the test concludes that a job takes longer than 6 hours then the management will not accept the new IT framework.

An external consultancy contacts you and offers you to use their framework which would ensure on average each batch job completes in 6 hours.

Before you accept it blindly, you decide to test the Hypothesis. You get the framework installed on a test environment. Additionally, you then decide to run a sample of jobs; some at night and some in the mornings.

Test

A sample of 30 batch jobs is chosen. Let x be time of a batch job in a sample.

Null Hypothesis: Mean of sample jobs is less or equal to 6 hours
Alternative Hypothesis: Mean of sample jobs is equal or greater than 6 hours
You can see that your Alternative Hypothesis is one tailed as the mean of the jobs can turn out to be greater than 6 hours.

Additionally, you then attempt to run 30 batch jobs and calculate the mean and variance of your sample. As you know the variance of your sample, you can test using Z Statistics Test. There is always a room for errors (min. threshold) and it is the level of significance. You decide that the level of significance is 1% so you will only accept Null Hypothesis if the average job time falls in 1%

Perform z stats calculation — it has a well known formula
State your decision

These set of easy to follow steps can be used to articulate whether a hypothesis is correct or not. It helps one make conscious risk averse decisions.

Summary

The article highlighted the concept of Hypothesis Analysis which is used in a number of fields including risk management, finance, stats and artificial intelligence.

Furthermore, it helps researchers gain better insight into the data. Hope it helps.