avatarFarhad Malik

Summary

The provided web content outlines the process of hypothesis analysis in data science, detailing the steps to test a claim statistically.

Abstract

The article "Is There A Statistical Method To Test A Claim?" explains the core concepts of hypothesis analysis, a method widely used by researchers, statisticians, and quantitative analysts to make informed decisions based on data. It emphasizes the importance of stating a null hypothesis (H0) and an alternative hypothesis (Ha), collecting representative samples, choosing an alpha value to determine the significance level, deciding between one-tail or two-tail tests, selecting appropriate test statistics (T, Z, CHI, F), calculating the test statistics, and making a decision based on the results while acknowledging the possibility of Type 1 and Type 2 errors. The process is illustrated with an example of an IT manager testing a new framework's efficiency.

Opinions

  • The author believes that hypothesis analysis is a crucial tool in various fields, including machine learning and data science.
  • The article suggests that the success of hypothesis analysis heavily depends on the quality of the chosen sample.
  • It is implied that understanding and correctly applying statistical tests, such as chi-square, T, Z, F, is essential for accurate hypothesis testing.
  • The author conveys that there is always a chance of error in hypothesis analysis, highlighting the importance of considering Type 1 and Type 2 errors.
  • The article advocates for the use of hypothesis analysis to make "conscious risk averse decisions," indicating the author's view on the practical utility of statistical methods in decision-making processes.

Is There A Statistical Method To Test A Claim?

Use Data Science To Test Your Hypothesis

In this article, I will be explaining the core concepts of hypothesis analysis.

Hypothesis analysis helps researchers attain deeper insight about their data. Consequently, it allows them to make better decisions as they are backed by a set of mathematically calculated measures.

Article Aim

Hypothesis analysis is a well-known concept and is used extensively by researchers, statisticians and quantitative analysts.

It allows us to follow a set of formal steps to perform calculated analysis on their data. It is also widely used in machine learning and data science world.

We can use Hypothesis Analysis to formally test our hypothesis

Please read Disclaimer

Photo by Maarten van den Heuvel on Unsplash

What Are The Formal Statistical Steps?

  1. State The Hypothesis

2. Collect Samples

3. Choose Alpha

4. Choose Test Tail

5. Choose Test Statistics

6. Calculate Test Statistics

7. State The Claim

Photo by Daniel von Appen on Unsplash

Let’s Understand The Steps

Step 1. State the Hypothesis — Null & Alternative

In practice, two hypothesis assumptions are being made about the data:

  • One which is believed to be true; known as Null Hypothesis (H0)
  • One which is believed to be false; known as Alternative Hypothesis (Ha)

It is essential to ensure that both Null and Alternative hypothesis are quantifiable so that they can be measured during verification stage.

Importantly, neither null nor alternative hypothesis can be true at the same time. Hence both null and alternative hypothesis are mutually inclusive.

Photo by Nine Köpfer on Unsplash

Step 2. Gather The Sample To Represent Population

We are often required to assess and make a judgement about a population of data. As testing all observations in a population is occasionally impossible therefore a representative sample is chosen.

The sample is chosen such that it is the best representation of the population of data under test. Success of hypothesis analysis is based on the quality of the chosen sample.

Sample Has A Probability Distribution

A number of measures can be calculated once a sample is collected.

For an instance: mean, variance, kertosis, skewness and standard deviation are common set of measures which can be measured on a sample.

A sample can be thought of as a random variable having its own probability distribution, patterns and trends.

We can collect a number of samples and workout their means, standard deviation and variances to gain better insight into the data.

  • Mean of a sample is the sum of all possible values in a sample divided by the number of observations in a sample. It is the first moment.
  • Variance of a sample tells a statistician about dispersion of the random variable from its mean. It is the second moment. When calculating the variance, the nominator is chosen to be the size of the sample — 1 to ensure that the calculated values are unbiased.
  • Standard Deviation is the square root of the variance of the sample
  • Standard Error is the standard deviation measure of the sample.

Step 3: Let’s consider a valid level of significance — Alpha value

What is Alpha?

Alpha is the level of significance in Hypothesis analysis. To elaborate, Alpha is the range of values which can be accepted before Null Hypothesis is rejected.

It is the lower threshold.

The level of significance can be 1% or 5% for example.

Step 4: Is your test 1 tail or 2 tail

Alternative Hypothesis can take two forms:

One tail or Two tail

One tail Alternative Hypothesis Test:

One Tail Alternative Hypothesis are uni-directional tests. For an instance, let’s assume that you are an investor and want to test if returns of construction sector is greater than the returns of pharmaceutical sector so that you can make conscious decision before you invest your millions.

Your test is one directional as it’s simply testing returns of one sector vs the other.

Two tail Alternatives Hypothesis test:

Two tail alternative hypothesis tests are bidirectional tests and a statistician is interested in checking equality of data.

The results of the test can move in either direction.

For example, assume the Null Hypothesis states that on average, a batch job in an IT system takes 5 minutes to complete. On the other hand, Alternative Hypothesis can be that on average, a batch job in the IT system does not take 5 minutes.

Hence average time can move in either direction.

You might find that it takes on average 6 minutes or 4 minutes for the job to complete.

Photo by Campaign Creators on Unsplash

Step 5: Select Appropriate Statistics: T vs Z vs CHI vs F

A set of questions can be asked to figure out an appropriate test statistics:

  • Is data frequency known? If it is known then use chi squares test.
  • Is data variance known? If the answer is Yes then use Z statistics, otherwise use Student T statistics.

Each of the test statistics have their own formula which I have explained in my other blog.

Step 6: Calculate The Test Statistics

Based on the chosen test statistics in step 5, apply the formula and calculate the value. Compare the value with the level of significance.

If you want to understand how to measure the test statistics:

Step 7: State Decision

Based on the results of the calculation in step 6, whether the hypothesis analysis is accepted or rejected is stated.

These set of steps are dependent on the sample that was chosen and how good the tests were.

This implies that there is always a chance that an error was made. For example, the tests could end up proving Null Hypothesis wrong when it is right or could end up proving Alternative Hypothesis wrong when it is right.

There Can Be Errors

Types Of Errors:

In Hypothesis Analysis, there are two types of errors:

Type 1 And Type 2

  1. Type 1 error: Null Hypothesis was correct but the analysis proved it wrong
  2. Type 2 error: Null Hypothesis was wrong but the analysis couldn’t prove that it was wrong

Hypothesis Analysis Explained With An Example

Let’s assume you are an IT manager in a hedge fund. One of your critical systems runs an overnight batch and it has slowed down significantly.

The batch now takes on average 12 hours to complete daily. It has been notified by the support team and you are looking for alternative solutions to the current IT system.

As there is a cost associated with running batches for the hypothesis, the IT management concludes that it only makes sense to replace the existing framework with the new framework, if the new framework ensures on average each batch job completes in less than 6 hours.

This implies that if the test concludes that a job takes longer than 6 hours then the management will not accept the new IT framework.

Photo by JESHOOTS.COM on Unsplash

An external consultancy contacts you and offers you to use their framework which would ensure on average each batch job completes in 6 hours.

Before you accept it blindly, you decide to test the Hypothesis. You get the framework installed on a test environment. Additionally, you then decide to run a sample of jobs; some at night and some in the mornings.

Test

A sample of 30 batch jobs is chosen. Let x be time of a batch job in a sample.

  • Null Hypothesis: Mean of sample jobs is less or equal to 6 hours
  • Alternative Hypothesis: Mean of sample jobs is equal or greater than 6 hours
  • You can see that your Alternative Hypothesis is one tailed as the mean of the jobs can turn out to be greater than 6 hours.

Additionally, you then attempt to run 30 batch jobs and calculate the mean and variance of your sample. As you know the variance of your sample, you can test using Z Statistics Test. There is always a room for errors (min. threshold) and it is the level of significance. You decide that the level of significance is 1% so you will only accept Null Hypothesis if the average job time falls in 1%

  • Perform z stats calculation — it has a well known formula
  • State your decision

These set of easy to follow steps can be used to articulate whether a hypothesis is correct or not. It helps one make conscious risk averse decisions.

Summary

The article highlighted the concept of Hypothesis Analysis which is used in a number of fields including risk management, finance, stats and artificial intelligence.

Furthermore, it helps researchers gain better insight into the data. Hope it helps.

Data Science
Statistics
Fintech
Mathematics
Analysis
Recommended from ReadMedium