avatarDr. Daniel Koh

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

3030

Abstract

in which the data was collected and the method of the data collection, to ascertain whether the p-value is a valid measurement for the truthfulness of a hypothesis. At times, there are simply too many factors in the environment that might influence the picking of the apples. For example, the apples in the basket were placed by someone who intentionally took more green apples than red apples when the ratio of apples in the market is balanced. The p-value may affirm the hypothesis, but it is not a representation of all the apples in the supermarket.</p><p id="109f">Data scientists need to avoid misusing the p-value. One of the approaches is to work with other team members and understand the environment and method in which the data is collected. For this reason, a data scientist can never work alone and the valuable information given by the team members can help the data scientist validate the use of the p-value. We need to be more careful and take a step back before deciding the truthfulness of a hypothesis by using the p-value.</p><p id="8c7e">Note: “Now, some of you may question what I have written. I kindly ask my peers to allow me to explain the p-value in the simplest terms possible. I intend to help the layperson understand the significance of it. I acknowledge the existence of a distribution, with its left tail and right tail, and how a skew affects the truthfulness of the p-value. I prefer to leave this complex statistical explanation to a one-to-one conversation.”</p><p id="d3f5">(Source: <a href="https://danthescientist.net/training/283/">https://danthescientist.net/training/283/</a>)</p><figure id="0e43"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*hUzXXbWnQht5xBpaeHWtZg.jpeg"><figcaption><b>Dr. Daniel Koh</b></figcaption></figure><p id="77a1">Daniel started off his career as a senior list researcher with a British publishing firm. Back then, his role involved contact sourcing through the internet and performed data entry into the Microsoft Dynamic CRM system. (Microsoft Dynamic CRM 3.0) Progressively, he explored the option of using Visual Basic scripting within excel to automate the contact sourcing process.</p><p id="6ece">He successfully developed and implemented the scripts, leading to 95% increase in data entry efficiency. He then moved on to take on the role of a CRM executive with Fuji Xerox Singapore.</p><p id="4040">As a CRM executive, he liaised with third party vendor for technical enhancement of the CRM system (Microsoft Dynamic CRM 4.0 and 365). He also performs functional enhancement of the CRM system for hundreds of end users.</p><p id="5a95">His notable achievement was the development of the CRM boy that led to 98% improvement in data quality and data integrity in the CRM system. Following his Masters studies in Consumer Insight with Nanyang Business School, he took on the role of an Analytics instructor with Singapore Management University. He prepared class notes and technical walkthrough, and taught Analytics to the undergradua

Options

te students from various disciplines. Subsequently, he took on various roles as consultants in the consultancy, manufacturing and information technology industries in Singapore.</p><figure id="7ea7"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*VWhPhUkYYQx2VBMU.jpeg"><figcaption></figcaption></figure><p id="87c8">He travelled to Paris, London, Sri Lanka, Japan and Malaysia to fulfill his role as a consultant. The cultural and professional exchanges between local and overseas data analytics had given him a very good overview of the expectations and motivations from people around the world. He also had a chance to relocate to the United States for one year, particularly focusing on Operations Management.</p><p id="62f9">Prior to his current freelance status, he took on the role of the Data Science Lead in a Singaporean software company. His primary role was to develop Artificial Intelligence using logic, data science and machine learning techniques through in-depth, full-stacked scripting. He also developed customized Reporting for his customers. In his point of view, 95% of today’s reporting can be automated, which can free up staff from daily manual work.</p><figure id="e41d"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*FvkpvIpBBAPadr0u.jpeg"><figcaption></figcaption></figure><p id="326e">He holds a Bachelor of Science in Marketing (BSc. Marketing Pass with Merit) from Singapore University of Social Sciences (in which he graduated as a Valedictorian), a Master of Science in Marketing and Consumer Insights (MSc. Marketing and Consumer Insights) from Nanyang Technological University, a Doctor of Business Administration (DBA) from Swiss School of Business and Management.</p><h1 id="3a41">A Message from DataFrens…</h1><p id="9683">Thanks for being a part of our community!</p><p id="8e6e">Do join us here at:</p><div id="4c8f" class="link-block"> <a href="https://www.datafrens.sg/"> <div> <div> <h2>www.DataFrens.sg</h2> <div><h3>Data Introduced Us Frenship Bonded Us</h3></div> <div><p>www.datafrens.sg</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*it9sezMoFzLZK0Gk)"></div> </div> </div> </a> </div><p id="4e96">Read all our DataFrens articles here at:</p><div id="4994" class="link-block"> <a href="https://medium.com/datafrens-sg"> <div> <div> <h2>DataFrens.sg</h2> <div><h3>A Place for All DataFrens to Blog about Data Stuff…www.DataFrens.sg</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*1zvJEgNLM209Ecvqco5MJw.png)"></div> </div> </div> </a> </div></article></body>

Explaining the p-value using apples in a basket

We have heard of the p-value at some point in our lives. Either in school or through reading scientific articles, researchers often use the p-value to decide whether a conjecture is true. And this conjecture is often expressed by a hypothesis.

A hypothesis is nothing more than a guess. And we validate the truthfulness of a guess by challenging it. Imagine I have a basket that contains full of red and green apples in a supermarket. They are all mixed up. I would like to ask you to guess whether there are more green apples than red apples in that single basket. We do not know whether this guess is true (perhaps the room is dark or you are struggling to differentiate the colors). We need to validate it so that your guess is true.

Researchers often use statistical methods to challenge the hypothesis. And one of the values that they look out for is the p-value. A p-value tells us whether what we see or do happen by chance at a certain percentage likelihood. In the example of the apples in the supermarket, we can say that picking a red apple happens by chance when we picked green apples sequentially in the first nineteen times and a red apple in the twentieth time. In this scenario, we ask ourselves why a red apple was not picked in the first nineteen times. There must be so many green apples in the basket, to the extent that the random pick of an apple is more likely a green apple. And it is by chance that we pick the red apple.

Notice that the ratio of green apples and red apples matters. And to challenge our hypothesis, since we pick a red apple by chance, we know that the number of green apples is significantly more than the number of red apples. Assuming that the ratio of apples in the basket represents all apples in the supermarket, we can say that there are significantly more green apples than red apples in the supermarket.

In a simple explanation, the p-value is the likelihood of picking the red apple by chance. And the likelihood of picking a red apple by chance is very low when there are significantly more green apples than red apples in the basket. We will only validate a guess as true when the picking of a red apple happens only once out of all the twenty rounds of picking. 1 out of 20 gives us 5%.

There are many controversial arguments about this school of thought. Nonetheless, in this short article, I hope to help you understand what p-value is, and why it matters to data science.

While data scientists use the p-value to determine the truthfulness of a hypothesis, we also need to know the environment in which the data was collected and the method of the data collection, to ascertain whether the p-value is a valid measurement for the truthfulness of a hypothesis. At times, there are simply too many factors in the environment that might influence the picking of the apples. For example, the apples in the basket were placed by someone who intentionally took more green apples than red apples when the ratio of apples in the market is balanced. The p-value may affirm the hypothesis, but it is not a representation of all the apples in the supermarket.

Data scientists need to avoid misusing the p-value. One of the approaches is to work with other team members and understand the environment and method in which the data is collected. For this reason, a data scientist can never work alone and the valuable information given by the team members can help the data scientist validate the use of the p-value. We need to be more careful and take a step back before deciding the truthfulness of a hypothesis by using the p-value.

Note: “Now, some of you may question what I have written. I kindly ask my peers to allow me to explain the p-value in the simplest terms possible. I intend to help the layperson understand the significance of it. I acknowledge the existence of a distribution, with its left tail and right tail, and how a skew affects the truthfulness of the p-value. I prefer to leave this complex statistical explanation to a one-to-one conversation.”

(Source: https://danthescientist.net/training/283/)

Dr. Daniel Koh

Daniel started off his career as a senior list researcher with a British publishing firm. Back then, his role involved contact sourcing through the internet and performed data entry into the Microsoft Dynamic CRM system. (Microsoft Dynamic CRM 3.0) Progressively, he explored the option of using Visual Basic scripting within excel to automate the contact sourcing process.

He successfully developed and implemented the scripts, leading to 95% increase in data entry efficiency. He then moved on to take on the role of a CRM executive with Fuji Xerox Singapore.

As a CRM executive, he liaised with third party vendor for technical enhancement of the CRM system (Microsoft Dynamic CRM 4.0 and 365). He also performs functional enhancement of the CRM system for hundreds of end users.

His notable achievement was the development of the CRM boy that led to 98% improvement in data quality and data integrity in the CRM system. Following his Masters studies in Consumer Insight with Nanyang Business School, he took on the role of an Analytics instructor with Singapore Management University. He prepared class notes and technical walkthrough, and taught Analytics to the undergraduate students from various disciplines. Subsequently, he took on various roles as consultants in the consultancy, manufacturing and information technology industries in Singapore.

He travelled to Paris, London, Sri Lanka, Japan and Malaysia to fulfill his role as a consultant. The cultural and professional exchanges between local and overseas data analytics had given him a very good overview of the expectations and motivations from people around the world. He also had a chance to relocate to the United States for one year, particularly focusing on Operations Management.

Prior to his current freelance status, he took on the role of the Data Science Lead in a Singaporean software company. His primary role was to develop Artificial Intelligence using logic, data science and machine learning techniques through in-depth, full-stacked scripting. He also developed customized Reporting for his customers. In his point of view, 95% of today’s reporting can be automated, which can free up staff from daily manual work.

He holds a Bachelor of Science in Marketing (BSc. Marketing Pass with Merit) from Singapore University of Social Sciences (in which he graduated as a Valedictorian), a Master of Science in Marketing and Consumer Insights (MSc. Marketing and Consumer Insights) from Nanyang Technological University, a Doctor of Business Administration (DBA) from Swiss School of Business and Management.

A Message from DataFrens…

Thanks for being a part of our community!

Do join us here at:

Read all our DataFrens articles here at:

Data Science
P Value
Statistics
Recommended from ReadMedium