Why spend so much time talking about hypotheses?

Summary

The provided content discusses the importance of understanding hypotheses and their tests in data science, emphasizing the distinction between chance and pattern to uncover truths in data.

Abstract

The article delves into the significance of hypotheses in data science, equating it to the pursuit of understanding the order and logic inherent in nature. It underscores the necessity for data scientists to discern the role of chance and pattern in data, drawing an analogy with the random selection of apples from a basket. The author references the "Tenth Man Rule" from the movie "World War Z" to illustrate the value of challenging consensus opinions and considering alternative explanations. The piece also introduces Dr. Daniel Koh, detailing his professional journey and contributions to the field of data science, including his development of automated processes and his role as a consultant and educator. The article concludes with an invitation to join the DataFrens community and engage with their content.

Opinions

The author believes that data science, as a branch of science, is about discovering the inherent order and logic in nature, which is not completely chaotic.
There is an emphasis on the need for data scientists to understand the interplay between chance and pattern to accurately interpret data.
The "Tenth Man Rule" is presented as a metaphor for the importance of considering dissenting views in data analysis, suggesting that a consensus may not always reflect the truth.
The author suggests that recognizing patterns is not enough; one must also evaluate the likelihood of events, as demonstrated by the example of picking apples from a basket.
Dr. Daniel Koh's career is highlighted to showcase the practical applications of data science and the impact of automation and data analytics in various industries.
The article conveys an opinion that a significant portion of reporting tasks can be automated, thereby increasing efficiency and freeing up human resources for more complex tasks.
The author advocates for cultural and professional exchanges in data analytics to gain a global perspective on the field.
The piece concludes with a call to action for readers to engage with the DataFrens community, indicating the author's view on the value of community and shared learning in data science.

Why spend so much time talking about hypotheses?

A few months ago, I explained the use of p-value by using the example of apples in the supermarket. (https://readmedium.com/explaining-the-p-value-using-apples-in-a-basket-24117f2ac4be) Then, I shared a segment of my doctoral dissertation explaining null hypotheses and alternative hypotheses. (https://readmedium.com/null-hypothesis-and-alternative-hypothesis-722a80feb6be) Why am I focusing all my efforts on trying to explain hypotheses and their tests?

Data science is science. And science is the discovery of nature. Nature has its system and preset logic, and nothing in nature is designed to be completely chaotic. There’s order. When we think of order, we think of chance and non-chance or pattern. We cannot ascertain an event to be a pattern when the series of activities happened by chaotic means. For this reason, data scientists — and by this we call ourselves scientists — need to understand the significance of chance and pattern in our daily lives.

In my previous two articles, I wrote about chance and non-chance. When we randomly pick apples from a basket full of red and green apples, it is due to chance when we pick a red or green apple and the ratio of red and green apples in a series of 20 pickings is 50 by 50. But to pick a green apple on the twentieth pick when the first nineteen picks are red apples, there is less likelihood that chance plays a larger role than pattern. (We can infer picking red apples in the first nineteen picks consecutively in a basket full of red and green apples as observing more red apples than green apples in the basket. Hence, the outcome of picking apples does not happen by chance.) However, we also need to understand what’s the likelihood of picking a red apple at the twentieth pick. To say that we have an equal likelihood of picking a green or red apple at the twentieth pick would mean that the pick is not by chance, but that does not necessarily mean we observe the truth. The basket of apples could have changed overnight without the knowledge of the scientist — for example.

In the movie “World War Z”, Mossad Chief Jurgen Warmbrunn mentioned the Tenth Man Rule. Quoting him:

If nine of us who get the same information arrived at the same conclusion, it’s the duty of the tenth man to disagree. No matter how improbable it may seem. The tenth man has to start thinking about the assumption that the other nine are wrong.

In our daily conversation, we call this the Devil’s Advocacy.

While we observe a pattern of nine people agreeing and coming to the same conclusion, there is also an equal likelihood of validating the tenth man. We may observe a pattern in having 9 people agreeing — or lower probability in observing chance — but it may not necessarily mean the truth. Chief Jurgen was the tenth man — and he correctly rejected the beliefs held by the 9 people and deduced the presence of the undead, allowing Israel to plan and execute their defense way before other countries.

And this is exactly what scientists — data or not — have to think.

Dr. Daniel Koh

Daniel started off his career as a senior list researcher with a British publishing firm. Back then, his role involved contact sourcing through the internet and performed data entry into the Microsoft Dynamic CRM system. (Microsoft Dynamic CRM 3.0) Progressively, he explored the option of using Visual Basic scripting within excel to automate the contact sourcing process.

He successfully developed and implemented the scripts, leading to 95% increase in data entry efficiency. He then moved on to take on the role of a CRM executive with Fuji Xerox Singapore.

As a CRM executive, he liaised with third party vendor for technical enhancement of the CRM system (Microsoft Dynamic CRM 4.0 and 365). He also performs functional enhancement of the CRM system for hundreds of end users.

His notable achievement was the development of the CRM boy that led to 98% improvement in data quality and data integrity in the CRM system. Following his Masters studies in Consumer Insight with Nanyang Business School, he took on the role of an Analytics instructor with Singapore Management University. He prepared class notes and technical walkthrough, and taught Analytics to the undergraduate students from various disciplines. Subsequently, he took on various roles as consultants in the consultancy, manufacturing and information technology industries in Singapore.

He travelled to Paris, London, Sri Lanka, Japan and Malaysia to fulfill his role as a consultant. The cultural and professional exchanges between local and overseas data analytics had given him a very good overview of the expectations and motivations from people around the world. He also had a chance to relocate to the United States for one year, particularly focusing on Operations Management.

Prior to his current freelance status, he took on the role of the Data Science Lead in a Singaporean software company. His primary role was to develop Artificial Intelligence using logic, data science and machine learning techniques through in-depth, full-stacked scripting. He also developed customized Reporting for his customers. In his point of view, 95% of today’s reporting can be automated, which can free up staff from daily manual work.

He holds a Bachelor of Science in Marketing (BSc. Marketing Pass with Merit) from Singapore University of Social Sciences (in which he graduated as a Valedictorian), a Master of Science in Marketing and Consumer Insights (MSc. Marketing and Consumer Insights) from Nanyang Technological University, a Doctor of Business Administration (DBA) from Swiss School of Business and Management.

Why spend so much time talking about hypotheses?

A Message from DataFrens…

www.DataFrens.sg

Data Introduced Us Frenship Bonded Us

DataFrens.sg

A Place for All DataFrens to Blog about Data Stuff…www.DataFrens.sg