avatarDr. Daniel Koh

Summary

The web content discusses the F (Fisher) Distribution in statistics, its application in comparing variances between groups, and the career journey of Dr. Daniel Koh, who has contributed significantly to CRM systems and data analytics.

Abstract

The article "Probability Theory #7 — F (Fisher) Distribution" delves into the concept of the F Distribution, a statistical tool used to analyze the differences in variances between two sets of data. It is particularly useful when comparing test scores, immigration patterns, or other group characteristics to determine if observed differences are statistically significant and not due to random chance. The author, Dr. Daniel Koh, shares his fascination with the F Distribution, emphasizing the importance of meaningful differences between groups while maintaining similarities within them. The text also provides a professional background of Dr. Koh, highlighting his achievements in enhancing data entry efficiency, data quality, and his global consulting experience in data analytics. His career has spanned roles from research to education to leadership in data science, reflecting a deep commitment to the field of analytics and operational management.

Opinions

  • The author expresses a personal preference for the F Distribution, finding it logical and meaningful for group comparisons.
  • Dr. Koh values the significance of within-group similarities and between-group differences for the integrity of data analysis.
  • The article suggests that a 5% or less likelihood of chance occurrence is acceptable for establishing a non-chance pattern in data.
  • The author implies that understanding the factors behind group differences, such as in immigration patterns, is crucial for creating clearer profiles, which could be of interest to immigration officers.
  • Dr. Koh's notable achievements in CRM system enhancements and data analytics automation indicate his belief in the importance of efficiency and data integrity in business operations.
  • The author's global experience has provided him with insights into the diverse expectations and motivations of data analytics professionals worldwide.
  • Dr. Koh advocates for the automation of reporting processes, estimating that 95% of today's reporting could be automated to save time and resources.
  • The author's educational background, including a Doctor of Business Administration, reflects his dedication to continuous learning and expertise in marketing and consumer insights.

Probability Theory #7 — F (Fisher) Distribution

F Distribution — also known as Fisher–Snedecor distribution — is about differences in variances. Let’s have GPT4.0 explain to us in straightforward terms.

Imagine you’re comparing two groups of students from different schools, School A and School B. You want to know if one school consistently has better test scores than the other.

Think of the F distribution as a tool that helps you decide whether the differences in test scores between the two schools are real or just a result of random chance.

As you have read, it is about comparing two groups of data (in this case, students) and whether one group has better test scores. We want to approach it using the scientific method via a statistical approach.

What is interesting about this distribution is that

we assume members within each group do not differ much among themselves, but the average of all members within each group does differ much between groups.

Personally, this is one of my favourite distributions. It makes total sense, based on the fact that members who are assigned to the group should be similar to each other since these members describe the same group they are in. That’s the reason why we have meaningful groups.

But at the same time, groups should differ from each other, as a meaningful group is only meaningful when they differ apart from each other.

Now the question is how much ‘differences’ should be observed so that we can say with certainty that they are different scientifically speaking?

I have gone to great lengths to discuss chance and non-chance (pattern) in my previous writings. And when we say that we want to observe non-chance or pattern, we want to make sure that the likelihood of chance playing a bigger role in our data is 5% or less. If we draw a parallel comparison between the distribution of the differences within and between groups, and the model data (technically we call it F-distribution), do we observe a significant shift away from the model data?

One of the most useful applications for this distribution is the identification of the uniqueness of groups of nationalities when it comes to immigration. Ideally, we should observe certain similarities among fellow countrymen. However, as we know, Americans who grew up in the USA with both parents being American too differ quite a lot from Asians who grew up in Asia with both parents being Asian. The key question is this:

What factors truly determine these differences?

I’m pretty sure our immigration officers will be keen to know these factors, so as to provide a clearer profile of visitors.

Dr. Daniel Koh

Daniel started off his career as a senior list researcher with a British publishing firm. Back then, his role involved contact sourcing through the internet and performed data entry into the Microsoft Dynamic CRM system. (Microsoft Dynamic CRM 3.0) Progressively, he explored the option of using Visual Basic scripting within excel to automate the contact sourcing process.

He successfully developed and implemented the scripts, leading to 95% increase in data entry efficiency. He then moved on to take on the role of a CRM executive with Fuji Xerox Singapore.

As a CRM executive, he liaised with third party vendor for technical enhancement of the CRM system (Microsoft Dynamic CRM 4.0 and 365). He also performs functional enhancement of the CRM system for hundreds of end users.

His notable achievement was the development of the CRM boy that led to 98% improvement in data quality and data integrity in the CRM system. Following his Masters studies in Consumer Insight with Nanyang Business School, he took on the role of an Analytics instructor with Singapore Management University. He prepared class notes and technical walkthrough, and taught Analytics to the undergraduate students from various disciplines. Subsequently, he took on various roles as consultants in the consultancy, manufacturing and information technology industries in Singapore.

He travelled to Paris, London, Sri Lanka, Japan and Malaysia to fulfill his role as a consultant. The cultural and professional exchanges between local and overseas data analytics had given him a very good overview of the expectations and motivations from people around the world. He also had a chance to relocate to the United States for one year, particularly focusing on Operations Management.

Prior to his current freelance status, he took on the role of the Data Science Lead in a Singaporean software company. His primary role was to develop Artificial Intelligence using logic, data science and machine learning techniques through in-depth, full-stacked scripting. He also developed customized Reporting for his customers. In his point of view, 95% of today’s reporting can be automated, which can free up staff from daily manual work.

He holds a Bachelor of Science in Marketing (BSc. Marketing Pass with Merit) from Singapore University of Social Sciences (in which he graduated as a Valedictorian), a Master of Science in Marketing and Consumer Insights (MSc. Marketing and Consumer Insights) from Nanyang Technological University, a Doctor of Business Administration (DBA) from Swiss School of Business and Management.

Data Science
Statistics
Probability
Recommended from ReadMedium