Discussion of hypothesis testing using correlation between technology adoption with the likelihood of getting scammed online
With reference to the previous article I wrote “Why all the time spent talking about hypotheses”, I will further expound on the discussion.
You have a set of data on hand about the number of people falling for online scams. You create a distribution based on the technology adoption rate: the group of people who don’t use technology at all at the far left of the distribution, and the group of people who use extremely sophisticated technology to the point they can counter-scam the scammers at the far right of the distribution. And we have the everyday users who adopt technology but take little precaution in preventing themselves from being scammed in the middle of the distribution. Presumably, we make this conjecture to validate my assumptions about the relationship between technology adoption and the number of people scammed:
H_0 = The number of people scammed is not correlated to technology adoption.
We are testing whether we can reject this null hypothesis using the study of Probability Theory. At a given distribution, if it does not conform to a particular distribution based on conjecture, we know that they are not related. If there is only 5% or less in the probability of deviating from the said distribution, then we know non-chance has a greater role and the distribution is said to be validated. As a result, we reject the null hypothesis and conclude that the number of people scammed IS correlated to technology adoption.
And we move on to a formal logic:
∀X∃x[Q(x)∨R(x)⟹S(x)]
In this logic, we are saying that:
In all members of the public X, there exists an individual x that is either low Q(x) or high adopter R(x) of technology, and by that, we infer (approximately at this point) the observance of a low probability of being scammed online.
Technically, we are saying those who do not even touch any technology or those who are extremely superb in technology are the ones who are less likely to be scammed, online.
Now, let’s assume we have this logic and it is proven to be correct. The more people use technology in a non-expert manner, the more likely people got scammed. I’m making this assumption for the sake of discussion and not for any scientific work.
In my previous article, I wrote about the contrapositive. So let’s make that formal logic a contrapositive one:
∀X∃x[~S(x)⟹~Q(x)∨~R(x)]
In this logic, we are saying that:
In all members of the public X, there exists an individual x that does not have a low probability of being scammed online, and we infer that this individual is neither low Q(x) nor high adopter R(x) of technology
When we reverse the logic, we are saying that people who get scammed the most are individuals who are in between the low and high adopters of technology.
So far so good! We have the positive and the contrapositive, AND assuming both make sense through statistical analysis, we arrive at the most important question:
What is the probability of NOT being scammed online for those in between the low and high adopters of technology?
You perform a separate study on individuals NOT being scammed. (The previous study was on being scammed)
I can assure you that the probability rate is much higher (NOT being scammed) than we think. Why? Simply because we allude to the proposition that being scammed online due to technology adoption (which makes sense in everyday life) is more likely a scenario we observe than the opposite. When we think of NOT being scammed, we attribute it to more factors. For example, if an individual uses technology every day and he or she is neither a low nor high adopter of technology, he or she is less likely to be scammed online if this individual is “once bitten twice shy” person (someone who was scammed once, but this time round he or she is more careful now).
This brings to the question of whether the null hypothesis is truly a meaningful one. But frankly speaking, we can do this exercise all day and write a 20-page exposition about the null hypothesis. But can we test it?
This is a challenge for scientists. At the top of our minds, we allude to the likelihood of believing a proposition and settling down with it after testing it, but it is simply more than just a one-liner sentence. Should we expand the null hypothesis to the extent of writing a 20-page exposition but fail to test it in real life? Or should we write a one-liner null hypothesis and expect it to represent the truth? What is the standard in scientific studies? For many centuries, we do not have a definitive answer to it.
This is exactly the question we need to ask ourselves as data scientists. There will be all kinds of significance tests — interval or point, whichever. But we can never truly satisfy the rejection of the null hypothesis to the extent of deriving truth.
And this is the reason why we observe an uprising of qualitative research — one which derives truth based on in-depth observation and study: typically through corpuses and manuscripts from ethnographic studies and interviews.
Perhaps we as data scientists need to broaden our scope and carefully examine what truly matters instead of taking the data on hand and deriving ‘truth’.

Daniel started off his career as a senior list researcher with a British publishing firm. Back then, his role involved contact sourcing through the internet and performed data entry into the Microsoft Dynamic CRM system. (Microsoft Dynamic CRM 3.0) Progressively, he explored the option of using Visual Basic scripting within excel to automate the contact sourcing process.
He successfully developed and implemented the scripts, leading to 95% increase in data entry efficiency. He then moved on to take on the role of a CRM executive with Fuji Xerox Singapore.
As a CRM executive, he liaised with third party vendor for technical enhancement of the CRM system (Microsoft Dynamic CRM 4.0 and 365). He also performs functional enhancement of the CRM system for hundreds of end users.
His notable achievement was the development of the CRM boy that led to 98% improvement in data quality and data integrity in the CRM system. Following his Masters studies in Consumer Insight with Nanyang Business School, he took on the role of an Analytics instructor with Singapore Management University. He prepared class notes and technical walkthrough, and taught Analytics to the undergraduate students from various disciplines. Subsequently, he took on various roles as consultants in the consultancy, manufacturing and information technology industries in Singapore.

He travelled to Paris, London, Sri Lanka, Japan and Malaysia to fulfill his role as a consultant. The cultural and professional exchanges between local and overseas data analytics had given him a very good overview of the expectations and motivations from people around the world. He also had a chance to relocate to the United States for one year, particularly focusing on Operations Management.
Prior to his current freelance status, he took on the role of the Data Science Lead in a Singaporean software company. His primary role was to develop Artificial Intelligence using logic, data science and machine learning techniques through in-depth, full-stacked scripting. He also developed customized Reporting for his customers. In his point of view, 95% of today’s reporting can be automated, which can free up staff from daily manual work.

He holds a Bachelor of Science in Marketing (BSc. Marketing Pass with Merit) from Singapore University of Social Sciences (in which he graduated as a Valedictorian), a Master of Science in Marketing and Consumer Insights (MSc. Marketing and Consumer Insights) from Nanyang Technological University, a Doctor of Business Administration (DBA) from Swiss School of Business and Management.
A Message from DataFrens…
Thanks for being a part of our community!
Do join us here at:
Read all our DataFrens articles here at:






