avatarJosep Ferrer

Summary

Big Data encompasses large, complex datasets that challenge traditional processing methods, with its characteristics defined by the three Vs: volume, velocity, and variety, and more recently, value and veracity.

Abstract

The concept of Big Data refers to datasets so vast and complex that conventional data processing tools are inadequate. It is characterized by the three Vs: high volume, high velocity, and a wide variety of data types. The evolution of Big Data is traced from historical data analysis efforts, such as John Graunt's work during the bubonic plague, to modern-day challenges posed by the Internet and personal computing devices. The term gained prominence in the mid-2000s as the volume of data generated by online services like Facebook and YouTube necessitated new technologies for data handling. Big Data's potential lies in its ability to reveal patterns and correlations, leading to predictions and insights across various fields. However, Big Data is not merely about data size; it also involves the data's mixed, unstructured nature and rapid accumulation, which traditional software cannot handle. For data to be considered Big Data, it must be voluminous, diverse, and rapidly generated or processed, often ranging from terabytes to petabytes. The true value of Big Data lies in its veracity and the ability to extract meaningful insights, which can lead to more informed business decisions and the development of new products.

Opinions

  • The author suggests that the term "Big Data" is often misunderstood or misused, with many people claiming expertise without a true understanding of its complexities.
  • There is an emphasis on the historical progression of data analysis, acknowledging that while the term Big Data is new, the practice of analyzing data for decision-making is centuries old.
  • The author points out that Big Data is not just about the size of the dataset but also about the challenges it presents in terms of processing and analysis, requiring specialized tools and techniques.
  • The article implies that Big Data has become a valuable asset, particularly for tech companies that analyze it to enhance efficiency and innovate products.
  • The author advocates for a critical approach to what is considered Big Data, encouraging readers to differentiate between truly massive and complex datasets and those that are simply large.
  • The author highlights the importance of the additional Vs—value and veracity—in Big Data, stressing that data must be truthful and contain meaningful information to be useful.

What makes big data really be BIG Data?

And everything you need to know about this new buzzword.

Image by Fullvector on Freepik

Everyone right now is talking about Big Data. Big Data seems to be the new trend. So let’s just imagine you are giving your own speech — on any fancy stage you can imagine of — about any stuff. Yet, everyone in the public is just dying of boredom… But, all of the sudden, you just pronounce it— BIG DATA— and then… everyone stares at you. Just out of the blue, everyone is completely focused on what you say.

Congrats! You just used the one and only word — Big Data. Everyone talks about it… everyone seems to love it… but, do they really know what it is?

I have recently changed jobs two months ago… and suddenly I found out everyone thinks they know what is Big Data — but actually they do not. Surprisingly, I myself did not either — so then I said, okay… maybe this is a tricky one.

So let’s find out together what is Big Data! — And why literally everyone wants to be part of it.

Big Data — The origins

Big data refers to data that is so large, fast, or complex that it is difficult or impossible to process using traditional methods. Although the term Big Data itself is relatively new, the importance of studying data — either small or big — to infer meaningful information is actually quite old. Over the course of centuries, people have been trying to use data analysis to support their decision-making process.

Data Analytics precedents

Surprisingly enough, we can recall this data analytics historical “beginning” around the later 1660s when John Graunt dealt with overwhelming amounts of information while he studied the bubonic plague, which was haunting Europe at the time. Graunt was the first-ever person to use some statistical data analysis and has been the founding father of human demography, epidemiology, and vital statistics.

Data becomes a BIG Problem

Later, in the early 1800s, the field of statistics expanded to include collecting and analyzing data. However, in 1880 Data originated the first-ever overwhelming data-managing problem — The US Census Bureau announced that they estimated it would take more than eight years to handle and process all data collected during the census program, as you can still read on their official webpage here. Fortunately, only just one year later, Herman Hollerith invented the Hollerith Tabulating Machine that reduced this calculation work problem. These first tabulating machines opened the world’s eyes to the very first idea of Data processing.

Standardization of Data Processing

Throughout the 20th century, Data analysis evolved at an unexpected speed. During World War II and due to the desperate need to crack Nazi cryptic codes, the British invented the Colossus, which could scan 5.000 characters per second, reducing the workload from weeks to merely hours. Colossus was the first real data processor. After that, the first large data sets originated between the 1960s and ’70s when the world of data was just getting started with the first data centers and the development of the relational database. In 1965, the US government built the first data center, with the intention of storing millions of fingerprint sets and tax returns.

Internet and Personal Computers

ARPANET began on Oct 29, 1969, when a message was sent from UCLA’s host computer to Stanford’s host computer. Since then, technology just started to develop at a breathtaking speed: The first personal computers around 1977, the all-conquering Nokia 3310 crash-landed on shop shelves around 2000, the first iPhone was released in 2007, and so on… — Our modern society has been flooded by new electronic devices that are both producing and collecting data. And with the growth in the Internet of Things, data streams now into businesses and individuals at an unprecedented rate and must be handled in a really fast timely manner.

Picture by PCH-Vector

A new concept is born — Big Data

Around the mid-2000s, people began to realize just how much data users were generating through Facebook, YouTube, and many other online services. This constant flow of data needed to be handled somehow — collected and processed — and new technologies needed to be developed to do so. This is why, the concept of Big Data gained momentum when industry analyst Doug Laney articulated the now-mainstream definition of Big Data as data that contains greater variety, arriving in increasing volumes and with more velocity. This is also known as the three Vs.

Volume: The number of data matters. With Big Data, you’ll have to process high volumes of data of low density containing a lot of useful hidden information, but stored in a scattered way— and unstructured data — it comes without a predefined order or structure.

Velocity: Data needs to be collected really really fast. Velocity is the fast rate at which data is received and (hopefully) acted on. Some internet-enabled smart products operate in real-time or near real-time — which means really really fast — and will require real-time evaluation and action

Variety. Data comes in all types of formats. From structured, numeric data in traditional databases to unstructured text documents, emails, videos, audios, stock ticker data, and financial transactions. Data can literally come in any form — which means tools need to be versatile and prepared for any unknown types.

Just to put it simply:

Big Data referrers to data sets or combinations of data sets whose size (Volume), complexity (Variety) and growth rate (Velocity) make it difficult to collect, manage, process and analyze them using conventional technologies and tools like relational databases or conventional statistics, within the time it takes for them to be useful.

The potential here is that if we crunch true Big Data, we can make an attempt to establish patterns and correlations between seemingly random events in the world. Then, by establishing and testing hypotheses, we could understand causality, so predictions and deep insights could be obtained. Additionally, these massive volumes of data can be used to address business problems that couldn’t be tackled before.

Is Big Data just a lot of Data?

If this question was stuck in your mind, now you have your answer: Certainly not. Big Data is not just more data. It is so much data, that is so mixed and unstructured, and is accumulating so rapidly, that traditional techniques and methodologies including normal — conventional — software do not really work (like Excel or any other).

So the first signal that what you are dealing with is not Big Data is being able to manage your data set using a plain Excel sheet — I am sorry to tell you… but, definitely, what you have is totally not Big Data.

But now let’s imagine that I have a data set with more than thousands of entries — is it Big Data? Definitely not. According to Rob Kitchen and Gavin McArdle's research, Big Data volumes typically range between terabytes or petabytes. However, Big Data does not only consists of having big-sized databases. These data sets need to be further challenging somehow. One of the main conclusions of Kitchen and McArdle is that the key definitional boundary markers for Big Data are the traits of velocity and exhaustivity — rather than the volume itself.

The value — and truth — of big data

Today, Big Data has become capital. A large part of the value that some of the world’s biggest tech companies offer comes from their data, which they’re constantly analyzing to produce more efficiency and develop new products. However, all this data needs to have some intrinsic value. This is why Big Data should have two more characteristics: value and veracity. Data is of no use until that value is discovered and once this value is found out, we need to rely on it. This is why data needs to contain some meaningful information as well— and this information needs to be truthful.

Finding value in Big Data isn’t just about analyzing it — which is a whole other benefit. It is an entire discovery process that requires insightful analysts, business users, and good executives who know how to ask the right questions, recognize patterns, infer good assumptions, and predict behavior.

Big Data Use Cases

Recent technological breakthroughs have exponentially reduced the cost of data storage and computing, making it easier and less expensive to store more data than ever before. With an increased volume of Big Data now cheaper and more accessible, it is possible to make more accurate and precise business decisions. Hence, Big Data can be applied to literally anything you can imagine. Some of the most common use cases are:

  • Recommendation Engines We are all used now to having personalized recommendations. Netflix, Instagram, Spotify… all popular digital services take advantage of big data to have a personalized selection for every customer. Big Data, with its scalability and power to process massive amounts of data that can enable companies to analyze billions of clicks and view data from you and other users like you for the best recommendations. Over time, through machine learning and predictive analytics, the recommendations become better tailored to the user’s taste.
  • Customer Experience The race for customers is on. Big Data provides retailers with a more accurate view of the customer experience. This information can be used to improve their operations and products, hence further enhancing the customer experience of their users. Big Data analytics can be used to deliver personalized offers, reduce customer churn, and proactively handle issues.
  • Pricing Analytics and Optimization Companies need to know the true profitability of their customers, how markets can be segmented, and infer the potential of any future opportunities. End-to-end profit and margin analysis can help with identifying pricing improvement opportunities and areas where profits may be leaking.

If you want to check more use cases, below you can find a full list of them!

BIG Data or just “Data”

All in all, Big Data has been growing really fast during these last years. This is why the world seems to have fallen under a Big Data fever and calls anything that is remotely similar — basically containing a little bit of data — just Big Data. Sometimes, we should just stop, breathe a bit, and reconsider how we assess problems.

Next time you hear something related to Big Data — just stop for a while, reconsider it and conclude if it is really BIG Data or just some “Data”.

Feel free to share your thoughts and experiences in the comments! ✨

You can subscribe to my Medium Newsletter to stay tuned and receive my content. I promise it will be unique!

If you are not a full Medium member yet, just check it out here to support me and many other writers. It really helps :D

Some other nice medium-related articles you should go check out! :D

Data Science
Writing
Technology
Tech
Advice
Recommended from ReadMedium