avatarDr Mehmet Yildiz (Tech)

Summarize

Big Data and Its Management for Proper Use in Business

A high-level, simplified, architectural, and design approach to managing large amounts of data for digital transformation

Photo by clark cruz from Pexels

In this article, I provide a pragmatic architectural panorama of the Big Data management lifecycle in a digital transformation context, traversing its distinct phases and offering key definitions and critical solution considerations summarizing key points from my published books.

Digital transformation is a comprehensive overhaul of organizational processes, activities, and models through the strategic integration of digital technologies. It involves leveraging cutting-edge tech tools, data analytics, cloud computing, and emerging technologies to enhance efficiency, customer experiences, and business outcomes.

The goal is to adopt digital solutions and fundamentally reshape how a business operates, adapts to market dynamics, and delivers value, ensuring long-term relevance and competitiveness in a rapidly evolving digital landscape.

As a pivotal catalyst for artificial intelligence, cognitive computing, and various AI subsets like machine learning, deep learning, expert systems, and neural networks, Big Data solutions emerge as indispensable players in the global business landscape.

In this intricate balance of technology, a profound grasp of the Big Data lifecycle and the ability to architect and design solutions with pragmatic rigor become essential skills for AI professionals and business stakeholders.

Drawing from architectural experience gained through successful and failed projects, the insights I present here carry the weight of practical wisdom.

The realization that architecting Big Data solutions with pragmatism and rigor significantly contributes to the delivery of high-quality AI and Cognitive solutions is a central theme.

To maintain conceptual clarity, my introduction begins by outlining data architecture at a high level. Amidst the plethora of definitions found in data management literature, textbooks, and user-generated content, my interpretation aligns with the context, content, and purpose of this article.

What is data architecture?

Data architecture is the systematic process of collecting data from diverse sources and navigating data sets, practices, and platforms from a current state to a future state. This is accomplished using established frameworks and models, ultimately crafting business insights from the data solutions.

The term ‘description’ stands out in this definition, emphasizing the articulation of the data’s life cycle — from collection to processing, storage, utilization, and archival.

Big Data needs to be architected by solution architects and designed by specialists. A Big Data solution architect assumes the responsibility of crafting this architectural description, mapping the trajectory from a current to a target state.

Equally crucial is the term ‘manipulation,’ encapsulating the intricate balance of moving and transforming data structures, items, groups, and stores. This is where specialists play a critical role depending on project requirements.

This encompasses pivotal architectural and design activities, like integrating data artifacts into the application landscape, managing communications, interactions, data flow, analysis, source and target locations, and understanding data consumption profiles.

Getting to Grips with Data and Big Data

Data refers to raw facts, information, or statistics collected and stored for various purposes. It can take diverse forms, such as numbers, text, images, or any other representation of facts.

Data is the foundation for generating insights, making informed decisions, and supporting various operations in computing and information systems.

The interpretation and analysis of data lead to meaningful information, providing a basis for understanding patterns, trends, and relationships that contribute to informed decision-making and problem-solving across different domains.

Big Data is everywhere, but it’s not our typical data. What makes it stand out are things like how much of it there is, how fast it comes at us, the different types it can be, how trustworthy it is, and the overall complexity it brings to the data world.

In a nutshell, Big Data is like a treasure hunt for valuable business insights in a jungle of data. And you need some serious tools and skills to navigate it all. Let’s dive into the ABCs of Big Data.

First off, let’s agree on what Big Data is. Gartner, a big name in the tech world, sums it up as data that’s high in volume, speed, and variety. Basically, it’s a ton of information that needs some smart handling to make sense of it and make good decisions.

Now, let’s break down the V-words that make Big Data what it is.

Volume: Think of this as the size of the data — how much of it is floating around. We’re talking about loads of terabytes, petabytes, or even exabytes — no fixed rule on how big is big, though.

Velocity: This is about how fast data is popping up. Real-time data from phones, social media, IoT gadgets, you name it. It’s like a data race happening in the blink of an eye.

Variety: Big Data isn’t picky. It comes in all shapes and sizes — structured, semi-structured, and unstructured. Think transaction records, website logs, and even videos and pictures.

Veracity: Quality matters. Keeping data accurate and reliable can be a real challenge with all that data flying around. But it’s crucial to make sense of it and get useful insights.

Value: This is the end game for Big Data. The whole point is to turn all that chaos into something valuable for businesses. A smart approach and input from everyone involved make this happen.

Sometimes, I tease my friends, saying Big Data isn’t for the faint-hearted. It’s got more data types and poses a challenge to extract the good stuff. Think of it as a puzzle with a massive number of pieces.

Although Big Data is like a cousin to traditional data, we can’t handle it with the same old tricks. We need new methods and tools to tackle its unique traits. The usual ways just won’t cut it.

So, the whole process involves grabbing a bunch of data from different sources, storing it, analyzing it, searching through it, moving it around, sharing it, updating it, visualizing it, and keeping it all in check. Sounds like a lot, right?

The funny thing is, the main deal with Big Data isn’t the sheer amount of data itself. It’s more about using smart analytics techniques to dig out the gold from this mountain of information.

There are different types of analytics — descriptive tells us what’s happening right now, predictive looks into the future, prescriptive suggests actions to take, and diagnostic figures out why something happened.

Building Blocks of Data from Concept to Reality

Let’s talk about how we handle data — it’s like creating a delicious cake layer by layer. As Big Data architects, we take a top-down approach, breaking it down into three essential layers: conceptual, logical, and physical.

First up is the conceptual layer. Think of it as the big picture, where we outline the business entities connected to the data. What are the key players in this data game? That’s what we will figure out here.

Moving down to the logical layer, we get into the nitty-gritty. This layer dives into how these data entities relate to each other. Imagine it as connecting the dots between different aspects of your data world. What connects to what? That’s what logical layer is all about.

Last but not least, we hit the physical layer. Here’s where things get real. The physical layer is all about the nuts and bolts, the actual mechanisms and functions that bring the data to life. It’s like turning those concepts and relationships into something tangible and functional.

Now, let’s zoom out and look at the bigger picture — the lifecycle management. It’s how we nurture these layers throughout their journey.

An Overview of the Big Data Journey

Being a Big Data architects is like being captain of a ship. It is crucial to understand the entire journey, from start to finish. We play a key role in all phases of the Big Data lifecycle, guiding the technical ship through its various stages.

Our roles may shift in each phase, but the big picture is to oversee the entire journey. Based on my experience and insights from industry sources, the Big Data lifecycle mirrors the traditional data lifecycle but with a Big Data twist.

Here’s a breakdown of the distinct phases we sail through:

1. Foundations: Setting the groundwork for the entire journey.

2. Acquisition: Bringing in the data from various sources.

3. Preparation: Getting the data ready for its next adventure.

4. Input and Access: Managing how data comes in and who gets to access it.

5. Processing: Crunching the numbers and making sense of the data.

6. Output and Interpretation: Sharing the insights gained from the data.

7. Storage: Finding a safe home for all that valuable information.

8. Integration: Making sure all the pieces fit together seamlessly.

9. Analytics and Visualization: Using tools to understand and showcase the data.

10. Consumption: Letting the right people make use of the data.

11. Retention, Backup, and Archival: Safeguarding the data for the long haul.

12. Destruction: Knowing when it’s time to bid farewell to certain data.

These phases might go by different names in different teams, but the essence remains the same. There’s no one-size-fits-all approach to the Big Data lifecycle, as the field is still evolving. We’re taking lessons from traditional data management and tailoring them for the unique challenges of Big Data.

Consider these phases as guiding stars in your journey, and feel free to customize them based on your organization’s needs. The key is to stay flexible and adapt to the ever-changing data seascape.

Phase 1: Foundations — Laying the Groundwork

Imagine we’re building a house; Phase 1 is where we prepare the land and set the foundation. In the Big Data world, Foundations are all about understanding the lay of the land and getting everything ready for the data journey.

Here’s what this phase involves:

Figuring out what data we need and why.

Defining the boundaries of our data solutions.

Identifying who does what in our data adventure.

Getting our technical landscape ready for action.

Thinking about both technical and non-technical factors.

Understanding the dos and don’ts of data in our organization.

This phase is like creating a detailed plan for our data journey. It’s not a one-person show; a project manager, the Big Data solution architect, and some data specialists join forces to craft a robust plan.

Project Managers compile the Project Definition Report (PDR), a document with all the nitty-gritty details — plans, funding, resources, risks, assumptions, issues, and dependencies. The Big Data Architect steps in to provide the essential solution overview, shaping the blueprint for our data-driven project.

So, think of Phase 1 as setting the stage for a blockbuster data adventure — with a solid plan, we’re ready to dive into the next phases of our Big Data journey.

Phase 2: Data Acquisition — Gathering the Treasures

Now that we’ve laid our foundation, it’s time to gather the data treasures for our Big Data adventure. Think of it as setting sail to discover the data seas.

Here’s what happens in Data Acquisition:

Data comes from all directions — internal, external, structured, semi-structured, or even in the wild, unstructured forms like video and audio.

Experts in data and database administration take charge of collecting these valuable datasets.

While data specialists do the heavy lifting, the Big Data architect ensures things are ship-shape. We’re not just observers; we’re leaders in this phase.

We start setting the rules — ensuring data behaves, stays secure, and respects privacy. Quality control is our compass.

The lead Big Data architect, joined by enterprise and business architects, spearheads the strategy. We lead the charge in making crucial decisions, documenting use cases, and specifying the technical nitty-gritty.

Team Effort: For big business solutions, we might enlist help. Domain architects and data specialists join the crew, each playing a vital role.

Picture us as captains steering the ship through uncharted data waters at this phase. With strategy, leadership, and a stellar team, we’re ready to hoist the sails and explore the vast seas of information in the next phases.

Phase 3: Data Preparation — Polishing the Gems

Now that we’ve hauled in our data treasures, it’s time to polish them to perfection. Think of it as cleaning and shining the gems we’ve discovered in the data mines.

Here’s what happens in Data Preparation:

The data, in its raw, uncut form, gets a thorough cleaning. Some call it cleansing; others prefer preparation. We’re making it sparkle.

We turn into data detectives, hunting for inconsistencies, errors, and duplicates. Anything that doesn’t belong is flagged.

Redundant, duplicated, incomplete, and incorrect data? They’re shown the door. Our goal is pristine, usable datasets.

While data specialists roll up their sleeves, the Big Data solution architect oversees the operation. We ensure the cleaning process aligns with the grand vision.

Detailed cleaning work is often handled by data specialists. They’re the experts in the art of data preparation and cleaning.

Think of us as master jewelers, meticulously refining each gem for maximum brilliance at this phase. With a keen eye for detail, our data sets are now polished and ready for the next phases of our data adventure.

Phase 4: Data Input and Access — Guiding Data to its Destination

Our polished data gems are ready to embark on their journey to designated havens. Welcome to the crucial phase of guiding and accessing our refined data.

Here’s the plan on Data Input and Access:

Data input is about dispatching our refined data to its intended destinations. Whether it’s a CRM application, a data lake for the brilliant minds of data scientists, or a data warehouse for specific departments, each gem finds its home.

In this phase, our data specialists step in. They work their magic, transforming the raw data into a format that’s not just usable but optimal for its chosen destination.

Accessing data comes in various flavors. Think relational databases, flat files, or the ever-popular NoSQL, especially in the realm of Big Data. It’s like choosing the right key to unlock the data vault.

While the captain of the ship is the Big Data solution architect, we trust our skilled crew — data specialists and database administrators — to handle the nitty-gritty details. They execute the input and access plans under our strategic guidance.

Picture us as air traffic controllers at this phase, ensuring each data gem takes off smoothly, following its flight plan to the precise destination. Our refined data is on the move, ready to make waves in the realms of CRM, data lakes, and warehouses.

Phase 5: Data Processing in a Nutshell

Enter the realm of Data Processing, where the raw data undergoes a magical transformation. It’s not just about deciphering the language of data; it’s about orchestrating a symphony of insights.

Here’s the score for Data Processing:

It all starts with the raw data, a bit like discovering a hidden treasure. We process it, molding it into a readable format that carries both form and context. This transformation opens the door to interpreting data using the chosen analytics tools in our arsenal.

Meet our trusty companions in this phase — Hadoop MapReduce, Impala, Hive, Pig, Spark SQL, HBase, and Spark Streaming. Each tool plays a unique note in our symphony of data processing. Whether it’s batch processing or real-time magic, we’ve got the tools.

Data processing isn’t just about making data readable. It’s an art that involves annotation, integration, aggregation, and representation. Annotation labels the data, integration weaves together diverse data sources, and aggregation compiles data into harmonious datasets.

Data may change its attire based on the audience. Processed data struts its stuff in data lakes, enterprise networks, and connected devices. The format adapts to consumer needs, ensuring a perfect fit.

Dive deeper, and we find ourselves in the domain of advanced processing techniques. Spark MLib, Spark GraphX, and a plethora of machine learning tools stand ready for a more profound exploration of our data sets.

It’s a team effort. While the lead Big Data solution architect orchestrates the symphony, the performers — data specialists, stewards, engineers, and scientists — take center stage. Everyone brings a unique skill set to the table, creating a harmonious blend of expertise.

Data Processing is a journey where raw data finds its melody, transforming into a symphony of insights ready to captivate the audience.

Phase 6: Data Output and Interpretation

Enter the Data Output and interpretation phase, where the curtain rises on the data’s performance for business users. No fluff, just the essentials:

Ready for Consumption: Here, data takes its final form, tailored for easy consumption by business users. It can be dressed in plain text, graphs, processed images, or video files — whatever suits the audience.

Data Ingestion: This phase doubles as the Data Ingestion process in some circles. Once the data is ready, it’s off to the next stop — storage. Think of it as the data preparing for its grand entrance into the archives.

Real-time or Batch: Depending on the setting, data can be ingested in real-time or in batches. Tools like Sqoop, Flume, and Spark Streaming take the spotlight, handling the influx of data gracefully.

Interpreting the Act: The main act here is interpretation. This involves scrutinizing the ingested data, extracting meaningful information, and answering the burning questions related to Big Data business solutions.

In a nutshell, Phase 6 is the grand reveal — the moment when data steps into the spotlight, ready to deliver its insights to the eagerly waiting audience of business users.

Phase 7: Data Storage — The Digital Vault

As we wrap up the Data Output phase, it’s time to tuck away our data in purpose-built storage units securely.

Storage Blueprint: Designated storage units, carefully outlined in the data platform, stand-ready. This blueprint considers critical non-functional aspects — capacity, scalability, security, compliance, performance, and availability.

Storage Infrastructure: These units can take the form of Storage Area Networks (SAN), Network-Attached Storage (NAS), or Direct Access Storage (DAS). Data and database administrators step in to manage stored data, granting access to specified user groups.

Underlying Technologies: Big Data storage doesn’t just stop at basic units. It extends to sophisticated technologies like database clusters, relational data storage, or specialized systems like HDFS and HBASE, known for their open-source prowess.

File Formats Matter: Don’t forget the fine print — file formats matter. Text, binary, or specialized formats like Sequence, Avro, and Parquet play a crucial role in shaping the storage design phase.

Phase 7 ensures our data is securely stowed away, accessible when needed, and poised for the next leg of its data journey.

Phase 8: Data Integration — Bridging the Digital Divides

In the conventional realm, storing data is often the final chapter. However, in Big Data’s epic, another chapter unfolds — Data Integration. Here’s the play-by-play:

The Integration Saga: Stored data isn’t meant to live in isolation. It craves interaction with different systems for varied purposes. Enter Data Integration, a critical architectural move in the Big Data realm.

Architectural Complexity: Data integration isn’t a walk in the park; it’s a complex dance. Big Data architects step onto the stage, choreographing the use of diverse data connectors. Think ODBC, JDBC, Kafka, DB2, Amazon S3, Netezza, Teradata, Oracle — the list goes on. The chosen connectors depend on the data sources in play.

Connect-the-Dots: Some data models yearn for a harmonious blend — integrating data lakes with data warehouses or data marts. Application integration also takes the spotlight, aligning Big Data seamlessly with dashboards, Tableau, websites, or various data visualization applications.

Overlapping Realms: The integration act may share the spotlight with its next-door neighbor, Data Analytics, as they tag-team to unveil insights.

Phase 8 marks the bridge-building moment, where data connections intertwine, setting the stage for the grand symphony of insights to come.

Phase 9: Data Analytics and Visualisation

In the vast expanse of integrated data, a pivotal chapter unfolds — Data Analytics and visualization. Let’s demystify this phase:

Integrated data isn’t a mere spectator; it craves interpretation and revelation. Data Analytics takes center stage, a linchpin in extracting business value from the Big Data saga. Visualization becomes the artist, translating raw data into meaningful insights.

Armed with a toolbox, analytics, and visualization unfold. Scala, Python, and R notebooks emerge as the unsung heroes, weaving a narrative of insights in my Big Data tales. Python steals the spotlight, a versatile virtuoso empowering machine learning endeavors.

In your business symphony, a chief data scientist might lead the orchestra of data analytics. They steer the ship, ensuring the team harmonizes with the business goals.

While Big Data solution architects play a supporting role, their watchful eyes ensure the lifecycle unfolds with architectural rigor. A dance between data scientists and architects, ensuring insights are not just discovered but architecturally sound.

Phase 10: Data Consumption

After the analytical crucible, data metamorphoses into consumable information. This is where internal and external users, including esteemed customers, step into the limelight.

The aftermath of data analytics isn’t just insights; it’s a consumable feast of information. This is the phase where the organization opens its doors, offering the gleaned knowledge to those within and beyond its walls.

But, the banquet is not a chaotic spread. Policies, rules, regulations, principles, and guidelines govern this feast. Here, architectural input becomes imperative — defining the framework that orchestrates the consumption process.

Imagine data consumption as a well-regulated service provision. The bodies of data governance step in, crafting regulations that govern how this service is provisioned. It’s a structured approach, ensuring that data isn’t just consumed but consumed responsibly.

In this symphony of consumption, the lead Big Data Solution Architect assumes the role of a guiding conductor. They lead and facilitate the creation of these architectural policies, rules, principles, and guidelines. It’s not just about serving data; it’s about serving it right.

Phase 10 is where data transcends analytics, becoming a consumable entity — a narrative crafted with architectural precision.

Phase 11: Retention, Backup, and Archival

In this phase, the focus sharpens on safeguarding critical data. We’re talking about a strategic choreography of retention, backup, and archival practices — a meticulous dance of protection and compliance.

The necessity of safeguarding critical data is acknowledged universally. To achieve this, we employ established data backup strategies, methods, techniques, and tools. It’s not just a backup; it’s a systematic orchestration of data protection.

Here, the Big Data solution architect takes the lead in identifying, documenting, and obtaining approval for the strategies that will shape retention, backup, and archival decisions. It’s a role that requires not just technical expertise but a keen understanding of business requirements.

While the architect outlines the blueprint, the detailed design might be delegated. Infrastructure architects, with the support of domain specialists in data, databases, storage, and recovery, become the craftsmen shaping the protective infrastructure.

Sometimes, data needs to be preserved for regulatory or business reasons. This is where archival decisions come into play. A documented data retention strategy takes center stage, receiving the nod of approval from governing bodies, especially enterprise architects. Implementation becomes a collaborative effort involving infrastructure architects and storage specialists.

In Phase 11, it’s not just about securing data; it’s about architecting a fortress of protection and compliance — an essential chapter in the lifecycle where data durability is etched with precision.

Phase 12: Data Destruction — Finale and Re-Beginning

In the grand finale of the data lifecycle, destruction takes the spotlight — a choreographed act driven by regulatory demands and industry nuances.

Enter regulatory requirements, setting the stage for the scheduled demise of specific data types. Timelines are defined, and compliance becomes the guiding force. It’s not just a phase; it’s a legal script that demands meticulous adherence.

The plot thickens as the specific requirements for data destruction change based on the industry landscape. What applies to one may not resonate with another. Flexibility becomes a key player, adapting the script to the unique scenes of each business organization.

While the data lifecycle suggests a chronological order, the reality may embrace a more dynamic dance. Some phases gracefully overlap, orchestrating a parallel ballet. It’s a nuanced performance, synchronized yet flexible.

The lifecycle presented here serves as a guideline — a roadmap that offers awareness of the overarching process. However, it’s not a rigid script. Think of it as sheet music, waiting for the skilled conductor — your data solution team — to infuse it with life.

As the curtains fall on the lifecycle, the essence of destruction is not just an end; it’s a prelude to renewal. It’s the acknowledgment that, in the world of data, even endings pave the way for new beginnings.

For more information about digital transformation, you may check the Digital Transformation Handbook for Solution Architects, summarized by Dr Mehmet Yildiz on Medium.

Thank you for reading my perspectives.

If you are interested in technology, business, and leadership topics, I invite you to follow my new account and subscribe to my mailing list on Substack.

To inform my readers, I wrote numerous articles from my main account that might inform and inspire you. My topics include brain and cognitive function, significant health conditions, valuable nutrients, ketogenic lifestyle, self-healing, weight management, writing/reading, and humor. 100+ Insightful Life Lessons from My Circles for the Last 50+ Years

I publish my health and wellness stories on EUPHORIA. My posts do not include professional or health advice. I only document my reviews, observations, experiences, and perspectives to provide information and create awareness. Be Your Own Therapist in 10 Steps.

To get my post in your inbox, you may subscribe to my content. If you are a writer, you are welcome to join my publications by sending a request via this link. I support 24K+ writers who contribute to my publications on this platform.

Big Data
Data
Data Science
Technology
Digital Transformation
Recommended from ReadMedium