The Secret to Modernizing Data Analytics Platforms
What are the Key Elements that are used to meet the Challenges?

This article is intended to clarify the phenomenon of Cloud and the implications for the topics like Data Science, Big Data and other related fields. Nowadays, even Data Warehouses and Data Lakes can be made available via SaaS (e.g. Google’s Big Query or Amazones Redshift). A lot of services can be activated and used with the click of a button. They can also often be used more cheaply and without much effort. The cloud also usually offers significantly more performance and computing power than you could realize yourself in a data center.
Cloud vs. On-Premise
Cloud computing refers to the provision of software via the Internet. The application is hosted in a data center and made available to the user via an encrypted connection. Unlike its counterpart, the on-premise model, the software is not installed and maintained in a company’s private data center [1]. This makes topics such as Machine Learning, Deep Learning or Big Data possible and interesting even for smaller and medium-sized companies.
Myths around the Cloud
A myth surrounding cloud solutions is the fact that administrators fear losing control over their infrastructure. However, this is not true. They just give up tasks relating to the operation and maintenance of servers or operating systems and can use this time for more valuable tasks. Also, many users believe that they either have to move all their applications to the cloud or work entirely with locally operated servers. But hybrid models are common for example, by outsourcing their email server to the cloud, but continuing to run the database and file servers locally [2].
New Paradigms
Here are the three most important new paradigms, that the cloud is creating in the area of data.
Move the Compute to the Data Instead of moving data, for example from one layer to the next or from one system to the next, data is simply loaded into a data lake and processed from there by various tools. Read more about it here [3]. An important factor is the fact that the data science process can be streamlined remarkably. Every data scientist and engineer knows how time consuming the process can be — so the approach of having everything you need in a cloud environment or even in a service simplifies this process significantly.

It results in a simplification of programmability — for example, analysts can easily do machine learning tasks only with SQL. In the following, I like to illustrate this with a simple example in Google BigQuery. If you want to dive deeper into the topic Data Platform modernization you might find this article[4] interesting.
Data as a Product
Data is the new oil, but how can new products be developed from it? What is important here, is that the company becomes a data-driven company. Learn more about it here [5]. Data-driven products are products that primarily use data to contribute value to the achievement of a company’s goals. This can be the collection and sale of user data, or the provision of APIs on a common platform of business partners in order to improve the efficiency of individual processes.
Self Service Analysis
Self service analytics platforms are becoming more and more important. With easy-to-use and easily available business intelligence and analytics tools, ordinary employees are empowered to work with data and gain useful insights. The analyses and reports support the various areas in the company in their decision-making processes. To enable self-service business intelligence, access to the data warehouse must be provided via intuitive tools and applications. This enables the specialist departments to access, prepare and analyze data. Even during the introduction of business intelligence applications in the company, care must be taken to ensure that the system supports self-services.

Advantages of the Cloud
- Agility: Environments can be created quickly and easily in the cloud.
- Elasticity: Resources can be added and removed on demand.
- Reduced IT Administration Effort: No need to operate and maintain IT resources, upgrades are performed by the provider.
- Access and Availability: Cloud platforms provide a high level of availability and offer a wide range of access options.
- Security: Cloud platform providers offer “state-of-the-art” security.
- Increased Data Availability: Reliable availability of IT services through professional operation and maintenance of IT by the provider.
These advantages are of course valid for IT departments in general, but especially for the topics around data, the technology of cloud is extremely useful. In this way, the cloud offers the opportunity for big data in the first place, as it provides enough resources and does so cost-effectively and offers SaaS solutions such as BI tools, ML services and many others to help an organization become data-driven.
Conclusion
Every company can benefit from the cloud in different ways. But especially in the area of data and the topics of Big Data, Data Analytics and Machine Learning, the cloud is a driver if not a prerequisite. Thanks to the cloud, start-ups and medium-sized companies, in particular, can address these topics relatively quickly and inexpensively. Without the cloud, this would be much more difficult due to the required infrastructure and the resulting costs. However, I recommend that you think about possible use cases in advance, identify the possible benefits and build a data analysis platform in the cloud that is ideally suited to the company.
Sources and Further Readings
[1] Cisco, What Is a Data Center (2021)
[2] Gartner, The Top 10 Cloud Myths (2021)
[3] Christian Lauer, Bring Machine Learning to the Data(2021)
[4] Christian Lauer, How to set up a modern Data Analytics Platform(2021)
[5] Christian Lauer, Why You Should Build Up a Data-Driven Corporate Culture (2021)
