The Bullshit of Devops, We Developers Simply Don’t Want to Do Operations!
Originally considered using “DevOps is dead, platform engineering is the future” as the title, but such an expression may be too absolute. In the end, it was decided to use the word “nonsense” to describe DevOps, but this is not a civilized way of expression. The article aims to re-examine DevOps and platform engineering, exploring the concepts of DevOps and platform engineering respectively, and focusing on analyzing some core content advocated by platform engineering.
The goal of DevOps
In 2009, the concept of DevOps was proposed, emphasizing team collaboration, automation tools, and process improvement to increase the speed and quality of software development and deployment. However, nearly 15 years after its introduction, it has been found that this method has not achieved its goals as expected. Within our company, we have also found that the cost of software delivery remains high. From the perspective of deployment tools such as J-ONE, JDOS, or the current Xingyun Deployment platform, there are still certain costs for developers in daily deployment and release processes. But this phenomenon seems to be more than just a tool-level issue.
DevOps itself is a concept that emphasizes team collaboration to enable close cooperation between development teams and operations teams. Although it emphasizes the importance of automation and tools, it does not indicate specific directions for development. Therefore, the concept of Platform Engineering emerged. Although it is no longer possible to verify who originally proposed it, in July 2022, a message on Twitter “DevOps is dead; long live Platform Engineering” quickly spread within domestic and international DevOps circles and received widespread response.
Platform Engineering is a new concept of operation and maintenance, emphasizing that internal development platforms should provide technical research and development personnel with self-service capabilities. One of its core viewpoints is to shield the complexity of infrastructure, providing technical research and development personnel with flexible toolchains and workflows. In this way, by utilizing the basic capabilities of the platform, one can independently solve problems without relying on involvement from the platform layer, enabling development teams to work more efficiently and improve the speed and quality of software delivery.
Definition of platform engineering
Platform engineering is the discipline of designing and building toolchains and workflows that provide self-service capabilities for software engineering organizations in the cloud-native era. The integrated products provided by platform engineers are typically referred to as “internal developer platforms,” covering operational needs throughout the entire lifecycle of an application. — Definition from platformengineering.org (There are various definitions of platform engineering, but most share a common theme: advocating self-service to reduce the complexity and uncertainty of underlying infrastructure support tools, streamline workflows, lower cognitive costs for end-users during usage, thereby enhancing user experience and improving productivity.)
Platform engineering and DevOps are concepts in the fields of software development and operations. They both focus on improving the efficiency and quality of software development and deployment, but their emphasis and methods differ. Platform engineering emphasizes building reusable platform architectures, providing scenario-based capabilities, and offering a self-service experience. On the other hand, DevOps focuses on team collaboration, automation tools, and process improvements to enhance the speed and quality of software development and deployment.
In 2023, Gartner identified platform engineering as one of the top strategic trends. In its recently released Top 10 Technology Trends for 2024 report, Gartner once again mentioned platform engineering and elevated its status by a level. This indicates that platform engineering is gaining further recognition in the industry.
Why developers don’t want to do operations?
DevOps emphasizes team collaboration and encourages developers to take on certain operational tasks. However, why is this often difficult to achieve in reality? I believe there are several reasons for this:
• Focus on core development tasks: Developers usually prefer daily software development tasks and may not have much time and energy for other aspects, as it could affect the progress of their daily tasks.
• Unfamiliarity or lack of interest: Developers may not have enough experience to handle operational work, or they may not be interested in such work, leading to a lack of enthusiasm for operations.
• Heavy burden and complexity of operations: Operational work involves the production environment, so its responsibilities and impact are significant. Any operational mistakes can lead to serious consequences such as system failures, service interruptions, or data loss. Therefore, taking on operational work may bring additional pressure and responsibility for developers. In addition, operational work typically includes various trivial and complex tasks, including 24/7 shifts.
• Lack of user-friendly tools and platform support: Without easy-to-use and efficient automation tools and platforms, operational work will rely more on manual operations, increasing the cost and complexity of operations.
The above reasons might explain why developers are reluctant to take on operational tasks. Next up is exploring the essence of operations.
The essence of operations and maintenance work
The key focus of operations and maintenance work is to ensure the security and stable operation of the system. It not only requires 24/7 monitoring of the stability of the online environment but also involves handling various daily operational tasks. These tasks may include resource management, routine inspections, fault troubleshooting and repair, ticket processing, etc.
Recently, some major factories have experienced significant online stability failures, attracting a lot of attention from the industry.
These recent online failures have sent a strong warning to the entire industry, as all companies are facing similar challenges in online stability.
Some thoughts brought about
Safety in Production, Alarm Bells Ringing: When facing online issues, we must not simply pursue speed and convenience. For any online operation, we must maintain a sense of awe.
Safety in Production, Everyone is Responsible: Whether it’s incorrect code logic written by developers or upgrade operations errors made by operations personnel, they could ultimately bring immeasurable losses to the company.
The stability of the production environment is most difficult not in terms of technology but relies on countless details being implemented. Ensuring stability requires a significant investment. However, the biggest issue with this matter is that it’s hard to be recognized and how can one measure success? There was once an internet joke that roughly meant “Those who write code without bugs often go unnoticed or may even get fired; conversely, those classmates who frequently introduce bugs tend to thrive because they are busy fixing bugs daily.” Of course, one reason why developers are unwilling to take on operations responsibilities is indeed due to the heavy responsibility for online stability. At the same time, operational work carries a heavy burden and lacks suitable tools and platforms for support.
However, Platform Engineering has been proposed as a new concept aimed at addressing these issues and improving software delivery processes. Next up for discussion: what are the key factors for the success of Platform Engineering compared to DevOps?
Key Factors for Platform Engineering Success
How to promote platform engineering within the company
As a relatively novel concept, platform engineering has been recognized by Gartner for two consecutive years, pushing it to an important position that we must pay attention to. To promote platform engineering within the company, I believe the following aspects need to be clarified:
• Platform Scope: There are many tools internally. Firstly, establish authoritative or certified tools for continuous investment and iteration instead of developing separately to avoid duplication and waste of costs.
• Platform Culture: Who is the platform ultimately made for and serving? Technical developers are our gods. Establish a platform culture primarily serving technical developers while also meeting company management perspectives.
• Platform Objectives: The core objective is to build scenario-based tools so that technical personnel can self-service in the platform with scenario-based self-service as the core objective.
• Platform Owner: IPD within enterprises cannot be centralized in one department. Therefore, determining specific scenario owners is crucial to eliminate unclear responsibility boundaries.
• Source of Requirements: All based on development requirements, balancing user experience for developers, avoiding large-scale version upgrades leading to system migration and resource migration which result in additional usage costs.
• Platform API: Internal platforms should naturally have rich APIs to meet internal development needs and should also provide detailed documentation for technical personnel use.
In conclusion, we discussed how to promote platform engineering internally from a global perspective. Next, let’s explore what qualities tools under platform engineering should possess:
What kind of tools are built under platform engineering?
I believe that internal tools are more important compared to consumer-facing products. This is because consumers have the choice to select products, but internal staff do not have much room for choice. At most, they can only complain a few times but still need to continue using the tools. To create tools that satisfy internal staff, I think at least the following key attributes are needed:
• Productization: Internal tool platforms must be productized and positioned to serve the entire group rather than just a few individuals or dozens of people within their department. The target users should be all research and development colleagues within the group to develop the tools effectively.
• User Experience: Emphasize user experience by providing not only basic GUI interfaces and API capabilities but also focusing on shielding complex backend logic to reduce user usage costs.
- Integration: When discussing integration here, it is not just about integrating various tools onto a platform through a tool market like current practices such as Xingyun/Taishan. These steps only complete the first phase; instead, focus on R&D usage scenarios with an application-centric workspace perspective. For example, during deployment, integrate observable views such as monitoring, logs, contingency plans, alarms, etc. allowing users to meet all needs of that scenario in one place.
- Self-Service: Users should be able to fulfill all functions without assistance from platform colleagues. For example, when we go to a bank counter for cash withdrawal it requires manual assistance from bank personnel; however, through an ATM we can fully withdraw money independently.
Internal development team under the platform engineering division
In the context of platform engineering, internal development teams may have the following common situations these four aspects:
• Productization: Internal tools are particularly easy to customize in terms of demand control. After some time, they may evolve into customized products for certain individuals or small departments.
• Priority: Often receive or face high-priority demands from “a certain C-x boss.”
• Recognition: Due to being disconnected from the business, it is difficult to measure value. Over time, doubts about the recognition of output value may arise.
• Repetitive Construction: The problem of repetitive construction of internal tools and platforms is more serious.
I believe that internal platform teams should adhere to the following key points:
• Continuously collect user requirements and plan the long-term roadmap for the platform.
• Improve user manuals and best practices to enhance user experience.
• Maintain an open mindset and provide APIs.
- Continuously promote and operate the responsible platform (giving birth to a child and raising a child).
- In response to repetitive construction issues, strengthen cooperative construction efforts to avoid getting stuck in small-scale self-indulgent “personal/departmental tool” development.
The future of platform engineering
Currently, major companies such as Google, Spotify, Netflix, Walmart, and many others are actively promoting the implementation of platform engineering within their enterprises. In November, CNCF officially released the Capability Maturity Model for Platform Engineering, dividing it into four levels across five dimensions. The maturity model released by CNCF is relatively coarse-grained, mainly evaluating aspects such as team/personnel, platform applications, user experience, self-service, and platform iteration without detailed division on platform functionality dimensions.
Gartner predicts that by 2026, 80% of software engineering organizations will establish platform teams, with 75% including developer self-service portals. These platform teams will be established as internal providers of reusable services, components, and tools for application delivery.
It is evident that platform engineering is not just a trend; it is the future of software delivery.