avatarJeremiah Lowin

Summary

Prefect introduces an open-source framework aimed at simplifying data engineering by addressing both positive and negative aspects of the field, with a focus on reducing the time spent on defensive coding.

Abstract

Prefect is an open-source framework designed to streamline the process of data engineering by providing tools that address both the creative and defensive aspects of the discipline. The team at Prefect recognizes that data engineers typically spend 90% of their time dealing with potential failures and defensive coding, which they refer to as "negative data engineering." By focusing on this often-overlooked area, Prefect's framework aims to double the positive productivity of engineers by automating best practices and handling common issues that, while appearing unique in isolation, exhibit patterns when viewed collectively. Inspired by the challenges faced within Apache Airflow and the broader data community, Prefect has been developed to offer a simple yet powerful solution that can detect and correct problems efficiently, much like the human brain processes language.

Opinions

  • The Prefect team values transparency and inclusivity, sharing updates with both internal and external stakeholders.
  • They believe that there is significant leverage in improving the efficiency of negative engineering tasks.
  • Prefect's approach to risk management in data engineering is informed by the idea that a set of generalized tools and concepts can effectively handle a wide range of unknown issues.
  • The creators of Prefect have identified universal patterns in data engineering challenges through extensive engagement with the community.
  • They assert that most data applications can be constructed from a simple vocabulary of building blocks, which Prefect facilitates.
  • Prefect positions itself as both a provider of components (the "hardware store") and as a guide (the "store manager") to ensure the success of data engineering projects.
  • The team at Prefect takes pride in tackling problems that users find particularly frustrating and time-consuming.
Image by Courtney Corlew on Unsplash

Positive and Negative Engineering

Don’t Panic.

Prefect has a culture of transparency, which involves sharing news — both good and bad — with our employees and investors. In keeping with those values and the spirit of open-source, we’d like to also include the broader community when possible. This post is excerpted from our August 2018 investor update.

Hello, world! Our team is excited to announce Prefect, an open-source framework for building robust data applications. Prefect was inspired by observing frictions between data engineers and data scientists, and solves these problems with a functional API for defining and executing data workflows. Prefect is currently being used by our Lighthouse Partners — email us if you’re interested!

We’ve got a lot to share, but I want to kick off this blog by revisiting our team’s motivation and what we’re so excited about.

The biggest problem data engineers face is a Sisyphean task that we call “negative data engineering.”

  • Positive data engineering is what we typically think engineers do: write code to achieve an objective.
  • Negative data engineering is when engineers write defensive code to make sure the positive code actually runs. For example: what happens if data arrives malformed? What if the database goes down? What if the computer running the code fails? What if the code succeeds but the computer fails before it can report the success? Negative engineering is characterized by needing to anticipate this infinity of possible failures.

Engineers tell us they typically spend 90% of their time on negative or defensive issues, and just 10% on the positive solutions they were hired to build. That means there’s extraordinary leverage in focusing on negative engineering: if we can reduce the negative share to just 80%, we can effectively double engineers’ positive productivity, because they can spend 20% of their time on functional code.

However, if you look across the data landscape, you’ll see hardly any acknowledgment of this problem. Most people will tell you it’s simply too hard for a third party to solve these issues because, by their very nature, they are so specific to a company’s unique business practices.

At Prefect, we know better. Thinking about a multitude of implausible but critical negative outcomes is something I’ve done my entire career as a risk manager. In risk, one doesn’t attempt to predict and hedge every possible result; instead, one develops a repertoire of concepts and tools that is specific enough to be useful, but generalized enough to robustly handle the unknown. One of the key requirements to doing that effectively is being able to gather relevant experience as fast as possible.

For the past three years, I’ve been a PMC member of Apache Airflow, the most widely-used open source software for data engineering workflows. In addition to giving me valuable insight into a variety of technical challenges, that also means I’ve received thousands of emails from data engineers and scientists looking for help with their problems. Through those conversations, I gained particular insight into the negative engineering problem. In isolation, each issue does, in fact, appear unique. But in aggregate, striking patterns appear: the same universal problems manifesting over and over. For a long time, I attempted to solve these issues within the confines of Airflow; when I reached Airflow’s limits, I started designing Prefect. That was almost two years ago.

Today, Prefect is the codification of the patterns we observe in modern data engineering. We’ve worked very hard to build a system that can automatically enable best practices, even for data applications it’s never seen before. To see how this works, consider how you can immediately recognize that “the sky is blue” is right and “sky is blue the” is wrong without memorizing every combination of words in the English language. Just as your brain has broad rules for language, Prefect can detect when something is wrong even when we can’t pinpoint exactly what or why. This capability makes negative engineering much easier and saves our users unbelievable amounts of time and headache.

Prefect is an exercise in simplicity. Negative engineering problems are not always complex, or sophisticated, or difficult. On the contrary: they are often minor, annoying, and repetitive. Consequently, they fall through the sieve we use to identify major issues, even though their aggregate impact is extraordinary. At Prefect, our discovery has been that most data applications can be decomposed into a simple vocabulary, and by focusing on those basic building blocks, we can solve negative issues without sacrificing any of the power or sophistication that positive engineering demands. Our users are granted a creative license to combine those blocks in fascinating and unexpected ways, and Prefect serves as the lighthouse keeping them safe.

At our core, we provide two things. One is our open-source framework, which operates like a hardware store: stocked with all the necessary components for building great data applications. The other is our platform logic, which we think of as the store manager: guiding users to the right tools and making sure their projects are successful. With these two things working together, we can offer a compelling solution for both positive and negative engineering problems.

Fundamentally, we love to solve a problem that our users hate to have.

We’ve posted a brief technical introduction to Prefect, and can’t wait to share more very soon.

Happy engineering!

Software Development
Data Engineering
Data Science
Startup
Updates
Recommended from ReadMedium