avatarSarah Floris

Summary

Prefect is emerging as a superior alternative to Apache Airflow and Azure Data Factory for data pipeline orchestration, offering a more Pythonic approach, robust testing capabilities, diverse infrastructure models, an open-source community, and a free tier for moderate usage.

Abstract

The article discusses the author's transition to a new company where they are evaluating their data pipeline orchestration options. After finding Azure Data Factory's historical data monitoring to be faulty, they consider Airflow, which the author has experience with, but finds it cumbersome due to its complex server setup and non-Pythonic DAG implementation. Prefect is introduced as an alternative that stands out due to its simplicity, built-in testing capabilities, flexibility in infrastructure models that keep code within the user's environment, strong open-source community, and a generous free tier. The author highlights the ease of defining tasks and flows in a Pythonic manner with Prefect, the importance of testing in Prefect's design philosophy, its infrastructure agnosticism, and the supportive community behind it. Prefect's model is showcased through examples of code and references to comprehensive documentation and active community channels.

Opinions

  • The author is dissatisfied with Azure Data Factory due to its unreliable historical data monitoring.
  • Airflow is criticized for its complex setup and unintuitive workflow design, which hinders productivity.
  • Prefect is highly praised for its simplicity, allowing tasks and flows to be set up in a way that is natural for Python developers.
  • The emphasis on testing in Prefect is seen as a significant advantage, with built-in testing capabilities for flows and tasks.
  • Prefect's infrastructure models are viewed favorably as they keep code within the user's environment, providing a secure and flexible deployment model.
  • The author values the responsive and collaborative open-source community surrounding Prefect, which aids in troubleshooting and feature development.
  • The free tier offered by Prefect is considered a major selling point for small businesses and individual developers.

Airflow, there’s a new competitor in town.

Make your data pipeline orchestration future-proof with Prefect.

Photo by Boris Stefanik on Unsplash

I just started a new position at a great company and we are re-evaluating our options. To provide some context, we are in Azure and are using Azure Data Factory. We are not too happy with this setup as we do not have extensive historical data and the monitoring has been showing up as faulty e.g. pipeline is successful when it actually failed. Our options? Airflow or Prefect.

Now, Airflow also sucks. As someone who has used it before, setting up the airflow server is not an easy task and DAGs have to sync with the server before you can test whether it works. The DAGs are not set up in a pythonic manner and do not allow for easy transfer of data between tasks unless you set up XCOM. And lastly, our code gets pushed along to Airflow.

That left us well.. with Prefect. and oh gosh has Prefect changed my mind about workflow management.

Let’s go over the four top reasons to choose Prefect

  1. Simplicity
  2. Testing capabilities
  3. Their diverse infrastructure models
  4. Open Source Community
  5. Free (For the Most Part)
  6. Simplicity

The tasks and flows are set up similarly to how you would set it up in Python. You can add decorators too. I took this example from their GitHub page to show its simplicity of it.

from prefect import task, Flow, Parameter


@task(log_stdout=True)
def say_hello(name):
    print("Hello, {}!".format(name))


with Flow("My First Flow") as flow:
    name = Parameter('name')
    say_hello(name)


flow.run(name='world') # "Hello, world!"
flow.run(name='Marvin') # "Hello, Marvin!"

2. Testing Capabilities

They definitely believe in tests. Their platform has a multitude of tests for their own platform whether you are wondering about what tests they run on their flows or tasks (see here for more details).

But that is not all. You can easily add your own tests to tasks and flows.

from prefect import Task, Flow

class Extract(Task):
    def run(self):
        return 10

class Load(Task):
    def run(self, x):
        print(x)

flow = Flow("testing-example")

e = Extract()
l = Load()

flow.add_edge(upstream_task=e, downstream_task=l, key="x")
#Added this test for the example
state = flow.run()
assert state.result[e].result == 10

Need some more examples? Here’s the testing documentation.

They also have integrations to data build tool (dbt), great expectations, and the likes. These libraries allow you to easily test the code that has already been created.

3. Different Infrastructure Models

The code does not leave your environment regardless of whether the infrastructure is run in Prefect Cloud or Prefect Core.

Their documentation also clearly talks about under which circumstances Prefect keeps data and provides a case study for Actium Health.

4. Open Source Community

You will have an open-source, responsive community to ask questions. I personally already looked at their Slack channel for advice and have looked at their Github and articles that cover a multitude of scenarios.

Prefect has over 200 contributors. but if you are missing one, feel free to add code and provide Missing an integration? Feel free to create the integration They also are available for PRs for new features.

5. Free (For the Most Part)

The best part of Prefect is that it allows you to use the service for up to 20,000 free runs every month. That’s right. That means you as a developer or small business could use this service for free.

References

  1. https://github.com/prefecthq/prefect
  2. https://www.prefect.io/pricing/
  3. https://readmedium.com/the-prefect-hybrid-model-1b70c7fd296
  4. https://readmedium.com/prefect-could-be-perfect-a318b9b1ad6e

TL;DR

Prefect is an excellent workflow management and orchestration tool that encourages testing and reliability.

So what are you waiting for? Let’s create great flows with Prefect together.

Monitoring
Data Engineering
Software Development
Testing
Python
Recommended from ReadMedium