Airflow, there’s a new competitor in town.
Make your data pipeline orchestration future-proof with Prefect.
I just started a new position at a great company and we are re-evaluating our options. To provide some context, we are in Azure and are using Azure Data Factory. We are not too happy with this setup as we do not have extensive historical data and the monitoring has been showing up as faulty e.g. pipeline is successful when it actually failed. Our options? Airflow or Prefect.
Now, Airflow also sucks. As someone who has used it before, setting up the airflow server is not an easy task and DAGs have to sync with the server before you can test whether it works. The DAGs are not set up in a pythonic manner and do not allow for easy transfer of data between tasks unless you set up XCOM. And lastly, our code gets pushed along to Airflow.
That left us well.. with Prefect. and oh gosh has Prefect changed my mind about workflow management.
Let’s go over the four top reasons to choose Prefect
- Simplicity
- Testing capabilities
- Their diverse infrastructure models
- Open Source Community
- Free (For the Most Part)
- Simplicity
The tasks and flows are set up similarly to how you would set it up in Python. You can add decorators too. I took this example from their GitHub page to show its simplicity of it.
from prefect import task, Flow, Parameter
@task(log_stdout=True)
def say_hello(name):
print("Hello, {}!".format(name))
with Flow("My First Flow") as flow:
name = Parameter('name')
say_hello(name)
flow.run(name='world') # "Hello, world!"
flow.run(name='Marvin') # "Hello, Marvin!"2. Testing Capabilities
They definitely believe in tests. Their platform has a multitude of tests for their own platform whether you are wondering about what tests they run on their flows or tasks (see here for more details).
But that is not all. You can easily add your own tests to tasks and flows.
from prefect import Task, Flow
class Extract(Task):
def run(self):
return 10
class Load(Task):
def run(self, x):
print(x)
flow = Flow("testing-example")
e = Extract()
l = Load()
flow.add_edge(upstream_task=e, downstream_task=l, key="x")#Added this test for the examplestate = flow.run()
assert state.result[e].result == 10Need some more examples? Here’s the testing documentation.
They also have integrations to data build tool (dbt), great expectations, and the likes. These libraries allow you to easily test the code that has already been created.
3. Different Infrastructure Models
The code does not leave your environment regardless of whether the infrastructure is run in Prefect Cloud or Prefect Core.
Their documentation also clearly talks about under which circumstances Prefect keeps data and provides a case study for Actium Health.
4. Open Source Community
You will have an open-source, responsive community to ask questions. I personally already looked at their Slack channel for advice and have looked at their Github and articles that cover a multitude of scenarios.
Prefect has over 200 contributors. but if you are missing one, feel free to add code and provide Missing an integration? Feel free to create the integration They also are available for PRs for new features.
5. Free (For the Most Part)
The best part of Prefect is that it allows you to use the service for up to 20,000 free runs every month. That’s right. That means you as a developer or small business could use this service for free.
References
- https://github.com/prefecthq/prefect
- https://www.prefect.io/pricing/
- https://readmedium.com/the-prefect-hybrid-model-1b70c7fd296
- https://readmedium.com/prefect-could-be-perfect-a318b9b1ad6e
TL;DR
Prefect is an excellent workflow management and orchestration tool that encourages testing and reliability.
So what are you waiting for? Let’s create great flows with Prefect together.






