Tutorial Series: Catchup vs Backfilling in Airflow
Both the Backfilling and Catchup mechanisms are concepts of airflow and really important for managing and ensuring the execution of tasks within DAGs.

Here is an overview of each concept:
Backfilling:
Backfilling refers to the process of executing historical DAG runs that were not executed at their scheduled times. This can happen due to various reasons, such as system downtime, misconfiguration, or deliberate pauses.
Backfilling ensures that all historical data is processed even if the DAG was not active during that period.
Use Cases:
- Initial Setup: When a new DAG is created, you might want to run it on historical data.
- System Downtime: If the system is down or tasks fail to execute, you need to rerun the missed tasks.
- Corrections: If a bug is discovered and fixed, backfilling can be used to rerun the tasks with the corrected logic.
Catchup Mechanism:
Catchup is a feature in Airflow that allows DAGs to run any missed intervals, effectively catching up to the current schedule.
Key Parameters:
- catchup (default True): When set to True, Airflow will run all the backlogged DAG runs that were missed. When set to
False, Airflow will only run the most recent DAG run and skip any past scheduled intervals. - start_date: The date from which the DAG will start running. If the
start_dateis in the past andcatchupisTrue, Airflow will attempt to backfill the missed intervals. - schedule_interval: The frequency at which the DAG should run. This is used to determine the intervals that need to be caught up.
Catchup allows Airflow to automatically run missed intervals, ensuring the DAG’s schedule is up to date.
How It Works:
- If
catchupis enabled, Airflow will sequentially run all the DAG runs from thestart_dateto the current date according to theschedule_interval. - If
catchupis disabled, Airflow will only execute the latest scheduled run and ignore any missed intervals.
from airflow import DAG
from datetime import datetime
default_args = {
'owner': 'airflow',
'start_date': datetime(2023, 6, 1)
}
dag = DAG(
'example_dag',
default_args=default_args,
description='A simple DAG',
schedule_interval="@daily",
catchup=True # Set this to False to disable backfilling/catchup
)Comparison:

catchup=False, Airflow ignores all missed intervals and only runs the DAG from the current date onward.
catchup=True, Airflow will sequentially run all missed intervals from the start_date to the current date.






