How To Fix Task received SIGTERM signal In Airflow
Fixing the SIGTERM signal in Apache Airflow tasks

Introduction
While I have been recently working on migrating DAGs from Airflow 1 (v1.10.15) to Airflow 2 (v2.2.5) I’ve spent a lot of time trying to figure out one error that I was getting for some of the DAGs that wasn’t informative at all.
WARNING airflow.exceptions.AirflowException: Task received SIGTERM signal
INFO - Marking task as FAILED.Even though I have spent some time trying out possible solutions that I’ve found online, none of them seemed to have worked for me.
In today’s article I will go through a few potential solutions to the SIGTERM signal that is sent to tasks, causing Airflow DAGs to fail. Depending on your configuration and your specific use-case a different solution may work for you so make sure to carefully go through each propose solution and try it out.
DAG run timeout
One of the reasons why your task is receiving a SIGTERM signal is due to a short dagrun_timeout value. The DAG class takes this argument that is used to specify how long a DagRun should be up before timing out / failing, so that new DagRuns can be created. Note that the timeout is only enforced for scheduled DagRuns.
For DAGs containing many long-running tasks there’s a chance that dagrun_timeout is exceeded and the the actively running tasks will therefore receive a SIGTERM signal so that the DAG can then fail and a new DagRun gets executed.
You can check the duration of a DagRun on Airflow UI and if you observe that this is greater than the dagrun_timeout value specified when creating an instance of a DAG, you can then increase it to a reasonable amount of time depending on your specific use case.
Note that this configuration is applicable to the DAG so you need to come up with a value that will allow enough time for all the tasks included in your DAG to run.
from datetime import datetime, timedeltafrom airflow.models.dag import DAG
with DAG(
'my_dag',
start_date=datetime(2016, 1, 1),
schedule_interval='0 * * * *',
dagrun_timeout=timedelta(minutes=60),
) as dag:
...Running out of memory
Another possibility is that the machine that is currently running an Airflow Task runs out of memory. Depending on how you deployed Airflow you may need to inspect the memory usage of the workers and make sure that they do have sufficient memory.
For instance, if your deployment is on the Cloud you may have to check whether any of the Kubernetes pods was evicted. Pods are usually evicted due to resource-starved nodes and therefore this may be the reason why your Airflow task is receiving a SIGTERM signal.
Metadata Database draining the CPU
Another commonly reported issue that may be causing Airflow Tasks to receive SIGTERM signals is the CPU usage on the metadata database.
By default, Airflow uses SQLite, which is intended for development purposes only but it was designed to support database backend for PostgreSQL, MySQL, or MSSQL.
There’s a chance that the CPU usage on the database is at 100% and this may be the reason why your Airflow tasks are receiving a SIGTERM signal. If this is the case, then you should consider increasing the value of job_heartbeat_sec configuration (or AIRFLOW__SCHEDULER__JOB_HEARTBEAT_SEC environment variable) that by default is set to 5 seconds.
job_heartbeat_sec
Task instances listen for external kill signal (when you clear tasks from the CLI or the UI), this defines the frequency at which they should listen (in seconds).
In the Airflow configuration file airflow.cfg make sure to specify this configuration under the scheduler section as illustrated below.
[scheduler]
job_heartbeat_sec = 20Alternatively, you can modify the value of this configuration through the corresponding environment variable:
export AIRFLOW__SCHEDULER__JOB_HEARTBEAT_SEC=20If the CPU consumption on the Database level was an issue, then the increase to the above configuration should now significantly reduce CPU usage.
Disable “Mini Scheduler”
By default, the task supervisor process attempts to schedule more tasks of the same Airflow DAG in order to improve the performance and eventually help DAG to get executed in less amount of time.
This behaviour is configured through the schedule_after_task_execution that defaults to True.
schedule_after_task_execution
Should the Task supervisor process perform a “mini scheduler” to attempt to schedule more tasks of the same DAG. Leaving this on will mean tasks in the same DAG execute quicker, but might starve out other dags in some circumstances.
Due to a bug in Airflow, the chances of tasks being killed by the LocalTaskJob heartbeat were pretty high. Therefore, one possible solution is to simply disable the mini scheduler.
In your Airflow configuration file airflow.cfg, you need to set schedule_after_task_execution to False.
[scheduler]
schedule_after_task_execution = FalseAlternatively, this configuration can be overwritten through the AIRFLOW__SCHEDULER__SCHEDULE_AFTER_TASK_EXECUTION environment variable:
export AIRFLOW__SCHEDULER__SCHEDULE_AFTER_TASK_EXECUTION=FalseIf this was the problem in your case, then you may also want to consider upgrading Airflow into a version in which this bug was fixed.
Final Thoughts
In today’s tutorial we discussed about the meaning of SIGTERM signal that can be occasionally sent to Airflow tasks, causing DAGs to fail. We discussed about a few potential reasons why this may be happening and showcased how to overcome this problem depending on your specific use case.
Note that there’s also a chance that your configuration suffers to more than a single problem discussed in this tutorial and thus you may have to apply a combination of solutions we discussed today in order to get rid of SIGTERM signal.
Become a member and read every story on Medium. Your membership fee directly supports me and other writers you read. You’ll also get full access to every story on Medium.
Related articles you may also like





