Airflow SLA Setup

Prem Vishnoi(cloudvala)
2 min readFeb 12, 2024

--

Airflow allows you to set Service Level Agreements (SLAs) for tasks within your DAGs, ensuring they succeed within a specified timeframe. Here’s an overview of how to set up SLAs in Airflow:

1. Task-Level SLA:

  • Use the sla parameter when defining a task operator:

Python

from airflow import DAG
from airflow.operators.dummy import DummyOperator
dag = DAG(
"example_dag",
start_date=datetime(2024, 2, 12),
schedule_interval=None,
)
task1 = DummyOperator(task_id="task1", dag=dag, sla=timedelta(hours=1))

Use code with caution. Learn more

This sets an SLA of 1 hour for task1. If it doesn't complete within that time, Airflow sends an alert.

2. SLA Miss Callback:

  • Define a function to be called when an SLA is missed:

Python

def sla_miss_callback(dag_run, task_ids):
# Custom logic to handle SLA misses, e.g., send notifications
print(f"SLA missed for tasks: {task_ids}")
task1.sla_miss_callback = sla_miss_callback

Use code with caution. Learn more

3. Global SLA Configuration:

  • Set a default SLA for all tasks in the [core] section of your airflow.cfg file:
[core]
default_sla = timedelta(hours=1)

4. SLA Considerations:

  • SLAs only apply to scheduled DAG runs, not manually triggered runs.
  • Use realistic SLAs considering task complexity and dependencies.
  • Monitor SLAs and adjust them as needed for optimal workflow performance.

Additional Resources:

from datetime import timedelta, datetime
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.utils.dates import days_ago

# default list of arguments that can be passed to any of
# the tasks
default_args = {
'owner': 'DataJek',
'depends_on_past': False,
'retries': 1,
'retry_delay': timedelta(minutes=5)
}

# create a DAG
dag = DAG(
'Example10',
default_args=default_args,
description='Example DAG 10',
schedule_interval='*/2 * * * *',
catchup=False,
start_date=days_ago(2))

# create task1
task1 = BashOperator(
task_id='task1',
bash_command='sleep 65',
dag=dag,
sla=timedelta(seconds=1))




# Wait for Airflow to startup.

# List all dags and verify you see Example10
af list_dags

# Run Example10 by unpausing it
af unpause Example10

# A DAG run would be scheduled every 2 minutes. Wait for at least one
# run to complete
af list_dag_runs Example10

# Retrieve the list of DAGs missing SLAs and verify the output contains
# Example10. You can also re-run this DAG in the UI in a later lesson.
curl http://localhost:8080/admin/slamiss/ | grep Example10

--

--

Prem Vishnoi(cloudvala)
Prem Vishnoi(cloudvala)

Written by Prem Vishnoi(cloudvala)

Head of Data and ML experienced in designing, implementing, and managing large-scale data infrastructure. Skilled in ETL, data modeling, and cloud computing

No responses yet