Catalogue
/
Data Management
/
Apache Airflow for Data Pipelines: Build and Operate at Scale

Apache Airflow for Data Pipelines: Build and Operate at Scale

A practical course for teams that need to design, run, and operate production grade Airflow pipelines. The agenda skips basic intros and focuses on architecture, authoring DAGs, integrations, monitoring, security, scaling, and troubleshooting. Participants leave with repeatable patterns that fit on premises or cloud deployments.

What will you learn?

You will structure Airflow for reliable scheduling and execution, write clear DAGs with the TaskFlow API, and integrate databases, APIs, and cloud services. You will add monitoring, security, and scaling patterns that keep pipelines healthy as usage grows. By the end, you will run production ready workflows with an operating model your team can sustain.

  • Design Airflow architecture, executors, and environments for your context
  • Author clean DAGs with operators, sensors, hooks, and robust scheduling
  • Integrate data sources and cloud platforms with monitoring and alerting
  • Secure, scale, and troubleshoot pipelines using proven production practices

Requirements:

  • Comfortable with Python and SQL
  • Familiarity with containers and a public cloud is helpful
  • Access to non sensitive example data and endpoints

Course Outline*:

*We know each team has their own needs and specifications. That is why we can modify the training outline per need.

Module 1: Core architecture and execution model

  • Scheduler, web server, and worker roles and how they coordinate
  • DAGs, tasks, and operators as the unit of work
  • Executors and backends Local, Celery, Kubernetes selection tradeoffs
  • Metadata database and queues as system backbone

Module 2: Installation and configuration

  • Environment choices local, containers, managed and self hosted cloud
  • Configuring executors and queues for throughput and reliability
  • Setting up metadata stores and Airflow connections
  • Packaging dependencies and provider management

Module 3: Operating the UI and CLI

  • Web UI for DAG views, task graphs, and logs
  • Monitoring runs, retries, SLAs, and backfills
  • CLI for administration users, variables, pools, and deployments
  • Role aware access to views and actions

Module 4: Authoring and managing DAGs

  • TaskFlow API for readable Python native pipelines
  • Operators, sensors, and hooks patterns for external systems
  • Dependencies, schedules, and calendars including catchup rules
  • Idempotency and data aware schedules

Module 5: Data and cloud integrations

  • Connecting to databases, files, APIs, and message queues
  • Building ETL or ELT pipelines with modular tasks
  • AWS, GCP, and Azure providers for storage, compute, and serverless
  • Parameterization for environments and tenants

Module 6: Monitoring and observability

  • Task logs and real time views for health
  • Metrics export Prometheus scraping and Grafana dashboards
  • Alerts and notifications via email, Slack, and webhooks
  • Run history, audit trails, and lineage signals

Module 7: Security foundations

  • RBAC models and least privilege access
  • Authentication with SSO, OAuth, or LDAP
  • Secrets management HashiCorp Vault and cloud secret stores
  • Network controls for workers, databases, and external calls

Module 8: Scaling Airflow

  • Parallelism, concurrency, pools, and queues without starvation
  • CeleryExecutor and KubernetesExecutor selection and tuning
  • Deploying on Kubernetes with Helm and best practice values
  • Cost and performance guardrails

Module 9: Production best practices

  • Version control and CI or CD for DAGs and providers
  • Testing strategies unit, integration, and end to end orchestration checks
  • Reliable scheduling with SLAs and calendars
  • Performance minded DAG design

Module 10: Troubleshooting and optimization

  • Debugging failed tasks and stuck DAGs from logs and events
  • Optimizing task runtime, retries, and backoff
  • Avoiding common pitfalls XCom bloat, overly chatty sensors, giant DAGs
  • Safe backfills and reruns

Module 11: Operations and governance

  • Change control, approvals, and rollout patterns
  • Ownership, on call, and runbooks for incidents
  • Data quality gates and contract checks in pipelines
  • Capacity planning and upgrade strategy

Module 12: Handover and roadmap

  • Standard templates for DAGs, connections, and alerts
  • Playbooks for scale out or cloud migration
  • Readiness checklist for production promotion
  • Ninety day improvement plan and scorecard

Hands-on learning with expert instructors at your location for organizations.

4.347€*
Graph Icon - Education X Webflow Template
Level:
intermediate
Clock Icon - Education X Webflow Template
Duration:
21
Hours (days:
3
)
Camera Icon - Education X Webflow Template
Training customized to your needs
Star Icon - Education X Webflow Template
Immersive hands-on experience in a dedicated setting
*Price can range depending on number of participants, change of outline, location etc.

Master new skills guided by experienced instructors from anywhere.

5.020€*
Graph Icon - Education X Webflow Template
Level:
intermediate
Clock Icon - Education X Webflow Template
Duration:
21
Hours (days:
3
)
Camera Icon - Education X Webflow Template
Training customized to your needs
Star Icon - Education X Webflow Template
Reduced training costs
*Price can range depending on number of participants, change of outline, location etc.