Name: Apache Airflow for Data Pipelines: Build and Operate at Scale
Availability: InStock

Apache Airflow for Data Pipelines: Build and Operate at Scale

A practical course for teams that need to design, run, and operate production grade Airflow pipelines. The agenda skips basic intros and focuses on architecture, authoring DAGs, integrations, monitoring, security, scaling, and troubleshooting. Participants leave with repeatable patterns that fit on premises or cloud deployments.

What will you learn?

You will structure Airflow for reliable scheduling and execution, write clear DAGs with the TaskFlow API, and integrate databases, APIs, and cloud services. You will add monitoring, security, and scaling patterns that keep pipelines healthy as usage grows. By the end, you will run production ready workflows with an operating model your team can sustain.

Design Airflow architecture, executors, and environments for your context
Author clean DAGs with operators, sensors, hooks, and robust scheduling
Integrate data sources and cloud platforms with monitoring and alerting
Secure, scale, and troubleshoot pipelines using proven production practices

Requirements:

Comfortable with Python and SQL
Familiarity with containers and a public cloud is helpful
Access to non sensitive example data and endpoints

Course Outline*:

*We customize the course outline and content to your specific needs and relevant use cases.

Module 1: Core architecture and execution model

Scheduler, web server, and worker roles and how they coordinate
DAGs, tasks, and operators as the unit of work
Executors and backends Local, Celery, Kubernetes selection tradeoffs
Metadata database and queues as system backbone

Module 2: Installation and configuration

Environment choices local, containers, managed and self hosted cloud
Configuring executors and queues for throughput and reliability
Setting up metadata stores and Airflow connections
Packaging dependencies and provider management

Module 3: Operating the UI and CLI

Web UI for DAG views, task graphs, and logs
Monitoring runs, retries, SLAs, and backfills
CLI for administration users, variables, pools, and deployments
Role aware access to views and actions

Module 4: Authoring and managing DAGs

TaskFlow API for readable Python native pipelines
Operators, sensors, and hooks patterns for external systems
Dependencies, schedules, and calendars including catchup rules
Idempotency and data aware schedules

Module 5: Data and cloud integrations

Connecting to databases, files, APIs, and message queues
Building ETL or ELT pipelines with modular tasks
AWS, GCP, and Azure providers for storage, compute, and serverless
Parameterization for environments and tenants

Module 6: Monitoring and observability

Task logs and real time views for health
Metrics export Prometheus scraping and Grafana dashboards
Alerts and notifications via email, Slack, and webhooks
Run history, audit trails, and lineage signals

Module 7: Security foundations

RBAC models and least privilege access
Authentication with SSO, OAuth, or LDAP
Secrets management HashiCorp Vault and cloud secret stores
Network controls for workers, databases, and external calls

Module 8: Scaling Airflow

Parallelism, concurrency, pools, and queues without starvation
CeleryExecutor and KubernetesExecutor selection and tuning
Deploying on Kubernetes with Helm and best practice values
Cost and performance guardrails

Module 9: Production best practices

Version control and CI or CD for DAGs and providers
Testing strategies unit, integration, and end to end orchestration checks
Reliable scheduling with SLAs and calendars
Performance minded DAG design

Module 10: Troubleshooting and optimization

Debugging failed tasks and stuck DAGs from logs and events
Optimizing task runtime, retries, and backoff
Avoiding common pitfalls XCom bloat, overly chatty sensors, giant DAGs
Safe backfills and reruns

Module 11: Operations and governance

Change control, approvals, and rollout patterns
Ownership, on call, and runbooks for incidents
Data quality gates and contract checks in pipelines
Capacity planning and upgrade strategy

Module 12: Handover and roadmap

Standard templates for DAGs, connections, and alerts
Playbooks for scale out or cloud migration
Readiness checklist for production promotion
Ninety day improvement plan and scorecard

Hands-on learning with expert instructors at your location for organizations.

4.347€*

Level:

intermediate

Duration:

Hours (days:

)

Training customized to your needs

Immersive hands-on experience in a dedicated setting

*Price can range depending on number of participants, change of outline, location etc.

Get Quote

Master new skills guided by experienced instructors from anywhere.

5.020€*

Level:

intermediate

Duration:

Hours (days:

)

Training customized to your needs

Reduced training costs

*Price can range depending on number of participants, change of outline, location etc.

Get Quote

Apache Airflow for Data Pipelines: Build and Operate at Scale

What will you learn?

Requirements:

Course Outline*:

Get Quote for:

Apache Airflow for Data Pipelines: Build and Operate at Scale

Start your training experience here!