*We know each team has their own needs and specifications. That is why we can modify the training outline per need.
Module 1: Core architecture and execution model
- Scheduler, web server, and worker roles and how they coordinate
- DAGs, tasks, and operators as the unit of work
- Executors and backends Local, Celery, Kubernetes selection tradeoffs
- Metadata database and queues as system backbone
Module 2: Installation and configuration
- Environment choices local, containers, managed and self hosted cloud
- Configuring executors and queues for throughput and reliability
- Setting up metadata stores and Airflow connections
- Packaging dependencies and provider management
Module 3: Operating the UI and CLI
- Web UI for DAG views, task graphs, and logs
- Monitoring runs, retries, SLAs, and backfills
- CLI for administration users, variables, pools, and deployments
- Role aware access to views and actions
Module 4: Authoring and managing DAGs
- TaskFlow API for readable Python native pipelines
- Operators, sensors, and hooks patterns for external systems
- Dependencies, schedules, and calendars including catchup rules
- Idempotency and data aware schedules
Module 5: Data and cloud integrations
- Connecting to databases, files, APIs, and message queues
- Building ETL or ELT pipelines with modular tasks
- AWS, GCP, and Azure providers for storage, compute, and serverless
- Parameterization for environments and tenants
Module 6: Monitoring and observability
- Task logs and real time views for health
- Metrics export Prometheus scraping and Grafana dashboards
- Alerts and notifications via email, Slack, and webhooks
- Run history, audit trails, and lineage signals
Module 7: Security foundations
- RBAC models and least privilege access
- Authentication with SSO, OAuth, or LDAP
- Secrets management HashiCorp Vault and cloud secret stores
- Network controls for workers, databases, and external calls
Module 8: Scaling Airflow
- Parallelism, concurrency, pools, and queues without starvation
- CeleryExecutor and KubernetesExecutor selection and tuning
- Deploying on Kubernetes with Helm and best practice values
- Cost and performance guardrails
Module 9: Production best practices
- Version control and CI or CD for DAGs and providers
- Testing strategies unit, integration, and end to end orchestration checks
- Reliable scheduling with SLAs and calendars
- Performance minded DAG design
Module 10: Troubleshooting and optimization
- Debugging failed tasks and stuck DAGs from logs and events
- Optimizing task runtime, retries, and backoff
- Avoiding common pitfalls XCom bloat, overly chatty sensors, giant DAGs
- Safe backfills and reruns
Module 11: Operations and governance
- Change control, approvals, and rollout patterns
- Ownership, on call, and runbooks for incidents
- Data quality gates and contract checks in pipelines
- Capacity planning and upgrade strategy
Module 12: Handover and roadmap
- Standard templates for DAGs, connections, and alerts
- Playbooks for scale out or cloud migration
- Readiness checklist for production promotion
- Ninety day improvement plan and scorecard