*We know each team has their own needs and specifications. That is why we can modify the training outline per need.
Module 1: Cluster architecture and workload placement
- Namespaces, resource quotas, limit ranges, and requests or limits best practices
- Node pools, taints and tolerations, node affinity and anti affinity
- Pod topology spread, disruption budgets, and graceful rollouts
- Cluster add ons overview CNI, CSI, ingress, metrics, and policy controllers
Module 2: Networking deep dive
- Service types ClusterIP, NodePort, LoadBalancer, headless and when to use them
- CoreDNS, service discovery patterns, and debugging name resolution
- NetworkPolicies from simple allow lists to namespace isolation
- Ingress and gateway controllers routing, TLS, and zero downtime changes
Module 3: Storage and stateful applications
- Volumes, PersistentVolumeClaims, storage classes, and dynamic provisioning
- StatefulSets with stable identities, ordered updates, and scaling notes
- Backup and restore patterns snapshots, PVC migration, and disaster recovery basics
- Performance hints for IOPS, throughput, and cache friendly configs
Module 4: Autoscaling and capacity management
- HPA and VPA roles, signals, and safe target settings
- Cluster autoscaler behavior and node group design
- Cost and performance tradeoffs bin packing and overcommit guidance
- Warm paths for spikes pre scaling, buffer nodes, and PDB alignment
Module 5: Security foundations for clusters and workloads
- RBAC design roles, aggregation, impersonation, and least privilege
- Pod Security Admission profiles and escalation prevention
- Image security minimal base, signing, provenance, and admission checks
- Secrets and config hardening encryption at rest, mounting patterns, rotation
Module 6: Observability and troubleshooting
- Events, describe, logs, exec, and port forward usage with intent
- Metrics and alerting golden signals, kube state metrics, and recording rules
- Tracing for services and jobs sampling choices and context propagation
- Common failure scenarios CrashLoopBackOff, Pending, ImagePullBackOff, and OOMKill
Module 7: Reliability patterns
- Readiness and liveness probes that represent real health
- Timeouts, retries, budgets, and circuit breakers with sidecars or mesh
- Rollout strategies blue green, canary, surge and max unavailable tuning
- Multi cluster basics regional resilience, service discovery, and traffic steering
Module 8: Secure delivery and policy as code
- GitOps workflow repositories, reconciliation, drift detection
- Policy controllers gating unsafe configs with admission policies
- Supply chain visibility SBOMs, vulnerability reports, and attestation
- Runbooks, SLOs, and a ninety day improvement plan