Interview › Infrastructure as Code (Terraform, Ansible)
How do you detect and remediate IaC drift continuously rather than only at apply time?
Infrastructure as Code (Terraform, Ansible) · Advanced level
Answer
Continuous drift detection means regularly comparing desired IaC state with live infrastructure outside normal apply windows. I use scheduled plans, Terraform Cloud/HCP drift checks or equivalent pipelines, cloud config tools, policy scanners, and alerts that create tickets or pull requests for remediation.
Technical explanation
Drift checks should be read-only by default and alert rather than auto-remediate risky changes.
Some drift is expected when another controller owns a field; define ownership before alerting.
Track drift MTTR so teams know whether detection actually improves operations.
Keep source manifests or IaC definitions readable enough that reviewers can understand the final desired state.
Use overlays, modules, or roles for reuse, but keep environment-specific differences explicit and reviewable.
Validate generated output in CI before applying it through kubectl, Argo CD, Terraform, or Ansible.
Hands-on example
1. Set up drift handling for: How do you detect and remediate IaC drift continuously rather than only at apply time?
2. Schedule a read-only plan job per workspace:
terraform init
terraform plan -detailed-exitcode -out=drift.tfplan || status=$?
terraform show -json drift.tfplan > drift.json
3. Interpret exit code 0 as no drift, 2 as changes present, and 1 as an error requiring investigation.
4. For state-only synchronization, use refresh-only review:
terraform plan -refresh-only -out=refresh.tfplan
terraform apply refresh.tfplan
5. Open a ticket that classifies drift as revert, codify, ignore because externally owned, or remove from Terraform ownership.
Check how well your resume matches the role with our free resume checker— match score, ATS check, and the skills you're missing.
More Infrastructure as Code (Terraform, Ansible) interview questions
- What is Infrastructure as Code, and what problems does it solve over click-ops?
- What is the difference between declarative and imperative IaC, and where do Terraform and Ansible fall?
- What is the difference between configuration management and provisioning?
- What is Terraform, and what is the core plan/apply workflow?
- What does terraform init do?
- What is the Terraform state file, and why is it critical?
- Why should state be stored remotely, and what backend would you use on AWS?
- What is state locking, and why does it matter for teams?