Interview › Infrastructure as Code (Terraform, Ansible)
How would you orchestrate a rolling, health-checked upgrade across servers with Ansible?
Infrastructure as Code (Terraform, Ansible) · Advanced level
Answer
For a rolling, health-checked upgrade, I use serial to batch hosts, drain each host from traffic, upgrade packages or deploy artifacts, restart services through handlers, run health checks, re-add the host, and fail fast if health does not recover.
Technical explanation
The workflow should remove a host from service before mutation and add it back only after health passes.
Handlers should restart services only when files or packages changed.
A failed batch should stop the rollout and leave enough capacity serving traffic.
Prefer idempotent modules over shell so repeated runs are safe and change reporting is meaningful.
Separate reusable role logic from inventory-specific variables so the same automation works across environments.
Run lint, syntax checks, check mode where useful, and staged rollouts before production-wide changes.
Hands-on example
1. Orchestrate a rolling update for: How would you orchestrate a rolling, health-checked upgrade across servers with Ansible?
2. Playbook skeleton:
- name: Rolling app upgrade
hosts: app
serial: 2
max_fail_percentage: 20
tasks:
- name: Drain host from load balancer
ansible.builtin.command: /usr/local/bin/lbctl drain {{ inventory_hostname }}
delegate_to: localhost
- name: Upgrade app package
ansible.builtin.package:
name: myapp
state: present
notify: Restart app
- meta: flush_handlers
- name: Wait for health
ansible.builtin.uri:
url: http://{{ inventory_hostname }}:8080/health
status_code: 200
retries: 12
delay: 5
register: health
until: health.status == 200
- name: Add host back to load balancer
ansible.builtin.command: /usr/local/bin/lbctl enable {{ inventory_hostname }}
delegate_to: localhost
3. Test against a staging group with serial: 1, then increase batch size after measuring recovery time.
4. Confirm a failed health check stops the rollout before most hosts are touched.
Check how well your resume matches the role with our free resume checker— match score, ATS check, and the skills you're missing.
More Infrastructure as Code (Terraform, Ansible) interview questions
- What is Infrastructure as Code, and what problems does it solve over click-ops?
- What is the difference between declarative and imperative IaC, and where do Terraform and Ansible fall?
- What is the difference between configuration management and provisioning?
- What is Terraform, and what is the core plan/apply workflow?
- What does terraform init do?
- What is the Terraform state file, and why is it critical?
- Why should state be stored remotely, and what backend would you use on AWS?
- What is state locking, and why does it matter for teams?