Interview Infrastructure as Code (Terraform, Ansible)

How would you orchestrate a rolling, health-checked upgrade across servers with Ansible?

Infrastructure as Code (Terraform, Ansible) · Advanced level

Answer

For a rolling, health-checked upgrade, I use serial to batch hosts, drain each host from traffic, upgrade packages or deploy artifacts, restart services through handlers, run health checks, re-add the host, and fail fast if health does not recover.

Technical explanation

The workflow should remove a host from service before mutation and add it back only after health passes.

Handlers should restart services only when files or packages changed.

A failed batch should stop the rollout and leave enough capacity serving traffic.

Prefer idempotent modules over shell so repeated runs are safe and change reporting is meaningful.

Separate reusable role logic from inventory-specific variables so the same automation works across environments.

Run lint, syntax checks, check mode where useful, and staged rollouts before production-wide changes.

Hands-on example

1. Orchestrate a rolling update for: How would you orchestrate a rolling, health-checked upgrade across servers with Ansible?

2. Playbook skeleton:

- name: Rolling app upgrade

hosts: app

serial: 2

max_fail_percentage: 20

tasks:

- name: Drain host from load balancer

ansible.builtin.command: /usr/local/bin/lbctl drain {{ inventory_hostname }}

delegate_to: localhost

- name: Upgrade app package

ansible.builtin.package:

name: myapp

state: present

notify: Restart app

- meta: flush_handlers

- name: Wait for health

ansible.builtin.uri:

url: http://{{ inventory_hostname }}:8080/health

status_code: 200

retries: 12

delay: 5

register: health

until: health.status == 200

- name: Add host back to load balancer

ansible.builtin.command: /usr/local/bin/lbctl enable {{ inventory_hostname }}

delegate_to: localhost

3. Test against a staging group with serial: 1, then increase batch size after measuring recovery time.

4. Confirm a failed health check stops the rollout before most hosts are touched.

Preparing for an interview?

Check how well your resume matches the role with our free resume checker— match score, ATS check, and the skills you're missing.

More Infrastructure as Code (Terraform, Ansible) interview questions

← All Infrastructure as Code (Terraform, Ansible) questions