Interview Infrastructure as Code (Terraform, Ansible)

What is max_fail_percentage, and how does it protect a rollout?

Infrastructure as Code (Terraform, Ansible) · Advanced level

Answer

max_fail_percentage stops a play when failures exceed an allowed percentage within a batch. It protects rollouts by preventing a bad change from continuing across the fleet after too many hosts fail.

Technical explanation

The threshold applies to hosts in the batch, helping stop widespread damage.

Tune it based on fleet size and service redundancy.

Combine it with any_errors_fatal for stricter orchestration when one failure should stop all.

Prefer idempotent modules over shell so repeated runs are safe and change reporting is meaningful.

Separate reusable role logic from inventory-specific variables so the same automation works across environments.

Run lint, syntax checks, check mode where useful, and staged rollouts before production-wide changes.

Hands-on example

1. Orchestrate a rolling update for: What is max_fail_percentage, and how does it protect a rollout?

2. Playbook skeleton:

- name: Rolling app upgrade

hosts: app

serial: 2

max_fail_percentage: 20

tasks:

- name: Drain host from load balancer

ansible.builtin.command: /usr/local/bin/lbctl drain {{ inventory_hostname }}

delegate_to: localhost

- name: Upgrade app package

ansible.builtin.package:

name: myapp

state: present

notify: Restart app

- meta: flush_handlers

- name: Wait for health

ansible.builtin.uri:

url: http://{{ inventory_hostname }}:8080/health

status_code: 200

retries: 12

delay: 5

register: health

until: health.status == 200

- name: Add host back to load balancer

ansible.builtin.command: /usr/local/bin/lbctl enable {{ inventory_hostname }}

delegate_to: localhost

3. Test against a staging group with serial: 1, then increase batch size after measuring recovery time.

4. Confirm a failed health check stops the rollout before most hosts are touched.

Preparing for an interview?

Check how well your resume matches the role with our free resume checker— match score, ATS check, and the skills you're missing.

More Infrastructure as Code (Terraform, Ansible) interview questions

← All Infrastructure as Code (Terraform, Ansible) questions