ansiblerunbook

RUNBOOK

RUNBOOKLast updated: 1/31/2026

Ansible Production Runbook

1. Overview

Ansible is an agentless infrastructure automation platform used for configuration management, application deployment, and infrastructure-as-code. This runbook provides operational procedures for deploying, maintaining, and troubleshooting Ansible in production environments.

Key Capabilities:

  • Infrastructure provisioning and configuration
  • Application deployment and updates
  • Multi-tier orchestration
  • Disaster recovery automation
  • Compliance and security automation

2. Standard Deployment Configuration

2.1 Ansible Control Node Setup

The control node is the machine where Ansible executes playbooks. Recommended specifications:

System Requirements:

  • OS: Linux (Ubuntu 20.04 LTS or CentOS 8+)
  • Python: 3.8+
  • CPU: 2+ cores
  • Memory: 4GB+ RAM
  • Storage: 50GB+ for playbooks, inventories, logs

Installation:

#!/bin/bash
# Install Ansible on Ubuntu/Debian
sudo apt-get update
sudo apt-get install -y python3 python3-pip git

# Install Ansible and essential collections
pip3 install ansible>=2.11
pip3 install ansible-lint  # For linting

# Create directory structure
mkdir -p /opt/ansible/{playbooks,roles,inventories,logs}
cd /opt/ansible

# Clone or initialize repository
git clone <your-ansible-repo> .

Essential Configuration (ansible.cfg):

[defaults]
# Inventory and host settings
inventory = inventories/production/hosts
host_key_checking = False
remote_user = ansible
private_key_file = ~/.ssh/ansible_key

# Execution settings
forks = 10              # Parallel task execution
timeout = 30            # SSH timeout
log_path = /var/log/ansible.log

# Performance optimization
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_cache
fact_caching_timeout = 86400

# Display settings
force_color = True
display_skipped_hosts = False
deprecation_warnings = False

[ssh_connection]
pipelining = True
ssh_args = -o ControlMaster=auto -o ControlPersist=60s

2.2 SSH Key Setup

Ansible requires SSH access to all managed hosts:

# Generate SSH key for Ansible user (if not exists)
ssh-keygen -t ed25519 -f ~/.ssh/ansible_key -N ""

# Set proper permissions
chmod 600 ~/.ssh/ansible_key
chmod 700 ~/.ssh

# Copy public key to all managed hosts
for host in web1.example.com web2.example.com; do
  ssh-copy-id -i ~/.ssh/ansible_key.pub ansible@$host
done

# Verify connectivity
ansible all -i inventories/production/hosts -m ping

2.3 Inventory Configuration

Static Inventory (inventories/production/hosts):

[webservers]
web1.example.com
web2.example.com
web3.example.com

[databases]
db1.example.com db_primary=true
db2.example.com db_primary=false

[caching]
cache1.example.com

[all:vars]
ansible_user=ansible
ansible_ssh_private_key_file=~/.ssh/ansible_key
ansible_python_interpreter=/usr/bin/python3

Dynamic Inventory (AWS example - aws_ec2.yml):

plugin: aws_ec2
regions:
  - us-east-1
  - us-west-2

filters:
  tag:Environment: production

keyed_groups:
  - key: tags.Role
    separator: _
  - key: placement.region
    prefix: aws

2.4 Group Variables

Define variables for host groups:

inventories/production/group_vars/webservers.yml:

---
# Web server configuration
http_port: 80
https_port: 443
app_version: "2.0.0"

# Performance tuning
max_connections: 1000
worker_processes: 4

# Monitoring
enable_monitoring: true
metrics_port: 9090

inventories/production/group_vars/databases.yml:

---
# Database configuration
db_port: 5432
db_backup_enabled: true
db_backup_schedule: "0 2 * * *"  # 2 AM daily

# Replication
db_replication: true
replication_lag_threshold: 10  # seconds

3. Standard Playbook Structure

3.1 Directory Layout

ansible-project/
├── ansible.cfg
├── inventory/
│   ├── production/
│   │   ├── hosts
│   │   ├── group_vars/
│   │   │   ├── all.yml
│   │   │   ├── webservers.yml
│   │   │   └── databases.yml
│   │   └── host_vars/
│   ├── staging/
│   └── development/
├── roles/
│   ├── common/
│   │   ├── tasks/main.yml
│   │   ├── handlers/main.yml
│   │   ├── vars/main.yml
│   │   ├── defaults/main.yml
│   │   └── templates/
│   ├── webserver/
│   ├── database/
│   └── monitoring/
├── playbooks/
│   ├── site.yml
│   ├── deploy.yml
│   ├── upgrade.yml
│   └── maintenance.yml
├── templates/
├── files/
├── group_vars/
└── requirements.yml

3.2 Main Playbook Template (site.yml)

---
- name: Deploy infrastructure
  hosts: all
  gather_facts: yes
  any_errors_fatal: true
  
  vars:
    app_name: myapp
    app_version: "2.0.0"
    environment: production
  
  pre_tasks:
    - name: Validate prerequisites
      assert:
        that:
          - ansible_os_family == 'Debian' or ansible_os_family == 'RedHat'
          - ansible_memtotal_mb >= 2048
          - ansible_processor_vcpus >= 2
        fail_msg: "Host does not meet minimum requirements"
    
    - name: Check connectivity
      wait_for_connection:
        delay: 1
        timeout: 10
    
    - name: Record deployment start
      set_fact:
        deploy_start_time: "{{ ansible_date_time.iso8601 }}"
  
  roles:
    - role: common
      tags: [common, setup]
    
    - role: monitoring
      tags: [monitoring, setup]
    
    - role: application
      tags: [app, deploy]
      vars:
        version: "{{ app_version }}"
  
  post_tasks:
    - name: Health check
      uri:
        url: "http://{{ inventory_hostname }}:8080/health"
        status_code: 200
      register: health_check
      retries: 5
      delay: 10
      when: "'webservers' in group_names"
    
    - name: Verify deployment
      assert:
        that:
          - health_check.status is defined
          - health_check.status == 200
        fail_msg: "Application health check failed"
      when: health_check is defined
    
    - name: Record deployment end
      set_fact:
        deploy_end_time: "{{ ansible_date_time.iso8601 }}"
    
    - name: Send deployment notification
      mail:
        host: smtp.example.com
        port: 587
        to: devops@example.com
        subject: "Deployment completed: {{ app_name }} v{{ app_version }}"
        body: |
          Deployment Summary:
          - Application: {{ app_name }}
          - Version: {{ app_version }}
          - Environment: {{ environment }}
          - Start: {{ deploy_start_time }}
          - End: {{ deploy_end_time }}
          - Status: SUCCESS
      when: deploy_result is succeeded

4. Deployment Procedures

4.1 Pre-Deployment Checklist

Before any deployment, verify:

# 1. Syntax check
ansible-playbook --syntax-check playbooks/site.yml

# 2. Inventory validation
ansible-inventory -i inventories/production/hosts --list

# 3. Connectivity test
ansible all -i inventories/production/hosts -m ping

# 4. Gather facts (optional, can take time)
ansible all -i inventories/production/hosts -m setup

4.2 Dry-Run Deployment

Always test with --check mode first:

# Dry-run on staging
ansible-playbook -i inventories/staging playbooks/site.yml --check --diff

# Test on single host first
ansible-playbook -i inventories/production playbooks/site.yml \
  --check --diff \
  --limit web1.example.com

4.3 Production Deployment

Execute deployment with proper logging:

# Full deployment
export ANSIBLE_LOG_PATH=/var/log/ansible/deployment-$(date +%Y%m%d-%H%M%S).log
ansible-playbook -i inventories/production playbooks/site.yml -v

# Rolling deployment (one host at a time)
ansible-playbook -i inventories/production playbooks/site.yml \
  --limit webservers \
  --serial 1 \
  -v

# Deployment with tags
ansible-playbook -i inventories/production playbooks/site.yml \
  --tags "app,deploy" \
  -v

4.4 Rollback Procedure

If deployment fails:

# Revert to previous version
export ROLLBACK_VERSION="1.9.0"
ansible-playbook -i inventories/production playbooks/rollback.yml \
  -e "app_version=$ROLLBACK_VERSION" \
  -v

# Verify rollback
ansible all -i inventories/production -m shell -a "curl http://localhost:8080/version"

5. Monitoring & Health Checks

5.1 Monitoring Configuration

Add monitoring role to all hosts:

roles:
  - role: monitoring
    vars:
      metrics_port: 9090
      enable_prometheus: true
      enable_logging: true

5.2 Critical Metrics to Monitor

MetricThresholdAction
Ansible playbook execution time> 30 minInvestigate
SSH connection failures> 5%Check network
Task failure rate> 0%Review logs
Playbook success rate< 99%Alert team

5.3 Health Check Commands

# Check last deployment status
ansible-playbook -i inventories/production playbooks/health_check.yml

# Verify all hosts are reachable
ansible all -i inventories/production -m ping

# Check specific service status
ansible webservers -i inventories/production -m systemd -a "name=myapp state=started"

# Review recent deployment logs
tail -100f /var/log/ansible.log

6. Maintenance & Upgrades

6.1 Ansible Version Upgrade

# Check current version
ansible --version

# Backup current installation
pip3 freeze > requirements-backup.txt

# Upgrade Ansible
pip3 install --upgrade ansible

# Verify upgrade
ansible --version

# Test with staging inventory
ansible-playbook -i inventories/staging playbooks/site.yml --syntax-check

6.2 Collection Updates

# Check for collection updates
ansible-galaxy collection list

# Update all collections
ansible-galaxy collection install --upgrade -r requirements.yml

# Update specific collection
ansible-galaxy collection install --upgrade community.general

6.3 Role Updates

Roles should be version-controlled. Update via Git:

# Pull latest changes
git pull origin main

# Verify changes
git diff HEAD~1

# Test updated roles
ansible-playbook -i inventories/staging playbooks/site.yml --check --diff

7. Troubleshooting

7.1 Common Issues & Solutions

IssueSymptomSolution
SSH Connection FailedUNREACHABLE! => {... "msg": "Failed to connect to the host via ssh..."}Verify SSH key permissions: chmod 600 ~/.ssh/ansible_key, check network connectivity, verify host is reachable: ping <host>
Python Not Foundfatal: [host]: FAILED! => {"msg": "python3 not found..."}Install Python: ansible all -i hosts -m raw -a "apt-get install -y python3", or set ansible_python_interpreter in inventory
Privilege Escalation Failedfatal: [host]: FAILED! => {"msg": "sudo: sorry..."}Ensure become_user has sudo access without password, check sudoers file: visudo
Task Timeoutfatal: [host]: FAILED! - TimeoutError...Increase timeout: timeout = 60 in ansible.cfg, check host performance
Variable Not Definedfatal: [host]: FAILED! => {"msg": "The variable 'xyz' is not defined..."}Check variable spelling, verify variable scope (defaults/vars/group_vars), use debug: var=xyz to inspect
Handler Not TriggeredService restart doesn't occur even with notifyEnsure task has changed_when or notify is correctly spelled, handlers run at end of play
Template Syntax Errorfatal: [host]: FAILED! => {"msg": "TypeError - unsupported..."}Validate Jinja2 template syntax, test locally: jinja2 template.j2, use try-except in complex templates
Inventory Parse Error[WARNING] Unable to parse /inventory as an inventory sourceValidate YAML/INI syntax, check file encoding (UTF-8), verify file permissions

7.2 Debugging Techniques

Enable Verbose Output:

# Single verbose (-v): Task names
ansible-playbook playbooks/site.yml -i inventories/production -v

# Double verbose (-vv): Task names and facts
ansible-playbook playbooks/site.yml -i inventories/production -vv

# Triple verbose (-vvv): Extremely verbose with connection debug
ansible-playbook playbooks/site.yml -i inventories/production -vvv

# Maximum verbose (-vvvv): Connection debugging
ansible-playbook playbooks/site.yml -i inventories/production -vvvv

Debug Specific Tasks:

# Use debug module to inspect variables
- name: Debug variables
  debug:
    msg: |
      Variable 1: {{ var1 }}
      Variable 2: {{ var2 }}
    verbosity: 2  # Only shown with -vv

# Use pause to inspect state
- name: Pause for debugging
  pause:
    prompt: "Press ENTER to continue"

# Register output for inspection
- name: Run command
  shell: /opt/app/bin/check-status
  register: status_output

- name: Show output
  debug:
    var: status_output

Step-by-Step Execution:

# Execute tasks one at a time with confirmation
ansible-playbook playbooks/site.yml -i inventories/production --step

# Start from specific task
ansible-playbook playbooks/site.yml -i inventories/production \
  --start-at-task="Task Name"

# Run only specific tags
ansible-playbook playbooks/site.yml -i inventories/production \
  --tags "debug,important"

Review Logs:

# Show Ansible logs
tail -100f /var/log/ansible.log

# Search for errors
grep ERROR /var/log/ansible.log

# Show last deployment
cat /var/log/ansible/deployment-*.log | tail -200

7.3 Essential Commands

# List all hosts in inventory
ansible-inventory -i inventories/production/hosts --list

# Display inventory in graph format
ansible-inventory -i inventories/production/hosts --graph

# Get facts from specific host
ansible <hostname> -i inventories/production -m setup

# Run adhoc command on all webservers
ansible webservers -i inventories/production -m shell -a "systemctl status myapp"

# Copy file to all hosts
ansible all -i inventories/production -m copy -a "src=file.txt dest=/tmp/"

# Execute playbook with extra variables
ansible-playbook playbooks/site.yml -e "app_version=2.0.0 env=prod"

# Check syntax without executing
ansible-playbook playbooks/site.yml --syntax-check

# Show what would change (dry-run)
ansible-playbook playbooks/site.yml --check --diff

# Get execution time summary
ansible-playbook playbooks/site.yml --stats

8. Disaster Recovery

8.1 Backup Strategy

Backup Configuration:

# Daily backup of all playbooks and inventories
0 2 * * * tar -czf /backups/ansible-$(date +\%Y\%m\%d).tar.gz \
  /opt/ansible/{playbooks,roles,inventories,requirements.yml}

# Backup to remote storage (S3)
0 3 * * * aws s3 sync /opt/ansible s3://ansible-backups/ --delete

8.2 Recovery Procedures

Restore from Backup:

# List available backups
ls -la /backups/ansible-*.tar.gz

# Restore to specific date
tar -xzf /backups/ansible-20240131.tar.gz -C /opt/

# Verify restored content
ls -la /opt/ansible/playbooks/

Verify Critical Hosts:

# After recovery, verify connectivity
ansible all -i inventories/production -m ping

# Run health check playbook
ansible-playbook -i inventories/production playbooks/health_check.yml

# Manual verification
ssh ansible@<host> "systemctl status myapp"

9. Security Best Practices

9.1 Secrets Management

# Encrypt sensitive files
ansible-vault encrypt inventories/production/group_vars/databases.yml

# Edit encrypted file
ansible-vault edit inventories/production/group_vars/databases.yml

# View encrypted content
ansible-vault view inventories/production/group_vars/databases.yml

# Run playbook with vault password
ansible-playbook playbooks/site.yml --vault-password-file ~/.vault-pass

# Create encrypted variable file
echo "db_password: supersecret" | ansible-vault encrypt /dev/stdin

9.2 SSH Key Security

# Restrict SSH key permissions
chmod 600 ~/.ssh/ansible_key
chmod 700 ~/.ssh

# Use SSH agent for key management
eval $(ssh-agent)
ssh-add ~/.ssh/ansible_key

# Verify SSH key is loaded
ssh-add -l

9.3 Audit Logging

# Enable Ansible logging in config
log_path = /var/log/ansible.log

# Review deployment logs
grep -i "changed" /var/log/ansible.log

# Archive logs for compliance
tar -czf /archive/ansible-logs-$(date +%Y%m).tar.gz /var/log/ansible.log

10. Performance Tuning

10.1 Optimization Settings

[defaults]
# Parallelization
forks = 20                 # Increase for large inventories

# Fact caching (improves performance significantly)
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_cache
fact_caching_timeout = 86400

# Connection pooling
[ssh_connection]
pipelining = True
control_path = /tmp/ansible-ssh-%%h-%%p-%%r

10.2 Performance Monitoring

# Show execution time per task
ansible-playbook playbooks/site.yml --stats

# Profile slow tasks
ansible-playbook playbooks/site.yml -vvv 2>&1 | grep -E "TASK|elapsed"

# Monitor system during execution
watch -n 1 'ps aux | grep ansible'

Appendix: Reference Commands

# Inventory Operations
ansible-inventory -i hosts --list              # List all hosts
ansible-inventory -i hosts --graph             # Graph view

# Playbook Operations
ansible-playbook playbooks/site.yml            # Run playbook
ansible-playbook playbooks/site.yml --check    # Dry-run
ansible-playbook playbooks/site.yml --diff     # Show changes
ansible-playbook playbooks/site.yml -v         # Verbose output
ansible-playbook playbooks/site.yml --step     # Step through

# Adhoc Commands
ansible all -m ping                            # Ping all hosts
ansible webservers -m setup                    # Get host facts
ansible all -m shell -a "date"                 # Run command
ansible all -m copy -a "src=file dest=/tmp"    # Copy file

# Vault Operations
ansible-vault encrypt file.yml                 # Encrypt file
ansible-vault decrypt file.yml                 # Decrypt file
ansible-vault edit file.yml                    # Edit encrypted file
ansible-vault view file.yml                    # View encrypted file

# Collection Management
ansible-galaxy collection list                 # List collections
ansible-galaxy collection install community.general  # Install
ansible-galaxy collection install -r requirements.yml # From file

Contact & Escalation


Last Updated: January 31, 2026
Maintained By: Infrastructure & Automation Team