Skip to content

Infrastructure as Code with Ansible⚓︎

Overview⚓︎

This article provides a complete guide to implementing Infrastructure as Code (IaC) using Ansible to manage Ubuntu servers from a Synology NAS. The solution enables consistent, version-controlled server provisioning and ongoing management, including automated maintenance, Docker container orchestration, security hardening, and disaster recovery.

Architecture⚓︎

Control Machine⚓︎

  • Platform: Synology NAS
  • Ansible Runtime: Docker container (cytopia/ansible:latest)
  • Project Directory: /volume2/ansible-infrastructure
  • SSH Key Location: /volume2/ansible-infrastructure/ssh-keys

Target Servers⚓︎

  • Operating System: Ubuntu 24.04 LTS
  • Authentication: SSH key-based
  • Privilege Escalation: sudo with password (--ask-become-pass)

Data Protection⚓︎

  • Infrastructure State: Fully managed via Ansible, reproducible from code
  • Application Data: Backed up with custom scripts that preserve all Docker volumes
  • Recovery Strategy: Rebuild with Ansible + restore from backup

Project Structure⚓︎

/volume2/ansible-infrastructure/
├── ansible.cfg                    # Ansible configuration
├── inventory/
│   ├── xenlab-local.yml           # Local server inventory
│   └── portable-vps.yml           # VPS inventory
├── playbooks/
│   ├── discover-server.yml        # System discovery
│   ├── local-infrastructure.yml   # Local server deployment
│   ├── vps-infrastructure.yml     # VPS deployment
│   ├── verify-base-system.yml     # Base system verification
│   ├── verify-docker.yml          # Docker role verification
│   ├── verify-scripts.yml         # Scripts role verification
│   └── verify-cron.yml            # Cron role verification
├── roles/
│   ├── xenlab-local/              # Consolidated local server role
│   │   ├── defaults/main.yml      # Role defaults
│   │   ├── tasks/main.yml         # All tasks
│   │   └── templates/             # Configuration templates
│   └── vps-applications/          # Consolidated VPS role
│       ├── defaults/main.yml      # Role defaults
│       ├── tasks/main.yml         # All tasks
│       └── templates/             # Configuration templates
├── group_vars/
│   ├── local_servers.yml          # Local server group variables
│   └── vps_servers.yml            # VPS server group variables
└── ssh-keys/
    ├── ansible_key                # Private SSH key
    └── ansible_key.pub            # Public SSH key

Step-by-Step Implementation⚓︎

1. Initial Setup⚓︎

Create Project Structure⚓︎

mkdir -p /volume2/ansible-infrastructure/{inventory,playbooks,roles,group_vars,ssh-keys}

Generate SSH Keys⚓︎

ssh-keygen -t ed25519 -f /volume2/ansible-infrastructure/ssh-keys/ansible_key -N ""

Configure Ansible⚓︎

Create /volume2/ansible-infrastructure/ansible.cfg:

[defaults]
host_key_checking = False
inventory = inventory/
private_key_file = ssh-keys/ansible_key
roles_path = roles
stdout_callback = yaml
retry_files_enabled = False
gathering = smart

[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=60s
pipelining = True

Create Inventories⚓︎

Local Server (/volume2/ansible-infrastructure/inventory/xenlab-local.yml):

---
all:
  children:
    local_servers:
      hosts:
        xenlab:
          ansible_host: 100.102.180.92
          ansible_user: dave
          ansible_ssh_private_key_file: ssh-keys/ansible_key

VPS Server (/volume2/ansible-infrastructure/inventory/portable-vps.yml):

---
all:
  children:
    vps_servers:
      hosts:
        portable-vps:
          ansible_host: YOUR_VPS_IP
          ansible_user: unifiadmin
          ansible_ssh_private_key_file: ssh-keys/ansible_key

2. Consolidated Role Architecture⚓︎

Xenlab-Local Role (roles/xenlab-local/)⚓︎

Purpose: Complete local server configuration with comprehensive variable handling

Architecture: - Role defaults (defaults/main.yml): Comprehensive defaults ensure consistent deployments - Group variables (group_vars/local_servers.yml): Override defaults with environment-specific values - Template safety checks: Handle undefined variables gracefully - Task conditionals: Skip optional features when variables unavailable

Features: * Base system hardening and package installation * User and group management with retry logic * Docker environment setup with latest version detection * Custom automation scripts deployment * Scheduled task management * Robust deployment handling for fresh Ubuntu systems

VPS-Applications Role (roles/vps-applications/)⚓︎

Purpose: Complete VPS configuration with comprehensive variable handling

Architecture: - Role defaults (defaults/main.yml): All variables defined with sensible defaults - Group variables (group_vars/vps_servers.yml): Environment-specific overrides - Safety checks: Prevent undefined variable errors - Multi-cloud compatibility: Works across all cloud providers

Features: * Security hardening (UFW firewall, Fail2Ban) * Docker environment optimized for VPS * NAS integration with CIFS mounting * Monitoring and notification system (ntfy.sh, healthcheck.io) * Automated backup scripts (Unifi, Golinks, dotfiles) * Multi-cloud portability

Key Scripts Deployed: * /usr/local/bin/unifi-backup.sh - Weekly Unifi controller backup * /usr/local/bin/backup-golinks.sh - Daily Tailscale Golinks backup * /usr/local/bin/internet-connectivity-check.sh - 5-minute connectivity monitoring * /usr/local/bin/monitor-exit-node.sh - 5-minute Tailscale exit node monitoring * /usr/local/bin/docker-maintenance.sh - Daily Docker updates and cleanup * /usr/local/bin/backup-dotfiles.sh - Weekly dotfiles backup to GitHub

3. Execution Commands⚓︎

Local Infrastructure Deployment⚓︎

Discovery:

docker run --rm -it \
  -v /volume2/ansible-infrastructure:/ansible \
  -w /ansible \
  cytopia/ansible:latest \
  sh -c "apk add --no-cache openssh-client && ansible-playbook -i inventory/xenlab-local.yml playbooks/discover-server.yml --ask-become-pass -v"

Dry Run:

docker run --rm -it \
  -v /volume2/ansible-infrastructure:/ansible \
  -w /ansible \
  cytopia/ansible:latest \
  sh -c "apk add --no-cache openssh-client && ansible-playbook -i inventory/xenlab-local.yml playbooks/local-infrastructure.yml --check --diff --ask-become-pass -v"

Production Deployment:

docker run --rm -it \
  -v /volume2/ansible-infrastructure:/ansible \
  -w /ansible \
  cytopia/ansible:latest \
  sh -c "apk add --no-cache openssh-client && ansible-playbook -i inventory/xenlab-local.yml playbooks/local-infrastructure.yml --ask-become-pass -v"

VPS Infrastructure Deployment⚓︎

Dry Run:

docker run --rm -it \
  -v /volume2/ansible-infrastructure:/ansible \
  -w /ansible \
  cytopia/ansible:latest \
  sh -c "apk add --no-cache openssh-client && ansible-playbook -i inventory/portable-vps.yml playbooks/vps-infrastructure.yml --check --diff --ask-become-pass -v"

Production Deployment:

docker run --rm -it \
  -v /volume2/ansible-infrastructure:/ansible \
  -w /ansible \
  cytopia/ansible:latest \
  sh -c "apk add --no-cache openssh-client && ansible-playbook -i inventory/portable-vps.yml playbooks/vps-infrastructure.yml --ask-become-pass -v"

Variable Management Strategy⚓︎

Layered Variable Architecture⚓︎

The infrastructure uses a role defaults + group_vars override strategy:

  1. Role Defaults (roles/*/defaults/main.yml):
  2. Comprehensive defaults for all variables
  3. Ensures roles are self-contained and functional
  4. Provides sensible fallback values

  5. Group Variables (group_vars/*.yml):

  6. Environment-specific overrides
  7. Clean organization of configuration
  8. Higher precedence than role defaults

  9. Template Safety Checks:

  10. {% if variable is defined %} checks in templates
  11. Graceful handling of undefined variables
  12. Fallback configurations when needed

  13. Task Conditionals:

  14. when: variable is defined on optional tasks
  15. Skip features when variables unavailable
  16. Prevent deployment failures

Example Variable Flow⚓︎

# roles/xenlab-local/defaults/main.yml (always available)
disk_space_threshold: 85
docker_compose_version: "2.39.0"

# group_vars/local_servers.yml (overrides defaults)
disk_space_threshold: 90  # Override default
# docker_compose_version uses default value

Making Updates and Changes⚓︎

When making infrastructure changes, follow this disciplined workflow to maintain consistency and avoid configuration drift:

1. Update Ansible First (Preferred Method)⚓︎

For new packages, scripts, or configuration changes:

# Edit the appropriate group variables
nano /volume2/ansible-infrastructure/group_vars/local_servers.yml
# or
nano /volume2/ansible-infrastructure/group_vars/vps_servers.yml

# Test with dry run
docker run --rm -it \
  -v /volume2/ansible-infrastructure:/ansible \
  -w /ansible \
  cytopia/ansible:latest \
  sh -c "apk add --no-cache openssh-client && ansible-playbook -i inventory/xenlab-local.yml playbooks/local-infrastructure.yml --check --diff --ask-become-pass -v"

# Apply if changes look correct
docker run --rm -it \
  -v /volume2/ansible-infrastructure:/ansible \
  -w /ansible \
  cytopia/ansible:latest \
  sh -c "apk add --no-cache openssh-client && ansible-playbook -i inventory/xenlab-local.yml playbooks/local-infrastructure.yml --ask-become-pass -v"

2. Audit with Discovery⚓︎

Run discovery periodically to identify configuration drift:

# Monthly audit to see what's changed
docker run --rm -it \
  -v /volume2/ansible-infrastructure:/ansible \
  -w /ansible \
  cytopia/ansible:latest \
  sh -c "apk add --no-cache openssh-client && ansible-playbook -i inventory/xenlab-local.yml playbooks/discover-server.yml --ask-become-pass -v"

Review the output for unexpected changes and decide whether to:

  • Add them to Ansible configurations (if they should be permanent)
  • Document them as acceptable exceptions
  • Remove them as unwanted drift

3. Handle Manual Changes⚓︎

When manual changes are needed:

For Testing/Troubleshooting:

  • Make the change manually
  • Document it as temporary
  • Remove it when done, or add it to Ansible if it should be permanent

For Emergency Fixes:

  • Make the immediate fix
  • Update Ansible configurations afterward
  • Run Ansible to ensure consistency across all servers

Configuration Drift Detection⚓︎

Create a simple script to monitor for significant changes:

#!/bin/bash
# /home/dave/scripts/drift-check.sh
CURRENT_PACKAGES=$(dpkg --get-selections | wc -l)
LAST_COUNT=$(cat /tmp/package_count 2>/dev/null || echo 0)

if [ "$CURRENT_PACKAGES" != "$LAST_COUNT" ]; then
    echo "Package count changed: $LAST_COUNT -> $CURRENT_PACKAGES"
    # Optional: Send notification
fi

echo "$CURRENT_PACKAGES" > /tmp/package_count

Documentation Strategy⚓︎

Keep a simple log of manual changes:

# /volume2/ansible-infrastructure/MANUAL_CHANGES.md

## 2025-07-29
- Temporarily installed `strace` for debugging - removed after troubleshooting
- Added `htop` to essential_packages in local_servers.yml

## 2025-07-15  
- Manual Docker cleanup during disk space issue - added to docker cleanup script

Disaster Recovery (DR)⚓︎

Comprehensive Backup Strategy⚓︎

Local Infrastructure: The system includes robust backup scripts that capture:

backup_files=("/home" "/var/spool/cron" "/etc/apt" "/var/lib/docker/volumes")

VPS Infrastructure: Automated backups include:

  • Unifi controller data
  • Tailscale Golinks configuration
  • User dotfiles and configurations
  • Docker volumes and application data

Coverage:

  • /home: All user data, including docker-compose.yml files
  • /var/spool/cron: All scheduled tasks
  • /etc/apt: Package manager configuration
  • /var/lib/docker/volumes: ALL Docker volumes (complete application data)

Recovery Process⚓︎

  1. Infrastructure Rebuild (Ansible): 5-10 minutes
  2. System hardening and packages
  3. System updates (for fresh deployments)
  4. Docker environment setup
  5. User and group management
  6. Automation scripts and scheduling
  7. Logging infrastructure

  8. Data Restoration (Backup): 10-20 minutes

  9. Complete Docker volume restoration
  10. User data and configurations
  11. Application state preservation

  12. Service Startup (Docker Compose): 2-5 minutes

  13. All containers automatically recreated
  14. Networks and volumes reconnected
  15. Applications fully operational

SSH Key Deployment for Recovery⚓︎

Note: ssh-copy-id is not available on Synology NAS. Use this alternative:

cat /volume2/ansible-infrastructure/ssh-keys/ansible_key.pub | ssh dave@NEW_SERVER_IP "mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys && chmod 700 ~/.ssh && chmod 600 ~/.ssh/authorized_keys"

Security Considerations⚓︎

SSH Hardening⚓︎

  • Key-based authentication only
  • Root login disabled
  • Password authentication disabled
  • Custom SSH port (configurable)

VPS-Specific Security⚓︎

  • UFW Firewall: Configured with essential ports only
  • Fail2Ban: SSH brute force protection
  • Network Security: Tailscale mesh networking
  • Secure Mounting: CIFS credentials properly secured

Privilege Escalation⚓︎

  • No passwordless sudo
  • Explicit password prompts for privilege escalation
  • Minimal privilege principle applied

Backup Security⚓︎

  • Automated backup with notification systems
  • Container shutdown during backup for consistency
  • Comprehensive logging and monitoring
  • Encrypted transport for VPS backups

Maintenance & Monitoring⚓︎

Log Locations⚓︎

Local Infrastructure:

  • Custom Scripts: /var/log/custom-scripts/
  • Cron Jobs: /var/log/cron-jobs/

VPS Infrastructure:

  • Monitoring Scripts: Integrated with ntfy.sh notifications
  • Healthcheck Integration: Real-time service verification
  • Backup Logs: Automated logging with retention

Automated Maintenance⚓︎

Local Infrastructure:

  • Disk space monitoring: Every 6 hours with alerts
  • System backups: Weekly with 30-day retention
  • Docker cleanup: Weekly resource optimization

VPS Infrastructure:

  • Internet connectivity: Every 5 minutes
  • Exit node monitoring: Every 5 minutes with healthcheck
  • Docker maintenance: Daily updates and cleanup
  • Automated backups: Unifi (weekly), Golinks (daily), dotfiles (weekly)

Troubleshooting⚓︎

Common Issues & Solutions⚓︎

Variable Loading Issues:

  • Previous Issue: 'docker_daemon_config' is undefined
  • Solution: Role defaults ensure variables are always available

Permission Errors During Deployment:

  • Symptom: usermod: Permission denied or /etc/passwd lock errors
  • Solution: The consolidated roles handle this automatically with retry logic

APT Lock Issues:

  • Symptom: Could not get lock /var/lib/apt/lists/lock
  • Solution: The roles wait for package managers and handle this automatically

Template Errors:

  • Previous Issue: Jinja2 template undefined variable errors
  • Solution: Template safety checks with proper variable existence validation

SSH Connection Issues:

  • Symptom: No such file or directory: b'ssh'
  • Solution: Ensure apk add --no-cache openssh-client is included in Docker commands

VPS-Specific Issues:

  • NAS Mount Failures: Check CIFS credentials and network connectivity
  • Notification Failures: Verify ntfy.sh tokens and connectivity
  • Healthcheck Failures: Confirm healthcheck.io endpoints are accessible

Validation Commands⚓︎

Test SSH connectivity:

ssh -i /volume2/ansible-infrastructure/ssh-keys/ansible_key dave@100.102.180.92
ssh -i /volume2/ansible-infrastructure/ssh-keys/ansible_key unifiadmin@VPS_IP

Ping All Hosts:

ansible -i inventory/xenlab-local.yml all -m ping
ansible -i inventory/portable-vps.yml all -m ping

Test Inventory:

ansible-inventory -i inventory/xenlab-local.yml --list
ansible-inventory -i inventory/portable-vps.yml --list

Customization⚓︎

Modifying Variables⚓︎

Update group variables for environment-specific configuration:

Local Infrastructure (group_vars/local_servers.yml):

disk_space_threshold: 90  # Override default 85%
backup_retention_days: 60  # Override default 30 days

VPS Infrastructure (group_vars/vps_servers.yml):

disk_space_threshold: 90
memory_threshold: 85

Add New Hosts⚓︎

Local Servers (inventory/xenlab-local.yml):

all:
  children:
    local_servers:
      hosts:
        xenlab:
          ansible_host: 100.102.180.92
        xenlab-test:
          ansible_host: 10.1.1.22

VPS Servers (inventory/portable-vps.yml):

all:
  children:
    vps_servers:
      hosts:
        portable-vps:
          ansible_host: YOUR_VPS_IP
        backup-vps:
          ansible_host: BACKUP_VPS_IP

Summary & Benefits⚓︎

Operational Excellence⚓︎

  • Layered Architecture: Role defaults + group_vars ensure reliable deployments
  • Consolidated Roles: Single roles per deployment type for simplicity
  • Symmetric Structure: Consistent patterns across local and VPS deployments
  • Repeatable Deployments: Version-controlled, documented infrastructure
  • Fast Rebuilds: 30–60 mins total recovery time
  • Robust Error Handling: Handles fresh Ubuntu systems automatically

Disaster Recovery⚓︎

  • Complete Automation: Infrastructure + data restoration
  • Multi-Environment Support: Local servers and cloud VPS
  • Comprehensive Backups: All Docker volumes and application data
  • Quick Recovery: Minimal downtime with automated processes

Security & Monitoring⚓︎

  • Security Hardening: SSH, firewall, and intrusion prevention
  • Real-time Monitoring: Connectivity, services, and health checks
  • Automated Notifications: ntfy.sh and healthcheck.io integration
  • Centralized Logging: Comprehensive audit trails

Scalability & Portability⚓︎

  • Multi-Cloud Support: Deploy VPS to any Ubuntu 24.04 provider
  • Consistent Management: Single Ansible configuration for all environments
  • Group-Based Variables: Flexible configuration management
  • Consolidated Roles: Simplified maintenance and updates

Variable Management Excellence⚓︎

  • Self-Contained Roles: Always work regardless of environment
  • Clean Organization: Group variables for environment-specific settings
  • Safety Checks: Template and task conditionals prevent failures
  • Professional Standards: Enterprise-grade variable precedence handling

Final Thoughts⚓︎

This Ansible-based IaC framework delivers a scalable, secure, and maintainable infrastructure management solution with comprehensive variable handling and perfect symmetry between local and cloud deployments. The consolidated role architecture with extensive defaults provides enterprise-grade reliability while maintaining simplicity. With robust error handling, security hardening, comprehensive monitoring, automated backups, and multi-cloud portability, this solution provides production-ready infrastructure management from a Synology NAS control plane.