Skip to content

Xenlab Rebuild with Ansible⚓︎

🚨 Emergency Server Recovery Guide⚓︎

This guide assumes the Xenlab server has failed and needs to be rebuilt completely.

Prerequisites Checklist⚓︎

Before starting, ensure you have:

  • New Ubuntu 24.04 server deployed
  • Root or sudo access to new server
  • Synology NAS accessible
  • Network connectivity between NAS and new server
  • Access to latest server backup

Phase 1: Ubuntu Server Setup⚓︎

Get New Server IP Address⚓︎

# On the new server, get IP address
ip addr show

Write down the IP: _________________

Create User Account⚓︎

# On new server as root
adduser dave
usermod -aG sudo dave

Deploy SSH Key⚓︎

# ssh-copy-id is not available on Synology NAS, use this one-liner instead:
cat /volume2/ansible-infrastructure/ssh-keys/ansible_key.pub | ssh dave@NEW_SERVER_IP "mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys && chmod 700 ~/.ssh && chmod 600 ~/.ssh/authorized_keys"

Test SSH access:

ssh -i /volume2/ansible-infrastructure/ssh-keys/ansible_key dave@NEW_SERVER_IP

Phase 2: Update Ansible Configuration⚓︎

Update Server IP in Inventory⚓︎

# Edit this file on your NAS
nano /volume2/ansible-infrastructure/inventory/xenlab-local.yml

Change the IP address:

all:
  children:
    local_servers:
      hosts:
        xenlab:
          ansible_host: NEW_SERVER_IP_HERE  # ← Update this line
          ansible_user: dave
          ansible_ssh_private_key_file: ssh-keys/ansible_key

Phase 3: Deploy Infrastructure with Ansible⚓︎

Test Connectivity⚓︎

docker run --rm -it \
  -v /volume2/ansible-infrastructure:/ansible \
  -w /ansible \
  cytopia/ansible:latest \
  sh -c "apk add --no-cache openssh-client && ansible -i inventory/xenlab-local.yml all -m ping"

Expected output: xenlab | SUCCESS => {"ping": "pong"}

Optional: Run Discovery⚓︎

docker run --rm -it \
  -v /volume2/ansible-infrastructure:/ansible \
  -w /ansible \
  cytopia/ansible:latest \
  sh -c "apk add --no-cache openssh-client && ansible-playbook -i inventory/xenlab-local.yml playbooks/discover-server.yml --ask-become-pass -v"

Test Deployment First⚓︎

# Test deployment in check mode first
docker run --rm -it \
  -v /volume2/ansible-infrastructure:/ansible \
  -w /ansible \
  cytopia/ansible:latest \
  sh -c "apk add --no-cache openssh-client && ansible-playbook -i inventory/xenlab-local.yml playbooks/local-infrastructure.yml --check --diff --ask-become-pass -v"

Deploy Complete Infrastructure⚓︎

# Deploy infrastructure
docker run --rm -it \
  -v /volume2/ansible-infrastructure:/ansible \
  -w /ansible \
  cytopia/ansible:latest \
  sh -c "apk add --no-cache openssh-client && ansible-playbook -i inventory/xenlab-local.yml playbooks/local-infrastructure.yml --ask-become-pass -v"

When prompted for BECOME password: Enter the dave sudo password

Note: The deployment uses a consolidated xenlab-local role with comprehensive defaults and bulletproof variable handling. Both local and VPS deployments are tested and working.

What Ansible Builds Automatically⚓︎

✅ Base System Configuration⚓︎

  • SSH hardening (key-only auth, disable root login)
  • Essential packages (curl, wget, git, htop, vim, tree, etc.)
  • System package updates and security hardening
  • User shell configuration
  • Service management
  • Robust deployment handling (apt locks, systemd timeouts, user/group management)

✅ Docker Environment⚓︎

  • Docker Compose v2.39.0+ installation (latest version automatically detected)
  • Docker daemon configuration (logging, storage driver)
  • Docker networks and volumes
  • Docker Compose directory: /home/dave/.docker/compose/

✅ Automation Scripts⚓︎

  • /home/dave/scripts/disk-space-check.sh
  • /home/dave/scripts/backup/system-backup.sh
  • /home/dave/scripts/maintenance/docker-cleanup.sh
  • Log directories: /var/log/custom-scripts/
  • Logrotate configuration

✅ Scheduled Tasks⚓︎

  • Disk space monitoring (every 6 hours)
  • System backups (Sunday 2:30 AM)
  • Docker cleanup (Monday 3:00 AM)
  • Log directory: /var/log/cron-jobs/

Phase 4: Restore Data⚓︎

Extract Backup⚓︎

# SSH to the new server
ssh -i /volume2/ansible-infrastructure/ssh-keys/ansible_key dave@NEW_SERVER_IP

# Navigate to root for full restore
cd /

# Extract latest backup (adjust path and filename as needed)
sudo tar xzf /path/to/backup/xenlab-YYYY-MM-DD.tgz

This restores:

  • All Docker volumes (/var/lib/docker/volumes/)
  • Docker compose configuration (/home/dave/.docker/compose/)
  • All user data (/home/)
  • Cron jobs (/var/spool/cron/)
  • Package configurations (/etc/apt/)

Phase 5: Start Services⚓︎

Start Docker Compose Services⚓︎

# Navigate to Docker Compose directory
cd /home/dave/.docker/compose

# Start all services
docker-compose up -d

Verify Everything is Running⚓︎

# Check Docker containers
docker ps

# Check system services
systemctl status docker
systemctl status cron

# Check cron jobs
crontab -l

# Check custom scripts
ls -la /home/dave/scripts/

Complete Recovery Timeline⚓︎

  1. Ubuntu Install: 15-20 minutes
  2. User Setup: 2-3 minutes
  3. Ansible Deployment: 5-10 minutes
  4. Data Restore: 10-20 minutes (depending on backup size)
  5. Service Startup: 2-5 minutes

Common Issues & Solutions⚓︎

Variable Loading Issues⚓︎

Solution: Roles now have comprehensive defaults that ensure all variables are available, preventing undefined variable errors.

Permission Errors During Deployment⚓︎

Solution: The consolidated xenlab-local role handles this automatically with retry logic.

APT Lock Issues⚓︎

Solution: The consolidated role waits for package managers and handles this automatically.

Template Errors⚓︎

Solution: Templates now include safety checks for undefined variables.

Emergency Commands Reference⚓︎

Quick Server Info⚓︎

# System info
hostnamectl
df -h
free -h
docker --version
docker-compose --version

If Ansible Fails⚓︎

# Test SSH manually
ssh -i /volume2/ansible-infrastructure/ssh-keys/ansible_key dave@NEW_SERVER_IP

# Check SSH key permissions
ls -la /volume2/ansible-infrastructure/ssh-keys/

# Fix SSH key permissions if needed
chmod 600 /volume2/ansible-infrastructure/ssh-keys/ansible_key
chmod 644 /volume2/ansible-infrastructure/ssh-keys/ansible_key.pub

Dry Run Mode (Testing)⚓︎

# Test all changes without applying them
docker run --rm -it \
  -v /volume2/ansible-infrastructure:/ansible \
  -w /ansible \
  cytopia/ansible:latest \
  sh -c "apk add --no-cache openssh-client && ansible-playbook -i inventory/xenlab-local.yml playbooks/local-infrastructure.yml --check --diff --ask-become-pass -v"

Success Verification⚓︎

Server is fully restored when:

  • SSH access works with key authentication
  • Docker and Docker Compose are running
  • All containers from docker ps match original setup
  • All cron jobs are scheduled (crontab -l)
  • Custom scripts exist and are executable
  • Log directories exist with proper permissions
  • All applications are accessible

Backup Integration⚓︎

Existing backup script captures:

  • /home - All user data including docker-compose.yml
  • /var/spool/cron - All cron jobs
  • /etc/apt - Package manager configuration
  • /var/lib/docker/volumes - ALL Docker volumes

This backup + Ansible approach provides:

  • Infrastructure: Rebuilt by Ansible using consolidated xenlab-local role
  • Data: Restored from comprehensive backups
  • Complete Recovery: Both system and application state

Emergency Contacts & Resources⚓︎

Ansible Project Location: /volume2/ansible-infrastructure
SSH Keys Location: /volume2/ansible-infrastructure/ssh-keys
Docker Compose Location: /home/dave/.docker/compose
Backup Location: /mnt/Backup/Server-Backups

Recovery Architecture:

  • Ansible: Infrastructure foundation with bulletproof variable handling
  • Backup Script: Complete data preservation
  • Docker Compose: Application orchestration

Post-Recovery Tasks⚓︎

  1. Update documentation with any changes made during recovery
  2. Test all services to ensure they're working properly
  3. Update monitoring with new server details if needed
  4. Verify backup script is working on restored server
  5. Schedule next DR test for one year from now

Key Advantages of This Approach⚓︎

✅ Complete Automation⚓︎

  • Infrastructure rebuilt from consolidated role with comprehensive defaults
  • Data restored from comprehensive backups
  • Services automatically restarted

✅ Minimal Manual Steps⚓︎

  • Only Ubuntu install and user creation required
  • Everything else is automated

✅ Comprehensive Coverage⚓︎

  • System configuration, Docker environment, automation scripts
  • ALL Docker volumes preserved in backups
  • Scheduled tasks and logging infrastructure

✅ Fast Recovery⚓︎

  • 30-60 minute complete rebuild
  • Tested and documented process
  • Bulletproof variable handling prevents deployment failures

Recovery completed successfully! 🎉