Engineering Zero-Downtime Deployments: The Hemiron Rollback System
A Performance Study of Traefik vs Nginx in Blue-Green Deployment Strategies
In modern web applications, minimizing downtime during deployments is crucial. As part of the Hemiron project at Hogeschool Leiden, I conducted a comprehensive performance comparison between Traefik and Nginx for implementing instant rollback features in blue-green deployment strategies.
This article compares Traefik and Nginx in the context of blue-green deployments, including architecture details, and key insights for implementing reliable rollback systems.
Understanding Blue-Green Deployments
In blue-green deployments, two identical environments (blue and green) exist simultaneously, with only one environment serving production traffic at any time. When a new version is deployed, it's first released to the inactive environment. Once verified, traffic is redirected to this environment, minimizing downtime and risk.
The key innovation was implementing a system where each deployment receives a unique subdomain (e.g., commit-sha.example.com
), while the main domain (example.com
) points to the current production environment. This setup allows for instant rollbacks by simply updating the DNS or proxy configuration to point to a previous, still-running version.
Comparison
To accurately compare Traefik and Nginx, I designed a controlled experiment using:
- VPS Environments: Tests were conducted on both low-spec (1vCPU, 1GB RAM) and high-spec (4vCPU, 8GB RAM) VPS instances
- Docker Containers: Each test environment used containerized applications and reverse proxies
- GitLab CI/CD Pipeline: Automated deployment and rollback processes to ensure consistency
- DNS Wildcard Configuration: Setup that allowed access to each deployment via unique subdomains
- Measurement Scripts: Custom scripts that measured rollback execution time with millisecond precision
The test pipeline included these key steps:
- Prepare environment and export scripts to the server
- Build and deploy frontend and backend Docker containers
- Switch the root domain to point to the new deployment
- Initiate rollback and measure execution time
- Collect and analyze performance data
Architecture Comparison
Traefik Implementation
Traefik is designed specifically for microservices architectures and offers container-aware routing through Docker labels. Its dynamic configuration makes it particularly easy to set up.
# Example docker-compose.yml for Traefik
version: '3'
services:
traefik:
image: traefik:v2.5
command:
- '--providers.docker=true'
- '--providers.docker.exposedByDefault=false'
- '--entrypoints.web.address=:80'
ports:
- '80:80'
- '443:443'
volumes:
- /var/run/docker.sock:/var/run/docker.sock
app:
image: myapp:latest
labels:
- 'traefik.enable=true'
- 'traefik.http.routers.app.rule=Host(`app.example.com`)'
- 'traefik.http.services.app.loadbalancer.server.port=80'
Nginx Implementation
Nginx required a more manual configuration approach but offered greater performance and flexibility:
# Example Nginx configuration for blue-green deployment
upstream blue {
server blue-app:80;
}
upstream green {
server green-app:80;
}
server {
listen 80;
server_name example.com;
# Production traffic goes to current active environment (green or blue)
location / {
proxy_pass http://green;
}
}
# Individual version access
server {
listen 80;
server_name blue.example.com;
location / {
proxy_pass http://blue;
}
}
server {
listen 80;
server_name green.example.com;
location / {
proxy_pass http://green;
}
}
Key Implementation Differences
1. Configuration Approach
The most significant difference between Traefik and Nginx was in their configuration approaches:
Traefik:
- Uses Docker labels for dynamic configuration
- Automatically detects container changes
- No need for manual configuration file updates
- Built-in Let's Encrypt integration for SSL certificates
Nginx:
- Requires separate configuration files
- Configuration updates need explicit reloads
- More precise control over routing logic
- Manual SSL certificate management
2. Rollback Implementation
The rollback process differed significantly between the two solutions:
Traefik Rollback Process:
#!/bin/bash
# Simplified rollback script for Traefik
START_TIME=$(date +%s%N)
# Update root domain to point to previous version
docker-compose -f docker-compose-traefik.yml down -v app-frontend
docker-compose -f docker-compose-traefik.yml up -d app-frontend-previous
END_TIME=$(date +%s%N)
ELAPSED_TIME=$((($END_TIME - $START_TIME)/1000000))
echo "Traefik rollback completed in $ELAPSED_TIME ms"
Nginx Rollback Process:
#!/bin/bash
# Simplified rollback script for Nginx
START_TIME=$(date +%s%N)
# Update nginx configuration to route to previous version
cat > /etc/nginx/conf.d/app.conf << EOF
upstream active {
server app-previous:80;
}
EOF
# Reload Nginx configuration
docker exec nginx-proxy nginx -s reload
END_TIME=$(date +%s%N)
ELAPSED_TIME=$((($END_TIME - $START_TIME)/1000000))
echo "Nginx rollback completed in $ELAPSED_TIME ms"
3. Performance Monitoring
Both implementations included health checks and performance monitoring:
- Each deployment version had a
/health
endpoint - Automated healthchecks triggered rollbacks when needed
- All rollback events were logged with timing information
- Prometheus metrics were collected for long-term analysis
Conclusion
The architectural approach outlined in this article—maintaining multiple deployment versions with unique subdomains and implementing instant rollbacks through proxy configuration updates—provides significant benefits regardless of which reverse proxy solution is chosen:
- Zero-downtime deployments across all releases
- Recovery times measured in milliseconds rather than minutes
- Improved deployment confidence leading to more frequent releases
- Enhanced user experience through elimination of service interruptions
If you're interested in learning more about this or have questions about implementing similar architectures, feel free to reach out through the contact details on my About page.