CONCEPT
CONCEPTLast updated: 1/31/2026
CI/CD: Continuous Integration & Continuous Deployment - Comprehensive Technical Reference
Enterprise-grade CI/CD architecture, tooling, and best practices for modern software delivery.
1. CI/CD Fundamentals & Evolution
What is CI/CD?
Continuous Integration (CI):
- Automatically build and test code changes on every commit
- Detect integration errors early (minutes, not days)
- Maintain code quality and consistency
- Reduce manual testing overhead
Continuous Deployment (CD):
- Automatically deploy validated changes to production
- Enable rapid, frequent releases (multiple per day)
- Reduce deployment risk through automation
- Maintain consistent infrastructure
CI/CD Pipeline:
Commit → Build → Test → Security Scan → Deploy → Monitor
↓ ↓ ↓ ↓ ↓ ↓
Git Compile Unit SAST/DAST Staging Metrics
Integration Prod
Evolution Timeline
2000s: Manual deployments
- FTP uploads
- Manual testing
- 1-2 releases per year
- High risk, high effort
2010s: Continuous Integration
- Jenkins, Travis CI emerge
- Automated testing
- Version control integration
- 1-2 releases per month
2015+: Full CI/CD & DevOps
- Containerization (Docker)
- Kubernetes orchestration
- GitOps principles
- 10-100 releases per day
- Infrastructure as Code
2. CI/CD Platform Comparison
Enterprise Platforms
| Feature | Jenkins | GitLab CI | GitHub Actions | Azure Pipelines | GCP Cloud Build |
|---|---|---|---|---|---|
| Ease of Setup | Moderate | Easy | Very Easy | Easy | Easy |
| Hosting | Self-hosted | Cloud/Self | Cloud | Cloud | Cloud |
| Cost | Free | $12-99/user/mo | Free tier | Free tier | Pay-per-build |
| Kubernetes | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| Container Registry | Plugin | Native | Native | Native | Native |
| Infrastructure as Code | Groovy | YAML | YAML | YAML | YAML |
| Scalability | Excellent | Excellent | Good | Excellent | Excellent |
| Enterprise Security | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| Community | Very Large | Large | Very Large | Large | Medium |
Platform Selection Matrix
Choose Jenkins if:
- On-premises required
- Full customization needed
- Complex legacy systems
- High compliance requirements
Choose GitLab CI if:
- All-in-one DevOps platform needed
- Self-hosted or cloud option desired
- GitOps workflow preferred
- Container-native architecture
Choose GitHub Actions if:
- GitHub already primary repository
- Minimal setup desired
- Open source projects
- Cost-conscious, free tier sufficient
Choose Azure Pipelines if:
- Microsoft stack (Azure, Teams, Office 365)
- Enterprise Windows/C# development
- MSDN subscription available
- Integrated with Azure infrastructure
Choose GCP Cloud Build if:
- Google Cloud Platform primary
- Kubernetes (GKE) primary platform
- Multi-cloud build needed
- Container registry (GCR/Artifact Registry)
3. CI/CD Pipeline Architecture
Stage Breakdown
Stage 1: Commit/Trigger
Git Commit → Webhook → Pipeline Triggered
└─ Branch strategy (main, develop, feature)
└─ PR approval gates
└─ Code review integration
Stage 2: Build
Checkout Code → Compile → Package → Artifact Storage
└─ Language: Java, Python, Go, Node.js, C#, etc.
└─ Dependency resolution
└─ Version tagging
└─ Artifact: JAR, Docker image, ZIP, etc.
Stage 3: Test
Unit Tests → Integration Tests → E2E Tests
├─ Minimum 80% code coverage
├─ Performance benchmarks
└─ Test data management
Stage 4: Security Scan
SAST → DAST → Dependency Scan → Container Scan
├─ Static analysis (SonarQube, Checkmarx)
├─ Dynamic analysis (OWASP ZAP)
├─ CVE vulnerability detection
└─ Bill of Materials (SBOM)
Stage 5: Artifact Registry
Docker Image → Registry Push
├─ Tagging: v1.2.3, latest, stable
├─ Registry: Docker Hub, ECR, GCR, ACR
└─ Immutable image reference
Stage 6: Deploy Staging
Deploy to QA → Smoke Tests → Approval Gate
├─ Infrastructure provisioning
├─ Configuration management
└─ Manual sign-off (optional)
Stage 7: Deploy Production
Production Deployment → Health Checks → Monitoring
├─ Blue-green / Canary / Rolling
├─ Automated rollback
└─ Production monitoring
Stage 8: Monitor
Logs → Metrics → Alerts → Incidents
├─ Application metrics (response time, error rate)
├─ Infrastructure metrics (CPU, memory, disk)
├─ Business metrics (user signups, conversion)
└─ Alert escalation
4. Containerization with Docker
Docker Build Optimization
# Multi-stage build (optimized)
FROM golang:1.21 AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o app
# Final stage (small image)
FROM alpine:3.18
RUN apk add --no-cache ca-certificates
COPY --from=builder /app/app .
EXPOSE 8080
CMD ["./app"]
# Image size: 15MB (vs 1.2GB with full Go SDK)
Docker Registry Strategies
Image Naming:
registry.example.com/namespace/service:tag
├─ registry: Docker Hub, ECR, GCR, ACR
├─ namespace: company, team, project
├─ service: app name
└─ tag: v1.2.3, latest, stable, 2024-01-15
Tagging Strategy:
- Semantic: v1.2.3 (breaking.feature.patch)
- Timestamp: 2024-01-15-14-30-45
- SHA: git-abc1234567890def
- Branch: main, develop, feature-x
- Quality: latest, stable, canary
Registry Security:
- Image scanning on push (CVE detection)
- Retention policies (delete old images)
- Access control (IAM, service accounts)
- Signed images (Docker Content Trust)
5. Deployment Strategies
Blue-Green Deployment
Blue (Current) → Green (New) → Switch Traffic
Version 1.0 Version 2.0 to Green
Benefits:
✓ Zero-downtime deployments
✓ Easy rollback (switch back to Blue)
✓ Test in production-like environment
✓ User acceptance testing on Green
Challenges:
✗ Requires 2x infrastructure
✗ Database migration complexity
✗ Cache invalidation
When to use:
- Large systems with high availability
- Frequent deployments (multiple/day)
- Database schema changes infrequent
Canary Deployment
Old (95%) + New (5%) → Monitor → Gradual Shift
Version 1.0 Version 2.0 Metrics 5% → 25% → 50% → 100%
Benefits:
✓ Gradual rollout reduces risk
✓ Real user feedback early
✓ Automatic rollback if errors detected
✓ Minimal infrastructure overhead
Challenges:
✗ Complex monitoring setup
✗ Session affinity needed
✗ Stateful services difficult
When to use:
- Risk-averse deployments
- Gradual feature rollouts
- A/B testing scenarios
Rolling Deployment
Instance 1 → Drain → Update → Bring Up
Instance 2 → Drain → Update → Bring Up
Instance 3 → Drain → Update → Bring Up
Benefits:
✓ No downtime
✓ Gradual resource recovery
✓ Load balancer handles traffic
Challenges:
✗ Multiple versions running
✗ Data migration complex
✗ Debugging harder
When to use:
- Kubernetes deployments
- Stateless services
- Frequent updates
6. GitOps & Infrastructure as Code
GitOps Principles
1. Git is Source of Truth
- All configuration in Git repo
- Git history = audit trail
- Rollback = git revert
2. Declarative Description
- YAML defines desired state
- System reconciles to match
- No imperative scripts
3. Continuous Synchronization
- Pull model (not push)
- Watch for drift
- Auto-correct or alert
4. Git Workflows
- PR approval before deploy
- Code review + CI/CD gates
- Traceability for compliance
Infrastructure as Code Tools
| Tool | Language | Best For | Learning Curve |
|---|---|---|---|
| Terraform | HCL | Multi-cloud (AWS, Azure, GCP) | Moderate |
| Ansible | YAML | Configuration management | Easy |
| CloudFormation | JSON/YAML | AWS-specific | Easy (AWS only) |
| ARM Templates | JSON | Azure-specific | Moderate |
| Helm | YAML | Kubernetes packages | Moderate |
| Kustomize | YAML | Kubernetes overlays | Easy |
7. Security in CI/CD
SAST (Static Application Security Testing)
Before code runs, scan for vulnerabilities:
Tools:
- SonarQube (universal)
- Checkmarx (comprehensive)
- GitHub CodeQL (GitHub-native)
- Snyk (dependencies)
- Fortify (enterprise)
Checks:
✓ SQL injection patterns
✓ XSS vulnerabilities
✓ Buffer overflow risks
✓ Hardcoded secrets
✓ Insecure crypto
✓ Code quality metrics
DAST (Dynamic Application Security Testing)
Test running application:
Tools:
- OWASP ZAP (free)
- Burp Suite (commercial)
- Rapid7 InsightAppSec
- Qualys ASPM
Checks:
✓ Authentication bypass
✓ Injection attacks
✓ Broken access control
✓ API security
✓ Session management
✓ Encryption validation
Secrets Management
WRONG (Do NOT):
- Hardcoded in code
- Committed to Git
- Stored in config files
- Visible in logs
RIGHT:
- Vault (HashiCorp)
- AWS Secrets Manager
- Azure Key Vault
- Google Secret Manager
- Environment variables (at runtime)
CI/CD Integration:
1. Pipeline needs secret
2. Request from secrets manager
3. Secret injected at runtime
4. Secret NOT logged/stored
5. Automatic rotation
Supply Chain Security
Container Image Security:
├─ Build provenance tracking
├─ Signed images (Cosign)
├─ SBOM (Software Bill of Materials)
├─ Vulnerability scanning
├─ Registry access control
└─ Policy enforcement
Dependency Security:
├─ Lock files (go.sum, package-lock.json)
├─ Version pinning
├─ Automated updates (Dependabot)
├─ License scanning
└─ CVE monitoring
8. Observability & Monitoring
Four Golden Signals
1. Latency
- Request response time
- P50, P95, P99 percentiles
- SLA: < 200ms for 99th percentile
2. Traffic
- Requests per second
- Concurrent users
- Data throughput
3. Errors
- Error rate (4xx, 5xx)
- Exception types
- Error budget tracking
4. Saturation
- CPU utilization
- Memory usage
- Disk I/O
- Network bandwidth
Monitoring Stack
Application Code
↓
(Prometheus/Datadog/New Relic)
↓
Metrics Store
↓
Visualization (Grafana/Kibana)
↓
Alerting (PagerDuty/Opsgenie)
↓
Incident Response
Key Dashboards
Deployment Dashboard:
- Deployment frequency
- Lead time for changes
- Mean time to recovery (MTTR)
- Change failure rate
Application Dashboard:
- Requests per second
- Error rate
- P99 latency
- Top slowest endpoints
Infrastructure Dashboard:
- CPU utilization (all instances)
- Memory usage
- Disk I/O
- Network throughput
9. Cost Optimization in CI/CD
Build Optimization
Caching Strategy:
├─ Dependency cache (faster builds)
├─ Container layer cache (faster images)
├─ Build artifact cache
└─ Expected savings: 70-80% build time
Parallel Execution:
├─ Run independent jobs in parallel
├─ Fan-out/fan-in patterns
├─ Expected speedup: 4-8x with 8 parallel jobs
Build Resource Sizing:
├─ Use smaller instances for lightweight builds
├─ Use spot instances for non-production
├─ Scale down when idle
└─ Expected savings: 30-50% on build infrastructure
Docker Image Optimization:
├─ Multi-stage builds (reduce image size)
├─ Alpine base images (5MB vs 200MB)
├─ Remove build tools from final image
└─ Expected savings: 90% smaller images
Pipeline Optimization
Remove Redundant Steps:
├─ Skip tests for docs-only changes
├─ Skip deploy for failed builds
├─ Fail fast (stop early on first failure)
Workflow Optimization:
├─ Merge fast paths (unit tests on demand)
├─ Run expensive tests only on main branch
├─ Run E2E tests only before production
Expected Results:
- 50% faster feedback to developers
- 70% reduction in wasted compute
- $500K-2M annual savings (enterprise scale)
10. Enterprise CI/CD Patterns
Multi-Environment Strategy
Dev → Staging → Production
├─ Dev: Personal development, minimal checks
├─ Staging: Full testing, security scanning
└─ Production: Manual approval, zero downtime
Configuration Management:
├─ Secrets: Dev, Staging, Prod (separate)
├─ Feature flags: Enable/disable in runtime
├─ Infrastructure: IaC with environment overlays
└─ Monitoring: Different alert thresholds per env
Microservices CI/CD
Service A (Build → Test → Deploy) → Registry
Service B (Build → Test → Deploy) → Registry
Service C (Build → Test → Deploy) → Registry
↓
Orchestrator (Kubernetes)
↓
Multi-service deployment
Challenges:
✗ Service dependencies
✗ Database migrations
✗ Distributed tracing
✗ API versioning
Solutions:
✓ Contract testing (Consumer-Driven)
✓ Feature flags for compatibility
✓ Backward compatibility requirements
✓ Service mesh (Istio) for traffic management
High-Frequency Release Cycles
Traditional (Quarterly):
Jan → Apr → Jul → Oct (4 releases/year)
Agile (Sprint-based):
Every 2 weeks (26 releases/year)
Continuous Deployment:
Multiple times per day (100+ releases/year)
Requirements:
✓ Automated testing (80%+ coverage)
✓ Feature flags (control rollout)
✓ Monitoring (detect issues instantly)
✓ Rollback automation (revert in seconds)
✓ Small, focused changes (easier to debug)
11. Anti-Patterns to Avoid
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Manual Deployments | Slow, error-prone | Automation first |
| Untested Code in Prod | Frequent outages | Mandatory automated tests |
| Shared Deployment Credentials | Security risk | Service accounts + IAM |
| Monolithic Pipeline | Bottleneck, slow feedback | Parallel execution, modular |
| No Rollback Plan | Long MTTR | Automated rollback, blue-green |
| Secrets in Code | Data breach risk | Secrets manager integration |
| No Monitoring | Blind deployments | Mandatory observability |
| All-or-Nothing Deployments | High risk | Gradual rollouts (canary) |
12. CI/CD Metrics (DORA Metrics)
Key Performance Indicators
1. Deployment Frequency
Low: < 1/month (bottleneck)
Medium: 1-6/month (acceptable)
High: Daily (competitive)
Elite: Multiple daily (leading)
2. Lead Time for Changes
Low: > 6 months (major delays)
Medium: 1-6 months (acceptable)
High: < 1 month (good)
Elite: < 1 day (industry leading)
3. Mean Time to Recovery (MTTR)
Low: > 6 months (crisis mode)
Medium: 1-6 months (poor)
High: < 1 month (good)
Elite: < 1 hour (excellent)
4. Change Failure Rate
Low: > 50% (unreliable)
Medium: 15-50% (acceptable)
High: < 15% (good)
Elite: < 5% (excellent)
Benchmarks
Fortune 500 Company:
- Deployment frequency: 1/month
- Lead time: 2 months
- MTTR: 3 days
- Failure rate: 20%
Fast-growing SaaS Startup:
- Deployment frequency: Daily
- Lead time: 1 week
- MTTR: 4 hours
- Failure rate: 8%
Tech Leader (Google, Amazon, Netflix):
- Deployment frequency: Hourly (1000s/day)
- Lead time: Minutes
- MTTR: 15 minutes
- Failure rate: < 3%
13. Cloud Provider CI/CD Services
AWS CodePipeline
Source (CodeCommit/GitHub) →
Build (CodeBuild) →
Deploy (CodeDeploy/ECS/EKS) →
Test (CodeBuild) →
Release (Manual Approval)
Strengths:
✓ Tight AWS integration
✓ Cheap (pay per execution)
✓ Scales automatically
Weaknesses:
✗ Minimal UI
✗ Steeper learning curve
✗ Limited free tier
Azure Pipelines
YAML pipelines (code-as-config) →
Hosted agents or self-hosted →
Deploy to Azure/on-premises/multi-cloud →
Integrated with Azure DevOps
Strengths:
✓ Microsoft ecosystem integration
✓ Free for public/internal
✓ MSDN integration
Weaknesses:
✗ Primarily Azure-focused
✗ Steeper learning curve for non-Microsoft
GCP Cloud Build
Trigger from Cloud Source Repos/GitHub →
Build in container (fast startup) →
Push to Artifact Registry/GCR →
Deploy to Cloud Run/GKE/App Engine →
Integrated with GCP services
Strengths:
✓ Container-native (fast)
✓ Seamless GCP integration
✓ Pay-per-minute pricing
Weaknesses:
✗ GCP-centric
✗ Less feature-rich than Jenkins/GitLab
14. Troubleshooting Common CI/CD Issues
Build Failures
Issue: Intermittent test failures ("flaky tests")
Causes: Race conditions, timing issues, external dependencies
Solutions:
- Isolate tests (no shared state)
- Mock external services
- Increase timeout thresholds
- Retry flaky tests
Issue: Out of memory during builds
Causes: Large test suites, memory leaks, limited heap
Solutions:
- Increase runner memory
- Run tests in parallel (smaller batches)
- Profile memory usage
- Split tests across jobs
Deployment Issues
Issue: Deployment hangs
Causes: Waiting for resources, health checks timing out
Solutions:
- Check resource availability
- Increase timeout thresholds
- Review load balancer configuration
- Check application startup logs
Issue: Production downtime after deployment
Causes: Faulty update, insufficient testing, traffic surge
Solutions:
- Use blue-green deployment
- Automated rollback on failed health checks
- Canary deployment (risk reduction)
- Load testing before deployment
Performance Issues
Issue: Build takes too long (> 30 minutes)
Causes: Sequential execution, no caching, slow tests
Solutions:
- Enable parallel execution
- Implement caching
- Skip unnecessary steps
- Use spot instances (faster hardware)
Issue: Slow feedback loop
Causes: Serial pipeline stages, waiting for resources
Solutions:
- Parallel execution
- Smaller build jobs
- Fast feedback (fail fast)
- Skip heavy tests on every commit
15. CI/CD Maturity Model
Level 1: Manual (Baseline)
- Manual code merges
- Manual builds
- Manual testing
- Manual deployments
- No automation
- Deployment: 1-2x per quarter
- Incident response: Days
Level 2: Build Automation
- Automated builds on commit
- Unit testing automated
- Artifact versioning
- Deployment: Still manual
- Deployment: 1-2x per month
- Incident response: Hours
Level 3: Test & Deploy Automation
- Full test suite automated (unit, integration, E2E)
- Automated security scanning
- Automated deployments to staging
- Manual production approval
- Deployment: Weekly
- Incident response: Minutes-hours
Level 4: Full CI/CD
- Everything automated
- Continuous deployment to production
- Feature flags for gradual rollout
- Automated rollback
- Deployment: Daily
- Incident response: Minutes
Level 5: AIOps / Continuous Verification
- ML-powered deployment decisions
- Automated incident resolution
- Self-healing infrastructure
- Predictive alerts
- Deployment: Multiple daily
- Incident response: Automatic
- Manual: Rare exceptions only
Document Version: 1.0
Last Updated: January 31, 2026
Audience: Infrastructure Engineers, DevOps Teams, Engineering Leaders