CI/CD: Continuous Integration & Continuous Deployment - Comprehensive Technical Reference

Enterprise-grade CI/CD architecture, tooling, and best practices for modern software delivery.

1. CI/CD Fundamentals & Evolution

What is CI/CD?

Continuous Integration (CI):

Automatically build and test code changes on every commit
Detect integration errors early (minutes, not days)
Maintain code quality and consistency
Reduce manual testing overhead

Continuous Deployment (CD):

Automatically deploy validated changes to production
Enable rapid, frequent releases (multiple per day)
Reduce deployment risk through automation
Maintain consistent infrastructure

CI/CD Pipeline:

Commit → Build → Test → Security Scan → Deploy → Monitor
  ↓        ↓       ↓         ↓           ↓        ↓
  Git    Compile  Unit    SAST/DAST  Staging  Metrics
                  Integration           Prod

Evolution Timeline

2000s: Manual deployments
  - FTP uploads
  - Manual testing
  - 1-2 releases per year
  - High risk, high effort

2010s: Continuous Integration
  - Jenkins, Travis CI emerge
  - Automated testing
  - Version control integration
  - 1-2 releases per month

2015+: Full CI/CD & DevOps
  - Containerization (Docker)
  - Kubernetes orchestration
  - GitOps principles
  - 10-100 releases per day
  - Infrastructure as Code

2. CI/CD Platform Comparison

Enterprise Platforms

Feature	Jenkins	GitLab CI	GitHub Actions	Azure Pipelines	GCP Cloud Build
Ease of Setup	Moderate	Easy	Very Easy	Easy	Easy
Hosting	Self-hosted	Cloud/Self	Cloud	Cloud	Cloud
Cost	Free	$12-99/user/mo	Free tier	Free tier	Pay-per-build
Kubernetes	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐⭐
Container Registry	Plugin	Native	Native	Native	Native
Infrastructure as Code	Groovy	YAML	YAML	YAML	YAML
Scalability	Excellent	Excellent	Good	Excellent	Excellent
Enterprise Security	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐
Community	Very Large	Large	Very Large	Large	Medium

Platform Selection Matrix

Choose Jenkins if:
- On-premises required
- Full customization needed
- Complex legacy systems
- High compliance requirements

Choose GitLab CI if:
- All-in-one DevOps platform needed
- Self-hosted or cloud option desired
- GitOps workflow preferred
- Container-native architecture

Choose GitHub Actions if:
- GitHub already primary repository
- Minimal setup desired
- Open source projects
- Cost-conscious, free tier sufficient

Choose Azure Pipelines if:
- Microsoft stack (Azure, Teams, Office 365)
- Enterprise Windows/C# development
- MSDN subscription available
- Integrated with Azure infrastructure

Choose GCP Cloud Build if:
- Google Cloud Platform primary
- Kubernetes (GKE) primary platform
- Multi-cloud build needed
- Container registry (GCR/Artifact Registry)

3. CI/CD Pipeline Architecture

Stage Breakdown

Stage 1: Commit/Trigger

Git Commit → Webhook → Pipeline Triggered
└─ Branch strategy (main, develop, feature)
└─ PR approval gates
└─ Code review integration

Stage 2: Build

Checkout Code → Compile → Package → Artifact Storage
└─ Language: Java, Python, Go, Node.js, C#, etc.
└─ Dependency resolution
└─ Version tagging
└─ Artifact: JAR, Docker image, ZIP, etc.

Stage 3: Test

Unit Tests → Integration Tests → E2E Tests
├─ Minimum 80% code coverage
├─ Performance benchmarks
└─ Test data management

Stage 4: Security Scan

SAST → DAST → Dependency Scan → Container Scan
├─ Static analysis (SonarQube, Checkmarx)
├─ Dynamic analysis (OWASP ZAP)
├─ CVE vulnerability detection
└─ Bill of Materials (SBOM)

Stage 5: Artifact Registry

Docker Image → Registry Push
├─ Tagging: v1.2.3, latest, stable
├─ Registry: Docker Hub, ECR, GCR, ACR
└─ Immutable image reference

Stage 6: Deploy Staging

Deploy to QA → Smoke Tests → Approval Gate
├─ Infrastructure provisioning
├─ Configuration management
└─ Manual sign-off (optional)

Stage 7: Deploy Production

Production Deployment → Health Checks → Monitoring
├─ Blue-green / Canary / Rolling
├─ Automated rollback
└─ Production monitoring

Stage 8: Monitor

Logs → Metrics → Alerts → Incidents
├─ Application metrics (response time, error rate)
├─ Infrastructure metrics (CPU, memory, disk)
├─ Business metrics (user signups, conversion)
└─ Alert escalation

4. Containerization with Docker

Docker Build Optimization

# Multi-stage build (optimized)
FROM golang:1.21 AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o app

# Final stage (small image)
FROM alpine:3.18
RUN apk add --no-cache ca-certificates
COPY --from=builder /app/app .
EXPOSE 8080
CMD ["./app"]

# Image size: 15MB (vs 1.2GB with full Go SDK)

Docker Registry Strategies

Image Naming:
registry.example.com/namespace/service:tag
  ├─ registry: Docker Hub, ECR, GCR, ACR
  ├─ namespace: company, team, project
  ├─ service: app name
  └─ tag: v1.2.3, latest, stable, 2024-01-15

Tagging Strategy:
- Semantic: v1.2.3 (breaking.feature.patch)
- Timestamp: 2024-01-15-14-30-45
- SHA: git-abc1234567890def
- Branch: main, develop, feature-x
- Quality: latest, stable, canary

Registry Security:
- Image scanning on push (CVE detection)
- Retention policies (delete old images)
- Access control (IAM, service accounts)
- Signed images (Docker Content Trust)

5. Deployment Strategies

Blue-Green Deployment

Blue (Current)  →  Green (New)  →  Switch Traffic
Version 1.0          Version 2.0      to Green

Benefits:
✓ Zero-downtime deployments
✓ Easy rollback (switch back to Blue)
✓ Test in production-like environment
✓ User acceptance testing on Green

Challenges:
✗ Requires 2x infrastructure
✗ Database migration complexity
✗ Cache invalidation

When to use:
- Large systems with high availability
- Frequent deployments (multiple/day)
- Database schema changes infrequent

Canary Deployment

Old (95%)  +  New (5%)  →  Monitor  →  Gradual Shift
Version 1.0    Version 2.0   Metrics    5% → 25% → 50% → 100%

Benefits:
✓ Gradual rollout reduces risk
✓ Real user feedback early
✓ Automatic rollback if errors detected
✓ Minimal infrastructure overhead

Challenges:
✗ Complex monitoring setup
✗ Session affinity needed
✗ Stateful services difficult

When to use:
- Risk-averse deployments
- Gradual feature rollouts
- A/B testing scenarios

Rolling Deployment

Instance 1 → Drain → Update → Bring Up
Instance 2 → Drain → Update → Bring Up
Instance 3 → Drain → Update → Bring Up

Benefits:
✓ No downtime
✓ Gradual resource recovery
✓ Load balancer handles traffic

Challenges:
✗ Multiple versions running
✗ Data migration complex
✗ Debugging harder

When to use:
- Kubernetes deployments
- Stateless services
- Frequent updates

6. GitOps & Infrastructure as Code

GitOps Principles

1. Git is Source of Truth
   - All configuration in Git repo
   - Git history = audit trail
   - Rollback = git revert

2. Declarative Description
   - YAML defines desired state
   - System reconciles to match
   - No imperative scripts

3. Continuous Synchronization
   - Pull model (not push)
   - Watch for drift
   - Auto-correct or alert

4. Git Workflows
   - PR approval before deploy
   - Code review + CI/CD gates
   - Traceability for compliance

Infrastructure as Code Tools

Tool	Language	Best For	Learning Curve
Terraform	HCL	Multi-cloud (AWS, Azure, GCP)	Moderate
Ansible	YAML	Configuration management	Easy
CloudFormation	JSON/YAML	AWS-specific	Easy (AWS only)
ARM Templates	JSON	Azure-specific	Moderate
Helm	YAML	Kubernetes packages	Moderate
Kustomize	YAML	Kubernetes overlays	Easy

7. Security in CI/CD

SAST (Static Application Security Testing)

Before code runs, scan for vulnerabilities:

Tools:
- SonarQube (universal)
- Checkmarx (comprehensive)
- GitHub CodeQL (GitHub-native)
- Snyk (dependencies)
- Fortify (enterprise)

Checks:
✓ SQL injection patterns
✓ XSS vulnerabilities
✓ Buffer overflow risks
✓ Hardcoded secrets
✓ Insecure crypto
✓ Code quality metrics

DAST (Dynamic Application Security Testing)

Test running application:

Tools:
- OWASP ZAP (free)
- Burp Suite (commercial)
- Rapid7 InsightAppSec
- Qualys ASPM

Checks:
✓ Authentication bypass
✓ Injection attacks
✓ Broken access control
✓ API security
✓ Session management
✓ Encryption validation

Secrets Management

WRONG (Do NOT):
- Hardcoded in code
- Committed to Git
- Stored in config files
- Visible in logs

RIGHT:
- Vault (HashiCorp)
- AWS Secrets Manager
- Azure Key Vault
- Google Secret Manager
- Environment variables (at runtime)

CI/CD Integration:
1. Pipeline needs secret
2. Request from secrets manager
3. Secret injected at runtime
4. Secret NOT logged/stored
5. Automatic rotation

Supply Chain Security

Container Image Security:
├─ Build provenance tracking
├─ Signed images (Cosign)
├─ SBOM (Software Bill of Materials)
├─ Vulnerability scanning
├─ Registry access control
└─ Policy enforcement

Dependency Security:
├─ Lock files (go.sum, package-lock.json)
├─ Version pinning
├─ Automated updates (Dependabot)
├─ License scanning
└─ CVE monitoring

8. Observability & Monitoring

Four Golden Signals

1. Latency
   - Request response time
   - P50, P95, P99 percentiles
   - SLA: < 200ms for 99th percentile

2. Traffic
   - Requests per second
   - Concurrent users
   - Data throughput

3. Errors
   - Error rate (4xx, 5xx)
   - Exception types
   - Error budget tracking

4. Saturation
   - CPU utilization
   - Memory usage
   - Disk I/O
   - Network bandwidth

Monitoring Stack

Application Code
    ↓
(Prometheus/Datadog/New Relic)
    ↓
Metrics Store
    ↓
Visualization (Grafana/Kibana)
    ↓
Alerting (PagerDuty/Opsgenie)
    ↓
Incident Response

Key Dashboards

Deployment Dashboard:
- Deployment frequency
- Lead time for changes
- Mean time to recovery (MTTR)
- Change failure rate

Application Dashboard:
- Requests per second
- Error rate
- P99 latency
- Top slowest endpoints

Infrastructure Dashboard:
- CPU utilization (all instances)
- Memory usage
- Disk I/O
- Network throughput

9. Cost Optimization in CI/CD

Build Optimization

Caching Strategy:
├─ Dependency cache (faster builds)
├─ Container layer cache (faster images)
├─ Build artifact cache
└─ Expected savings: 70-80% build time

Parallel Execution:
├─ Run independent jobs in parallel
├─ Fan-out/fan-in patterns
├─ Expected speedup: 4-8x with 8 parallel jobs

Build Resource Sizing:
├─ Use smaller instances for lightweight builds
├─ Use spot instances for non-production
├─ Scale down when idle
└─ Expected savings: 30-50% on build infrastructure

Docker Image Optimization:
├─ Multi-stage builds (reduce image size)
├─ Alpine base images (5MB vs 200MB)
├─ Remove build tools from final image
└─ Expected savings: 90% smaller images

Pipeline Optimization

Remove Redundant Steps:
├─ Skip tests for docs-only changes
├─ Skip deploy for failed builds
├─ Fail fast (stop early on first failure)

Workflow Optimization:
├─ Merge fast paths (unit tests on demand)
├─ Run expensive tests only on main branch
├─ Run E2E tests only before production

Expected Results:
- 50% faster feedback to developers
- 70% reduction in wasted compute
- $500K-2M annual savings (enterprise scale)

10. Enterprise CI/CD Patterns

Multi-Environment Strategy

Dev → Staging → Production
├─ Dev: Personal development, minimal checks
├─ Staging: Full testing, security scanning
└─ Production: Manual approval, zero downtime

Configuration Management:
├─ Secrets: Dev, Staging, Prod (separate)
├─ Feature flags: Enable/disable in runtime
├─ Infrastructure: IaC with environment overlays
└─ Monitoring: Different alert thresholds per env

Microservices CI/CD

Service A (Build → Test → Deploy) → Registry
Service B (Build → Test → Deploy) → Registry
Service C (Build → Test → Deploy) → Registry
             ↓
         Orchestrator (Kubernetes)
             ↓
    Multi-service deployment

Challenges:
✗ Service dependencies
✗ Database migrations
✗ Distributed tracing
✗ API versioning

Solutions:
✓ Contract testing (Consumer-Driven)
✓ Feature flags for compatibility
✓ Backward compatibility requirements
✓ Service mesh (Istio) for traffic management

High-Frequency Release Cycles

Traditional (Quarterly):
Jan → Apr → Jul → Oct (4 releases/year)

Agile (Sprint-based):
Every 2 weeks (26 releases/year)

Continuous Deployment:
Multiple times per day (100+ releases/year)

Requirements:
✓ Automated testing (80%+ coverage)
✓ Feature flags (control rollout)
✓ Monitoring (detect issues instantly)
✓ Rollback automation (revert in seconds)
✓ Small, focused changes (easier to debug)

11. Anti-Patterns to Avoid

Anti-Pattern	Problem	Solution
Manual Deployments	Slow, error-prone	Automation first
Untested Code in Prod	Frequent outages	Mandatory automated tests
Shared Deployment Credentials	Security risk	Service accounts + IAM
Monolithic Pipeline	Bottleneck, slow feedback	Parallel execution, modular
No Rollback Plan	Long MTTR	Automated rollback, blue-green
Secrets in Code	Data breach risk	Secrets manager integration
No Monitoring	Blind deployments	Mandatory observability
All-or-Nothing Deployments	High risk	Gradual rollouts (canary)

12. CI/CD Metrics (DORA Metrics)

Key Performance Indicators

1. Deployment Frequency
   Low:  < 1/month (bottleneck)
   Medium: 1-6/month (acceptable)
   High: Daily (competitive)
   Elite: Multiple daily (leading)

2. Lead Time for Changes
   Low:  > 6 months (major delays)
   Medium: 1-6 months (acceptable)
   High: < 1 month (good)
   Elite: < 1 day (industry leading)

3. Mean Time to Recovery (MTTR)
   Low:  > 6 months (crisis mode)
   Medium: 1-6 months (poor)
   High: < 1 month (good)
   Elite: < 1 hour (excellent)

4. Change Failure Rate
   Low:  > 50% (unreliable)
   Medium: 15-50% (acceptable)
   High: < 15% (good)
   Elite: < 5% (excellent)

Benchmarks

Fortune 500 Company:
- Deployment frequency: 1/month
- Lead time: 2 months
- MTTR: 3 days
- Failure rate: 20%

Fast-growing SaaS Startup:
- Deployment frequency: Daily
- Lead time: 1 week
- MTTR: 4 hours
- Failure rate: 8%

Tech Leader (Google, Amazon, Netflix):
- Deployment frequency: Hourly (1000s/day)
- Lead time: Minutes
- MTTR: 15 minutes
- Failure rate: < 3%

13. Cloud Provider CI/CD Services

AWS CodePipeline

Source (CodeCommit/GitHub) →
Build (CodeBuild) →
Deploy (CodeDeploy/ECS/EKS) →
Test (CodeBuild) →
Release (Manual Approval)

Strengths:
✓ Tight AWS integration
✓ Cheap (pay per execution)
✓ Scales automatically

Weaknesses:
✗ Minimal UI
✗ Steeper learning curve
✗ Limited free tier

Azure Pipelines

YAML pipelines (code-as-config) →
Hosted agents or self-hosted →
Deploy to Azure/on-premises/multi-cloud →
Integrated with Azure DevOps

Strengths:
✓ Microsoft ecosystem integration
✓ Free for public/internal
✓ MSDN integration

Weaknesses:
✗ Primarily Azure-focused
✗ Steeper learning curve for non-Microsoft

GCP Cloud Build

Trigger from Cloud Source Repos/GitHub →
Build in container (fast startup) →
Push to Artifact Registry/GCR →
Deploy to Cloud Run/GKE/App Engine →
Integrated with GCP services

Strengths:
✓ Container-native (fast)
✓ Seamless GCP integration
✓ Pay-per-minute pricing

Weaknesses:
✗ GCP-centric
✗ Less feature-rich than Jenkins/GitLab

14. Troubleshooting Common CI/CD Issues

Build Failures

Issue: Intermittent test failures ("flaky tests")
Causes: Race conditions, timing issues, external dependencies
Solutions:
- Isolate tests (no shared state)
- Mock external services
- Increase timeout thresholds
- Retry flaky tests

Issue: Out of memory during builds
Causes: Large test suites, memory leaks, limited heap
Solutions:
- Increase runner memory
- Run tests in parallel (smaller batches)
- Profile memory usage
- Split tests across jobs

Deployment Issues

Issue: Deployment hangs
Causes: Waiting for resources, health checks timing out
Solutions:
- Check resource availability
- Increase timeout thresholds
- Review load balancer configuration
- Check application startup logs

Issue: Production downtime after deployment
Causes: Faulty update, insufficient testing, traffic surge
Solutions:
- Use blue-green deployment
- Automated rollback on failed health checks
- Canary deployment (risk reduction)
- Load testing before deployment

Performance Issues

Issue: Build takes too long (> 30 minutes)
Causes: Sequential execution, no caching, slow tests
Solutions:
- Enable parallel execution
- Implement caching
- Skip unnecessary steps
- Use spot instances (faster hardware)

Issue: Slow feedback loop
Causes: Serial pipeline stages, waiting for resources
Solutions:
- Parallel execution
- Smaller build jobs
- Fast feedback (fail fast)
- Skip heavy tests on every commit

15. CI/CD Maturity Model

Level 1: Manual (Baseline)

- Manual code merges
- Manual builds
- Manual testing
- Manual deployments
- No automation
- Deployment: 1-2x per quarter
- Incident response: Days

Level 2: Build Automation

- Automated builds on commit
- Unit testing automated
- Artifact versioning
- Deployment: Still manual
- Deployment: 1-2x per month
- Incident response: Hours

Level 3: Test & Deploy Automation

- Full test suite automated (unit, integration, E2E)
- Automated security scanning
- Automated deployments to staging
- Manual production approval
- Deployment: Weekly
- Incident response: Minutes-hours

Level 4: Full CI/CD

- Everything automated
- Continuous deployment to production
- Feature flags for gradual rollout
- Automated rollback
- Deployment: Daily
- Incident response: Minutes

Level 5: AIOps / Continuous Verification

- ML-powered deployment decisions
- Automated incident resolution
- Self-healing infrastructure
- Predictive alerts
- Deployment: Multiple daily
- Incident response: Automatic
- Manual: Rare exceptions only

Document Version: 1.0
Last Updated: January 31, 2026
Audience: Infrastructure Engineers, DevOps Teams, Engineering Leaders

CONCEPT