WORKSHOP
WORKSHOPLast updated: 1/31/2026
CNPG Workshop: Hands-On PostgreSQL Clustering Lab
Overview
This workshop provides practical experience with CloudNativePG (CNPG) by building and operating a PostgreSQL cluster on Kubernetes.
- Duration: 120 minutes (6 parts, 18 tasks)
- Prerequisites: Kubernetes cluster (minikube/kind acceptable), kubectl, helm, docker
- Outcome: Running HA PostgreSQL cluster with backups, failover, and monitoring
Part 1: Operator Installation & Setup (15 min)
Task 1.1: Verify Kubernetes Environment
# Check Kubernetes version
kubectl version --short
# Expected output:
# Client Version: v1.28.0
# Server Version: v1.28.0
# Check available storage classes
kubectl get storageclass
# Expected output (at least 1):
# NAME PROVISIONER
# standard (default) kubernetes.io/amd-minikube-hostpath
Task 1.2: Create CNPG Namespace
# Create namespace for CNPG operator
kubectl create namespace cnpg-system
# Verify creation
kubectl get namespaces | grep cnpg
# Expected output:
# cnpg-system Active 5s
Task 1.3: Install CNPG Operator via Helm
# Add Helm repository
helm repo add cnpg https://cloudnative-pg.github.io/charts
helm repo update
# Install operator
helm install cnpg cnpg/cloudnative-pg \
--namespace cnpg-system \
--create-namespace
# Verify operator deployment
kubectl get deployment -n cnpg-system
# Expected output:
# NAME READY UP-TO-DATE AVAILABLE AGE
# cnpg-cloudnative-pg 1/1 1 1 30s
Task 1.4: Verify Operator is Ready
# Check operator pod
kubectl get pods -n cnpg-system
# Expected output (RUNNING status):
# NAME READY STATUS RESTARTS AGE
# cnpg-cloudnative-pg-7c8d4f5b89-abc123 1/1 Running 0 45s
# Check operator logs
kubectl logs -n cnpg-system deployment/cnpg-cloudnative-pg | head -20
# Expected: Log entries about CRD registration
Task 1.5: Create Production Namespace
# Create namespace for PostgreSQL clusters
kubectl create namespace databases
# Verify
kubectl get namespaces | grep databases
# Expected:
# databases Active 3s
Part 2: Deploy PostgreSQL Cluster (30 min)
Task 2.1: Create Base Cluster Manifest
Create pg-cluster.yaml:
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: demo-cluster
namespace: databases
spec:
instances: 3
imageName: ghcr.io/cloudnative-pg/postgresql:16.2
# HA Configuration
primaryUpdateStrategy: unsupervised
primaryUpdateMethod: switchover
# Storage
storage:
size: 1Gi
storageClass: standard
walStorage:
size: 1Gi
storageClass: standard
# Resources
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
# Monitoring
monitoring:
enabled: true
# PostgreSQL Configuration
postgresql:
parameters:
max_connections: "100"
shared_buffers: "64MB"
Task 2.2: Apply Cluster Manifest
# Apply manifest
kubectl apply -f pg-cluster.yaml
# Verify cluster created
kubectl get cluster -n databases
# Expected output:
# NAME AGE INSTANCES READY STATUS
# demo-cluster 5s 3 0 Setting up cluster
Task 2.3: Monitor Cluster Creation
# Watch pods being created
kubectl get pods -n databases -w
# Expected progression:
# NAME READY STATUS RESTARTS AGE
# demo-cluster-1 0/1 Pending 0 5s
# demo-cluster-1 0/1 ContainerCreating 0 8s
# demo-cluster-1 1/1 Running 0 15s
# demo-cluster-2 0/1 Pending 0 20s
# demo-cluster-2 1/1 Running 0 30s
# demo-cluster-3 0/1 Pending 0 35s
# demo-cluster-3 1/1 Running 0 45s
# Press Ctrl+C to stop watching
Task 2.4: Check Cluster Status
# Wait for all pods to be ready (1-2 minutes)
kubectl wait --for=condition=ready pod \
-l postgresql=demo-cluster \
-n databases \
--timeout=300s
# Get detailed cluster status
kubectl cnpg status demo-cluster -n databases
# Expected output:
# Cluster Summary
# Name: demo-cluster
# Namespace: databases
# System ID: xxxxxxxxxxxx
# PostgreSQL Version: 16.2
# Primary: demo-cluster-1
#
# Instances: 3
# Ready: 3
# Not Ready: 0
#
# Replicas:
# demo-cluster-2 (streaming|wal received)
# demo-cluster-3 (streaming|wal received)
Task 2.5: Verify Storage
# Check PVCs created
kubectl get pvc -n databases
# Expected output (6 PVCs: 3 for data, 3 for WAL):
# NAME STATUS VOLUME
# demo-cluster-1 Bound pvc-xxx
# demo-cluster-1-wal Bound pvc-xxx
# demo-cluster-2 Bound pvc-xxx
# demo-cluster-2-wal Bound pvc-xxx
# demo-cluster-3 Bound pvc-xxx
# demo-cluster-3-wal Bound pvc-xxx
Task 2.6: Verify Services
# Check services created
kubectl get svc -n databases
# Expected output (4 services):
# NAME TYPE CLUSTER-IP
# demo-cluster-rw ClusterIP 10.x.x.x (read-write, primary)
# demo-cluster-ro ClusterIP 10.x.x.x (read-only, replicas)
# demo-cluster-r ClusterIP 10.x.x.x (any instance)
# demo-cluster-metrics ClusterIP 10.x.x.x (prometheus)
Part 3: Database Operations & Queries (30 min)
Task 3.1: Connect to Database
# Open PSQL shell
kubectl cnpg psql demo-cluster -n databases
# You should see:
# psql (16.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 10.2.1)
# SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256)
# postgres=>
Task 3.2: Create Test Database and Schema
-- Create database
CREATE DATABASE testdb;
-- Connect to it
\c testdb
-- Create schema
CREATE SCHEMA test;
-- Create table
CREATE TABLE test.users (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
email VARCHAR(100) UNIQUE,
created_at TIMESTAMP DEFAULT NOW()
);
-- Create index
CREATE INDEX idx_users_email ON test.users(email);
-- Insert test data
INSERT INTO test.users (name, email) VALUES
('Alice', 'alice@example.com'),
('Bob', 'bob@example.com'),
('Charlie', 'charlie@example.com');
-- Verify data
SELECT * FROM test.users;
-- Expected output:
-- id | name | email | created_at
-- ----+---------+--------------------+----------------------------
-- 1 | Alice | alice@example.com | 2024-01-28 10:30:45.123456
-- 2 | Bob | bob@example.com | 2024-01-28 10:30:45.234567
-- 3 | Charlie | charlie@example.com| 2024-01-28 10:30:45.345678
Task 3.3: Check Replication
-- Check replication slots
SELECT * FROM pg_replication_slots;
-- Expected output (2 replicas):
-- slot_name | slot_type | restart_lsn
-- ---------------+-------------+-------------
-- cnpg_standby_1 | physical | 0/3000028
-- cnpg_standby_2 | physical | 0/3000028
-- Check active replicas
SELECT * FROM pg_stat_replication;
-- Expected output:
-- pid | usesysid | usename | application_name | state
-- -----+----------+---------+------------------+----------
-- 123 | 10 | cnpg | standby_1 | streaming
-- 124 | 10 | cnpg | standby_2 | streaming
-- Check lag (should be 0)
SELECT now() - pg_last_xact_replay_time();
-- Expected:
-- ?column?
-- ----------
-- 00:00:00
Task 3.4: Verify Write Operations on Primary
# Exit PSQL
\q
# Connect to READ-WRITE service (primary only)
kubectl cnpg psql demo-cluster -n databases
# Insert data
psql> INSERT INTO test.users (name, email) VALUES ('David', 'david@example.com');
# Query
psql> SELECT COUNT(*) FROM test.users;
-- Expected: 4
# Disconnect
\q
Task 3.5: Verify Replicas Received Data
# Connect to READ-ONLY service (replica)
# Note: Port forward required for replicas (they're not exposed in basic setup)
# Check replica pod directly
kubectl exec -it -n databases demo-cluster-2 -- psql -U postgres
# Query replica
psql> SELECT COUNT(*) FROM testdb.test.users;
-- Expected: 4 (replicas have the data)
psql> SELECT * FROM testdb.test.users WHERE name = 'David';
-- Expected: Shows the newly inserted row
psql> \q
Task 3.6: Check Primary ID
# Get primary pod name
kubectl get pods -n databases -o wide | grep "demo-cluster"
# Expected: demo-cluster-1 marked as primary
# Or check cluster status
kubectl cnpg status demo-cluster -n databases | grep "Primary:"
Part 4: Failover & Recovery (30 min)
Task 4.1: Simulate Primary Failure
# Delete primary pod (simulates crash)
kubectl delete pod demo-cluster-1 -n databases
# Immediately check cluster status
kubectl cnpg status demo-cluster -n databases
# Expected progression:
# Initial: Primary demo-cluster-1 (deleting)
# After 10s: No primary, replicas promoting
# After 20s: demo-cluster-2 promoted to primary
# After 30s: demo-cluster-1 recreated as replica
Task 4.2: Monitor Failover Process
# Watch pod recreation
kubectl get pods -n databases -w
# Timeline:
# demo-cluster-1 0/1 Terminating
# demo-cluster-1 0/1 Terminated
# (cluster without primary for ~10s)
# demo-cluster-2 Primary promoted (1/1 Running)
# demo-cluster-1 0/1 Pending (recreating)
# demo-cluster-1 0/1 ContainerCreating
# demo-cluster-1 1/1 Running (new replica)
Task 4.3: Verify Failover Completed
# Get cluster status
kubectl cnpg status demo-cluster -n databases
# Expected output (NEW primary):
# Primary: demo-cluster-2 (was demo-cluster-1)
# Replicas:
# demo-cluster-1 (streaming)
# demo-cluster-3 (streaming)
Task 4.4: Verify Data Integrity After Failover
# Connect and query
kubectl cnpg psql demo-cluster -n databases
# Check data is intact
psql> SELECT COUNT(*) FROM test.users;
-- Expected: 4 (all data preserved)
psql> SELECT * FROM test.users;
-- Expected: All rows including David
# Insert new data to verify primary works
psql> INSERT INTO test.users (name, email) VALUES ('Eve', 'eve@example.com');
psql> SELECT COUNT(*) FROM test.users;
-- Expected: 5
psql> \q
Task 4.5: Manual Switchover (Planned)
# Current state: demo-cluster-2 is primary
# Goal: Move primary back to demo-cluster-1
# Get current cluster status
kubectl cnpg status demo-cluster -n databases
# Promote replica to primary (switchover)
kubectl cnpg promote demo-cluster demo-cluster-1 -n databases
# Watch switchover
kubectl get pods -n databases -w
# Expected:
# demo-cluster-2 steps down (gets new primary label)
# demo-cluster-1 becomes primary
# demo-cluster-2 restarts as replica
Task 4.6: Verify Switchover Completed
# Get status
kubectl cnpg status demo-cluster -n databases
# Expected (original primary restored):
# Primary: demo-cluster-1
# Replicas:
# demo-cluster-2 (streaming)
# demo-cluster-3 (streaming)
# Verify data consistency
kubectl cnpg psql demo-cluster -n databases
psql> SELECT COUNT(*) FROM test.users;
-- Expected: 5 (includes Eve)
Part 5: Backup & Recovery (20 min)
Task 5.1: Create Backup Storage (MinIO)
# Deploy MinIO for object storage backup
kubectl run minio \
-n databases \
--image=minio/minio:latest \
-- server /data
# Wait for MinIO to start
sleep 10
# Expose MinIO (for local testing)
kubectl port-forward -n databases svc/minio 9000:9000 &
Task 5.2: Configure Backup in Cluster
Create pg-cluster-backup.yaml:
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: demo-cluster
namespace: databases
spec:
instances: 3
imageName: ghcr.io/cloudnative-pg/postgresql:16.2
# ... existing config ...
# Backup configuration
backup:
barmanObjectStore:
destinationPath: s3://demo-backup/
endpointURL: http://minio:9000
s3Credentials:
accessKeyId: minioadmin
secretAccessKey: minioadmin
wal:
compression: gzip
maxParallel: 4
Task 5.3: Apply Backup Configuration
# Apply updated manifest
kubectl apply -f pg-cluster-backup.yaml
# Verify backup storage initialized
kubectl logs -n databases demo-cluster-1 | grep -i backup
# Expected log entries about backup initialization
Task 5.4: Create Scheduled Backup
Create scheduled-backup.yaml:
apiVersion: postgresql.cnpg.io/v1
kind: ScheduledBackup
metadata:
name: demo-cluster-backup
namespace: databases
spec:
schedule: "0 1 * * *" # Daily at 1 AM UTC
backupOwnerReference: cluster
cluster:
name: demo-cluster
Task 5.5: Trigger Manual Backup
# Apply scheduled backup
kubectl apply -f scheduled-backup.yaml
# Trigger immediate backup (for lab purposes)
kubectl cnpg psql demo-cluster -n databases << EOF
SELECT * FROM pg_basebackup_start('demo-backup', false);
EOF
# Wait for backup to complete (~30 seconds)
echo "Backup in progress..."
sleep 30
# Check backup status in MinIO logs
kubectl logs -n databases deployment/minio | tail -20
Task 5.6: Point-in-Time Recovery (PITR)
# Get current timestamp
RECOVERY_TIME=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
echo "Recovery time: $RECOVERY_TIME"
# Create recovery manifest
cat > pg-cluster-recovered.yaml << EOF
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: demo-cluster-recovered
namespace: databases
spec:
instances: 1
imageName: ghcr.io/cloudnative-pg/postgresql:16.2
storage:
size: 1Gi
storageClass: standard
bootstrap:
recovery:
source: demo-cluster
recoveryTarget:
targetTime: "$RECOVERY_TIME"
EOF
# Apply recovery cluster (don't do this in real lab to avoid complications)
echo "Recovery manifest created: pg-cluster-recovered.yaml"
echo "In production, apply with: kubectl apply -f pg-cluster-recovered.yaml"
Part 6: Monitoring & Validation (15 min)
Task 6.1: Check Pod Metrics
# Get resource usage
kubectl top pods -n databases
# Expected output:
# NAME CPU(cores) MEMORY(bytes)
# demo-cluster-1 50m 128Mi
# demo-cluster-2 45m 125Mi
# demo-cluster-3 48m 130Mi
Task 6.2: Check PVC Usage
# Check storage usage
kubectl get pvc -n databases -o wide
# Check detailed PVC info
kubectl describe pvc demo-cluster-1 -n databases | grep -i "used\|size"
Task 6.3: Verify Replication Health
# Check cluster status
kubectl cnpg status demo-cluster -n databases
# Look for "All replicas ready" message
# Check Replication Slots are active (for both replicas)
Task 6.4: Test Recovery
# Connect to database
kubectl cnpg psql demo-cluster -n databases
# Simulate data corruption (delete a row)
psql> DELETE FROM test.users WHERE id = 1;
psql> SELECT COUNT(*) FROM test.users;
-- Expected: 4 (one deleted)
# In production, restore from backup to another cluster
# (PITR would recover the deleted data)
psql> \q
Task 6.5: Cleanup (Optional)
# Delete cluster (removes pods and PVCs)
kubectl delete cluster demo-cluster -n databases
# Verify deletion
kubectl get pods -n databases
# Delete namespace
kubectl delete namespace databases
Task 6.6: Validation Checklist
- Cluster deployed with 3 instances
- Primary and replicas identified
- Data replicated to all instances
- Failover worked (primary recreated)
- Switchover successful (primary role switched)
- Backup configured
- Can query both primary and replicas
- Replication slots active
- Monitoring metrics collected
- Services created correctly
Troubleshooting Common Issues
| Issue | Symptoms | Resolution |
|---|---|---|
| Pod stuck "Pending" | Stays pending > 5min | Check storage class: kubectl get storageclass |
| Replicas not joining | "Waiting for instances" | Check logs: kubectl logs <pod> -n databases |
| Slow failover | Primary down 2-3min before replica promotes | Check PVC attachment status |
| Cannot connect | "Connection refused" | Port forward or check service: kubectl get svc |
| WAL archiving failed | Backup not progressing | Verify MinIO credentials and connectivity |
Next Steps
- Deploy metrics collection (Prometheus + Grafana)
- Set up alerting (Alert on replication lag > 60s)
- Automate backups (Use ScheduledBackup CRD)
- Implement connection pooling (PgBouncer via CNPG Pooler)
- Test DR procedures (Regular PITR drills)
- Document runbooks (Customized for your environment)
Workshop Completion Time: ~120 minutes Skills Gained: Cluster deployment, failover, backup/recovery, monitoring