Database Indexing Workshop

Overview

This workshop contains practical hands-on exercises for both MySQL and PostgreSQL. Complete the MySQL section first (60 min), then PostgreSQL section (60 min) for a full understanding of index optimization across both databases.

Total Time: ~120 minutes (60 min per database)

Part 1: MySQL Workshop

Practical Assessment

Duration: 60 minutes

Objective: Hands-on practice creating indexes, analyzing query performance with EXPLAIN, and comparing query efficiency before and after indexing.

Prerequisites

System Requirements

Docker installed on your laptop
MySQL 5.7 or later
Basic MySQL CLI knowledge

Install Dependency

Ensure Docker is running:

# Check Docker is running
docker --version

# If not installed, follow: https://docs.docker.com/get-docker/

MySQL Part 1: Set Up MySQL in Docker

Start MySQL Server

# Run MySQL 5.7 in Docker (change password as needed)
docker run -d \
  -e MYSQL_ROOT_PASSWORD=password123 \
  --name mysql57 \
  -p 3306:3306 \
  mysql:5.7

# Connect to MySQL
docker exec -it mysql57 mysql -u root -p

# When prompted, enter: password123

Manage MySQL Container

# Start existing container
docker start -a mysql57

# Stop container
docker stop mysql57

# Remove container (if you want to clean up)
docker rm mysql57

MySQL Part 2: Create Sample Database and Table

Once connected to MySQL (via docker exec -it mysql57 mysql -u root -p), run these commands:

2.1: Create Database

CREATE DATABASE workshop;

USE workshop;

2.2: Create Table Without Indexes

CREATE TABLE person_age (
  id INT(11) AUTO_INCREMENT,
  name CHAR(10),
  age INT(11),
  PRIMARY KEY (id)
);

2.3: Verify Table Structure

DESC person_age;

-- Expected Output:
-- +-------+----------+------+-----+---------+----------------+
-- | Field | Type     | Null | Key | Default | Extra          |
-- +-------+----------+------+-----+---------+----------------+
-- | id    | int(11)  | NO   | PRI | NULL    | auto_increment |
-- | name  | char(10) | YES  |     | NULL    |                |
-- | age   | int(11)  | YES  |     | NULL    |                |
-- +-------+----------+------+-----+---------+----------------+

MySQL Part 3: Insert Sample Data

Insert 7 sample records:

INSERT INTO person_age VALUES
(1, 'Bob', 70),
(2, 'David', 10),
(3, 'Adam', 32),
(4, 'Hank', 19),
(5, 'Aaron', 25),
(6, 'Danny', 99),
(7, 'Nick', 17);

-- Verify data
SELECT * FROM person_age;

-- Expected: 7 rows with ages ranging from 10 to 99

MySQL Part 4: Query Performance WITHOUT Index

4.1: Run Query Without Index

-- This query searches for all people with age > 20 (should return 4 rows)
SELECT * FROM person_age WHERE age > 20;

4.2: Analyze Query with EXPLAIN (Without Index)

EXPLAIN SELECT * FROM person_age WHERE age > 20;

-- Look for these fields in output:
-- - type: ALL (full table scan - BAD)
-- - key: NULL (no index used)
-- - rows: 7 (scans all 7 rows)
-- - filtered: 57.14 (only ~57% of rows match the condition)

Expected Output (reformatted for clarity):

+----+-------------+------------+------+---------------+------+---------+------+------+----------+
| id | select_type | table      | type | possible_keys | key  | rows    | Extra       |
+----+-------------+------------+------+---------------+------+---------+------+------+----------+
|  1 | SIMPLE      | person_age | ALL  | NULL          | NULL | 7       | Using where |
+----+-------------+------------+------+---------------+------+------+------+----------+

Analysis:

type: ALL = Full table scan (inefficient)
rows: 7 = Had to examine all 7 rows
filtered: 57.14 = Only ~57% of rows actually matched the condition

MySQL Part 5: Create Index on Age Column

5.1: Add Index

ALTER TABLE person_age ADD INDEX `age_index` (`age`);

-- Output:
-- Query OK, 0 rows affected (0.34 sec)
-- Records: 0  Duplicates: 0  Warnings: 0

5.2: Verify Index Exists

SHOW CREATE TABLE person_age;

-- Should show: KEY `age_index` (`age`)

MySQL Part 6: Query Performance WITH Index

6.1: Run Same Query With Index

SELECT * FROM person_age WHERE age > 20;

-- Result should be identical (4 rows):
-- Bob (70), Adam (32), Aaron (25), Danny (99)

6.2: Analyze Query with EXPLAIN (With Index)

EXPLAIN SELECT * FROM person_age WHERE age > 20;

-- Expected improvements:
-- - type: range (index range scan - GOOD)
-- - key: age_index (index being used)
-- - rows: 3-4 (only examines relevant rows)
-- - filtered: 100 (all examined rows matched)

Expected Output (reformatted for clarity):

+----+-------------+------------+-------+---------------+-----------+---------+------+------+-----------------------+
| id | select_type | table      | type  | possible_keys | key       | rows    | Extra                  |
+----+-------------+------------+-------+---------------+-----------+---------+------+------+-----------------------+
|  1 | SIMPLE      | person_age | range | age_index     | age_index | 4       | Using index condition  |
+----+-------------+------------+-------+---------------+-----------+---------+------+------+-----------------------+

Analysis:

type: range = Efficient range scan using index
rows: 4 = Only examined 4 rows (vs 7 before)
key: age_index = Index was used
Performance improvement = ~1.75x faster for this small dataset (larger for real tables)

6.3: Force Index Usage (Verify)

EXPLAIN SELECT * FROM person_age FORCE INDEX (age_index) WHERE age > 20;

-- Should show same results with age_index forced

MySQL Part 7: Performance Comparison Summary

Complete this table based on your EXPLAIN results:

Metric	WITHOUT Index	WITH Index	Improvement
Type	ALL	range	Full scan → Index scan
Key Used	NULL	age_index	No → Yes
Rows Examined	7	4	7 → 4 rows
Filtered (%)	57.14	100	Partial → All used
Speed	Baseline	~1.75x faster	✅ Significant

MySQL Part 8: Advanced - Composite Index

8.1: Create Second Table for Composite Index Practice

CREATE TABLE orders (
  id INT AUTO_INCREMENT,
  user_id INT,
  amount DECIMAL(10, 2),
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (id)
);

-- Insert sample orders
INSERT INTO orders (user_id, amount) VALUES
(1, 99.99),
(2, 149.99),
(1, 199.99),
(3, 79.99),
(1, 59.99);

8.2: Query Without Composite Index

-- Find all orders by user 1 created after a specific date
SELECT * FROM orders WHERE user_id = 1 AND created_at > '2024-01-01' ORDER BY created_at DESC;

-- Analyze
EXPLAIN SELECT * FROM orders WHERE user_id = 1 AND created_at > '2024-01-01' ORDER BY created_at DESC;

8.3: Create Composite Index

-- Composite index on both columns (user_id, created_at)
ALTER TABLE orders ADD INDEX `idx_user_created` (user_id, created_at DESC);

8.4: Query With Composite Index

-- Run same query again
EXPLAIN SELECT * FROM orders WHERE user_id = 1 AND created_at > '2024-01-01' ORDER BY created_at DESC;

-- Should now show:
-- - type: range (using index)
-- - key: idx_user_created
-- - rows: much fewer
-- - Extra: Using index condition (more efficient)

MySQL Part 9: Clean Up

Remove Container (Optional)

# Stop the container
docker stop mysql57

# Remove the container
docker rm mysql57

# Verify it's gone
docker ps -a

MySQL Validation Checklist

MySQL container started and accessible
workshop database created and accessible
person_age table created with 7 records
Query WITHOUT index: type = ALL, key = NULL, rows = 7
Index age_index created successfully
Query WITH index: type = range, key = age_index, rows ≤ 4
Observed performance improvement (1.75x+ faster)
Composite index created for orders table
Understood Leftmost Prefix rule

Part 2: PostgreSQL Workshop

Practical Assessment

Duration: 60 minutes

Objective: Hands-on practice with large datasets, concurrent index creation, and understanding PostgreSQL's advanced scan types (Bitmap, Index-Only, Sequential).

Prerequisites

System Requirements

Docker installed on your laptop
PostgreSQL 13 or later
Basic psql CLI knowledge

Install Dependency

Ensure Docker is running:

docker --version

PostgreSQL Part 1: Set Up PostgreSQL in Docker

Start PostgreSQL Server

# Run PostgreSQL 13 in Docker
docker run -d \
  -e POSTGRES_PASSWORD=postgres \
  --name postgres13 \
  -p 5432:5432 \
  postgres:13

# Connect to PostgreSQL
docker exec -it postgres13 psql -U postgres

# When prompted, enter: postgres

Manage PostgreSQL Container

# Start existing container
docker start -a postgres13

# Stop container
docker stop postgres13

# Remove container (if you want to clean up)
docker rm postgres13

PostgreSQL Part 2: Create Large Test Dataset

Once connected to PostgreSQL (via docker exec -it postgres13 psql -U postgres), run these commands:

2.1: Create Database

CREATE DATABASE workshop;

\c workshop  -- Connect to the new database

2.2: Create Table

CREATE TABLE person (
  name TEXT,
  salary INT,
  age INT
);

2.3: Generate 10 Million Rows (Takes ~2-3 minutes)

INSERT INTO person (name, salary, age)
SELECT 
  UPPER(SUBSTR(MD5(RANDOM()::TEXT), 1, 6)),
  FLOOR(RANDOM() * (1000 - 100 + 1)) + 100,
  FLOOR(RANDOM() * (60 - 20 + 1)) + 20
FROM GENERATE_SERIES(1, 10000000);

-- This will output: INSERT 0 10000000

2.4: Update Table Statistics

-- This is CRITICAL for query planner to make good decisions
ANALYZE person;

-- Check table size
SELECT
  ROUND(PG_TOTAL_RELATION_SIZE('person') / 1024 / 1024.0, 2) AS size_mb;

-- Expected: ~400-500 MB for 10M rows

PostgreSQL Part 3: Query Performance WITHOUT Index

3.1: Run Query Without Index (Full Scan)

-- Select people older than 50 (should return ~33% of rows = 3.3M)
SELECT COUNT(*) FROM person WHERE age > 50;

-- Now fetch actual data
SELECT * FROM person WHERE age > 50 LIMIT 10;

3.2: Analyze Query with EXPLAIN

EXPLAIN SELECT * FROM person WHERE age > 50;

-- Expected: Seq Scan (scans all 10M rows)
-- QUERY PLAN
-- ---------------------------------------------------------------
-- Seq Scan on person  (cost=0.00..180000.00 rows=3300000 width=48)
--   Filter: (age > 50)

Analysis:

Scans entire table (all 10M rows)
Returns ~3.3M rows (33% selectivity)
Sequential I/O is efficient for large result sets

PostgreSQL Part 4: Create Index and Observe Behavior

4.1: Create Index on Age

CREATE INDEX idx_age ON person (age);

-- Check index size
SELECT
  ROUND(PG_RELATION_SIZE('idx_age') / 1024 / 1024.0, 2) AS index_size_mb;

-- Expected: ~150-200 MB

4.2: Query with Different Selectivity Levels

Very Selective (1 row, 0.00001%):

EXPLAIN SELECT * FROM person WHERE age = 45;

-- Expected: Index Scan
-- Index Scan using idx_age on person  (cost=0.42..8.44 rows=1 width=48)
--   Index Cond: (age = 45)

Moderately Selective (3.3M rows, 33%):

EXPLAIN SELECT * FROM person WHERE age > 50;

-- Expected: Seq Scan (because 33% > 5% threshold)
-- PostgreSQL chooses sequential scan for large result sets

Medium Selectivity (660K rows, 6.6%):

EXPLAIN SELECT * FROM person WHERE age > 55;

-- Expected: Bitmap Scan
-- PostgreSQL chooses bitmap for moderate selectivity
-- QUERY PLAN
-- -------------------------------------------------------
-- Bitmap Heap Scan on person  (cost=...)(rows=660000 width=48)
--   Recheck Cond: (age > 55)
--   -> Bitmap Index Scan on idx_age  (cost=...)
--        Index Cond: (age > 55)

PostgreSQL Part 5: Index-Only Scan

5.1: Query Fetching Only Indexed Column

-- Fetch ONLY the indexed column (age)
EXPLAIN SELECT age FROM person WHERE age > 50;

-- Expected: Index-Only Scan (no heap access)
-- Index Only Scan using idx_age on person  (cost=...)(rows=3300000 width=4)
--   Index Cond: (age > 50)

-- Compare to fetching all columns (requires heap access)
EXPLAIN SELECT * FROM person WHERE age > 50;
-- This triggers Seq Scan (because 33% selectivity)

PostgreSQL Part 6: Concurrent Index Creation

6.1: Standard Index Creation (Blocks Writes)

-- This would lock the table (but we won't actually run it)
-- CREATE INDEX idx_salary ON person (salary);
-- During creation, INSERT/UPDATE/DELETE would BLOCK

-- Simulate with:
-- SELECT * FROM person FOR UPDATE;  -- Locks table

6.2: Concurrent Index Creation (Production Safe)

-- This allows concurrent writes (but takes longer)
CREATE INDEX CONCURRENTLY idx_salary ON person (salary);

-- Table remains fully usable during creation
-- Takes significantly longer (~2-3x) than standard creation
-- Uses more CPU/IO, but no downtime

6.3: Verify Index was Created

SELECT * FROM pg_indexes WHERE tablename = 'person';

-- Should show both idx_age and idx_salary

PostgreSQL Part 7: Composite Index Example

7.1: Create Composite Index

CREATE INDEX CONCURRENTLY idx_salary_age ON person (salary, age);

7.2: Queries Benefiting from Composite Index

-- Query using leftmost columns
EXPLAIN SELECT * FROM person WHERE salary > 500 AND age < 40;

-- Expected: Uses composite index idx_salary_age
-- Index Scan or Bitmap Scan using idx_salary_age on person
--   Index Cond: ((salary > 500) AND (age < 40))

PostgreSQL Part 8: Performance Comparison

Summary of Scan Types Observed

Query Type	Selectivity	Scan Type	Cost	Use Case
`age = 45`	0.00001%	Index Scan	Lowest	Precise lookup
`age > 55`	6.6%	Bitmap Scan	Medium	Moderate results
`age > 50`	33%	Seq Scan	Medium	Large result set
`age (only)`	Any	Index-Only	Lowest	Covering index

PostgreSQL Part 9: Clean Up

Remove Container (Optional)

# Stop the container
docker stop postgres13

# Remove the container
docker rm postgres13

# Verify it's gone
docker ps -a

Indexes dramatically speed up searches (10-100x)
Use EXPLAIN to validate index usage (type: ALL = bad)
Order composite indexes by cardinality
Use migration tools for production changes

PostgreSQL

Concurrent index creation prevents downtime
PostgreSQL chooses scan type based on selectivity
Index-Only scans are fastest (no heap access)
Bitmap scans balance index + sequential I/O
Sequential scans are efficient for large result sets (> 10%)

Both Databases

EXPLAIN is mandatory before assuming indexes help
Measure selectivity to understand index effectiveness
Test locally first before production changes
Monitor slow queries in production regularly
Iterate based on actual data patterns

Next Steps

Monitor: Enable logging in your production database
Analyze: Review slow query logs weekly
Optimize: Create indexes only for frequently slow queries
Maintain: Rebuild fragmented indexes regularly
Document: Record index creation dates and rationale

General Database Indexing Concepts

WORKSHOP