nipunap

MySQL Query Optimization Guide

Cardinality and Index Usage

Version: 2.0 MySQL Version: 8.0+ Last Updated: January 2026

Introduction
Key Concepts & Glossary
Understanding Cardinality
Index Fundamentals
Analyzing Query Execution Plans
Practical Example: Optimizing a Join Query
Best Practices
Common Pitfalls
Advanced Topics
Production Operations
Real-World Scenarios

Introduction

Query optimization in MySQL is critical for application performance. Two key concepts drive effective optimization:

Cardinality — The uniqueness of data in a column
Index Usage — How MySQL uses indexes to access data efficiently

This guide demonstrates how to analyze and optimize queries using these principles.

Key Concepts & Glossary

Understanding these fundamental terms will help you follow the guide more easily.

Core Terms

Term	Definition
Index	A data structure that helps MySQL find rows faster. Think of it like a book’s index—you can quickly find topics without reading every page.
Cardinality	The number of unique values in a column. High cardinality = many unique values (e.g., email addresses). Low cardinality = few unique values (e.g., true/false flags).
Selectivity	The percentage of unique values = cardinality ÷ total rows. Higher selectivity (closer to 100%) means better for indexing.
Composite Index	An index on multiple columns. Order matters! Like organizing books by genre, then author, then title.
Covering Index	An index that includes ALL columns needed by a query, avoiding lookups to the main table. Like having complete information in the book index without opening the book.
Query Optimizer	MySQL’s decision-making engine that chooses how to execute your query (which indexes to use, join order, etc.).
Buffer Pool	MySQL’s memory cache for data and indexes. Larger = faster queries because data stays in RAM instead of reading from disk.

EXPLAIN Output Terms

Column	What It Means	Interpretation
type	How MySQL accesses data	Best to worst: `const` → `eq_ref` → `ref` → `range` → `index` → `ALL`
key	Which index MySQL is using	`NULL` = no index used (problem!)
key_len	Bytes of index being used	Larger = more columns from composite index utilized
rows	Estimated rows to examine	Lower is better
Extra	Additional information	See details below

EXPLAIN Extra Field Values

Value	Meaning	Performance
`Using index`	Covering index (no table lookup)	Excellent ✓
`Using where`	Filtering after reading rows	Less efficient
`Using filesort`	Sorting requires extra work	May need optimization
`Using temporary`	Temporary table created	Expensive ×
`Using index condition`	Index Condition Pushdown active	Good ✓

Performance & System Terms

Term	Definition
ICP (Index Condition Pushdown)	Filtering happens at the storage engine level (faster) instead of the MySQL server level.
MRR (Multi-Range Read)	Optimization that sorts row lookups before reading from disk to reduce random I/O.
InnoDB	MySQL’s default storage engine. Handles transactions, row-level locking, and crash recovery.
DDL (Data Definition Language)	Schema changes: `CREATE INDEX`, `ALTER TABLE`, `DROP TABLE`
DML (Data Manipulation Language)	Data operations: `SELECT`, `INSERT`, `UPDATE`, `DELETE`
OLTP (Online Transaction Processing)	Systems with many small read/write operations (e.g., e-commerce, banking).
OLAP (Online Analytical Processing)	Systems focused on complex queries for reporting/analytics (e.g., data warehouses).
Replication Lag	Time delay before changes on primary database appear on replica databases.
Table Scan	Reading every row in a table (slow for large tables). Happens when no suitable index exists.

Understanding Cardinality

What is Cardinality?

Cardinality refers to the number of distinct values in a column relative to the total number of rows.

Type	Description	Examples
High Cardinality	Many unique values	`user_id`, `email`, `UUID`, `order_number`
Low Cardinality	Few unique values	`gender`, `status`, `boolean flags`, `country_code`

Why Cardinality Matters

MySQL’s query optimizer uses cardinality statistics to:

Choose which indexes to use — Higher cardinality = better index selectivity
Determine join order — Tables with selective filters are processed first
Estimate row counts — Affects cost calculations for query plans

Checking Cardinality

-- View cardinality for all indexes on a table
SHOW INDEX FROM table_name;

-- Calculate selectivity for a column
SELECT
    COUNT(DISTINCT column_name) AS cardinality,
    COUNT(*) AS total_rows,
    ROUND(COUNT(DISTINCT column_name) / COUNT(*) * 100, 2) AS selectivity_percent
FROM table_name;

Understanding Selectivity

Formula: Selectivity = Cardinality ÷ Total Rows

Selectivity	Index Effectiveness	Action
> 90%	Excellent	Ideal for indexing
50-90%	Good	Good candidate
10-50%	Moderate	Consider composite index
< 10%	Poor	Avoid standalone index

Rule of Thumb: Index columns with selectivity > 10%

Index Fundamentals

Types of Indexes

Index Type	Characteristics	Use Case
Primary Key	Unique, non-null, clustered	Table’s main identifier
Unique	Ensures uniqueness	Email, username, SKU
Regular (Non-unique)	Most common, allows duplicates	Foreign keys, search columns
Composite (Multi-column)	Multiple columns (order matters!)	Complex WHERE conditions
Full-Text	Text searching	Blog posts, descriptions
Spatial	Geographic data	Locations, maps, GIS

Index Effectiveness by Cardinality

Column Cardinality	Index Effectiveness	Example Columns	Action
Very High (>90%)	Excellent ⭐⭐⭐	`user_id`, `email`, `order_number`	Always index
High (50-90%)	Good ⭐⭐	`product_sku`, `customer_name`	Good candidate
Medium (10-50%)	Moderate ⭐	`category_id`, `city`	Use in composite index
Low (<10%)	Poor ⚠️	`gender`, `is_active`, `status`	Avoid standalone index

Key Principle: Index columns with selectivity > 10% (0.1)

Analyzing Query Execution Plans

Using EXPLAIN

-- Basic EXPLAIN
EXPLAIN SELECT columns FROM table WHERE conditions;

-- Extended EXPLAIN (MySQL 8.0+)
EXPLAIN FORMAT=TREE SELECT ...;

-- With actual execution metrics
EXPLAIN ANALYZE SELECT ...;

Key EXPLAIN Output Columns

Column	Description	Good Values	Bad Values
type	Join/access type	`const`, `eq_ref`, `ref`, `range`	`ALL`, `index`
possible_keys	Indexes considered	Multiple options	`NULL`
key	Index actually used	Any index name	`NULL`
key_len	Bytes of index used	Higher (more columns used)	Low (partial index use)
rows	Estimated rows to examine	Low numbers (< 1000)	High numbers (> 10000)
filtered	% of rows filtered by condition	High % (> 50%)	Low % (< 10%)
Extra	Additional information	`Using index`	`Using filesort`, `Using temporary`

EXPLAIN Access Types (Best → Worst)

Type	Description	Performance	When It Occurs
const	Single row match	⭐⭐⭐⭐⭐ Excellent	`WHERE id = 1` on PRIMARY KEY
eq_ref	One row per join	⭐⭐⭐⭐ Excellent	JOIN on PRIMARY KEY or UNIQUE
ref	Multiple matching rows	⭐⭐⭐ Good	Non-unique index lookup
range	Index range scan	⭐⭐ Moderate	`BETWEEN`, `>`, `<`, `IN`
index	Full index scan	⭐ Poor	Scanning entire index
ALL	Full table scan	⚠️ Very Poor	No suitable index (avoid!)

Warning: ALL type scans every row in the table. For tables with >10,000 rows, this is usually a problem.

Practical Example: Optimizing a Join Query

Sample Schema

-- Create sample tables
CREATE TABLE customers (
    customer_id INT PRIMARY KEY AUTO_INCREMENT,
    email VARCHAR(255) UNIQUE NOT NULL,
    first_name VARCHAR(100),
    last_name VARCHAR(100),
    country VARCHAR(50),
    registration_date DATE,
    status ENUM('active', 'inactive', 'suspended') DEFAULT 'active',
    INDEX idx_country (country),
    INDEX idx_status (status),
    INDEX idx_registration_date (registration_date)
) ENGINE=InnoDB;

CREATE TABLE orders (
    order_id INT PRIMARY KEY AUTO_INCREMENT,
    customer_id INT NOT NULL,
    order_date DATETIME NOT NULL,
    total_amount DECIMAL(10,2),
    payment_status ENUM('pending', 'paid', 'failed', 'refunded'),
    shipping_country VARCHAR(50),
    INDEX idx_customer_id (customer_id),
    INDEX idx_order_date (order_date),
    INDEX idx_payment_status (payment_status),
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
) ENGINE=InnoDB;

CREATE TABLE order_items (
    item_id INT PRIMARY KEY AUTO_INCREMENT,
    order_id INT NOT NULL,
    product_id INT NOT NULL,
    quantity INT,
    unit_price DECIMAL(10,2),
    INDEX idx_order_id (order_id),
    INDEX idx_product_id (product_id),
    FOREIGN KEY (order_id) REFERENCES orders(order_id)
) ENGINE=InnoDB;

CREATE TABLE products (
    product_id INT PRIMARY KEY AUTO_INCREMENT,
    product_name VARCHAR(255),
    category VARCHAR(100),
    price DECIMAL(10,2),
    stock_quantity INT,
    INDEX idx_category (category)
) ENGINE=InnoDB;

Initial Query (Unoptimized)

SELECT
    c.customer_id,
    c.first_name,
    c.last_name,
    c.email,
    o.order_id,
    o.order_date,
    o.total_amount,
    p.product_name,
    oi.quantity,
    oi.unit_price
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
WHERE c.country = 'United States'
  AND c.status = 'active'
  AND o.order_date >= '2024-01-01'
  AND o.payment_status = 'paid'
  AND p.category = 'Electronics'
ORDER BY o.order_date DESC
LIMIT 100;

Step 1: Analyze Current Performance

EXPLAIN SELECT
    c.customer_id,
    c.first_name,
    c.last_name,
    c.email,
    o.order_id,
    o.order_date,
    o.total_amount,
    p.product_name,
    oi.quantity,
    oi.unit_price
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
WHERE c.country = 'United States'
  AND c.status = 'active'
  AND o.order_date >= '2024-01-01'
  AND o.payment_status = 'paid'
  AND p.category = 'Electronics'
ORDER BY o.order_date DESC
LIMIT 100;

Expected Problems:

Multiple WHERE conditions on low-cardinality columns (status, payment_status, category)
Potential full table scans
Inefficient index usage
ORDER BY might cause filesort

Step 2: Check Cardinality Statistics

-- Check cardinality for customers table
SELECT
    'country' as column_name,
    COUNT(DISTINCT country) AS distinct_values,
    COUNT(*) AS total_rows,
    ROUND(COUNT(DISTINCT country) / COUNT(*) * 100, 2) AS selectivity_percent
FROM customers
UNION ALL
SELECT
    'status',
    COUNT(DISTINCT status),
    COUNT(*),
    ROUND(COUNT(DISTINCT status) / COUNT(*) * 100, 2)
FROM customers;

-- Check cardinality for orders table
SELECT
    'payment_status' as column_name,
    COUNT(DISTINCT payment_status) AS distinct_values,
    COUNT(*) AS total_rows,
    ROUND(COUNT(DISTINCT payment_status) / COUNT(*) * 100, 2) AS selectivity_percent
FROM orders
UNION ALL
SELECT
    'order_date',
    COUNT(DISTINCT DATE(order_date)),
    COUNT(*),
    ROUND(COUNT(DISTINCT DATE(order_date)) / COUNT(*) * 100, 2)
FROM orders;

-- Check cardinality for products table
SELECT
    'category' as column_name,
    COUNT(DISTINCT category) AS distinct_values,
    COUNT(*) AS total_rows,
    ROUND(COUNT(DISTINCT category) / COUNT(*) * 100, 2) AS selectivity_percent
FROM products;

Example Results (from actual test with 10K customers, 50K orders):

Table	Column	Distinct Values	Total Rows	Selectivity
customers	country	5	10,000	0.05%
customers	status	3	10,000	0.03%
orders	payment_status	4	50,000	0.01%
orders	order_date	730	50,000	1.46%
products	category	5	15	33.33%

Key Observations:

order_date has the highest selectivity (1.46%), making it the best leading column for composite indexes
All status/category columns have very low selectivity (<0.1%)
For large production databases (millions of rows), these percentages would be even lower, making composite indexes even more critical

Step 3: Create Composite Indexes

Based on cardinality analysis, create composite indexes with high-to-low cardinality order:

-- Composite index for customers table
-- Put higher cardinality column first
DROP INDEX idx_country ON customers;
DROP INDEX idx_status ON customers;

CREATE INDEX idx_country_status_regdate ON customers(country, status, registration_date);

-- Composite index for orders table
-- order_date has higher cardinality than payment_status
DROP INDEX idx_order_date ON orders;
DROP INDEX idx_payment_status ON orders;

CREATE INDEX idx_customer_orderdate_payment ON orders(customer_id, order_date, payment_status);

-- Add covering index for better performance
CREATE INDEX idx_orderdate_payment_customer ON orders(order_date, payment_status, customer_id);

-- Products: category has low cardinality, but still useful with other filters
-- Keep simple index as is, or create composite if other filters exist

Step 4: Verify Improved Query Plan

EXPLAIN SELECT
    c.customer_id,
    c.first_name,
    c.last_name,
    c.email,
    o.order_id,
    o.order_date,
    o.total_amount,
    p.product_name,
    oi.quantity,
    oi.unit_price
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
WHERE c.country = 'United States'
  AND c.status = 'active'
  AND o.order_date >= '2024-01-01'
  AND o.payment_status = 'paid'
  AND p.category = 'Electronics'
ORDER BY o.order_date DESC
LIMIT 100;

Expected Improvements:

type: Should show ref or range instead of ALL
key: Should show composite indexes being used
rows: Should show significantly fewer rows examined
Extra: Should avoid “Using filesort” and “Using temporary”

Actual Test Results (BEFORE optimization):

Table: o (orders)
type: range
key: idx_order_date
key_len: 5 (only order_date used)
rows: 25,121
Extra: Using index condition; Using where

Actual Test Results (AFTER optimization):

Table: o (orders)
type: range
key: idx_orderdate_payment_customer
key_len: 7 (order_date + payment_status used)
rows: 24,450
Extra: Using index condition

✅ Key Improvements:

Index key_len increased from 5 → 7 bytes (both columns now used in index)
payment_status moved from “Using where” (post-filtering) to index condition (index-level filtering)
Slight reduction in estimated rows to scan
More efficient filtering at the index level

Step 5: Alternative Query Optimization

If the above query still has issues, consider rewriting:

-- Option 1: Filter early with subquery
SELECT
    c.customer_id,
    c.first_name,
    c.last_name,
    c.email,
    o.order_id,
    o.order_date,
    o.total_amount,
    p.product_name,
    oi.quantity,
    oi.unit_price
FROM (
    SELECT customer_id, first_name, last_name, email
    FROM customers
    WHERE country = 'United States' AND status = 'active'
) c
JOIN orders o ON c.customer_id = o.customer_id
    AND o.order_date >= '2024-01-01'
    AND o.payment_status = 'paid'
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
    AND p.category = 'Electronics'
ORDER BY o.order_date DESC
LIMIT 100;

-- Option 2: Start from the most selective table
SELECT
    c.customer_id,
    c.first_name,
    c.last_name,
    c.email,
    o.order_id,
    o.order_date,
    o.total_amount,
    p.product_name,
    oi.quantity,
    oi.unit_price
FROM orders o
JOIN customers c ON c.customer_id = o.customer_id
    AND c.country = 'United States'
    AND c.status = 'active'
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
    AND p.category = 'Electronics'
WHERE o.order_date >= '2024-01-01'
  AND o.payment_status = 'paid'
ORDER BY o.order_date DESC
LIMIT 100;

Step 6: Benchmark and Compare

-- Enable profiling
SET profiling = 1;

-- Run original query
SELECT /* original */ ... ;

-- Run optimized query
SELECT /* optimized */ ... ;

-- View results
SHOW PROFILES;

-- Detailed analysis of last query
SHOW PROFILE FOR QUERY 2;

Best Practices

1. Index Design Principles

✅ DO These Things

Practice	Reason
Index foreign key columns	Improves JOIN performance
Create composite indexes high→low cardinality	Optimizer uses most selective column first
Put selective columns first	Filters more rows earlier in execution
Use covering indexes for hot queries	Eliminates table lookups completely
Index columns in WHERE, JOIN, ORDER BY	Direct index usage for these operations
Update statistics regularly	Keeps optimizer decisions accurate

❌ DON’T Do These Things

Practice	Reason
Over-index tables (>15 indexes)	Each index adds write cost and maintenance overhead
Index low cardinality columns alone	Poor selectivity, full scans may be faster
Create redundant indexes	`(a,b)` already covers queries on `(a)`
Index tiny tables (<1000 rows)	Full scan is fast enough
Index columns with many NULLs	High NULL ratios reduce selectivity; prefer composite or specific queries

2. Composite Index Column Order

Golden Rule: High Cardinality → Low Cardinality

-- ✅ GOOD: email (high cardinality) before status (low cardinality)
CREATE INDEX idx_customer_filter ON customers(email, status);

-- ❌ BAD: status (low) before email (high)
CREATE INDEX idx_customer_filter_bad ON customers(status, email);

Leftmost Prefix Rule

A composite index on (a, b, c) can be used for:

Query Pattern	Index Used?
`WHERE a = ?`	✓ Yes
`WHERE a = ? AND b = ?`	✓ Yes
`WHERE a = ? AND b = ? AND c = ?`	✓ Yes (fully)
`WHERE b = ?`	× No (skips leftmost)
`WHERE c = ?`	× No (skips leftmost)
`WHERE b = ? AND c = ?`	× No (skips leftmost)

Remember: The first column in a composite index determines whether it can be used at all.

3. Monitoring Index Usage

-- Find unused indexes
SELECT
    t.TABLE_SCHEMA,
    t.TABLE_NAME,
    s.INDEX_NAME,
    s.CARDINALITY
FROM information_schema.STATISTICS s
LEFT JOIN performance_schema.table_io_waits_summary_by_index_usage i
    ON s.TABLE_SCHEMA = i.OBJECT_SCHEMA
    AND s.TABLE_NAME = i.OBJECT_NAME
    AND s.INDEX_NAME = i.INDEX_NAME
JOIN information_schema.TABLES t
    ON s.TABLE_SCHEMA = t.TABLE_SCHEMA
    AND s.TABLE_NAME = t.TABLE_NAME
WHERE s.INDEX_NAME != 'PRIMARY'
  AND i.INDEX_NAME IS NULL
  AND t.TABLE_SCHEMA NOT IN ('mysql', 'performance_schema', 'information_schema')
GROUP BY t.TABLE_SCHEMA, t.TABLE_NAME, s.INDEX_NAME;

-- Check index statistics
SELECT
    TABLE_NAME,
    INDEX_NAME,
    SEQ_IN_INDEX,
    COLUMN_NAME,
    CARDINALITY,
    SUB_PART,
    INDEX_TYPE
FROM information_schema.STATISTICS
WHERE TABLE_SCHEMA = 'your_database'
  AND TABLE_NAME = 'your_table'
ORDER BY TABLE_NAME, INDEX_NAME, SEQ_IN_INDEX;

4. Query Optimization Checklist

Use this checklist for every slow query investigation:

Step	Check	What to Look For
□	Run `EXPLAIN`	Get execution plan details
□	Check `type` column	Avoid `ALL` (full table scan)
□	Verify `key` column	Should NOT be `NULL`
□	Examine `rows` column	Should be low (< 10% of table)
□	Review `Extra` field	Avoid `Using filesort`, `Using temporary`
□	Check column cardinality	Ensure indexed columns are selective
□	Match composite indexes	Order should match query filters
□	Consider covering indexes	Include all SELECT columns if possible
□	Test with production data	Use realistic row counts
□	Benchmark changes	Measure before/after performance

Common Pitfalls

Avoid these mistakes that prevent MySQL from using indexes effectively.

1. Function on Indexed Column

Problem: Applying functions to indexed columns prevents index usage.

❌ Avoid	✅ Use Instead
`WHERE YEAR(order_date) = 2024`	`WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01'`
`WHERE UPPER(email) = 'USER@EMAIL.COM'`	`WHERE email = 'user@email.com'` (use case-insensitive collation)
`WHERE DATE(created_at) = '2024-01-15'`	`WHERE created_at >= '2024-01-15' AND created_at < '2024-01-16'`

Why: MySQL can’t use the index when you transform the column value.

2. Implicit Type Conversion

Problem: Comparing different data types forces MySQL to convert values, preventing index usage.

Column Type	❌ Wrong Type	✅ Correct Type
`INT`	`WHERE id = '123'`	`WHERE id = 123`
`VARCHAR`	`WHERE code = 123`	`WHERE code = '123'`
`DECIMAL`	`WHERE price = '19.99'`	`WHERE price = 19.99`

Why: Type conversion happens on every row, making indexes useless.

3. OR Conditions on Different Columns

Problem: OR across different columns prevents efficient index usage.

-- ❌ BAD: Can't efficiently use indexes on both columns
WHERE country = 'US' OR status = 'active'

-- ✅ GOOD: Use UNION to let MySQL use both indexes
SELECT * FROM customers WHERE country = 'US'
UNION
SELECT * FROM customers WHERE status = 'active'

Exception: OR on the same column works fine: WHERE status IN ('active', 'pending')

4. LIKE with Leading Wildcard

Problem: Leading wildcards prevent index range scans.

❌ Can’t Use Index	✅ Can Use Index
`WHERE email LIKE '%gmail.com'`	`WHERE email LIKE 'john%'`
`WHERE name LIKE '%smith%'`	`WHERE name LIKE 'Smith%'`

Alternative: For suffix searches, consider Full-Text Index or reverse the string.

5. Index Hints (Use When Optimizer is Wrong)

When to use: Rarely, only when MySQL consistently chooses a poor index.

-- Suggest an index (MySQL may ignore)
SELECT * FROM orders USE INDEX (idx_orderdate_payment_customer)
WHERE order_date >= '2024-01-01' AND payment_status = 'paid';

-- Force an index (MySQL must use it)
SELECT * FROM orders FORCE INDEX (idx_orderdate_payment_customer)
WHERE order_date >= '2024-01-01';

-- Ignore a specific index
SELECT * FROM orders IGNORE INDEX (idx_customer_id)
WHERE order_date >= '2024-01-01';

Warning: Hints are brittle and may become incorrect as data changes. Prefer updating statistics or adding better indexes.

Advanced Techniques

Using EXPLAIN ANALYZE (MySQL 8.0.18+)

EXPLAIN ANALYZE
SELECT ... FROM ... WHERE ...;

This shows:

Actual execution time (not just estimates)
Actual rows returned
Number of loops

Example from our test query:

-> Limit: 100 row(s)  (cost=18252 rows=100) (actual time=4.03..5.04 rows=100 loops=1)
    -> Index range scan on o using idx_orderdate_payment_customer
       (cost=4946 rows=24450) (actual time=3.99..4 rows=94 loops=1)

Key metrics:

cost: Optimizer’s estimated cost (18252)
rows: Estimated rows (24450) vs actual rows (94)
actual time: Real execution time in milliseconds (3.99ms to start, 4ms total)
The optimizer overestimated by ~260x, but index usage kept actual performance fast!

Optimizer Trace

SET optimizer_trace='enabled=on';

SELECT ... FROM ... WHERE ...;

SELECT * FROM information_schema.OPTIMIZER_TRACE\G

SET optimizer_trace='enabled=off';

This reveals MySQL’s decision-making process for query execution.

Advanced Topics

This section covers advanced optimization techniques and production considerations.

1. Index Overhead and Write Performance Impact

The Cost: Each index adds:

Additional write overhead (INSERT/UPDATE/DELETE get slower)
Extra disk space (varies by data type and index width)
More memory usage (buffer pool pressure)

-- Check how much space indexes use
SELECT
    TABLE_NAME,
    ROUND(DATA_LENGTH / 1024 / 1024, 2) AS 'Data (MB)',
    ROUND(INDEX_LENGTH / 1024 / 1024, 2) AS 'Index (MB)',
    ROUND(INDEX_LENGTH / DATA_LENGTH * 100, 2) AS 'Index Overhead %'
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = DATABASE()
ORDER BY INDEX_LENGTH DESC;

Guidelines:

Transactional systems (OLTP): 3-7 indexes per table
Reporting systems (OLAP): 10-15+ indexes acceptable
More than 15 indexes: Reconsider your design

2. Covering Indexes (Index-Only Scans)

What it is: An index that contains ALL columns your query needs, so MySQL never has to look at the actual table.

-- Normal index (must look up customer_id and total_amount from table)
CREATE INDEX idx_orderdate_payment ON orders(order_date, payment_status);

-- Covering index (everything needed is in the index!)
CREATE INDEX idx_covering
ON orders(order_date, payment_status, customer_id, total_amount);

How to verify: EXPLAIN shows Extra: Using index

Trade-offs:

✅ Can be much faster for read-heavy queries (fewer table lookups)
❌ Larger index, slower writes
Use for: Frequently-run queries with few SELECT columns

3. Index Condition Pushdown (ICP)

What it is: MySQL 5.6+ optimization that filters data at the storage engine level (deeper in the database) instead of at the MySQL server level.

Why it matters: Fewer rows need to be passed up to the server = faster queries.

-- Example with composite index (order_date, payment_status)
EXPLAIN
SELECT * FROM orders
WHERE order_date >= '2024-01-01'
  AND payment_status = 'paid';  -- ICP filters this at storage engine level

-- Look for: Extra: Using index condition

Best for: Composite indexes with multiple WHERE conditions on indexed columns.

4. Histograms for Better Query Plans (MySQL 8.0+)

What it is: Tells MySQL about the distribution of your data. By default, MySQL assumes data is evenly spread out, which is often wrong.

Example Problem: If 90% of your orders are from 2024 but MySQL thinks they’re spread evenly across all years, it will choose a poor query plan.

Solution:

-- Create histogram (tells MySQL about data distribution)
ANALYZE TABLE orders UPDATE HISTOGRAM ON order_date WITH 256 BUCKETS;

-- Remove histogram
ANALYZE TABLE orders DROP HISTOGRAM ON order_date;

When to use:

Skewed data (not evenly distributed)
Date columns where recent data dominates
After bulk imports or major data changes

5. Invisible Indexes (MySQL 8.0+)

What it is: Make an index invisible to the optimizer without dropping it. This is like turning off an index temporarily to test if it’s actually needed.

-- Hide the index (MySQL won't use it)
ALTER TABLE orders ALTER INDEX idx_payment_status INVISIBLE;

-- Monitor performance for a few days
-- If no slowdown, it's safe to drop

-- Delete the index
DROP INDEX idx_payment_status ON orders;

-- Or make it visible again
ALTER TABLE orders ALTER INDEX idx_payment_status VISIBLE;

Use for: Testing if an index is actually helping before permanently removing it.

6. Index Statistics and Stale Cardinality

What it is: MySQL keeps statistics about your data to make good decisions. If these statistics get outdated, queries slow down.

-- Check how old your statistics are
SELECT
    TABLE_NAME,
    UPDATE_TIME,
    TIMESTAMPDIFF(DAY, UPDATE_TIME, NOW()) AS days_old
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = DATABASE()
ORDER BY UPDATE_TIME;

-- Update statistics
ANALYZE TABLE orders;

-- Enable automatic updates when data changes
ALTER TABLE orders STATS_AUTO_RECALC=1;

Update statistics after:

Bulk data loads or deletes
When queries suddenly get slower
If cardinality looks wrong in SHOW INDEX

7. Index Fragmentation and Maintenance

What it is: Like a filing cabinet that gets messy over time with many updates/deletes, indexes get fragmented (wasted space between data).

-- Rebuild all indexes (use during maintenance window)
OPTIMIZE TABLE orders;

-- Or rebuild specific index without locking table
ALTER TABLE orders
  DROP INDEX idx_name,
  ADD INDEX idx_name(columns),
  ALGORITHM=INPLACE, LOCK=NONE;

Rebuild when:

Queries get slower over time
After deleting many rows (>30% of table)
During scheduled maintenance

8. Online DDL and Index Creation

What it is: Creating indexes without locking the table (users can still read/write during creation).

-- Production-safe way to create indexes (no blocking)
CREATE INDEX idx_new ON orders(column_name)
ALGORITHM=INPLACE, LOCK=NONE;

-- Check progress (MySQL 8.0+)
SELECT
    WORK_COMPLETED,
    WORK_ESTIMATED,
    ROUND(WORK_COMPLETED / WORK_ESTIMATED * 100, 2) AS 'Progress %'
FROM performance_schema.events_stages_current;

Important: Use ALGORITHM=INPLACE, LOCK=NONE when supported. If not supported, plan a controlled maintenance window or use online schema change tooling.

9. Descending Indexes (MySQL 8.0+)

What it is: Indexes can be created in descending order to match your query’s ORDER BY direction.

-- Query: ORDER BY order_date DESC, customer_id ASC
-- Create index matching the sort order
CREATE INDEX idx_order_desc_customer_asc
ON orders(order_date DESC, customer_id ASC);

Use for: Queries with mixed sort directions (some DESC, some ASC) or pagination showing newest items first.

10. Index Merge Optimization

What it is: MySQL can use multiple indexes for one query (usually with OR conditions).

-- Query with OR - MySQL may use both indexes
SELECT * FROM orders
WHERE order_date >= '2024-01-01'
   OR customer_id = 12345;

-- EXPLAIN shows: type: index_merge
-- Extra: Using union(idx_order_date,idx_customer_id)

Note: Usually helpful, but if the same OR query runs frequently, create a single composite index instead.

Production Operations

1. Slow Query Log Analysis

Enable and configure:

-- Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow-query.log';
SET GLOBAL long_query_time = 1;  -- Log queries > 1 second
SET GLOBAL log_queries_not_using_indexes = 'ON';

-- Check current settings
SHOW VARIABLES LIKE 'slow_query%';
SHOW VARIABLES LIKE 'long_query_time';

Analyze with pt-query-digest (Percona Toolkit):

pt-query-digest /var/log/mysql/slow-query.log \
  --limit 95%:20 \
  --output json > slow-query-analysis.json

# Or for human-readable output
pt-query-digest /var/log/mysql/slow-query.log \
  --limit 95%:20 \
  --order-by Query_time:sum

Key metrics to monitor:

Query_time: Total execution time
Lock_time: Time waiting for locks
Rows_examined vs Rows_sent: Efficiency ratio
QPS (Queries Per Second): Load pattern

2. Performance Schema Monitoring

Track index usage in production:

-- Enable Performance Schema (if not already)
-- In my.cnf: performance_schema = ON

-- Find most used indexes
SELECT
    OBJECT_SCHEMA,
    OBJECT_NAME,
    INDEX_NAME,
    COUNT_STAR AS 'Uses',
    COUNT_READ,
    COUNT_WRITE,
    COUNT_FETCH,
    COUNT_INSERT,
    COUNT_UPDATE,
    COUNT_DELETE
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE OBJECT_SCHEMA = DATABASE()
  AND INDEX_NAME IS NOT NULL
ORDER BY COUNT_STAR DESC;

-- Find unused indexes (candidates for removal)
SELECT
    t.TABLE_SCHEMA,
    t.TABLE_NAME,
    s.INDEX_NAME,
    s.CARDINALITY
FROM information_schema.STATISTICS s
JOIN information_schema.TABLES t USING (TABLE_SCHEMA, TABLE_NAME)
LEFT JOIN performance_schema.table_io_waits_summary_by_index_usage i
    ON s.TABLE_SCHEMA = i.OBJECT_SCHEMA
    AND s.TABLE_NAME = i.OBJECT_NAME
    AND s.INDEX_NAME = i.INDEX_NAME
WHERE s.TABLE_SCHEMA = DATABASE()
  AND s.INDEX_NAME != 'PRIMARY'
  AND (i.INDEX_NAME IS NULL OR i.COUNT_STAR = 0)
ORDER BY s.TABLE_NAME, s.INDEX_NAME;

-- Table scan operations (may need indexes)
SELECT
    OBJECT_SCHEMA,
    OBJECT_NAME,
    COUNT_READ,
    COUNT_WRITE,
    COUNT_FETCH,
    SUM_TIMER_WAIT / 1000000000000 AS 'Total Wait (sec)'
FROM performance_schema.table_io_waits_summary_by_table
WHERE OBJECT_SCHEMA = DATABASE()
ORDER BY SUM_TIMER_WAIT DESC;

-- Reset statistics for fresh analysis
TRUNCATE TABLE performance_schema.table_io_waits_summary_by_index_usage;

3. Buffer Pool and Index Cache

InnoDB Buffer Pool tuning:

-- Check buffer pool hit ratio (should be > 99%)
SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_read%';

-- Calculate hit ratio
SELECT
    CONCAT(
        ROUND(
            (1 - (Innodb_buffer_pool_reads / Innodb_buffer_pool_read_requests)) * 100,
            2
        ),
        '%'
    ) AS 'Buffer Pool Hit Ratio'
FROM (
    SELECT
        VARIABLE_VALUE AS Innodb_buffer_pool_reads
    FROM performance_schema.global_status
    WHERE VARIABLE_NAME = 'Innodb_buffer_pool_reads'
) AS reads,
(
    SELECT
        VARIABLE_VALUE AS Innodb_buffer_pool_read_requests
    FROM performance_schema.global_status
    WHERE VARIABLE_NAME = 'Innodb_buffer_pool_read_requests'
) AS requests;

-- Check buffer pool usage
SELECT
    CONCAT(
        ROUND(
            (SELECT SUM(DATA_LENGTH + INDEX_LENGTH)
             FROM information_schema.TABLES
             WHERE ENGINE = 'InnoDB') / 1024 / 1024 / 1024,
            2
        ),
        ' GB'
    ) AS 'Total InnoDB Data',
    @@innodb_buffer_pool_size / 1024 / 1024 / 1024 AS 'Buffer Pool (GB)';

-- View what's in the buffer pool
SELECT
    TABLE_NAME,
    INDEX_NAME,
    COUNT(*) AS 'Pages in Buffer',
    ROUND(SUM(IF(IS_OLD = 'YES', 1, 0)) / COUNT(*) * 100, 2) AS 'Old Pages %'
FROM information_schema.INNODB_BUFFER_PAGE
WHERE TABLE_NAME IS NOT NULL
GROUP BY TABLE_NAME, INDEX_NAME
ORDER BY COUNT(*) DESC
LIMIT 20;

Optimal buffer pool size:

Rule of thumb: 70-80% of available RAM
Monitor swap usage (should be near zero)
Consider multiple buffer pool instances for high concurrency

4. Query Cache Considerations (Deprecated in MySQL 8.0)

For MySQL 5.7 and earlier:

-- Note: Query cache removed in MySQL 8.0
-- Use Redis/Memcached or application-level caching instead

-- Check query cache status (MySQL 5.7)
SHOW VARIABLES LIKE 'query_cache%';
SHOW STATUS LIKE 'Qcache%';

5. Index Creation Checklist for Production

Before creating an index:

Run EXPLAIN on target queries
Check current index count (avoid over-indexing)
Estimate index size: (avg_row_length * cardinality) * 1.2
Verify sufficient disk space (index + temp space)
Check replication lag tolerance
Schedule during maintenance window if table is large
Use ALGORITHM=INPLACE, LOCK=NONE when supported
Monitor buffer pool hit ratio during creation
Plan rollback strategy

After creating an index:

Run ANALYZE TABLE
Verify with EXPLAIN that index is used
Benchmark query performance
Monitor write performance impact
Check replication lag
Update documentation
Set alert if index is not used after 30 days

6. Disaster Recovery Considerations

Backup index definitions:

# Export table structures (includes indexes)
mysqldump -u root -p --no-data --databases your_db > schema_backup.sql

# Export specific table indexes
mysql -u root -p -e "SHOW CREATE TABLE your_table\G" your_db > table_indexes.sql

Document critical indexes:

-- Generate index documentation
SELECT
    TABLE_NAME,
    INDEX_NAME,
    GROUP_CONCAT(COLUMN_NAME ORDER BY SEQ_IN_INDEX) AS Columns,
    INDEX_TYPE,
    CONCAT(
        'CREATE INDEX ', INDEX_NAME,
        ' ON ', TABLE_NAME,
        ' (', GROUP_CONCAT(COLUMN_NAME ORDER BY SEQ_IN_INDEX), ')'
    ) AS DDL
FROM information_schema.STATISTICS
WHERE TABLE_SCHEMA = DATABASE()
  AND INDEX_NAME != 'PRIMARY'
GROUP BY TABLE_NAME, INDEX_NAME;

11. Prefix Indexes for Large String Columns

What it is: Instead of indexing an entire long text column, index only the first N characters.

Problem: Indexing long VARCHAR/TEXT columns is expensive.

Solution: Index only the first N characters.

-- Full index (large and slow)
CREATE INDEX idx_email_full ON customers(email);

-- Prefix index (smaller and faster)
CREATE INDEX idx_email_prefix ON customers(email(20));

-- Find optimal prefix length
SELECT
    COUNT(DISTINCT LEFT(email, 5)) AS len_5,
    COUNT(DISTINCT LEFT(email, 10)) AS len_10,
    COUNT(DISTINCT LEFT(email, 15)) AS len_15,
    COUNT(DISTINCT LEFT(email, 20)) AS len_20,
    COUNT(DISTINCT email) AS full_length
FROM customers;

-- Use prefix that achieves ~95% of full cardinality

Limitations:

Cannot be used for ORDER BY or GROUP BY
Less effective for range queries
Prefix must be chosen carefully

12. Functional Indexes (MySQL 8.0.13+)

What it is: Create an index on a calculated expression or function result, not just a column value.

-- Index a function result
CREATE INDEX idx_year_month ON orders((YEAR(order_date)), (MONTH(order_date)));

-- Now this query uses the index
SELECT * FROM orders
WHERE YEAR(order_date) = 2024 AND MONTH(order_date) = 1;

-- Index JSON fields
ALTER TABLE products ADD COLUMN attrs JSON;
CREATE INDEX idx_json_price ON products((CAST(attrs->>'$.price' AS DECIMAL(10,2))));

-- Verify usage
EXPLAIN SELECT * FROM products
WHERE CAST(attrs->>'$.price' AS DECIMAL(10,2)) > 100;

13. Skip Scan Optimization (MySQL 8.0.13+)

What it is: MySQL 8.0.13+ can use a composite index even when you skip the first column in your WHERE clause.

-- Index: (country, city, status)
CREATE INDEX idx_location_status ON customers(country, city, status);

-- Old behavior: Cannot use index (skips first column)
SELECT * FROM customers WHERE city = 'New York' AND status = 'active';

-- MySQL 8.0.13+: Skip scan can use the index!
-- Set optimizer_switch='skip_scan=on'; (default in 8.0.13+)

-- Verify with EXPLAIN
EXPLAIN SELECT * FROM customers WHERE city = 'New York' AND status = 'active';
-- Look for: Extra: Using index for skip scan

When skip scan helps:

First column has low cardinality
Subsequent columns are selective
Query filters on later columns in composite index

14. Multi-Range Read (MRR)

What it is: MySQL sorts row IDs before reading data from disk, reducing random I/O and making disk access more efficient.

-- Enable MRR
SET optimizer_switch='mrr=on,mrr_cost_based=on';

-- MRR is beneficial for range queries
SELECT * FROM orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31'
ORDER BY order_date;

-- EXPLAIN shows: Extra: Using MRR

-- Benefits:
-- - Reduces random I/O
-- - Better disk access patterns
-- - More efficient buffer pool usage

15. Index Hints and Optimizer Directives

What it is: Override MySQL’s optimizer decisions when it chooses the wrong index or query plan.

-- Force specific index
SELECT * FROM orders FORCE INDEX (idx_orderdate_payment_customer)
WHERE order_date >= '2024-01-01';

-- Ignore specific index
SELECT * FROM orders IGNORE INDEX (idx_customer_id)
WHERE customer_id > 1000;

-- Suggest multiple indexes (MySQL chooses best)
SELECT * FROM orders USE INDEX (idx_orderdate_payment_customer, idx_customer_id)
WHERE order_date >= '2024-01-01' AND customer_id > 1000;

-- Modern approach: Optimizer hints (MySQL 8.0+)
SELECT /*+ INDEX(orders idx_orderdate_payment_customer) */
    *
FROM orders
WHERE order_date >= '2024-01-01';

-- Join order hint
SELECT /*+ JOIN_ORDER(o, c, oi) */
    *
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN order_items oi ON o.order_id = oi.order_id;

16. Partition Pruning for Massive Tables

What it is: Split huge tables into smaller physical pieces (partitions), so queries only scan relevant partitions instead of the entire table.

-- Partition by date range
CREATE TABLE orders_partitioned (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    total_amount DECIMAL(10,2)
)
PARTITION BY RANGE (YEAR(order_date)) (
    PARTITION p_2022 VALUES LESS THAN (2023),
    PARTITION p_2023 VALUES LESS THAN (2024),
    PARTITION p_2024 VALUES LESS THAN (2025),
    PARTITION p_future VALUES LESS THAN MAXVALUE
);

-- Query automatically prunes partitions
EXPLAIN PARTITIONS
SELECT * FROM orders_partitioned
WHERE order_date >= '2024-01-01';

-- Shows: partitions: p_2024,p_future (others pruned!)

-- View partition information
SELECT
    PARTITION_NAME,
    TABLE_ROWS,
    AVG_ROW_LENGTH,
    DATA_LENGTH / 1024 / 1024 AS 'Data (MB)'
FROM information_schema.PARTITIONS
WHERE TABLE_NAME = 'orders_partitioned';

Partitioning benefits:

Queries scan only relevant partitions
Faster bulk deletes (DROP PARTITION)
Better maintenance (OPTIMIZE one partition at a time)
Improved concurrency (partition-level locking)

17. Cost Model Tuning (Use with Caution)

What it is: Adjust MySQL’s internal cost calculations for how “expensive” different operations are. This is advanced and risky - only change if you know what you’re doing!

-- View current cost configuration
SELECT * FROM mysql.server_cost;
SELECT * FROM mysql.engine_cost;

-- Adjust costs (use with extreme caution!)
-- Example: Make disk seeks appear more expensive
UPDATE mysql.server_cost
SET cost_value = 2.0
WHERE cost_name = 'disk_temptable_row_cost';

-- Flush to apply changes
FLUSH OPTIMIZER_COSTS;

-- Reset to defaults
UPDATE mysql.server_cost SET cost_value = DEFAULT;
FLUSH OPTIMIZER_COSTS;

Warning: Only tune after extensive testing. Incorrect values can severely degrade performance.

18. Index Statistics Sampling

What it is: Control how many data pages MySQL samples when calculating index statistics. More samples = more accurate but slower; fewer samples = less accurate but faster.

-- Global setting
SET GLOBAL innodb_stats_persistent_sample_pages = 50;  -- Default: 20

-- Per-table setting
ALTER TABLE orders STATS_SAMPLE_PAGES = 100;

-- More samples = better accuracy but slower ANALYZE
-- Fewer samples = faster ANALYZE but less accurate

-- For large tables with skewed data, increase sampling

Real-World Scenarios

Scenario 1: Dashboard Query Timing Out (30s → 50ms)

Problem:

-- Dashboard query takes 30+ seconds
SELECT
    DATE(order_date) AS day,
    COUNT(*) AS orders,
    SUM(total_amount) AS revenue
FROM orders
WHERE order_date >= DATE_SUB(NOW(), INTERVAL 30 DAY)
GROUP BY DATE(order_date)
ORDER BY day DESC;

Analysis:

Function on indexed column: DATE(order_date)
Prevents index usage
Full table scan

Solution:

-- 1. Create functional index (MySQL 8.0.13+)
CREATE INDEX idx_order_date_func ON orders((DATE(order_date)));

-- 2. Or rewrite query to avoid function
SELECT
    DATE(order_date) AS day,
    COUNT(*) AS orders,
    SUM(total_amount) AS revenue
FROM orders
WHERE order_date >= DATE_SUB(CURDATE(), INTERVAL 30 DAY)
  AND order_date < DATE_ADD(CURDATE(), INTERVAL 1 DAY)
GROUP BY DATE(order_date)
ORDER BY day DESC;

-- 3. For recurring queries, use materialized view/summary table
CREATE TABLE daily_order_summary (
    day DATE PRIMARY KEY,
    order_count INT,
    total_revenue DECIMAL(12,2),
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);

-- Update via scheduled job or trigger
INSERT INTO daily_order_summary (day, order_count, total_revenue)
SELECT DATE(order_date), COUNT(*), SUM(total_amount)
FROM orders
WHERE DATE(order_date) = CURDATE()
GROUP BY DATE(order_date)
ON DUPLICATE KEY UPDATE
    order_count = VALUES(order_count),
    total_revenue = VALUES(total_revenue);

Result: 30s → 50ms

Scenario 2: High Write Latency After Adding Indexes

Problem:

INSERT/UPDATE operations suddenly slow after adding 5 new indexes
Write latency increased from 5ms to 50ms

Analysis:

-- Check index count
SELECT
    TABLE_NAME,
    COUNT(*) AS index_count,
    SUM(CARDINALITY) AS total_cardinality
FROM information_schema.STATISTICS
WHERE TABLE_SCHEMA = DATABASE()
  AND TABLE_NAME = 'orders'
GROUP BY TABLE_NAME;

-- Result: 12 indexes (too many!)

Solution:

-- 1. Identify unused indexes
SELECT
    s.INDEX_NAME,
    s.CARDINALITY,
    IFNULL(i.COUNT_STAR, 0) AS uses
FROM information_schema.STATISTICS s
LEFT JOIN performance_schema.table_io_waits_summary_by_index_usage i
    ON s.TABLE_SCHEMA = i.OBJECT_SCHEMA
    AND s.TABLE_NAME = i.OBJECT_NAME
    AND s.INDEX_NAME = i.INDEX_NAME
WHERE s.TABLE_SCHEMA = DATABASE()
  AND s.TABLE_NAME = 'orders'
  AND s.INDEX_NAME != 'PRIMARY'
GROUP BY s.INDEX_NAME
HAVING uses = 0;

-- 2. Make candidates invisible first (test impact)
ALTER TABLE orders ALTER INDEX idx_rarely_used INVISIBLE;

-- 3. Monitor for 7 days, then drop if no issues
DROP INDEX idx_rarely_used ON orders;

-- 4. Consolidate overlapping indexes
-- Instead of: idx_customer(customer_id) and idx_customer_date(customer_id, order_date)
-- Keep only: idx_customer_date (covers both use cases via leftmost prefix)

Result: Write latency back to 8ms (acceptable trade-off)

Scenario 3: Queries Getting Slower Over Time

Problem:

Queries that were fast 6 months ago now take 5x longer
No schema changes, data volume increased 3x

Root Cause: Stale statistics

Solution:

-- 1. Check statistics age
SELECT
    TABLE_NAME,
    UPDATE_TIME,
    TABLE_ROWS,
    AVG_ROW_LENGTH
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = DATABASE()
ORDER BY UPDATE_TIME;

-- 2. Update statistics
ANALYZE TABLE orders, customers, order_items;

-- 3. Enable automatic statistics
ALTER TABLE orders STATS_AUTO_RECALC = 1;
ALTER TABLE orders STATS_SAMPLE_PAGES = 50;

-- 4. Create histograms for skewed columns
ANALYZE TABLE orders UPDATE HISTOGRAM ON order_date, payment_status WITH 256 BUCKETS;

-- 5. Schedule regular statistics updates
-- Add to cron: mysqlanalyze --auto-repair --optimize --all-databases

Result: Query time back to baseline, automated maintenance prevents regression

Scenario 4: Pagination Query Too Slow

Problem:

-- Page 1000 takes 10+ seconds
SELECT * FROM orders
ORDER BY order_date DESC
LIMIT 10000, 10;

Why it’s slow: MySQL must scan 10,010 rows to skip 10,000

Solution:

-- 1. Keyset pagination (best for large offsets)
SELECT * FROM orders
WHERE order_date < '2024-01-15 10:30:00'  -- Last value from previous page
ORDER BY order_date DESC
LIMIT 10;

-- 2. Deferred join (covering index + late row lookup)
SELECT o.*
FROM orders o
INNER JOIN (
    SELECT order_id
    FROM orders
    ORDER BY order_date DESC
    LIMIT 10000, 10
) AS page USING (order_id);

-- 3. Add limit on date range
SELECT * FROM orders
WHERE order_date >= '2024-01-01'  -- Reasonable range
ORDER BY order_date DESC
LIMIT 10000, 10;

Result: 10s → 100ms

Operational Checklist

These checklists help maintain optimal database performance over time.

Daily Tasks

Review slow query log (queries > 1 second)
Check replication lag
Monitor buffer pool hit ratio (should be > 99%)
Review error log for index-related warnings
Check Performance Schema for table scans

Weekly Tasks

Analyze top 20 slowest queries
Review unused indexes (make invisible if unused 30+ days)
Check index fragmentation on heavily-modified tables
Monitor write latency trends
Review query plan changes (optimizer regression)

Monthly Maintenance

Run ANALYZE TABLE on large/changing tables
Update histograms on skewed columns
Review index size growth (index:data ratio)
Evaluate new indexes from development
Check for redundant indexes
Document index strategy changes

Quarterly Review

Capacity planning: Index growth projections
Review buffer pool size (adjust if needed)
Evaluate partition strategy for largest tables
Benchmark critical queries against historical baseline
Update runbooks with new optimization patterns
Review and update monitoring alerts

Before Production Deployment

EXPLAIN all modified queries
Verify index usage with EXPLAIN ANALYZE
Test with production-scale data
Measure write performance impact
Use ALGORITHM=INPLACE, LOCK=NONE for DDL (when supported)
Plan rollback strategy
Update documentation

Emergency Response: Query Performance Degradation

Immediate: Check currently running queries

SELECT * FROM sys.session WHERE command = 'Query' AND time > 10;

Quick wins: Update statistics
```
ANALYZE TABLE affected_tables;
```

Force good plan: Use index hints temporarily

SELECT /*+ INDEX(table idx_name) */ ...

Root cause: Compare EXPLAIN output with baseline
- Check cardinality drift
- Look for missing statistics
- Review recent schema changes
Long-term fix: Create missing indexes, update histograms

Common Index Design Patterns

Pattern 1: Time-Series Data (Events, Logs)

-- For queries like: "Show me recent active events for user X"
CREATE INDEX idx_timeseries ON events(
    created_at DESC,     -- Recent first (DESC)
    event_type,          -- Filter column
    user_id              -- Join/filter column
);

Pattern 2: Multi-Tenant SaaS Applications

-- For queries filtering by tenant (customer isolation)
CREATE INDEX idx_tenant_composite ON records(
    tenant_id,           -- ALWAYS first column for data isolation
    created_at DESC,     -- Time ordering
    status               -- Status filter
);

-- Cover common filter combinations
CREATE INDEX idx_search_filters ON products(
    category_id,         -- Primary filter
    price,               -- Range filter
    stock_quantity,      -- Availability filter
    brand_id             -- Secondary filter
);

Pattern 4: Reporting/Dashboard Queries

-- Covering index for fast aggregations (SUM, COUNT, AVG)
CREATE INDEX idx_reporting_cover ON orders(
    order_date,          -- Time range
    status,              -- Filter
    total_amount,        -- What you're aggregating
    customer_id          -- GROUP BY column
);

Pattern 5: Soft Deletes (Logical Deletion)

-- Efficiently query only active (non-deleted) records
CREATE INDEX idx_active_records ON users(
    deleted_at,          -- NULL = active
    status,
    created_at
);
-- Note: MySQL doesn't support partial indexes, but this pattern still helps

Conclusion

Effective MySQL query optimization is built on these foundations.

Core Principles

#	Principle	Action
1	Measure first	Use `EXPLAIN` and slow query log before optimizing
2	High cardinality first	Order composite indexes by uniqueness (most unique → least unique)
3	Balance reads vs writes	Indexes can speed reads but add write overhead; measure impact
4	Keep statistics current	Run `ANALYZE TABLE` after significant data changes
5	Monitor in production	Use Performance Schema to track actual index usage

Quick Reference Guide

Category	Recommendation
Good indexes	Selectivity >10%, used in WHERE/JOIN/ORDER BY
Bad indexes	Cardinality <10% alone, rarely used, redundant
Production DDL	Use `ALGORITHM=INPLACE, LOCK=NONE` when supported
Buffer pool	Target >99% hit ratio
Index count	OLTP: 3-7 per table, OLAP: 10-15 per table
Statistics	Update weekly or after 10%+ data change

Final Thought

“Premature optimization is the root of all evil.” — Donald Knuth

Always profile with real data, test thoroughly, and monitor after deployment. The best index strategy:

Serves actual query patterns (not theoretical ones)
Has measurable impact on user experience
Balances read performance with write costs
Is maintainable long-term

Additional Resources

Official Documentation

Resource	URL
MySQL Optimization Guide	https://dev.mysql.com/doc/refman/8.0/en/optimization.html
Understanding EXPLAIN	https://dev.mysql.com/doc/refman/8.0/en/explain-output.html
Index Statistics	https://dev.mysql.com/doc/refman/8.0/en/index-statistics.html
Online DDL Operations	https://dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl.html
Optimizer Hints	https://dev.mysql.com/doc/refman/8.0/en/optimizer-hints.html

Essential Tools

Tool	Purpose
pt-query-digest	Analyze slow query logs (Percona Toolkit)
pt-online-schema-change	Zero-downtime schema changes for large tables
MySQLTuner	Automated configuration recommendations
Percona Monitoring (PMM)	Comprehensive database monitoring
sys schema	Built-in performance analysis views (MySQL 8.0+)

Recommended Books

“High Performance MySQL” by Baron Schwartz et al. (O’Reilly) The definitive guide to MySQL optimization
“MySQL 8 Query Performance Tuning” by Jesper Wisborg Krogh (Apress) Detailed query optimization techniques for MySQL 8.0+
“Effective MySQL: Optimizing SQL Statements” by Ronald Bradford Practical SQL optimization patterns

Community Resources

Resource	URL
MySQL Server Blog	https://mysqlserverteam.com/
Planet MySQL	https://planet.mysql.com/
Percona Blog	https://www.percona.com/blog/

Document Information


Version	2.0 (Accessible Edition)
Last Updated	January 2026
MySQL Version	8.0+ (tested on 8.0.41)
Target Audience	Database Administrators, Developers, DevOps Engineers, Performance Engineers
License	Educational Use

This guide is based on production experience and real-world testing. All examples have been validated against MySQL 8.0.41 with actual data (10K customers, 50K orders, 100K order items).

nipunap

MySQL Query Optimization Guide

Cardinality and Index Usage

Table of Contents

Introduction

Key Concepts & Glossary

Core Terms

EXPLAIN Output Terms

EXPLAIN Extra Field Values

Performance & System Terms

Understanding Cardinality

What is Cardinality?

Why Cardinality Matters

Checking Cardinality

Understanding Selectivity

Index Fundamentals

Types of Indexes

Index Effectiveness by Cardinality

Analyzing Query Execution Plans

Using EXPLAIN

Key EXPLAIN Output Columns

EXPLAIN Access Types (Best → Worst)

Practical Example: Optimizing a Join Query

Sample Schema

Initial Query (Unoptimized)

Step 1: Analyze Current Performance

Step 2: Check Cardinality Statistics

Step 3: Create Composite Indexes

Step 4: Verify Improved Query Plan

Step 5: Alternative Query Optimization

Step 6: Benchmark and Compare

Best Practices

1. Index Design Principles

✅ DO These Things

❌ DON’T Do These Things

2. Composite Index Column Order

Leftmost Prefix Rule

3. Monitoring Index Usage

4. Query Optimization Checklist

Common Pitfalls

1. Function on Indexed Column

2. Implicit Type Conversion

3. OR Conditions on Different Columns

4. LIKE with Leading Wildcard

5. Index Hints (Use When Optimizer is Wrong)

Advanced Techniques

Using EXPLAIN ANALYZE (MySQL 8.0.18+)

Optimizer Trace

Advanced Topics

1. Index Overhead and Write Performance Impact

2. Covering Indexes (Index-Only Scans)

3. Index Condition Pushdown (ICP)

4. Histograms for Better Query Plans (MySQL 8.0+)

5. Invisible Indexes (MySQL 8.0+)

6. Index Statistics and Stale Cardinality

7. Index Fragmentation and Maintenance

8. Online DDL and Index Creation

9. Descending Indexes (MySQL 8.0+)

10. Index Merge Optimization

Production Operations

1. Slow Query Log Analysis

2. Performance Schema Monitoring

3. Buffer Pool and Index Cache

4. Query Cache Considerations (Deprecated in MySQL 8.0)

5. Index Creation Checklist for Production

6. Disaster Recovery Considerations

11. Prefix Indexes for Large String Columns

12. Functional Indexes (MySQL 8.0.13+)

13. Skip Scan Optimization (MySQL 8.0.13+)

14. Multi-Range Read (MRR)

15. Index Hints and Optimizer Directives

16. Partition Pruning for Massive Tables

17. Cost Model Tuning (Use with Caution)

18. Index Statistics Sampling

Real-World Scenarios

Scenario 1: Dashboard Query Timing Out (30s → 50ms)

Scenario 2: High Write Latency After Adding Indexes

Scenario 3: Queries Getting Slower Over Time

Scenario 4: Pagination Query Too Slow

Operational Checklist