CONCEPT
Elasticsearch & OpenSearch: Comprehensive Concepts Guide
Purpose: Deep technical reference for understanding Elasticsearch and OpenSearch architectures, best practices, and operational patterns.
Table of Contents
- Introduction
- Core Architecture
- Elasticsearch vs OpenSearch
- Cluster Design
- Indexing & Data Organization
- Querying & Search
- Performance Optimization
- Cluster Management
- High Availability & Disaster Recovery
- Security & Compliance
- Monitoring & Observability
- Advanced Features
- Common Use Cases
- Troubleshooting
- Enterprise Patterns
1. Introduction
What is Elasticsearch?
Elasticsearch is a distributed, open-source search and analytics engine built on top of Apache Lucene. It provides:
- Full-text search: Powerful, fast search across large datasets
- Real-time analytics: Process and analyze data as it arrives
- Scalability: Horizontal scaling to petabytes of data
- Availability: High availability through distribution and replication
- Schema flexibility: Dynamic mapping of document structures
What is OpenSearch?
OpenSearch is a community-driven fork of Elasticsearch (created after Elastic's license change in 2021). Key characteristics:
- Open-source: Fully open-source under SSPL/Polyform (no proprietary restrictions)
- Compatible: Mostly compatible with Elasticsearch APIs
- AWS-managed: Native integration with AWS OpenSearch Service
- Community-driven: Development driven by community contributions
Key Differences at a Glance
| Aspect | Elasticsearch | OpenSearch |
|---|---|---|
| License | Proprietary (8.0+) | SSPL/Polyform (100% open) |
| Cost | Enterprise licensing required | Free to use |
| Support | Elastic (commercial) | AWS or community |
| API Compatibility | - | ~95% compatible with ES 7.10 |
| Best for | Enterprise (integrated tooling) | Cost-conscious or AWS users |
2. Core Architecture
2.1 Node Types
Elasticsearch/OpenSearch clusters consist of specialized node types:
Master-Eligible Nodes
- Manage cluster state and coordination
- Handle node membership, shard allocation
- Minimum 3 for production (quorum-based)
- Low compute/memory requirements
- Example config:
node: roles: [master]
Data Nodes
- Store actual index data
- Execute search and aggregation queries
- High compute and memory requirements
- Horizontally scalable
- Example config:
node: roles: [data]
Ingest Nodes
- Pre-process documents before indexing
- Execute ingest pipelines
- Optional (can be enabled on data nodes)
- Example config:
node: roles: [ingest]
Coordinating Nodes
- Route requests to appropriate nodes
- No shard allocation
- Used for load balancing large requests
- Example config:
node: roles: [] # No roles = coordinating only
2.2 Data Organization
Cluster: Collection of nodes working together
Cluster "production"
├── Node 1 (master, data)
├── Node 2 (data)
├── Node 3 (data)
└── Node 4 (ingest)
Index: Collection of documents with common characteristics
Index "logs-2024.01"
├── Shard 0 (Primary)
│ └── Replica 0
├── Shard 1 (Primary)
│ └── Replica 0
└── Shard 2 (Primary)
└── Replica 0
Shard: Unit of data distribution and parallelization
- Primary shards: Original data
- Replica shards: Copies for high availability
- Example: 3 primary shards × 2 replicas = 6 total shards per index
Document: JSON object being indexed
{
"_id": "1",
"_index": "logs-2024.01",
"_type": "_doc",
"@timestamp": "2024-01-31T10:30:00Z",
"message": "Application error occurred",
"level": "ERROR",
"service": "api-server"
}
2.3 Indexing Pipeline
Document → Ingest Pipeline → Analyzer → Lucene Index → Shards
↓ ↓ ↓ ↓ ↓
Enrichment Processors Tokenization Indexing Distribution
(optional) (stemming) (scoring) (replication)
3. Elasticsearch vs OpenSearch
3.1 Feature Comparison
| Feature | Elasticsearch 8.x | OpenSearch 2.x |
|---|---|---|
| Full-text search | ✅ | ✅ |
| Analytics | ✅ (Kibana) | ✅ (OpenSearch Dashboards) |
| Alerting | ✅ (Commercial) | ✅ (Built-in) |
| Machine Learning | ✅ (Commercial) | ✅ (Plugins) |
| Security (SAML, LDAP) | ✅ (Commercial) | ✅ (Built-in) |
| Custom plugins | Limited | More flexibility |
| AWS integration | Basic | Native |
3.2 Migration Path
Elasticsearch → OpenSearch:
ES 7.10 (stable) → OpenSearch 1.x (compatible) → OpenSearch 2.x (enhanced)
Compatibility levels:
- ES 7.10 → OpenSearch 1.x: ~99% compatible
- OpenSearch 1.x → OpenSearch 2.x: Full compatibility
- OpenSearch 2.x → ES 8.x: ~80% compatible (API differences)
4. Cluster Design
4.1 Cluster Topology Patterns
Small Cluster (Development/Testing - 3 nodes)
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Master │ │ Master │ │ Master │
│ Data │ │ Data │ │ Data │
│ Ingest │ │ Ingest │ │ Ingest │
└─────────┘ └─────────┘ └─────────┘
Medium Cluster (Production - 9+ nodes)
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Master │ │ Master │ │ Master │ │ Ingest │ │ Ingest │
│ Voting │ │ Voting │ │ Voting │ │ Pipeline │ │ Pipeline │
└──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘
↓ ↓ ↓ ↓ ↓
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Data │ │ Data │ │ Data │ │ Coordinate │ Coordinate │
│ Warm │ │ Warm │ │ Warm │ │ Node │ │ Node │
└──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘
Large Enterprise Cluster (Multiple zones)
Zone 1 Zone 2 Zone 3
┌────────┐ ┌────────┐ ┌────────┐
│Master │ │Master │ │Master │
│Data │ │Data │ │Data │
│Hot │ │Warm │ │Cold │
└────────┘ └────────┘ └────────┘
↓ ↓ ↓
(Recent) (Historical) (Archive)
4.2 Shard Planning
Key formula:
Total Shards = (Data Size / Target Shard Size) × Replication Factor
Example:
- Data size: 1TB
- Target shard size: 50GB (optimal: 20-50GB)
- Replication factor: 2 (1 primary + 1 replica)
- Primary shards: 1000GB / 50GB = 20
- Total shards: 20 × 2 = 40 shards
Shard sizing guidelines:
| Data Size | Recommended Shards | Shard Size |
|---|---|---|
| < 10GB | 1-2 | 5-10GB |
| 10-100GB | 3-5 | 20-30GB |
| 100GB-1TB | 5-15 | 30-50GB |
| 1TB-10TB | 15-30 | 40-50GB |
| > 10TB | 30+ | 40-50GB |
5. Indexing & Data Organization
5.1 Index Lifecycle Management (ILM)
Automated policy for managing index lifecycle through 4 phases:
Hot Phase (Active indexing)
{
"policy": "logs_policy",
"phases": {
"hot": {
"min_age": "0d",
"actions": {
"rollover": {
"max_primary_shard_size": "50GB",
"max_age": "1d"
}
}
}
}
}
Warm Phase (Indexed but not frequently searched)
{
"warm": {
"min_age": "7d",
"actions": {
"set_priority": {
"priority": 25
},
"forcemerge": {
"max_num_segments": 1
}
}
}
}
Cold Phase (Occasional searches, lower performance acceptable)
{
"cold": {
"min_age": "30d",
"actions": {
"searchable_snapshot": {}
}
}
}
Delete Phase (Remove old data)
{
"delete": {
"min_age": "90d",
"actions": {
"delete": {}
}
}
}
5.2 Index Mapping
Defines structure of documents within an index:
{
"mappings": {
"properties": {
"@timestamp": {
"type": "date",
"format": "epoch_millis"
},
"message": {
"type": "text",
"analyzer": "standard"
},
"level": {
"type": "keyword"
},
"service": {
"type": "keyword"
},
"response_time_ms": {
"type": "integer"
},
"tags": {
"type": "keyword"
}
}
}
}
Field types:
text: Full-text searchable (analyzed)keyword: Exact matching (not analyzed)date: Date/time valuesinteger,long,float,double: Numericboolean: True/falseobject: Nested JSONnested: Array of objects
5.3 Data Ingestion Patterns
Bulk Indexing (Highest throughput)
curl -X POST "localhost:9200/_bulk" -H 'Content-Type: application/json' -d'
{ "index" : { "_index" : "logs", "_id" : "1" } }
{ "timestamp": "2024-01-31T10:30:00Z", "level": "INFO", "message": "Started" }
{ "index" : { "_index" : "logs", "_id" : "2" } }
{ "timestamp": "2024-01-31T10:30:01Z", "level": "ERROR", "message": "Failed" }
'
Beats (Lightweight shippers)
- Filebeat: Logs and files
- Metricbeat: System metrics
- Heartbeat: Uptime monitoring
- Packetbeat: Network traffic
Logstash (Heavy processing)
- Complex transformations
- Multi-source aggregation
- Conditional routing
Kafka Integration (High-volume events)
Kafka → Logstash → Elasticsearch/OpenSearch
6. Querying & Search
6.1 Query DSL (Domain Specific Language)
Match Query (Full-text search)
{
"query": {
"match": {
"message": "application error"
}
}
}
Term Query (Exact match)
{
"query": {
"term": {
"level": "ERROR"
}
}
}
Bool Query (Complex combinations)
{
"query": {
"bool": {
"must": [
{ "match": { "message": "error" } }
],
"filter": [
{ "term": { "level": "ERROR" } },
{ "range": { "@timestamp": { "gte": "now-1h" } } }
],
"should": [
{ "term": { "service": "api-server" } }
],
"minimum_should_match": 1
}
}
}
Range Query (Numeric/date ranges)
{
"query": {
"range": {
"response_time_ms": {
"gte": 100,
"lte": 1000
}
}
}
}
6.2 Aggregations (Analytics)
Terms Aggregation (Count by value)
{
"aggs": {
"errors_by_service": {
"terms": {
"field": "service",
"size": 10
}
}
}
}
Date Histogram (Time-series data)
{
"aggs": {
"errors_over_time": {
"date_histogram": {
"field": "@timestamp",
"calendar_interval": "1h"
}
}
}
}
Metric Aggregations (Calculate values)
{
"aggs": {
"avg_response_time": {
"avg": {
"field": "response_time_ms"
}
},
"p99_response_time": {
"percentiles": {
"field": "response_time_ms",
"percents": [99]
}
}
}
}
6.3 Search Performance Tips
- Index optimization: Smaller shards (20-50GB) for faster queries
- Caching: Elasticsearch caches query results
- Filtering before aggregations: Reduce data processed
- Date range queries: Always filter by date range when querying logs
- Avoid wildcards: Use term queries instead
- Use forcemerge on old indices: Merge segments to improve search speed
7. Performance Optimization
7.1 Tuning Parameters
Heap Size (Most important)
# Set Xmx and Xms to same value (avoid heap resizing)
# Never exceed 50% of available RAM
# Typical: 16GB-32GB for data nodes
-Xmx31g
-Xms31g
Thread Pools (Query execution)
thread_pool:
search:
size: <auto> # 3 × num_cores
queue_size: 1000
index:
size: <auto>
queue_size: 200
bulk:
size: <auto>
queue_size: 300
Network Optimization
# Increase TCP buffer sizes
net.core.rmem_max: 134217728
net.core.wmem_max: 134217728
net.ipv4.tcp_rmem: 67108864 134217728 268435456
net.ipv4.tcp_wmem: 67108864 134217728 268435456
7.2 Storage & I/O Optimization
Refresh Interval (How often segments searchable)
{
"settings": {
"index.refresh_interval": "30s" # Default 1s
}
}
Merge Policy (Combine segments)
{
"settings": {
"index.merge.policy.segments_per_tier": 10
}
}
Compression (Reduce storage)
index.codec: best_compression # Default: default
8. Cluster Management
8.1 Node Discovery & Bootstrap
Seed Hosts (Initial discovery)
discovery.seed_hosts:
- es-node-1:9300
- es-node-2:9300
- es-node-3:9300
Initial Master Nodes (Bootstrap cluster)
cluster.initial_master_nodes:
- es-node-1
- es-node-2
- es-node-3
8.2 Shard Allocation
Allocation awareness (Distribute across zones)
node.attr.zone: us-east-1a
cluster.routing.allocation.awareness.attributes: zone
Allocation filtering (Restrict shard placement)
{
"index.routing.allocation.require._name": "data-node-*"
}
8.3 Cluster Health Monitoring
# Check cluster health
curl -s localhost:9200/_cluster/health | jq .
# Response:
{
"cluster_name": "elasticsearch",
"status": "green", # green = healthy, yellow = missing replicas, red = missing primary
"timed_out": false,
"number_of_nodes": 5,
"number_of_data_nodes": 3,
"active_primary_shards": 100,
"active_shards": 200,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 0,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 100.0
}
9. High Availability & Disaster Recovery
9.1 Replication Strategy
Primary & Replica Distribution:
Node 1: Shard-0P, Shard-1P, Shard-2R
Node 2: Shard-0R, Shard-1R, Shard-2P
Node 3: Shard-1P, Shard-2P, Shard-0P (if 3 replicas)
Replication Best Practices:
- Minimum 2 replicas for critical data
- 1 replica for non-critical data
- Cross-zone replication for site failover
9.2 Snapshot & Restore
Create Repository (S3 example)
{
"type": "s3",
"settings": {
"bucket": "my-elasticsearch-backups",
"region": "us-east-1",
"base_path": "backups"
}
}
Create Snapshot
curl -X PUT "localhost:9200/_snapshot/my-repo/snapshot-2024-01-31"
# List snapshots
curl "localhost:9200/_snapshot/my-repo/_all"
# Restore snapshot
curl -X POST "localhost:9200/_snapshot/my-repo/snapshot-2024-01-31/_restore"
9.3 Cross-Cluster Replication (CCR)
For disaster recovery with geographically distributed clusters:
cluster.remote:
leader:
seeds:
- remote-es-node:9300
skip_unavailable: false
10. Security & Compliance
10.1 Authentication & Authorization
LDAP Integration (OpenSearch/ES with security plugin)
opendistro_security:
authcz:
admin_dn:
- "cn=admin,dc=example,dc=com"
auth_domain:
ldap:
description: "LDAP authenticator"
http_authenticator:
type: basic
challenge: false
authentication_backend:
type: ldap
config:
ldap_host: ldap.example.com
ldap_port: 389
ldap_bind_dn: "cn=admin,dc=example,dc=com"
ldap_bind_password: "password"
ldap_basedn: "dc=example,dc=com"
ldap_userbase: "ou=users,dc=example,dc=com"
ldap_usersearch: "(uid={0})"
10.2 Encryption
Transport Layer (Node-to-node)
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: "certs/elastic-stack-ca.p12"
xpack.security.transport.ssl.keystore.password: "password"
REST Layer (Client-to-node)
xpack.security.http.ssl.enabled: true
xpack.security.http.ssl.keystore.path: "certs/elastic-stack-ca.p12"
xpack.security.http.ssl.keystore.password: "password"
10.3 Audit Logging
xpack.security.audit.enabled: true
xpack.security.audit.outputs: [logfile]
xpack.security.audit.logfile.events.include:
- access_denied
- access_granted
- authentication_failed
- authentication_success
- privilege_check_failure
- privilege_check_success
11. Monitoring & Observability
11.1 Key Metrics to Monitor
| Metric | Threshold | Impact |
|---|---|---|
| Cluster Health | Should be green | Red = data loss risk |
| Heap Usage | < 85% | > 90% causes GC pauses |
| JVM GC Time | < 1% of total | > 2% indicates memory pressure |
| Disk Usage | < 85% | > 90% prevents shard assignment |
| Index Size | < 50GB per shard | Larger = slower queries |
| Query Latency | < 100ms p99 | > 500ms = poor UX |
| Indexing Rate | Monitor trend | Spikes indicate issues |
| Unassigned Shards | 0 | Indicates allocation problems |
11.2 Monitoring Tools
Built-in Monitoring (Elasticsearch/OpenSearch)
GET /_cluster/stats
GET /_nodes/stats
GET /_cat/indices
GET /_cat/nodes
External Monitoring
- Prometheus + Prometheus Exporter
- ELK Stack (Elasticsearch + Logstash + Kibana)
- Splunk integration
- New Relic integration
- Datadog integration
12. Advanced Features
12.1 Machine Learning (Elasticsearch - Commercial)
- Anomaly detection
- Forecasting
- Outlier detection
- Advanced visualizations
12.2 Alerting
{
"trigger": {
"schedule": {
"interval": "5m"
}
},
"input": {
"search": {
"indices": ["logs"],
"body": {
"query": {
"bool": {
"filter": {
"term": {
"level": "ERROR"
}
}
}
}
}
}
},
"condition": {
"script": {
"source": "ctx.payload.hits.total > 10"
}
},
"actions": {
"send_email": {
"email": {
"to": "ops@example.com",
"subject": "High error rate detected"
}
}
}
}
12.3 Canvas & Custom Dashboards
Create pixel-perfect dashboards with:
- Custom visualizations
- Real-time data
- Workpads (like Powerpoint presentations)
- Shareable reports
13. Common Use Cases
13.1 Application Logging
Pattern: Filebeat → Elasticsearch → Kibana
Applications
↓
Logs (JSON)
↓
Filebeat
↓
Elasticsearch
↓
Kibana (visualization)
13.2 Infrastructure Monitoring
Pattern: Metricbeat → Elasticsearch
Servers/K8s
↓
System Metrics
↓
Metricbeat
↓
Elasticsearch (time-series data)
↓
Dashboards & Alerting
13.3 Security Information & Event Management (SIEM)
Pattern: Multi-source → Logstash → Elasticsearch
Firewalls, IDS, WAF, Endpoints
↓
Raw Events
↓
Logstash (parse, enrich, correlate)
↓
Elasticsearch/OpenSearch
↓
Threat Detection & Response
13.4 Full-Text Search
E-commerce example:
{
"query": {
"multi_match": {
"query": "wireless headphones",
"fields": ["title^2", "description", "tags"]
}
}
}
14. Troubleshooting
14.1 Common Issues
Red Cluster Status
- Cause: Missing primary shards
- Solution: Check node status, restore from backup, or force allocate
High Heap Usage
- Cause: Large queries, memory leaks
- Solution: Increase heap, optimize queries, add nodes
Slow Queries
- Cause: Large shards, missing indices, complex aggregations
- Solution: Split large shards, add indices, optimize queries
Unassigned Shards
- Cause: Not enough nodes, allocation filters, disk space
- Solution: Add nodes, adjust filters, free disk space
14.2 Debugging Commands
# Check node status
curl -s localhost:9200/_nodes | jq '.nodes'
# Check shard allocation
curl -s localhost:9200/_cat/shards
# Check index settings
curl -s localhost:9200/logs/_settings
# Check node info
curl -s localhost:9200/_nodes/stats/jvm | jq '.nodes[] | {name, heap_percent}'
# Enable debug logging
curl -X PUT "localhost:9200/_cluster/settings" -d '{
"transient": {
"logger.org.elasticsearch": "DEBUG"
}
}'
15. Enterprise Patterns
15.1 Multi-Cluster Architecture
Hub & Spoke:
Central Cluster (Analytics)
↑
├── Regional Cluster 1 → Cross-Cluster Search
├── Regional Cluster 2 → Cross-Cluster Search
└── Regional Cluster 3 → Cross-Cluster Search
Active-Active Replication:
Cluster A ←→ Cross-Cluster Replication ←→ Cluster B
(DR) (Bidirectional) (DR)
15.2 Tiered Storage Architecture
Hot-Warm-Cold Pattern:
Day 1-7: Hot nodes (SSD, high performance)
Day 8-30: Warm nodes (HDD, slower, lower cost)
Day 31+: Cold nodes (Archive, very slow, cheapest)
Cost reduction: 50-70% with tiered storage
15.3 Index Strategy for Large Scale
Time-series indices (Recommended for logs):
- Daily:
logs-2024.01.31 - Weekly:
logs-2024.w05 - Monthly:
logs-2024.01
Rollover strategy: Automatic creation of new index when:
- Index reaches size limit (50GB)
- Index reaches age limit (1 day)
- Manual trigger
Conclusion
Elasticsearch/OpenSearch provides powerful, scalable search and analytics capabilities. Key takeaways:
- Architecture: Understand node types and shard placement
- Indexing: Use ILM for automated lifecycle management
- Querying: Master Query DSL for efficient searches
- Optimization: Focus on heap sizing and shard management
- HA/DR: Implement replication and snapshot strategies
- Security: Enable authentication, encryption, and auditing
- Monitoring: Proactive monitoring prevents issues
Document Version: 1.0
Last Updated: January 31, 2026
Contact: Database & Search Team