Best Practices and Tips
This chapter consolidates practical advice for running Kibana effectively in production. It covers performance, organization, naming conventions, team workflows, and common pitfalls.
Dashboard Design
Layout Principles
1. Most important information at the top
2. Summary → Detail (top to bottom)
3. Time-series charts in wide panels
4. Related metrics side by side
5. Controls and filters at the very top
Recommended layout:
┌─────────────────────────────────────────────────────────┐
│ [Filters] [Controls] [Time Range] │ Row 1
├────────┬────────┬────────┬──────────────────────────────┤
│ KPI 1 │ KPI 2 │ KPI 3 │ KPI 4 │ Row 2
├────────┴────────┴────────┴──────────────────────────────┤
│ Time-series chart (full width) │ Row 3
├──────────────────────────┬──────────────────────────────┤
│ Breakdown chart │ Breakdown chart │ Row 4
├──────────────────────────┴──────────────────────────────┤
│ Detail table (full width) │ Row 5
└─────────────────────────────────────────────────────────┘
Panel Count
Keep dashboards focused:
✅ 8-15 panels: Fast loading, clear purpose
✅ 15-25 panels: Acceptable for broad views
❌ 30+ panels: Split into linked dashboards
Technique: Use drilldowns to link an overview dashboard
to detail dashboards instead of packing everything into one.
Color Consistency
Establish a color scheme and stick to it:
Status colors (across all dashboards):
Green: #00BFA5 Success, healthy, within SLA
Yellow: #FFB74D Warning, degraded, approaching limit
Red: #FF5252 Error, critical, SLA breach
Blue: #448AFF Informational, neutral
Gray: #9E9E9E Inactive, disabled, no data
Category colors (per domain):
Assign fixed colors to categories so they're recognizable:
- "Men's Clothing" always uses blue
- "Women's Clothing" always uses pink
- "Shoes" always uses brown
Titles and Labels
✅ "Daily Revenue ($)": clear metric, clear unit
✅ "Error Rate (%) - Last 24h": metric, unit, time context
✅ "Top 10 Endpoints by Latency": scope, dimension, metric
❌ "Chart 1": meaningless
❌ "taxful_total_price": raw field name
❌ "Data": too vague
Naming Conventions
Saved Objects
Use a consistent naming scheme for all saved objects:
Pattern: [Team/Domain] Object Type - Description
Dashboards:
[Ops] Overview - Production Health
[Ops] Detail - API Gateway
[Sales] Overview - Revenue Metrics
[Sales] Detail - Product Performance
Visualizations:
[Ops] Metric - Error Rate
[Ops] Line - Request Latency p95
[Sales] Bar - Top Products by Revenue
[Sales] Pie - Revenue by Category
Saved Searches:
[Ops] Search - 5xx Errors Last 24h
[Sales] Search - High Value Orders
Index Patterns / Data Views:
logs-prod-* (Production Logs)
logs-staging-* (Staging Logs)
metrics-prod-* (Production Metrics)
Tags
Apply tags consistently:
Tag categories:
Team: ops, dev, sales, marketing, security
Environment: production, staging, development
Type: monitoring, analytics, reporting, investigation
Status: active, archived, draft, template
Priority: critical, important, reference
Example:
Dashboard: "[Ops] Overview - Production Health"
Tags: [ops, production, monitoring, critical]
Spaces
Space naming:
production → Production monitoring and dashboards
development → Dev/test dashboards and experiments
marketing → Marketing analytics
security → Security operations
shared → Cross-team dashboards
Keep it simple: one space per team or function.
Query Performance
Index and Field Optimization
✅ Use keyword fields for filtering and aggregations
category.keyword: "Shoes"
❌ Use text fields for aggregations
category: "Shoes" (analyzed field, slower)
✅ Use date histogram with appropriate intervals
Auto interval or 1h for daily views, 1d for monthly
❌ Use tiny intervals on large time ranges
1-minute intervals over 1 year = too many buckets
Query Patterns
Fast queries:
✅ Term filter on keyword field
✅ Range filter on numeric/date field
✅ Bool filter combining few conditions
✅ Date histogram with auto interval
Slow queries:
❌ Wildcard prefix (*something)
❌ Regex on large text fields
❌ High-cardinality terms aggregation (top 10000)
❌ Nested aggregations 4+ levels deep
❌ Scripts in aggregations
Time Range Strategy
Real-time dashboards: Last 15-30 minutes
Operational monitoring: Last 1-4 hours
Daily review: Last 24 hours / Today
Weekly reports: Last 7 days
Monthly analysis: Last 30 days
Historical research: Custom range (as narrow as possible)
Tip: Always set a time range. Querying "all time" on
production indices is the #1 cause of slow dashboards.
Caching
Kibana and Elasticsearch cache query results. Maximize cache hits:
✅ Use filters (cacheable) over query_string (not always cached)
✅ Round time ranges to hour/day boundaries
✅ Use consistent queries across dashboard panels
✅ Enable query caching in Elasticsearch:
indices.queries.cache.enabled: true (default)
Elasticsearch request cache works best when:
- Shard data doesn't change (rolled-over indices)
- Same query is repeated
- Time range uses "now" rounding: now/h, now/d
Data View Strategy
Naming and Organization
Convention: source-environment-*
Examples:
filebeat-prod-* All production Filebeat logs
metricbeat-prod-* All production Metricbeat metrics
apm-prod-* All production APM data
custom-orders-* Custom order data (all environments)
Avoid:
* (everything) Too broad, slow, confusing field list
logs-* Ambiguous: prod? dev? staging?
Field Formatting
Set up field formatting once in the data view so every dashboard benefits:
Standard formatting:
response → Color (green 2xx, yellow 3xx, red 4xx/5xx)
bytes → Bytes (auto KB/MB/GB)
response_time → Duration (ms)
price → Currency ($0,0.00)
url → URL (clickable)
percentage → Percent (0.00%)
ip_address → String (no special formatting)
Runtime Fields
Use runtime fields for common calculations so every dashboard has access:
Useful runtime fields:
hour_of_day → Extract hour from @timestamp
day_of_week → Extract day name
response_class → "2xx", "3xx", "4xx", "5xx" from response code
sla_status → "within_sla" / "breach" from response_time
environment → Extract from index name or hostname
Alerting Best Practices
Alert Design
✅ Alert on symptoms, not causes
"Error rate > 5%" (symptom)
NOT "Pod restarted" (cause - may be normal)
✅ Include runbook links in alert messages
"Error rate high. Runbook: https://wiki.example.com/runbooks/high-error-rate"
✅ Set appropriate thresholds (not too sensitive)
Test thresholds against historical data before enabling
✅ Use tiered severity
Warning: error_rate > 2% → Slack #monitoring
Critical: error_rate > 10% → PagerDuty + Slack #incidents
❌ Alert on everything
Alert fatigue = ignored alerts = missed incidents
Alert Noise Reduction
Techniques:
1. Aggregate: Group by service instead of per-instance
2. Throttle: Send at most once per hour while condition persists
3. Delay: Require condition true for 5+ minutes before alerting
4. Exclude: Filter out known false positives (health checks, test data)
5. Schedule: Suppress during maintenance windows
6. Snooze: Temporarily silence during known issues
Recovery Actions
Always configure recovery (resolved) actions:
Alert fired:
"🔴 Error rate 12% on payment-service (threshold: 5%)"
Alert recovered:
"✅ Error rate 1.2% on payment-service - back to normal
Duration: 23 minutes"
Recovery actions help close the loop and prevent confusion
about whether an issue is still ongoing.
Security Best Practices
Access Control
Principles:
1. Least privilege: Users get minimum access needed
2. Role-based: Define roles per function, assign to users
3. Space isolation: Separate environments and teams
4. Audit: Enable audit logging in production
Common roles:
viewer → Read-only dashboards (stakeholders)
analyst → Read data + create visualizations (analysts)
editor → Full dashboard management (dashboard authors)
alert_manager → Manage alerts (on-call engineers)
admin → Full Kibana management (admins only)
Sensitive Data
✅ Use field-level security to hide PII fields
✅ Use document-level security for multi-tenant data
✅ Never display raw credit card or SSN fields
✅ Mask email addresses in shared dashboards
✅ Use runtime fields to create masked versions of sensitive data
Example runtime field:
masked_email: "jo***@example.com" (from "john@example.com")
Production Hardening
# kibana.yml production settings
server.ssl.enabled: true
server.ssl.certificate: /path/to/cert.pem
server.ssl.key: /path/to/key.pem
xpack.security.enabled: true
xpack.encryptedSavedObjects.encryptionKey: "min-32-char-key"
xpack.reporting.encryptionKey: "min-32-char-key"
xpack.security.session.idleTimeout: "1h"
xpack.security.session.lifespan: "8h"
# Disable telemetry in production
telemetry.enabled: false
# CSP headers
csp.strict: true
csp.warnLegacyBrowsers: false
Operational Workflow
Dashboard Lifecycle
1. DRAFT
Create in development space
Use sample or dev data
Iterate on layout and queries
2. REVIEW
Share with stakeholders
Gather feedback
Test with production-like data
3. DEPLOY
Move to production space
Connect to production data views
Set up access controls
4. MAINTAIN
Monitor performance
Update as data schema changes
Archive when no longer needed
Version control via export/import
Version Control for Dashboards
Export dashboards as NDJSON and store in git:
# Export all dashboards
curl -X POST "localhost:5601/api/saved_objects/_export" \
-H "kbn-xsrf: true" \
-H "Content-Type: application/json" \
-d '{
"type": ["dashboard", "visualization", "search", "lens"],
"includeReferencesDeep": true
}' > kibana-dashboards.ndjson
# Commit to git
git add kibana-dashboards.ndjson
git commit -m "Export Kibana dashboards - Jan 2024"
# Import (restore or deploy to another instance)
curl -X POST "localhost:5601/api/saved_objects/_import?overwrite=true" \
-H "kbn-xsrf: true" \
--form file=@kibana-dashboards.ndjson
Backup Strategy
What to back up:
✅ Saved objects (dashboards, visualizations, data views)
✅ kibana.yml configuration
✅ Connector configurations (API keys, webhook URLs)
✅ ML job configurations
✅ Alert rules
How:
- Saved objects: API export (ndjson) → git
- Config files: Standard config management (Ansible, etc.)
- Elasticsearch snapshots (includes .kibana index)
Schedule:
- After any significant dashboard changes
- Weekly automated export
- Before Kibana version upgrades
Performance Tuning
Kibana Server
# kibana.yml performance settings
# Increase Node.js memory for large dashboards
# Set via environment variable:
# NODE_OPTIONS="--max-old-space-size=4096"
# Request timeout (increase for slow queries)
elasticsearch.requestTimeout: 60000
# Shard timeout
elasticsearch.shardTimeout: 30000
# Max payload size (for large imports)
server.maxPayload: 10485760
Elasticsearch Query Optimization
For Kibana-specific optimization:
1. Use index lifecycle management (ILM) to move old data to cheaper tiers
Hot → Warm → Cold → Delete
2. Create summary indices for dashboard-heavy queries
Daily/hourly rollups of frequently aggregated metrics
3. Optimize mappings
- Disable _source on metrics indices if not needed for Discover
- Use keyword instead of text for fields only used in aggs
- Set doc_values: false for fields never aggregated
4. Tune shard count
- Target 10-50GB per shard
- Avoid too many small shards (overhead per shard)
- Avoid too few large shards (slow queries)
Browser Performance
For users experiencing slow Kibana:
✅ Use Chrome or Firefox (latest versions)
✅ Close unused dashboard tabs
✅ Clear browser cache if behavior is unexpected
✅ Disable browser extensions that may interfere
✅ Use wired connection for large data sets
Dashboard-specific:
✅ Limit auto-refresh frequency (10s minimum)
✅ Set reasonable time ranges
✅ Use "Apply" button for controls (not auto-apply)
✅ Reduce panel count per dashboard
Monitoring Kibana Itself
Stack Monitoring
Enable monitoring to track Kibana's health:
# kibana.yml
monitoring.ui.enabled: true
Navigate to Stack Management → Stack Monitoring:
Kibana Instance Health:
┌──────────────────────────────────────────────┐
│ Requests: 45/s │
│ Response time: 120ms (avg), 450ms (p95) │
│ Memory: 1.2GB / 4GB (30%) │
│ Status: Green │
│ Connected to: 3 Elasticsearch nodes │
│ Uptime: 14 days │
└──────────────────────────────────────────────┘
Key Metrics to Watch
| Metric | Warning | Critical |
|---|---|---|
| Response time (p95) | > 2s | > 5s |
| Memory usage | > 70% | > 90% |
| Request rate | Varies | Sudden spike/drop |
| Elasticsearch connectivity | Intermittent | Lost |
| Status | Yellow | Red |
Health Check Endpoint
# Quick health check
curl -s "localhost:5601/api/status" | jq '.status.overall.level'
# "available" = healthy
# Detailed status
curl -s "localhost:5601/api/status" | jq '.status'
Upgrade Strategy
Before Upgrading
1. Read release notes for breaking changes
2. Export all saved objects (backup)
3. Test upgrade in staging/dev first
4. Check plugin compatibility
5. Verify Elasticsearch compatibility matrix
6. Plan rollback procedure
Kibana-Elasticsearch Compatibility
Rule: Kibana version must match Elasticsearch minor version
✅ Kibana 8.11.x + Elasticsearch 8.11.x
✅ Kibana 8.11.0 + Elasticsearch 8.11.3 (patch mismatch OK)
❌ Kibana 8.11.x + Elasticsearch 8.10.x (minor mismatch)
❌ Kibana 8.x + Elasticsearch 7.x (major mismatch)
Upgrade order: Elasticsearch first, then Kibana
Post-Upgrade Checklist
✅ Verify Kibana starts and connects to Elasticsearch
✅ Check saved objects migration (Stack Management → Upgrade Assistant)
✅ Test critical dashboards render correctly
✅ Verify alerts are firing
✅ Check ML jobs are running
✅ Test user authentication
✅ Review deprecated features and plan migration
Common Pitfalls
1. Querying Without Time Bounds
Problem: Dashboard queries scan entire index (years of data)
Impact: Slow queries, high memory, potential timeout
Fix: Always set a time range appropriate to the use case
2. Text Fields in Aggregations
Problem: Using "category" (text) instead of "category.keyword"
Impact: "Field is not aggregatable" error or unexpected results
Fix: Always use .keyword suffix for exact match and aggregations
3. Too Many Unique Values
Problem: Terms aggregation on high-cardinality field (e.g., user_id with millions)
Impact: Extremely slow, high memory, inaccurate
Fix: Use Top N (limit to 10-20), or use filters for specific values
4. Ignoring Error Messages
Problem: Visualization shows "No results" but user ignores it
Impact: Decisions made on missing data
Fix: Investigate: check time range, filters, data view, field names
5. Single Kibana Instance for Everything
Problem: One Kibana instance serves dev, staging, and production
Impact: Resource contention, security risk, messy organization
Fix: Separate instances per environment, or at minimum use Spaces
6. Not Using Saved Objects API for Migrations
Problem: Manually recreating dashboards in new environment
Impact: Time-consuming, error-prone, inconsistent
Fix: Export/import via API, store in version control
7. Alert Fatigue
Problem: Too many low-value alerts firing constantly
Impact: Team ignores alerts, real issues missed
Fix:
- Review and remove noisy alerts quarterly
- Require each alert to have a clear action (what should the recipient do?)
- Use tiered severity
- Set proper thresholds based on historical data
Quick Reference
Keyboard Shortcuts
| Shortcut | Action |
|---|---|
/ | Focus search bar |
Ctrl/Cmd + K | Command palette |
Ctrl/Cmd + / | Toggle navigation |
Ctrl/Cmd + S | Save current object |
Escape | Close modal |
Ctrl/Cmd + Z | Undo (in Lens editor) |
Useful API Endpoints
# System status
GET /api/status
# Saved objects
POST /api/saved_objects/_export
POST /api/saved_objects/_import
GET /api/saved_objects/_find?type=dashboard
# Data views
GET /api/data_views
POST /api/data_views/data_view
# Alerting
GET /api/alerting/rules/_find
POST /api/alerting/rule
# Spaces
GET /api/spaces/space
POST /api/spaces/space
Configuration Files
Kibana: /etc/kibana/kibana.yml
Elasticsearch: /etc/elasticsearch/elasticsearch.yml
Filebeat: /etc/filebeat/filebeat.yml
Metricbeat: /etc/metricbeat/metricbeat.yml
APM Server: /etc/apm-server/apm-server.yml
Docker volumes:
kibana: /usr/share/kibana/config/kibana.yml
elasticsearch: /usr/share/elasticsearch/config/elasticsearch.yml
Summary
In this chapter, you learned:
- ✅ Dashboard design principles for clarity and performance
- ✅ Naming conventions for saved objects, tags, and spaces
- ✅ Query performance optimization techniques
- ✅ Alerting best practices to avoid alert fatigue
- ✅ Security hardening for production deployments
- ✅ Operational workflows: lifecycle, versioning, backups
- ✅ Performance tuning for Kibana server and Elasticsearch
- ✅ Monitoring Kibana itself and upgrade strategies
- ✅ Common pitfalls and how to avoid them
This concludes the Kibana tutorial. You now have the knowledge to build effective dashboards, write efficient queries, set up monitoring and alerting, secure your deployment, and maintain it in production. The official Elastic documentation is an excellent resource for continued learning and reference.