Chapter 11: Monitoring and Observability
Overview
Monitoring WSO2 in production requires visibility into logs, metrics, API analytics, and distributed traces. This chapter covers built-in tools, integration with external observability stacks, and common troubleshooting patterns.
Logging
Log Files
| Log File | Location | Contents |
|---|---|---|
wso2carbon.log | repository/logs/ | Main server log |
http_access.log | repository/logs/ | HTTP access log (gateway) |
audit.log | repository/logs/ | Security audit events |
gc.log | repository/logs/ | JVM garbage collection |
correlation.log | repository/logs/ | Request correlation |
wso2-apigw-errors.log | repository/logs/ | Gateway-specific errors |
Log Configuration
deployment.toml:
# Root logger
[logging]
level = "INFO"
# Per-package log levels
[[logging.loggers]]
name = "org.apache.synapse"
level = "WARN"
[[logging.loggers]]
name = "org.wso2.carbon.apimgt"
level = "INFO"
[[logging.loggers]]
name = "org.apache.synapse.transport.http.wire"
level = "OFF" # Enable DEBUG for wire logs (verbose)
# Audit log
[[logging.loggers]]
name = "AUDIT_LOG"
level = "INFO"
appender = "AUDIT_LOGFILE"
Wire Logs (Debug HTTP Traffic)
Enable temporarily for debugging. Extremely verbose, so never use it in production long-term.
# Enable wire logs
[[logging.loggers]]
name = "org.apache.synapse.transport.http.wire"
level = "DEBUG"
Wire log output:
[2026-02-11 10:30:15] DEBUG - wire >> "POST /api/employees HTTP/1.1[\r][\n]"
[2026-02-11 10:30:15] DEBUG - wire >> "Content-Type: application/json[\r][\n]"
[2026-02-11 10:30:15] DEBUG - wire >> "Authorization: Bearer eyJ4NX...[\r][\n]"
[2026-02-11 10:30:15] DEBUG - wire >> "{"name":"John Doe","email":"john@example.com"}"
Correlation Logging
Track a request across all components with a correlation ID.
# Enable correlation logs
[monitoring.correlation]
enable = true
log_all_methods = true
Usage:
# Set correlation ID in request
curl -H "activityid: abc-123-def" https://gateway:8243/api/data
# Find in logs
grep "abc-123-def" repository/logs/correlation.log
Correlation log output:
abc-123-def|HTTP-Listener|2026-02-11 10:30:15|0|api/data|GET|InSequence|Start
abc-123-def|HTTP-Sender|2026-02-11 10:30:15|45|backend:8080/data|GET|Call|End
abc-123-def|HTTP-Listener|2026-02-11 10:30:15|48|api/data|GET|OutSequence|End
HTTP Access Logging
[transport.http.access_log]
enable = true
format = "combined"
# Format: %h %l %u %t "%r" %s %b "%{Referer}i" "%{User-Agent}i" %D
# Or custom format
# format = "%h %t %r %s %D %{X-Correlation-ID}i"
Metrics
JMX Monitoring
WSO2 exposes JMX MBeans for JVM and application metrics.
# Enable JMX
[monitoring.jmx]
rmi_hostname = "localhost"
rmi_port = 9999
Connect with JConsole or VisualVM:
jconsole localhost:9999
# Or
jvisualvm
Key MBeans:
java.lang:type=Memory: Heap usagejava.lang:type=Threading: Thread countsjava.lang:type=GarbageCollector: GC statsorg.apache.synapse:type=Transport: HTTP transport metricsorg.wso2.carbon:type=ServerAdmin: Server status
Prometheus Integration
Export metrics in Prometheus format.
# Enable Prometheus metrics
[monitoring.prometheus]
enable = true
port = 9201
Scrape config (prometheus.yml):
scrape_configs:
- job_name: 'wso2-apim-gateway'
metrics_path: /metrics
scheme: https
tls_config:
insecure_skip_verify: true
static_configs:
- targets:
- 'gw1.example.com:9201'
- 'gw2.example.com:9201'
labels:
component: gateway
- job_name: 'wso2-mi'
metrics_path: /metric-service/metrics
static_configs:
- targets:
- 'mi1.example.com:9201'
labels:
component: micro-integrator
Key Prometheus Metrics:
| Metric | Description |
|---|---|
wso2_api_request_count_total | Total API requests |
wso2_api_error_count_total | Total API errors |
wso2_api_response_time_seconds | Response time histogram |
jvm_memory_bytes_used | JVM heap usage |
jvm_threads_current | Active threads |
http_requests_total | HTTP transport requests |
Grafana Dashboards
API Gateway Dashboard:
{
"panels": [
{
"title": "Request Rate",
"type": "graph",
"targets": [
{"expr": "rate(wso2_api_request_count_total[5m])"}
]
},
{
"title": "Error Rate",
"type": "graph",
"targets": [
{"expr": "rate(wso2_api_error_count_total[5m]) / rate(wso2_api_request_count_total[5m]) * 100"}
]
},
{
"title": "P95 Response Time",
"type": "graph",
"targets": [
{"expr": "histogram_quantile(0.95, rate(wso2_api_response_time_seconds_bucket[5m]))"}
]
},
{
"title": "JVM Heap Usage",
"type": "gauge",
"targets": [
{"expr": "jvm_memory_bytes_used{area='heap'} / jvm_memory_bytes_max{area='heap'} * 100"}
]
}
]
}
API Analytics
Built-in Analytics
WSO2 API Manager provides a built-in analytics dashboard.
Enable Analytics:
[apim.analytics]
enable = true
[apim.analytics.properties]
"publisher.reporter.class" = "org.wso2.am.analytics.publisher.sample.reporter.AnalyticsMetricReporter"
Metrics Available:
- API usage by time period
- Top APIs by request count
- Response time distributions
- Error breakdown by type
- Geographic distribution of requests
- Top applications and subscribers
ELK Stack Integration
Send logs and analytics to Elasticsearch for centralized analysis.
Filebeat Configuration (filebeat.yml):
filebeat.inputs:
- type: log
paths:
- /opt/wso2am/repository/logs/wso2carbon.log
fields:
log_type: server
multiline:
pattern: '^\d{4}-\d{2}-\d{2}'
negate: true
match: after
- type: log
paths:
- /opt/wso2am/repository/logs/http_access*.log
fields:
log_type: access
- type: log
paths:
- /opt/wso2am/repository/logs/audit.log
fields:
log_type: audit
- type: log
paths:
- /opt/wso2am/repository/logs/correlation.log
fields:
log_type: correlation
output.elasticsearch:
hosts: ["http://elasticsearch:9200"]
index: "wso2-%{[fields.log_type]}-%{+yyyy.MM.dd}"
Logstash Filter (for structured parsing):
filter {
if [fields][log_type] == "access" {
grok {
match => {
"message" => "%{IPORHOST:client_ip} - - \[%{HTTPDATE:timestamp}\] \"%{WORD:method} %{URIPATHPARAM:request} HTTP/%{NUMBER:http_version}\" %{NUMBER:status} %{NUMBER:bytes} %{NUMBER:response_time}"
}
}
mutate {
convert => {
"status" => "integer"
"bytes" => "integer"
"response_time" => "integer"
}
}
}
}
Distributed Tracing
OpenTelemetry Integration
# Enable OpenTelemetry tracing
[opentelemetry]
enable = true
exporter_type = "otlp"
[opentelemetry.remote]
host = "otel-collector.example.com"
port = 4317
Jaeger Integration
[opentelemetry]
enable = true
exporter_type = "jaeger"
[opentelemetry.remote]
host = "jaeger.example.com"
port = 14250
Trace flow through WSO2:
Client → Gateway (Span 1)
→ Key Validation (Span 2)
→ Backend Call (Span 3)
→ Response Processing (Span 4)
← Response to Client
Health Monitoring
Health Check Endpoints
API Manager:
# Gateway health
curl -k https://localhost:8243/services/Version
# Carbon health
curl -k https://localhost:9443/carbon/admin/login.jsp -o /dev/null -w "%{http_code}"
Micro Integrator:
# Liveness (server is running)
curl http://localhost:9164/liveness
# Response: {"status": "active"}
# Readiness (ready to accept requests)
curl http://localhost:9164/readiness
# Response: {"status": "ready"}
# List deployed services
curl http://localhost:9164/management/apis
Custom Health Check (MI)
<api name="HealthAPI" context="/health" xmlns="http://ws.apache.org/ns/synapse">
<resource methods="GET" uri-template="/">
<inSequence>
<!-- Check backend connectivity -->
<call>
<endpoint>
<http method="get" uri-template="http://backend:8080/ping">
<timeout>
<duration>5000</duration>
</timeout>
</http>
</endpoint>
</call>
<filter source="$axis2:HTTP_SC" regex="200">
<then>
<payloadFactory media-type="json">
<format>
{"status": "healthy", "backend": "UP", "timestamp": "$1"}
</format>
<args>
<arg expression="get-property('SYSTEM_TIME')"/>
</args>
</payloadFactory>
</then>
<else>
<payloadFactory media-type="json">
<format>
{"status": "degraded", "backend": "DOWN", "timestamp": "$1"}
</format>
<args>
<arg expression="get-property('SYSTEM_TIME')"/>
</args>
</payloadFactory>
<property name="HTTP_SC" value="503" scope="axis2"/>
</else>
</filter>
<respond/>
</inSequence>
</resource>
</api>
Alerting
Prometheus Alerting Rules
groups:
- name: wso2-alerts
rules:
- alert: HighErrorRate
expr: rate(wso2_api_error_count_total[5m]) / rate(wso2_api_request_count_total[5m]) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "API error rate above 5%"
- alert: HighResponseTime
expr: histogram_quantile(0.95, rate(wso2_api_response_time_seconds_bucket[5m])) > 2
for: 5m
labels:
severity: warning
annotations:
summary: "P95 response time above 2 seconds"
- alert: HighMemoryUsage
expr: jvm_memory_bytes_used{area="heap"} / jvm_memory_bytes_max{area="heap"} > 0.85
for: 10m
labels:
severity: warning
annotations:
summary: "JVM heap usage above 85%"
- alert: GatewayDown
expr: up{job="wso2-apim-gateway"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Gateway instance is down"
Troubleshooting
Common Issues and Solutions
| Symptom | Likely Cause | Action |
|---|---|---|
| 401 Unauthorized | Expired/invalid token | Check token expiry, regenerate |
| 403 Forbidden | Missing scopes | Verify scope assignment |
| 500 Internal Error | Backend failure | Check backend logs, fault sequence |
| 503 Service Unavailable | Backend down | Check endpoint connectivity |
| Slow responses | Resource exhaustion | Check JVM heap, thread count, DB pool |
| OOM crash | Insufficient heap | Increase -Xmx, check for memory leaks |
| Connection timeout | Network or firewall | Verify connectivity, increase timeout |
Diagnostic Commands
# Check server status
curl -k https://localhost:9443/services/Version
# Thread dump (find deadlocks, stuck threads)
kill -3 <PID>
# Or
jstack <PID> > thread_dump.txt
# Heap dump (analyze memory)
jmap -dump:format=b,file=heap.hprof <PID>
# Check open file descriptors
ls /proc/<PID>/fd | wc -l
# Check active connections
ss -tnp | grep <PID> | wc -l
# Monitor GC in real-time
jstat -gcutil <PID> 1000
Analyzing Slow APIs
# 1. Enable correlation logging
# 2. Send request with correlation ID
curl -H "activityid: debug-001" https://gateway:8243/api/slow-endpoint
# 3. Analyze correlation log for time spent in each stage
grep "debug-001" repository/logs/correlation.log
# Output shows time per step:
# debug-001|HTTP-Listener|...|0ms|InSequence|Start
# debug-001|HTTP-Sender|...|1250ms|Backend|Call
# debug-001|HTTP-Listener|...|1255ms|OutSequence|End
# → Backend call took 1250ms; investigate backend
Log Level Changes at Runtime
# MI CLI: change log level without restart
mi log-level update org.apache.synapse DEBUG
# Revert
mi log-level update org.apache.synapse INFO
Key Takeaways
- Correlation logging traces requests end-to-end across components
- Wire logs are essential for debugging but too verbose for production
- Prometheus + Grafana provides real-time metrics and dashboards
- ELK stack centralizes logs for search and analysis
- OpenTelemetry/Jaeger adds distributed tracing across services
- Health check endpoints enable load balancer and Kubernetes probes
- Thread dumps and heap dumps diagnose JVM-level issues
- Runtime log level changes avoid restarts during incident investigation
Next Steps
Continue to Chapter 12: Advanced Topics to learn about advanced integration patterns, streaming, and microservices.