Chapter 11: Monitoring and Observability
Overview
Running WSO2 in production means seeing what it does. That requires logs, metrics, API analytics, and distributed traces, plus the discipline to read them when something breaks.
Logging
Log Files
| Log File | Location | Contents |
|---|---|---|
wso2carbon.log | repository/logs/ | Main server log |
http_access.log | repository/logs/ | HTTP access log (gateway) |
audit.log | repository/logs/ | Security audit events |
gc.log | repository/logs/ | JVM garbage collection |
correlation.log | repository/logs/ | Request correlation |
wso2-apigw-errors.log | repository/logs/ | Gateway-specific errors |
Log Configuration
deployment.toml:
# Root logger
[logging]
level = "INFO"
# Per-package log levels
[[logging.loggers]]
name = "org.apache.synapse"
level = "WARN"
[[logging.loggers]]
name = "org.wso2.carbon.apimgt"
level = "INFO"
[[logging.loggers]]
name = "org.apache.synapse.transport.http.wire"
level = "OFF" # Enable DEBUG for wire logs (verbose)
# Audit log
[[logging.loggers]]
name = "AUDIT_LOG"
level = "INFO"
appender = "AUDIT_LOGFILE"
Wire Logs (Debug HTTP Traffic)
Enable temporarily for debugging. Extremely verbose, so never use it in production long-term.
# Enable wire logs
[[logging.loggers]]
name = "org.apache.synapse.transport.http.wire"
level = "DEBUG"
Wire log output:
[2026-02-11 10:30:15] DEBUG - wire >> "POST /api/employees HTTP/1.1[\r][\n]"
[2026-02-11 10:30:15] DEBUG - wire >> "Content-Type: application/json[\r][\n]"
[2026-02-11 10:30:15] DEBUG - wire >> "Authorization: Bearer eyJ4NX...[\r][\n]"
[2026-02-11 10:30:15] DEBUG - wire >> "{"name":"John Doe","email":"john@example.com"}"
Correlation Logging
Track a request across all components with a correlation ID.
# Enable correlation logs
[monitoring.correlation]
enable = true
log_all_methods = true
Usage:
# Set correlation ID in request
curl -H "activityid: abc-123-def" https://gateway:8243/api/data
# Find in logs
grep "abc-123-def" repository/logs/correlation.log
Correlation log output:
abc-123-def|HTTP-Listener|2026-02-11 10:30:15|0|api/data|GET|InSequence|Start
abc-123-def|HTTP-Sender|2026-02-11 10:30:15|45|backend:8080/data|GET|Call|End
abc-123-def|HTTP-Listener|2026-02-11 10:30:15|48|api/data|GET|OutSequence|End
HTTP Access Logging
[transport.http.access_log]
enable = true
format = "combined"
# Format: %h %l %u %t "%r" %s %b "%{Referer}i" "%{User-Agent}i" %D
# Or custom format
# format = "%h %t %r %s %D %{X-Correlation-ID}i"
Metrics
JMX Monitoring
WSO2 exposes JMX MBeans for JVM and application metrics.
# Enable JMX
[monitoring.jmx]
rmi_hostname = "localhost"
rmi_port = 9999
Connect with JConsole or VisualVM:
jconsole localhost:9999
# Or
jvisualvm
Key MBeans:
java.lang:type=Memory: Heap usagejava.lang:type=Threading: Thread countsjava.lang:type=GarbageCollector: GC statsorg.apache.synapse:type=Transport: HTTP transport metricsorg.wso2.carbon:type=ServerAdmin: Server status
Prometheus Integration
Export metrics in Prometheus format.
# Enable Prometheus metrics
[monitoring.prometheus]
enable = true
port = 9201
Scrape config (prometheus.yml):
scrape_configs:
- job_name: 'wso2-apim-gateway'
metrics_path: /metrics
scheme: https
tls_config:
insecure_skip_verify: true
static_configs:
- targets:
- 'gw1.example.com:9201'
- 'gw2.example.com:9201'
labels:
component: gateway
- job_name: 'wso2-mi'
metrics_path: /metric-service/metrics
static_configs:
- targets:
- 'mi1.example.com:9201'
labels:
component: micro-integrator
Key Prometheus Metrics:
| Metric | Description |
|---|---|
wso2_api_request_count_total | Total API requests |
wso2_api_error_count_total | Total API errors |
wso2_api_response_time_seconds | Response time histogram |
jvm_memory_bytes_used | JVM heap usage |
jvm_threads_current | Active threads |
http_requests_total | HTTP transport requests |
Grafana Dashboards
API Gateway Dashboard:
{
"panels": [
{
"title": "Request Rate",
"type": "graph",
"targets": [
{"expr": "rate(wso2_api_request_count_total[5m])"}
]
},
{
"title": "Error Rate",
"type": "graph",
"targets": [
{"expr": "rate(wso2_api_error_count_total[5m]) / rate(wso2_api_request_count_total[5m]) * 100"}
]
},
{
"title": "P95 Response Time",
"type": "graph",
"targets": [
{"expr": "histogram_quantile(0.95, rate(wso2_api_response_time_seconds_bucket[5m]))"}
]
},
{
"title": "JVM Heap Usage",
"type": "gauge",
"targets": [
{"expr": "jvm_memory_bytes_used{area='heap'} / jvm_memory_bytes_max{area='heap'} * 100"}
]
}
]
}
API Analytics
Built-in Analytics
WSO2 API Manager provides a built-in analytics dashboard.
Enable Analytics:
[apim.analytics]
enable = true
[apim.analytics.properties]
"publisher.reporter.class" = "org.wso2.am.analytics.publisher.sample.reporter.AnalyticsMetricReporter"
Metrics Available:
- API usage by time period
- Top APIs by request count
- Response time distributions
- Error breakdown by type
- Geographic distribution of requests
- Top applications and subscribers
ELK Stack Integration
Send logs and analytics to Elasticsearch for centralized analysis.
Filebeat Configuration (filebeat.yml):
filebeat.inputs:
- type: log
paths:
- /opt/wso2am/repository/logs/wso2carbon.log
fields:
log_type: server
multiline:
pattern: '^\d{4}-\d{2}-\d{2}'
negate: true
match: after
- type: log
paths:
- /opt/wso2am/repository/logs/http_access*.log
fields:
log_type: access
- type: log
paths:
- /opt/wso2am/repository/logs/audit.log
fields:
log_type: audit
- type: log
paths:
- /opt/wso2am/repository/logs/correlation.log
fields:
log_type: correlation
output.elasticsearch:
hosts: ["http://elasticsearch:9200"]
index: "wso2-%{[fields.log_type]}-%{+yyyy.MM.dd}"
Logstash Filter (for structured parsing):
filter {
if [fields][log_type] == "access" {
grok {
match => {
"message" => "%{IPORHOST:client_ip} - - \[%{HTTPDATE:timestamp}\] \"%{WORD:method} %{URIPATHPARAM:request} HTTP/%{NUMBER:http_version}\" %{NUMBER:status} %{NUMBER:bytes} %{NUMBER:response_time}"
}
}
mutate {
convert => {
"status" => "integer"
"bytes" => "integer"
"response_time" => "integer"
}
}
}
}
Distributed Tracing
OpenTelemetry Integration
# Enable OpenTelemetry tracing
[opentelemetry]
enable = true
exporter_type = "otlp"
[opentelemetry.remote]
host = "otel-collector.example.com"
port = 4317
Jaeger Integration
[opentelemetry]
enable = true
exporter_type = "jaeger"
[opentelemetry.remote]
host = "jaeger.example.com"
port = 14250
Trace flow through WSO2:
Client → Gateway (Span 1)
→ Key Validation (Span 2)
→ Backend Call (Span 3)
→ Response Processing (Span 4)
← Response to Client
Health Monitoring
Health Check Endpoints
API Manager:
# Gateway health
curl -k https://localhost:8243/services/Version
# Carbon health
curl -k https://localhost:9443/carbon/admin/login.jsp -o /dev/null -w "%{http_code}"
Micro Integrator:
# Liveness (server is running)
curl http://localhost:9164/liveness
# Response: {"status": "active"}
# Readiness (ready to accept requests)
curl http://localhost:9164/readiness
# Response: {"status": "ready"}
# List deployed services
curl http://localhost:9164/management/apis
Custom Health Check (MI)
<api name="HealthAPI" context="/health" xmlns="http://ws.apache.org/ns/synapse">
<resource methods="GET" uri-template="/">
<inSequence>
<!-- Check backend connectivity -->
<call>
<endpoint>
<http method="get" uri-template="http://backend:8080/ping">
<timeout>
<duration>5000</duration>
</timeout>
</http>
</endpoint>
</call>
<filter source="$axis2:HTTP_SC" regex="200">
<then>
<payloadFactory media-type="json">
<format>
{"status": "healthy", "backend": "UP", "timestamp": "$1"}
</format>
<args>
<arg expression="get-property('SYSTEM_TIME')"/>
</args>
</payloadFactory>
</then>
<else>
<payloadFactory media-type="json">
<format>
{"status": "degraded", "backend": "DOWN", "timestamp": "$1"}
</format>
<args>
<arg expression="get-property('SYSTEM_TIME')"/>
</args>
</payloadFactory>
<property name="HTTP_SC" value="503" scope="axis2"/>
</else>
</filter>
<respond/>
</inSequence>
</resource>
</api>
Alerting
Prometheus Alerting Rules
groups:
- name: wso2-alerts
rules:
- alert: HighErrorRate
expr: rate(wso2_api_error_count_total[5m]) / rate(wso2_api_request_count_total[5m]) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "API error rate above 5%"
- alert: HighResponseTime
expr: histogram_quantile(0.95, rate(wso2_api_response_time_seconds_bucket[5m])) > 2
for: 5m
labels:
severity: warning
annotations:
summary: "P95 response time above 2 seconds"
- alert: HighMemoryUsage
expr: jvm_memory_bytes_used{area="heap"} / jvm_memory_bytes_max{area="heap"} > 0.85
for: 10m
labels:
severity: warning
annotations:
summary: "JVM heap usage above 85%"
- alert: GatewayDown
expr: up{job="wso2-apim-gateway"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Gateway instance is down"
Troubleshooting
Common Issues and Solutions
| Symptom | Likely Cause | Action |
|---|---|---|
| 401 Unauthorized | Expired/invalid token | Check token expiry, regenerate |
| 403 Forbidden | Missing scopes | Verify scope assignment |
| 500 Internal Error | Backend failure | Check backend logs, fault sequence |
| 503 Service Unavailable | Backend down | Check endpoint connectivity |
| Slow responses | Resource exhaustion | Check JVM heap, thread count, DB pool |
| OOM crash | Insufficient heap | Increase -Xmx, check for memory leaks |
| Connection timeout | Network or firewall | Verify connectivity, increase timeout |
Diagnostic Commands
# Check server status
curl -k https://localhost:9443/services/Version
# Thread dump (find deadlocks, stuck threads)
kill -3 <PID>
# Or
jstack <PID> > thread_dump.txt
# Heap dump (analyze memory)
jmap -dump:format=b,file=heap.hprof <PID>
# Check open file descriptors
ls /proc/<PID>/fd | wc -l
# Check active connections
ss -tnp | grep <PID> | wc -l
# Monitor GC in real-time
jstat -gcutil <PID> 1000
Analyzing Slow APIs
# 1. Enable correlation logging
# 2. Send request with correlation ID
curl -H "activityid: debug-001" https://gateway:8243/api/slow-endpoint
# 3. Analyze correlation log for time spent in each stage
grep "debug-001" repository/logs/correlation.log
# Output shows time per step:
# debug-001|HTTP-Listener|...|0ms|InSequence|Start
# debug-001|HTTP-Sender|...|1250ms|Backend|Call
# debug-001|HTTP-Listener|...|1255ms|OutSequence|End
# → Backend call took 1250ms; investigate backend
Log Level Changes at Runtime
# MI CLI: change log level without restart
mi log-level update org.apache.synapse DEBUG
# Revert
mi log-level update org.apache.synapse INFO
Key Takeaways
- Correlation logging traces requests end-to-end across components
- Wire logs are essential for debugging but too verbose for production
- Prometheus + Grafana provides real-time metrics and dashboards
- ELK stack centralizes logs for search and analysis
- OpenTelemetry/Jaeger adds distributed tracing across services
- Health check endpoints enable load balancer and Kubernetes probes
- Thread dumps and heap dumps diagnose JVM-level issues
- Runtime log level changes avoid restarts during incident investigation
Next Steps
Continue to Chapter 12: Advanced Topics to learn about advanced integration patterns, streaming, and microservices.