Add Prometheus HTTP service discovery endpoint for easy discovery of all workers in Docker image (#19336)
Add Prometheus [HTTP service discovery](https://prometheus.io/docs/prometheus/latest/http_sd/)
endpoint for easy discovery of all workers in Docker image.
Follow-up to https://github.com/element-hq/synapse/pull/19324
Spawning from wanting to [run a load
test](https://github.com/element-hq/synapse-rust-apps/pull/397) against
the Complement Docker image of Synapse and see metrics from the
homeserver.
`GET http://<synapse_container>:9469/metrics/service_discovery`
```json5
[
{
"targets": [ "<host>", ... ],
"labels": {
"<labelname>": "<labelvalue>", ...
}
},
...
]
```
The metrics from each worker can also be accessed via
`http://<synapse_container>:9469/metrics/worker/<worker_name>` which is
what the service discovery response points to behind the scenes. This
way, you only need to expose a single port (9469) to access all metrics.
<details>
<summary>Real HTTP service discovery response</summary>
```json5
[
{
"targets": [
"localhost:9469"
],
"labels": {
"job": "event_persister",
"index": "1",
"__metrics_path__": "/metrics/worker/event_persister1"
}
},
{
"targets": [
"localhost:9469"
],
"labels": {
"job": "event_persister",
"index": "2",
"__metrics_path__": "/metrics/worker/event_persister2"
}
},
{
"targets": [
"localhost:9469"
],
"labels": {
"job": "background_worker",
"index": "1",
"__metrics_path__": "/metrics/worker/background_worker1"
}
},
{
"targets": [
"localhost:9469"
],
"labels": {
"job": "event_creator",
"index": "1",
"__metrics_path__": "/metrics/worker/event_creator1"
}
},
{
"targets": [
"localhost:9469"
],
"labels": {
"job": "user_dir",
"index": "1",
"__metrics_path__": "/metrics/worker/user_dir1"
}
},
{
"targets": [
"localhost:9469"
],
"labels": {
"job": "media_repository",
"index": "1",
"__metrics_path__": "/metrics/worker/media_repository1"
}
},
{
"targets": [
"localhost:9469"
],
"labels": {
"job": "federation_inbound",
"index": "1",
"__metrics_path__": "/metrics/worker/federation_inbound1"
}
},
{
"targets": [
"localhost:9469"
],
"labels": {
"job": "federation_reader",
"index": "1",
"__metrics_path__": "/metrics/worker/federation_reader1"
}
},
{
"targets": [
"localhost:9469"
],
"labels": {
"job": "federation_sender",
"index": "1",
"__metrics_path__": "/metrics/worker/federation_sender1"
}
},
{
"targets": [
"localhost:9469"
],
"labels": {
"job": "synchrotron",
"index": "1",
"__metrics_path__": "/metrics/worker/synchrotron1"
}
},
{
"targets": [
"localhost:9469"
],
"labels": {
"job": "client_reader",
"index": "1",
"__metrics_path__": "/metrics/worker/client_reader1"
}
},
{
"targets": [
"localhost:9469"
],
"labels": {
"job": "appservice",
"index": "1",
"__metrics_path__": "/metrics/worker/appservice1"
}
},
{
"targets": [
"localhost:9469"
],
"labels": {
"job": "pusher",
"index": "1",
"__metrics_path__": "/metrics/worker/pusher1"
}
},
{
"targets": [
"localhost:9469"
],
"labels": {
"job": "device_lists",
"index": "1",
"__metrics_path__": "/metrics/worker/device_lists1"
}
},
{
"targets": [
"localhost:9469"
],
"labels": {
"job": "device_lists",
"index": "2",
"__metrics_path__": "/metrics/worker/device_lists2"
}
},
{
"targets": [
"localhost:9469"
],
"labels": {
"job": "stream_writers",
"index": "1",
"__metrics_path__": "/metrics/worker/stream_writers1"
}
},
{
"targets": [
"localhost:9469"
],
"labels": {
"job": "main",
"index": "1",
"__metrics_path__": "/metrics/worker/main"
}
}
]
```
</details>
And how it ends up as targets in Prometheus
(http://localhost:9090/targets):
(image)
### Testing strategy
1. Make sure your firewall allows the Docker containers to communicate
to the host (`host.docker.internal`) so they can access exposed ports of
other Docker containers. We want to allow Synapse to access the
Prometheus container and Grafana to access to the Prometheus container.
- `sudo ufw allow in on docker0 comment "Allow traffic from the default
Docker network to the host machine (host.docker.internal)"`
- `sudo ufw allow in on br-+ comment "(from Matrix Complement testing)
Allow traffic from custom Docker networks to the host machine
(host.docker.internal)"`
- [Complement firewall
docs](ee6acd9154/README.md (potential-conflict-with-firewall-software))
1. Build the Docker image for Synapse: `docker build -t
matrixdotorg/synapse -f docker/Dockerfile . && docker build -t
matrixdotorg/synapse-workers -f docker/Dockerfile-workers .`
([docs](7a24fafbc3/docker/README-testing.md (building-and-running-the-images-manually)))
1. Start Synapse:
```
docker run -d --name synapse \
--mount type=volume,src=synapse-data,dst=/data \
-e SYNAPSE_SERVER_NAME=my.docker.synapse.server \
-e SYNAPSE_REPORT_STATS=no \
-e SYNAPSE_ENABLE_METRICS=1 \
-p 8008:8008 \
-p 9469:9469 \
matrixdotorg/synapse-workers:latest
```
- Also try with workers:
```
docker run -d --name synapse \
--mount type=volume,src=synapse-data,dst=/data \
-e SYNAPSE_SERVER_NAME=my.docker.synapse.server \
-e SYNAPSE_REPORT_STATS=no \
-e SYNAPSE_ENABLE_METRICS=1 \
-e SYNAPSE_WORKER_TYPES="\
event_persister:2, \
background_worker, \
event_creator, \
user_dir, \
media_repository, \
federation_inbound, \
federation_reader, \
federation_sender, \
synchrotron, \
client_reader, \
appservice, \
pusher, \
device_lists:2, \
stream_writers=account_data+presence+receipts+to_device+typing" \
-p 8008:8008 \
-p 9469:9469 \
matrixdotorg/synapse-workers:latest
```
1. You should be able to see Prometheus service discovery endpoint at
http://localhost:9469/metrics/service_discovery
1. Create a Prometheus config (`prometheus.yml`)
```yaml
global:
scrape_interval: 15s
scrape_timeout: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: synapse
scrape_interval: 15s
metrics_path: /_synapse/metrics
scheme: http
# We set `honor_labels` so that each service can set their own `job`
label
#
# > honor_labels controls how Prometheus handles conflicts between
labels that are
# > already present in scraped data and labels that Prometheus would
attach
# > server-side ("job" and "instance" labels, manually configured target
# > labels, and labels generated by service discovery implementations).
# >
# > *--
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config*
honor_labels: true
# Use HTTP service discovery
#
# Reference:
# - https://prometheus.io/docs/prometheus/latest/http_sd/
# -
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#http_sd_config
http_sd_configs:
- url: 'http://localhost:9469/metrics/service_discovery'
```
1. Start Prometheus (update the volume bind mount to the config you just
saved somewhere):
```
docker run \
--detach \
--name=prometheus \
--add-host host.docker.internal:host-gateway \
-p 9090:9090 \
-v
~/Documents/code/random/prometheus-config/prometheus.yml:/etc/prometheus/prometheus.yml
\
prom/prometheus
```
1. Make sure you're seeing some data in Prometheus. On
http://localhost:9090/query, search for `synapse_build_info`
1. Start [Grafana](https://hub.docker.com/r/grafana/grafana)
```
docker run -d --name=grafana --add-host
host.docker.internal:host-gateway -p 3000:3000 grafana/grafana
```
1. Visit the Grafana dashboard, http://localhost:3000/ (Credentials:
`admin`/`admin`)
1. **Connections** -> **Data Sources** -> **Add data source** ->
**Prometheus**
- Prometheus server URL: `http://host.docker.internal:9090`
1. Import the Synapse dashboard:
https://github.com/element-hq/synapse/blob/develop/contrib/grafana/synapse.json