- Update `synapse_xxx` (server-level) metrics to use
`server_name="$server_name",` instead of `instance="$instance"`
- Add `synapse_server_name_info` metric to map Synapse `server_name`s to
the `instance`s they're hosted on.
- For process level metrics, update to use `xxx * on (instance, job,
index) group_left(server_name)
synapse_server_name_info{server_name="$server_name"}`
All of the changes here are backwards compatible with whatever people
were doing before with their Prometheus/Grafana dashboards.
Previously, the recommendation was to use the `instance` label to group
everything under the same server (803e4b4d88/docs/metrics-howto.md (L93-L147))
But the `instance` label actually has a special meaning and we're
actually abusing it by using it that way:
> `instance`: The `<host>:<port>` part of the target's URL that was
scraped.
>
> *--
https://prometheus.io/docs/concepts/jobs_instances/#automatically-generated-labels-and-time-series*
Since https://github.com/element-hq/synapse/issues/18592 (Synapse
`v1.139.0`), we now have the `server_name` label to use instead.
---
Additionally, the assumption that a single process is serving a single
server is no longer true with [Synapse Pro for small
hosts](https://docs.element.io/latest/element-server-suite-pro/synapse-pro-for-small-hosts/overview/).
Part of https://github.com/element-hq/synapse-small-hosts/issues/106
### Motivating use case
Although this change also benefits [Synapse Pro for small
hosts](https://docs.element.io/latest/element-server-suite-pro/synapse-pro-for-small-hosts/overview/)
(https://github.com/element-hq/synapse-small-hosts/issues/106), this is
actually spawning from adding Prometheus metrics to our workerized
Docker image (https://github.com/element-hq/synapse/pull/19324,
https://github.com/element-hq/synapse/pull/19336) with a more correct
label setup (without `instance`) and wanting the dashboard to be better.
### Testing strategy
1. Make sure your firewall allows the Docker containers to communicate
to the host (`host.docker.internal`) so they can access exposed ports of
other Docker containers. We want to allow Synapse to access the
Prometheus container and Grafana to access to the Prometheus container.
- `sudo ufw allow in on docker0 comment "Allow traffic from the default
Docker network to the host machine (host.docker.internal)"`
- `sudo ufw allow in on br-+ comment "(from Matrix Complement testing)
Allow traffic from custom Docker networks to the host machine
(host.docker.internal)"`
- [Complement firewall
docs](ee6acd9154/README.md (potential-conflict-with-firewall-software))
1. Build the Docker image for Synapse: `docker build -t
matrixdotorg/synapse -f docker/Dockerfile .`
([docs](7a24fafbc3/docker/README-testing.md (building-and-running-the-images-manually)))
1. Generate config for Synapse:
```
docker run -it --rm \
--mount type=volume,src=synapse-data,dst=/data \
-e SYNAPSE_SERVER_NAME=my.docker.synapse.server \
-e SYNAPSE_REPORT_STATS=yes \
-e SYNAPSE_ENABLE_METRICS=1 \
matrixdotorg/synapse:latest generate
```
1. Start Synapse:
```
docker run -d --name synapse \
--mount type=volume,src=synapse-data,dst=/data \
-p 8008:8008 \
-p 19090:19090 \
matrixdotorg/synapse:latest
```
1. You should be able to see metrics from Synapse at
http://localhost:19090/_synapse/metrics
1. Create a Prometheus config (`prometheus.yml`)
```yaml
global:
scrape_interval: 15s
scrape_timeout: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: prometheus
scrape_interval: 15s
metrics_path: /_synapse/metrics
scheme: http
static_configs:
- targets:
# This should point to the Synapse metrics listener (we're using
`host.docker.internal` because this is from within the Prometheus
container)
- host.docker.internal:19090
```
1. Start Prometheus (update the volume bind mount to the config you just
saved somewhere):
```
docker run \
--detach \
--name=prometheus \
--add-host host.docker.internal:host-gateway \
-p 9090:9090 \
-v
~/Documents/code/random/prometheus-config/prometheus.yml:/etc/prometheus/prometheus.yml
\
prom/prometheus
```
1. Make sure you're seeing some data in Prometheus. On
http://localhost:9090/query, search for `synapse_build_info`
1. Start [Grafana](https://hub.docker.com/r/grafana/grafana)
```
docker run -d --name=grafana --add-host
host.docker.internal:host-gateway -p 3000:3000 grafana/grafana
```
1. Visit the Grafana dashboard, http://localhost:3000/ (Credentials:
`admin`/`admin`)
1. **Connections** -> **Data Sources** -> **Add data source** ->
**Prometheus**
- Prometheus server URL: `http://host.docker.internal:9090`
1. Import the Synapse dashboard: `contrib/grafana/synapse.json`
To test workers, you can use the testing strategy from
https://github.com/element-hq/synapse/pull/19336 (assumes both changes
from this PR and the other PR are combined)
4.2 KiB
How to monitor Synapse metrics using Prometheus
-
Install Prometheus:
Follow instructions at http://prometheus.io/docs/introduction/install/
-
Enable Synapse metrics:
In
homeserver.yaml, make sureenable_metricsis set toTrue. -
Enable the
/_synapse/metricsSynapse endpoint that Prometheus uses to collect data:There are two methods of enabling the metrics endpoint in Synapse.
The first serves the metrics as a part of the usual web server and can be enabled by adding the
metricsresource to the existing listener as such as in this example:listeners: - port: 8008 tls: false type: http x_forwarded: true bind_addresses: ['::1', '127.0.0.1'] resources: # added "metrics" in this line - names: [client, federation, metrics] compress: falseThis provides a simple way of adding metrics to your Synapse installation, and serves under
/_synapse/metrics. If you do not wish your metrics be publicly exposed, you will need to either filter it out at your load balancer, or use the second method.The second method runs the metrics server on a different port, in a different thread to Synapse. This can make it more resilient to heavy load meaning metrics cannot be retrieved, and can be exposed to just internal networks easier. The served metrics are available over HTTP only, and will be available at
/_synapse/metrics.Add a new listener to homeserver.yaml as in this example:
listeners: - port: 8008 tls: false type: http x_forwarded: true bind_addresses: ['::1', '127.0.0.1'] resources: - names: [client, federation] compress: false # beginning of the new metrics listener - port: 9000 type: metrics bind_addresses: ['::1', '127.0.0.1'] -
Restart Synapse.
-
Add a Prometheus target for Synapse.
It needs to set the
metrics_pathto a non-default value (underscrape_configs):- job_name: "synapse" scrape_interval: 15s metrics_path: "/_synapse/metrics" static_configs: - targets: ["my.server.here:port"]where
my.server.hereis the IP address of Synapse, andportis the listener port configured with themetricsresource.If your prometheus is older than 1.5.2, you will need to replace
static_configsin the above withtarget_groups. -
Restart Prometheus.
-
Consider using the grafana dashboard and required recording rules
Monitoring workers
To monitor a Synapse installation using workers, every worker needs to be monitored independently, in addition to the main homeserver process. This is because workers don't send their metrics to the main homeserver process, but expose them directly (if they are configured to do so).
To allow collecting metrics from a worker, you need to add a
metrics listener to its configuration, by adding the following
under worker_listeners:
- type: metrics
bind_address: ''
port: 9101
The bind_address and port parameters should be set so that
the resulting listener can be reached by prometheus, and they
don't clash with an existing worker.
With this example, the worker's metrics would then be available
on http://127.0.0.1:9101.
Example Prometheus target for Synapse with workers:
- job_name: "synapse"
scrape_interval: 15s
metrics_path: "/_synapse/metrics"
static_configs:
- targets: ["my.server.here:port"]
labels:
job: "master"
index: 1
- targets: ["my.workerserver.here:port"]
labels:
job: "generic_worker"
index: 1
- targets: ["my.workerserver.here:port"]
labels:
job: "generic_worker"
index: 2
- targets: ["my.workerserver.here:port"]
labels:
job: "media_repository"
index: 1
Labels (job, index) can be defined as anything.
The labels are used to group graphs in grafana.