Files

Eric Eastwood 58f59ffbcb Refactor Grafana dashboard to use server_name label (#19337 )

- Update `synapse_xxx` (server-level) metrics to use
`server_name="$server_name",` instead of `instance="$instance"`
- Add `synapse_server_name_info` metric to map Synapse `server_name`s to
the `instance`s they're hosted on.
- For process level metrics, update to use `xxx * on (instance, job,
index) group_left(server_name)
synapse_server_name_info{server_name="$server_name"}`

All of the changes here are backwards compatible with whatever people
were doing before with their Prometheus/Grafana dashboards.

Previously, the recommendation was to use the `instance` label to group
everything under the same server (803e4b4d88/docs/metrics-howto.md (L93-L147))

But the `instance` label actually has a special meaning and we're
actually abusing it by using it that way:

> `instance`: The `<host>:<port>` part of the target's URL that was
scraped.
>
> *--
https://prometheus.io/docs/concepts/jobs_instances/#automatically-generated-labels-and-time-series*

Since https://github.com/element-hq/synapse/issues/18592 (Synapse
`v1.139.0`), we now have the `server_name` label to use instead.


---

Additionally, the assumption that a single process is serving a single
server is no longer true with [Synapse Pro for small
hosts](https://docs.element.io/latest/element-server-suite-pro/synapse-pro-for-small-hosts/overview/).

Part of https://github.com/element-hq/synapse-small-hosts/issues/106

### Motivating use case

Although this change also benefits [Synapse Pro for small
hosts](https://docs.element.io/latest/element-server-suite-pro/synapse-pro-for-small-hosts/overview/)
(https://github.com/element-hq/synapse-small-hosts/issues/106), this is
actually spawning from adding Prometheus metrics to our workerized
Docker image (https://github.com/element-hq/synapse/pull/19324,
https://github.com/element-hq/synapse/pull/19336) with a more correct
label setup (without `instance`) and wanting the dashboard to be better.



### Testing strategy

1. Make sure your firewall allows the Docker containers to communicate
to the host (`host.docker.internal`) so they can access exposed ports of
other Docker containers. We want to allow Synapse to access the
Prometheus container and Grafana to access to the Prometheus container.
- `sudo ufw allow in on docker0 comment "Allow traffic from the default
Docker network to the host machine (host.docker.internal)"`
- `sudo ufw allow in on br-+ comment "(from Matrix Complement testing)
Allow traffic from custom Docker networks to the host machine
(host.docker.internal)"`
- [Complement firewall
docs](ee6acd9154/README.md (potential-conflict-with-firewall-software))
1. Build the Docker image for Synapse: `docker build -t
matrixdotorg/synapse -f docker/Dockerfile .`
([docs](7a24fafbc3/docker/README-testing.md (building-and-running-the-images-manually)))
 1. Generate config for Synapse:
    ```
    docker run -it --rm \
        --mount type=volume,src=synapse-data,dst=/data \
        -e SYNAPSE_SERVER_NAME=my.docker.synapse.server \
        -e SYNAPSE_REPORT_STATS=yes \
        -e SYNAPSE_ENABLE_METRICS=1 \
        matrixdotorg/synapse:latest generate
    ```
 1. Start Synapse:
     ```
    docker run -d --name synapse \
        --mount type=volume,src=synapse-data,dst=/data \
        -p 8008:8008 \
        -p 19090:19090 \
        matrixdotorg/synapse:latest
    ```
1. You should be able to see metrics from Synapse at
http://localhost:19090/_synapse/metrics
 1. Create a Prometheus config (`prometheus.yml`)
    ```yaml
    global:
      scrape_interval: 15s
      scrape_timeout: 15s
      evaluation_interval: 15s
    
    scrape_configs:
      - job_name: prometheus
        scrape_interval: 15s
        metrics_path: /_synapse/metrics
        scheme: http
        static_configs:
          - targets:
# This should point to the Synapse metrics listener (we're using
`host.docker.internal` because this is from within the Prometheus
container)
              - host.docker.internal:19090
    ```
1. Start Prometheus (update the volume bind mount to the config you just
saved somewhere):
    ```
    docker run \
        --detach \
        --name=prometheus \
        --add-host host.docker.internal:host-gateway \
        -p 9090:9090 \
-v
~/Documents/code/random/prometheus-config/prometheus.yml:/etc/prometheus/prometheus.yml
\
        prom/prometheus
    ```
1. Make sure you're seeing some data in Prometheus. On
http://localhost:9090/query, search for `synapse_build_info`
 1. Start [Grafana](https://hub.docker.com/r/grafana/grafana)
    ```
docker run -d --name=grafana --add-host
host.docker.internal:host-gateway -p 3000:3000 grafana/grafana
    ```
1. Visit the Grafana dashboard, http://localhost:3000/ (Credentials:
`admin`/`admin`)
1. **Connections** -> **Data Sources** -> **Add data source** ->
**Prometheus**
     - Prometheus server URL: `http://host.docker.internal:9090`
 1. Import the Synapse dashboard: `contrib/grafana/synapse.json`

To test workers, you can use the testing strategy from
https://github.com/element-hq/synapse/pull/19336 (assumes both changes
from this PR and the other PR are combined)

2026-01-14 17:57:42 -06:00

4.2 KiB

Raw Blame History

How to monitor Synapse metrics using Prometheus

Install Prometheus:

Follow instructions at http://prometheus.io/docs/introduction/install/
Enable Synapse metrics:

In homeserver.yaml, make sure enable_metrics is set to True.
Enable the /_synapse/metrics Synapse endpoint that Prometheus uses to collect data:

There are two methods of enabling the metrics endpoint in Synapse.

The first serves the metrics as a part of the usual web server and can be enabled by adding the metrics resource to the existing listener as such as in this example:
```
listeners:
  - port: 8008
    tls: false
    type: http
    x_forwarded: true
    bind_addresses: ['::1', '127.0.0.1']

    resources:
      # added "metrics" in this line
      - names: [client, federation, metrics]
        compress: false
```
This provides a simple way of adding metrics to your Synapse installation, and serves under /_synapse/metrics. If you do not wish your metrics be publicly exposed, you will need to either filter it out at your load balancer, or use the second method.

The second method runs the metrics server on a different port, in a different thread to Synapse. This can make it more resilient to heavy load meaning metrics cannot be retrieved, and can be exposed to just internal networks easier. The served metrics are available over HTTP only, and will be available at /_synapse/metrics.

Add a new listener to homeserver.yaml as in this example:
```
listeners:
  - port: 8008
    tls: false
    type: http
    x_forwarded: true
    bind_addresses: ['::1', '127.0.0.1']

    resources:
      - names: [client, federation]
        compress: false

  # beginning of the new metrics listener
  - port: 9000
    type: metrics
    bind_addresses: ['::1', '127.0.0.1']
```
Restart Synapse.
Add a Prometheus target for Synapse.

It needs to set the metrics_path to a non-default value (under scrape_configs):
```
  - job_name: "synapse"
    scrape_interval: 15s
    metrics_path: "/_synapse/metrics"
    static_configs:
      - targets: ["my.server.here:port"]
```
where my.server.here is the IP address of Synapse, and port is the listener port configured with the metrics resource.

If your prometheus is older than 1.5.2, you will need to replace static_configs in the above with target_groups.
Restart Prometheus.
Consider using the grafana dashboard and required recording rules

Monitoring workers

To monitor a Synapse installation using workers, every worker needs to be monitored independently, in addition to the main homeserver process. This is because workers don't send their metrics to the main homeserver process, but expose them directly (if they are configured to do so).

To allow collecting metrics from a worker, you need to add a metrics listener to its configuration, by adding the following under worker_listeners:

  - type: metrics
    bind_address: ''
    port: 9101

The bind_address and port parameters should be set so that the resulting listener can be reached by prometheus, and they don't clash with an existing worker. With this example, the worker's metrics would then be available on http://127.0.0.1:9101.

Example Prometheus target for Synapse with workers:

  - job_name: "synapse"
    scrape_interval: 15s
    metrics_path: "/_synapse/metrics"
    static_configs:
      - targets: ["my.server.here:port"]
        labels:
          job: "master"
          index: 1
      - targets: ["my.workerserver.here:port"]
        labels:
          job: "generic_worker"
          index: 1
      - targets: ["my.workerserver.here:port"]
        labels:
          job: "generic_worker"
          index: 2
      - targets: ["my.workerserver.here:port"]
        labels:
          job: "media_repository"
          index: 1

Labels (job, index) can be defined as anything. The labels are used to group graphs in grafana.

4.2 KiB Raw Blame History

How to monitor Synapse metrics using Prometheus

Monitoring workers

4.2 KiB

Raw Blame History