synapse

Author	SHA1	Message	Date
Eric Eastwood	826a7dd29a	Update "Event Send Time Quantiles" graph to only use dots for the event persistence rate (#19399 ) This is the same thing we already do in the [`matrix.org` dashboard](https://grafana.matrix.org/d/000000012/synapse) and although the purple dots aren't new (introduced in https://github.com/matrix-org/synapse/pull/10001), you can see that was the intention in https://github.com/element-hq/synapse/pull/18510. I think this was just how our contrib dashboard looked at the time and perhaps was a Grafana version mismatch thing which is why it didn't translate.	2026-01-22 14:07:22 -06:00
Eric Eastwood	d6b45a7c8c	Update and align Grafana dashboard to use regex matching for `job=~"$job"` (#19400 ) We're already using `job=~"$job"` in the majority of the other panels. This is just aligning the stragglers. ### Background For a variable in Grafana, when the "All" value is selected, it translates the variable into a wildcard regex. By default, this is just a giant list of all of the possible values or'd together. It's possible to define a "custom all value" like we've done for `index` as `.` and feels like we should also do this in a follow-up PR. Input: ``` job="$job" ``` Before (using exact* match) -> resulted in matching nothing: ``` job="(appservice\|background_worker\|client_reader\|device_lists\|event_creator\|event_persister\|federation_inbound\|federation_reader\|federation_sender\|media_repository\|pusher\|stream_writers\|synapse\|synchrotron\|user_dir)"" ``` After (using regex match) -> matches all jobs as expected: ``` job=~"(appservice\|background_worker\|client_reader\|device_lists\|event_creator\|event_persister\|federation_inbound\|federation_reader\|federation_sender\|media_repository\|pusher\|stream_writers\|synapse\|synchrotron\|user_dir)"" ```	2026-01-22 11:18:49 -06:00
Eric Eastwood	87d93b1ae6	Latest changes from importing/exporting from Grafana 12.3.1 (#19381 ) These are automatic changes from importing/exporting from Grafana 12.3.1. In order to verify that I'm not sneaking in any changes, you can follow these steps to get the same output. Reproduction instructions: 1. Start [Grafana](https://hub.docker.com/r/grafana/grafana) ``` docker run -d --name=grafana --add-host host.docker.internal:host-gateway -p 3000:3000 grafana/grafana ``` 1. Visit the Grafana dashboard, http://localhost:3000/ (Credentials: `admin`/`admin`) 1. Import the Synapse dashboard: `contrib/grafana/synapse.json` 1. Export the Synapse dashboard. On the dashboard page -> Export -> Export as code -> Using the Classic model -> Check Export for sharing externally -> Copy 1. Paste into `contrib/grafana/synapse.json` 1. `git status`/`git diff` to check if there is any diff Sanity checked the dashboard itself by importing the dashboard on https://grafana.matrix.org/ (Grafana 10.4.1 according to https://grafana.matrix.org/api/health). The process-level metrics won't work because https://github.com/element-hq/synapse/pull/19337 just merged and isn't on `matrix.org` yet. Also just generally, this dashboard works for me locally with the [load-tests](https://github.com/element-hq/synapse-rust-apps/pull/397) I've been doing. ### Motivation There are few fixes I want to make to the Grafana dashboard and it sucks having to manually translate everything back over because we have different formatting. Hopefully after this bulk change, future exports will have exactly what we want to change.	2026-01-16 11:36:49 -06:00
Eric Eastwood	58f59ffbcb	Refactor Grafana dashboard to use `server_name` label (#19337 ) - Update `synapse_xxx` (server-level) metrics to use `server_name="$server_name",` instead of `instance="$instance"` - Add `synapse_server_name_info` metric to map Synapse `server_name`s to the `instance`s they're hosted on. - For process level metrics, update to use `xxx * on (instance, job, index) group_left(server_name) synapse_server_name_info{server_name="$server_name"}` All of the changes here are backwards compatible with whatever people were doing before with their Prometheus/Grafana dashboards. Previously, the recommendation was to use the `instance` label to group everything under the same server (https://github.com/element-hq/synapse/blob/803e4b4d884b2de4b9e20dc47ffb59a983b8a4b5/docs/metrics-howto.md#L93-L147) But the `instance` label actually has a special meaning and we're actually abusing it by using it that way: > `instance`: The `<host>:<port>` part of the target's URL that was scraped. > > -- https://prometheus.io/docs/concepts/jobs_instances/#automatically-generated-labels-and-time-series Since https://github.com/element-hq/synapse/issues/18592 (Synapse `v1.139.0`), we now have the `server_name` label to use instead. --- Additionally, the assumption that a single process is serving a single server is no longer true with [Synapse Pro for small hosts](https://docs.element.io/latest/element-server-suite-pro/synapse-pro-for-small-hosts/overview/). Part of https://github.com/element-hq/synapse-small-hosts/issues/106 ### Motivating use case Although this change also benefits [Synapse Pro for small hosts](https://docs.element.io/latest/element-server-suite-pro/synapse-pro-for-small-hosts/overview/) (https://github.com/element-hq/synapse-small-hosts/issues/106), this is actually spawning from adding Prometheus metrics to our workerized Docker image (https://github.com/element-hq/synapse/pull/19324, https://github.com/element-hq/synapse/pull/19336) with a more correct label setup (without `instance`) and wanting the dashboard to be better. ### Testing strategy 1. Make sure your firewall allows the Docker containers to communicate to the host (`host.docker.internal`) so they can access exposed ports of other Docker containers. We want to allow Synapse to access the Prometheus container and Grafana to access to the Prometheus container. - `sudo ufw allow in on docker0 comment "Allow traffic from the default Docker network to the host machine (host.docker.internal)"` - `sudo ufw allow in on br-+ comment "(from Matrix Complement testing) Allow traffic from custom Docker networks to the host machine (host.docker.internal)"` - [Complement firewall docs](https://github.com/matrix-org/complement/blob/ee6acd9154bbae2d0071a9cb39593c0a5e37268b/README.md#potential-conflict-with-firewall-software) 1. Build the Docker image for Synapse: `docker build -t matrixdotorg/synapse -f docker/Dockerfile .` ([docs](https://github.com/element-hq/synapse/blob/7a24fafbc376b9bffeb3277b1ad4aa950720c96c/docker/README-testing.md#building-and-running-the-images-manually)) 1. Generate config for Synapse: ``` docker run -it --rm \ --mount type=volume,src=synapse-data,dst=/data \ -e SYNAPSE_SERVER_NAME=my.docker.synapse.server \ -e SYNAPSE_REPORT_STATS=yes \ -e SYNAPSE_ENABLE_METRICS=1 \ matrixdotorg/synapse:latest generate ``` 1. Start Synapse: ``` docker run -d --name synapse \ --mount type=volume,src=synapse-data,dst=/data \ -p 8008:8008 \ -p 19090:19090 \ matrixdotorg/synapse:latest ``` 1. You should be able to see metrics from Synapse at http://localhost:19090/_synapse/metrics 1. Create a Prometheus config (`prometheus.yml`) ```yaml global: scrape_interval: 15s scrape_timeout: 15s evaluation_interval: 15s scrape_configs: - job_name: prometheus scrape_interval: 15s metrics_path: /_synapse/metrics scheme: http static_configs: - targets: # This should point to the Synapse metrics listener (we're using `host.docker.internal` because this is from within the Prometheus container) - host.docker.internal:19090 ``` 1. Start Prometheus (update the volume bind mount to the config you just saved somewhere): ``` docker run \ --detach \ --name=prometheus \ --add-host host.docker.internal:host-gateway \ -p 9090:9090 \ -v ~/Documents/code/random/prometheus-config/prometheus.yml:/etc/prometheus/prometheus.yml \ prom/prometheus ``` 1. Make sure you're seeing some data in Prometheus. On http://localhost:9090/query, search for `synapse_build_info` 1. Start [Grafana](https://hub.docker.com/r/grafana/grafana) ``` docker run -d --name=grafana --add-host host.docker.internal:host-gateway -p 3000:3000 grafana/grafana ``` 1. Visit the Grafana dashboard, http://localhost:3000/ (Credentials: `admin`/`admin`) 1. Connections -> Data Sources -> Add data source -> Prometheus - Prometheus server URL: `http://host.docker.internal:9090` 1. Import the Synapse dashboard: `contrib/grafana/synapse.json` To test workers, you can use the testing strategy from https://github.com/element-hq/synapse/pull/19336 (assumes both changes from this PR and the other PR are combined)	2026-01-14 17:57:42 -06:00
Erik Johnston	3ba3c7fe7d	Reduce cardinality of metrics on event persister (#19133 ) This reduces the size of metrics by ~80%. Responding with the metrics takes significant amounts of time.	2025-11-12 13:41:58 +00:00
Eric Eastwood	f13a136396	Refactor `Gauge` metrics to be homeserver-scoped (#18725 ) Bulk refactor `Gauge` metrics to be homeserver-scoped. We also add lints to make sure that new `Gauge` metrics don't sneak in without using the `server_name` label (`SERVER_NAME_LABEL`). Part of https://github.com/element-hq/synapse/issues/18592 ### Testing strategy 1. Add the `metrics` listener in your `homeserver.yaml` ```yaml listeners: # This is just showing how to configure metrics either way # # `http` `metrics` resource - port: 9322 type: http bind_addresses: ['127.0.0.1'] resources: - names: [metrics] compress: false # `metrics` listener - port: 9323 type: metrics bind_addresses: ['127.0.0.1'] ``` 1. Start the homeserver: `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Fetch `http://localhost:9322/_synapse/metrics` and/or `http://localhost:9323/metrics` 1. Observe response includes the TODO metrics with the `server_name` label ### Pull Request Checklist <!-- Please read https://element-hq.github.io/synapse/latest/development/contributing_guide.html before submitting your pull request --> * [x] Pull request is based on the develop branch * [x] Pull request includes a [changelog file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog). The entry should: - Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from `EventStore` to `EventWorkerStore`.". - Use markdown where necessary, mostly for `code blocks`. - End with either a period (.) or an exclamation mark (!). - Start with a capital letter. - Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry. * [x] [Code style](https://element-hq.github.io/synapse/latest/code_style.html) is correct (run the [linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))	2025-07-29 10:37:59 -05:00
Eric Eastwood	e80bc4b062	Distinguish all vs local events being persisted in the "Event Send Time Quantiles" graph (#18510 ) (Applies to the Grafana graphs) As discovered by @devonh, we use `synapse_storage_events_persisted_events_total` (which tracks all persisted events) for the "Events" rate in the "Event Send Time Quantiles" graph. This is pretty misleading as I would expect it to be the rate of events being sent given the graph title, "Event Send Time Quantiles". Since the event persistence queues are shared for local and remote events from federation and will block local events being sent, I think it does still make sense to have the event persist rate. I've updated the graph to include the rate of "Local events being persisted" and the rate of "All events being persisted". I think this properly disambiguates and clarifies what the graph is trying to show.	2025-06-05 15:30:28 -05:00
Erik Johnston	0455c40085	Update book location	2023-12-13 16:15:22 +00:00
Michael Sasser	3df70aa800	Replace all Prometheus datasource UIDs of the Grafana Dashboard with the variable `${DS_PROMETHEUS}` and remove `__inputs` (#16471 )	2023-10-23 19:50:50 +01:00
Will Lewis	835174180b	Fixed grafana deploy annotations in the dashboard config, so it shows for those not managing matrix.org (#15957 ) Removed the 'matrix.org' hardcorded instance setting Originally introduced in #15674 Co-authored-by: wrjlewis <will.lewis@askattest.com>	2023-07-20 12:33:06 +00:00
Eric Eastwood	0b5f64ff09	Add Synapse version deploy annotations to Grafana dashboard (#15674 ) Fix https://github.com/matrix-org/synapse/issues/15662 This manifests as purple lines that show up on all time series panels that you can hover and see what version was deployed. Also added a new "Deployed Synapse versions over time" panel where the color block changes with each version. And mixed this color block into the "Up" time series panel. To get the Grafana dashboard JSON to copy here: use the Share icon at the top -> Export -> check the Export for sharing externally option -> View JSON or Save to file	2023-05-31 14:35:49 -05:00
reivilibre	51e7255fbb	Fix the MAU Limits section of the Grafana dashboard relying on a specific `job` name for the workers of a Synapse deployment. (#14644 )	2022-12-13 14:19:43 +00:00
reivilibre	a6514792b2	Update forgotten references to legacy metrics in the included Grafana dashboard. (#14477 ) Fixes https://github.com/matrix-org/synapse/issues/14465	2022-11-22 10:51:01 +00:00
reivilibre	b455c2a5ec	Update Grafana dashboard to not use legacy metric names. (#13714 )	2022-09-06 12:21:21 +01:00
reivilibre	f48f4dd59e	Update the Grafana dashboard that is included with Synapse in the `contrib` directory. (#13697 ) * Add missing graph to contrib * Update with minor but plausible changes, including positioning changes * Newsfile Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org> Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>	2022-09-01 16:27:06 +01:00
Richard van der Hoff	434fd82d5f	Update grafana dashboard	2022-08-13 21:50:20 +01:00
Sheogorath	77258b6725	docs(contrib): Add link to documentation in dashboard (#12602 )	2022-05-09 10:08:31 +00:00
David Robertson	a10988983a	Break down cache expiry reasons in grafana (#10880 ) A follow-up to #10829	2021-09-23 14:45:32 +01:00
Brendan Abolivier	d1f43b731c	Update the Synapse Grafana dashboard (#10570 )	2021-08-16 12:57:09 +02:00
Dirk Klimpel	e938f69697	Fix some links in `docs` and `contrib` (#10370 )	2021-07-13 11:55:48 +01:00
Erik Johnston	9c76d0561b	Update the contrib grafana dashboard (#10001 )	2021-05-19 11:47:16 +01:00
Michael Kaye	f49c2093b5	Cross-link documentation to the prometheus recording rules. (#8667 )	2020-10-27 15:29:50 -04:00
Richard van der Hoff	fa361c8f65	Update grafana dashboard	2020-07-13 14:48:21 +01:00
Richard van der Hoff	816589b09a	update grafana dashboard	2020-06-02 12:44:36 +01:00
Richard van der Hoff	782b811789	update grafana dashboard	2020-03-19 10:45:40 +00:00
Matthew Hodgson	cc7ab0d84a	rst->md	2020-03-01 21:21:36 +00:00
Richard van der Hoff	abd334d27b	Add extremities graphs to grafana dashboard	2019-06-25 11:51:32 +01:00
Richard van der Hoff	dd1c722a39	format json for grafana dashboard	2019-06-25 08:59:19 +01:00
Richard van der Hoff	6b0ddf8ee5	update grafana dashboard	2019-04-13 13:10:46 +01:00
Richard van der Hoff	0649306fde	Update grafana dashboard	2018-09-25 13:29:28 +01:00
Richard van der Hoff	9b92720d88	fix event lag graph	2018-08-07 12:27:34 +01:00
Paul Tötterman	9c14c2b561	Add some documentation for using the dashboard	2018-07-31 12:48:37 +03:00
Richard van der Hoff	6aab397ada	synapse grafana dashboard	2018-07-31 09:45:58 +01:00

33 Commits