1
0
Commit Graph

212 Commits

Author SHA1 Message Date
Eric Eastwood
826a7dd29a Update "Event Send Time Quantiles" graph to only use dots for the event persistence rate (#19399)
This is the same thing we already do in the [`matrix.org`
dashboard](https://grafana.matrix.org/d/000000012/synapse) and although
the purple dots aren't new (introduced in
https://github.com/matrix-org/synapse/pull/10001), you can see that was
the intention in https://github.com/element-hq/synapse/pull/18510. I
think this was just how our contrib dashboard looked at the time and
perhaps was a Grafana version mismatch thing which is why it didn't
translate.
2026-01-22 14:07:22 -06:00
Eric Eastwood
d6b45a7c8c Update and align Grafana dashboard to use regex matching for job=~"$job" (#19400)
We're already using `job=~"$job"` in the majority of the other panels.
This is just aligning the stragglers.

### Background

For a variable in Grafana, when the "All" value is selected, it
translates the variable into a wildcard regex. By default, this is just
a giant list of all of the possible values or'd together. It's possible
to define a "custom all value" like we've done for `index` as `.*` and
feels like we should also do this in a follow-up PR.

Input:
```
job="$job"
```

Before (using **exact** match) -> resulted in matching nothing:

```
job="(appservice|background_worker|client_reader|device_lists|event_creator|event_persister|federation_inbound|federation_reader|federation_sender|media_repository|pusher|stream_writers|synapse|synchrotron|user_dir)""
```

After (using **regex** match) -> matches all jobs as expected:

```
job=~"(appservice|background_worker|client_reader|device_lists|event_creator|event_persister|federation_inbound|federation_reader|federation_sender|media_repository|pusher|stream_writers|synapse|synchrotron|user_dir)""
```
2026-01-22 11:18:49 -06:00
Eric Eastwood
87d93b1ae6 Latest changes from importing/exporting from Grafana 12.3.1 (#19381)
These are automatic changes from importing/exporting from Grafana
12.3.1.

In order to verify that I'm not sneaking in any changes, you can follow
these steps to get the same output.

Reproduction instructions:

 1. Start [Grafana](https://hub.docker.com/r/grafana/grafana)
    ```
docker run -d --name=grafana --add-host
host.docker.internal:host-gateway -p 3000:3000 grafana/grafana
    ```
1. Visit the Grafana dashboard, http://localhost:3000/ (Credentials:
`admin`/`admin`)
 1. Import the Synapse dashboard: `contrib/grafana/synapse.json`
1. Export the Synapse dashboard. On the dashboard page -> **Export** ->
**Export as code** -> Using the **Classic** model -> Check **Export for
sharing externally** -> Copy
 1. Paste into `contrib/grafana/synapse.json`
 1. `git status`/`git diff` to check if there is any diff

Sanity checked the dashboard itself by importing the dashboard on
https://grafana.matrix.org/ (Grafana 10.4.1 according to
https://grafana.matrix.org/api/health). The process-level metrics won't
work because https://github.com/element-hq/synapse/pull/19337 just
merged and isn't on `matrix.org` yet. Also just generally, this
dashboard works for me locally with the
[load-tests](https://github.com/element-hq/synapse-rust-apps/pull/397)
I've been doing.


### Motivation

There are few fixes I want to make to the Grafana dashboard and it sucks
having to manually translate everything back over because we have
different formatting.

Hopefully after this bulk change, future exports will have exactly what
we want to change.
2026-01-16 11:36:49 -06:00
Eric Eastwood
58f59ffbcb Refactor Grafana dashboard to use server_name label (#19337)
- Update `synapse_xxx` (server-level) metrics to use
`server_name="$server_name",` instead of `instance="$instance"`
- Add `synapse_server_name_info` metric to map Synapse `server_name`s to
the `instance`s they're hosted on.
- For process level metrics, update to use `xxx * on (instance, job,
index) group_left(server_name)
synapse_server_name_info{server_name="$server_name"}`

All of the changes here are backwards compatible with whatever people
were doing before with their Prometheus/Grafana dashboards.

Previously, the recommendation was to use the `instance` label to group
everything under the same server (803e4b4d88/docs/metrics-howto.md (L93-L147))

But the `instance` label actually has a special meaning and we're
actually abusing it by using it that way:

> `instance`: The `<host>:<port>` part of the target's URL that was
scraped.
>
> *--
https://prometheus.io/docs/concepts/jobs_instances/#automatically-generated-labels-and-time-series*

Since https://github.com/element-hq/synapse/issues/18592 (Synapse
`v1.139.0`), we now have the `server_name` label to use instead.


---

Additionally, the assumption that a single process is serving a single
server is no longer true with [Synapse Pro for small
hosts](https://docs.element.io/latest/element-server-suite-pro/synapse-pro-for-small-hosts/overview/).

Part of https://github.com/element-hq/synapse-small-hosts/issues/106

### Motivating use case

Although this change also benefits [Synapse Pro for small
hosts](https://docs.element.io/latest/element-server-suite-pro/synapse-pro-for-small-hosts/overview/)
(https://github.com/element-hq/synapse-small-hosts/issues/106), this is
actually spawning from adding Prometheus metrics to our workerized
Docker image (https://github.com/element-hq/synapse/pull/19324,
https://github.com/element-hq/synapse/pull/19336) with a more correct
label setup (without `instance`) and wanting the dashboard to be better.



### Testing strategy

1. Make sure your firewall allows the Docker containers to communicate
to the host (`host.docker.internal`) so they can access exposed ports of
other Docker containers. We want to allow Synapse to access the
Prometheus container and Grafana to access to the Prometheus container.
- `sudo ufw allow in on docker0 comment "Allow traffic from the default
Docker network to the host machine (host.docker.internal)"`
- `sudo ufw allow in on br-+ comment "(from Matrix Complement testing)
Allow traffic from custom Docker networks to the host machine
(host.docker.internal)"`
- [Complement firewall
docs](ee6acd9154/README.md (potential-conflict-with-firewall-software))
1. Build the Docker image for Synapse: `docker build -t
matrixdotorg/synapse -f docker/Dockerfile .`
([docs](7a24fafbc3/docker/README-testing.md (building-and-running-the-images-manually)))
 1. Generate config for Synapse:
    ```
    docker run -it --rm \
        --mount type=volume,src=synapse-data,dst=/data \
        -e SYNAPSE_SERVER_NAME=my.docker.synapse.server \
        -e SYNAPSE_REPORT_STATS=yes \
        -e SYNAPSE_ENABLE_METRICS=1 \
        matrixdotorg/synapse:latest generate
    ```
 1. Start Synapse:
     ```
    docker run -d --name synapse \
        --mount type=volume,src=synapse-data,dst=/data \
        -p 8008:8008 \
        -p 19090:19090 \
        matrixdotorg/synapse:latest
    ```
1. You should be able to see metrics from Synapse at
http://localhost:19090/_synapse/metrics
 1. Create a Prometheus config (`prometheus.yml`)
    ```yaml
    global:
      scrape_interval: 15s
      scrape_timeout: 15s
      evaluation_interval: 15s
    
    scrape_configs:
      - job_name: prometheus
        scrape_interval: 15s
        metrics_path: /_synapse/metrics
        scheme: http
        static_configs:
          - targets:
# This should point to the Synapse metrics listener (we're using
`host.docker.internal` because this is from within the Prometheus
container)
              - host.docker.internal:19090
    ```
1. Start Prometheus (update the volume bind mount to the config you just
saved somewhere):
    ```
    docker run \
        --detach \
        --name=prometheus \
        --add-host host.docker.internal:host-gateway \
        -p 9090:9090 \
-v
~/Documents/code/random/prometheus-config/prometheus.yml:/etc/prometheus/prometheus.yml
\
        prom/prometheus
    ```
1. Make sure you're seeing some data in Prometheus. On
http://localhost:9090/query, search for `synapse_build_info`
 1. Start [Grafana](https://hub.docker.com/r/grafana/grafana)
    ```
docker run -d --name=grafana --add-host
host.docker.internal:host-gateway -p 3000:3000 grafana/grafana
    ```
1. Visit the Grafana dashboard, http://localhost:3000/ (Credentials:
`admin`/`admin`)
1. **Connections** -> **Data Sources** -> **Add data source** ->
**Prometheus**
     - Prometheus server URL: `http://host.docker.internal:9090`
 1. Import the Synapse dashboard: `contrib/grafana/synapse.json`

To test workers, you can use the testing strategy from
https://github.com/element-hq/synapse/pull/19336 (assumes both changes
from this PR and the other PR are combined)
2026-01-14 17:57:42 -06:00
Erik Johnston
3ba3c7fe7d Reduce cardinality of metrics on event persister (#19133)
This reduces the size of metrics by ~80%. Responding with the metrics
takes significant amounts of time.
2025-11-12 13:41:58 +00:00
Andrew Ferrazzutti
fcac7e0282 Write union types as X | Y where possible (#19111)
aka PEP 604, added in Python 3.10
2025-11-06 14:02:33 -06:00
Andrew Ferrazzutti
fc244bb592 Use type hinting generics in standard collections (#19046)
aka PEP 585, added in Python 3.9

 - https://peps.python.org/pep-0585/
 - https://docs.astral.sh/ruff/rules/non-pep585-annotation/
2025-10-22 16:48:19 -05:00
Eric Eastwood
f13a136396 Refactor Gauge metrics to be homeserver-scoped (#18725)
Bulk refactor `Gauge` metrics to be homeserver-scoped. We also add lints
to make sure that new `Gauge` metrics don't sneak in without using the
`server_name` label (`SERVER_NAME_LABEL`).

Part of https://github.com/element-hq/synapse/issues/18592



### Testing strategy

 1. Add the `metrics` listener in your `homeserver.yaml`
    ```yaml
    listeners:
      # This is just showing how to configure metrics either way
      #
      # `http` `metrics` resource
      - port: 9322
        type: http
        bind_addresses: ['127.0.0.1']
        resources:
          - names: [metrics]
            compress: false
      # `metrics` listener
      - port: 9323
        type: metrics
        bind_addresses: ['127.0.0.1']
    ```
1. Start the homeserver: `poetry run synapse_homeserver --config-path
homeserver.yaml`
1. Fetch `http://localhost:9322/_synapse/metrics` and/or
`http://localhost:9323/metrics`
1. Observe response includes the TODO metrics with the `server_name`
label

### Pull Request Checklist

<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->

* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
  - Use markdown where necessary, mostly for `code blocks`.
  - End with either a period (.) or an exclamation mark (!).
  - Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct (run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
2025-07-29 10:37:59 -05:00
Andrew Morgan
291880012f Stop sending or processing the origin field in PDUs (#18418)
Co-authored-by: Quentin Gliech <quenting@element.io>
Co-authored-by: Eric Eastwood <erice@element.io>
2025-07-01 12:04:23 +01:00
Eric Eastwood
e80bc4b062 Distinguish all vs local events being persisted in the "Event Send Time Quantiles" graph (#18510)
(Applies to the Grafana graphs)

As discovered by @devonh, we use `synapse_storage_events_persisted_events_total` (which tracks *all* persisted events) for the "Events" rate in the "Event Send Time Quantiles" graph. This is pretty misleading as I would expect it to be the rate of events being sent given the graph title, "Event Send Time Quantiles".

Since the event persistence queues are shared for local and remote events from federation and will block local events being sent, I think it does still make sense to have the event persist rate. I've updated the graph to include the rate of "Local events being persisted" and the rate of "All events being persisted". I think this properly disambiguates and clarifies what the graph is trying to show.
2025-06-05 15:30:28 -05:00
Max Kratz
90a6bd01c2 Contrib: Docker: updates PostgreSQL version in docker-compose.yml (#18089)
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
2025-01-21 18:54:31 +00:00
Colin Watson
d69c00b5a1 Stop using twisted.internet.defer.returnValue (#18020)
`defer.returnValue` was only needed in Python 2; in Python 3, a simple
`return` is fine.

`twisted.internet.defer.returnValue` is deprecated as of Twisted 24.7.0.

Most uses of `returnValue` in synapse were removed a while back; this
cleans up some remaining bits.
2024-12-20 10:57:59 +00:00
Matthew Hodgson
4c67d20af7 link to element-docker-demo from contrib/docker* (#17953) 2024-11-22 12:35:03 +00:00
V02460
2fc43e4219 Remove the deprecated cgi module (#17741)
Removes all uses of the `cgi` module from Synapse. It was deprecated in
Python version 3.11 and removed in version 3.13 ([“dead
battery”](https://docs.python.org/3.13/whatsnew/3.13.html#pep-594-remove-dead-batteries-from-the-standard-library)).

### Pull Request Checklist

<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->

* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
  - Use markdown where necessary, mostly for `code blocks`.
  - End with either a period (.) or an exclamation mark (!).
  - Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct
(run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))

---------

Co-authored-by: Quentin Gliech <quenting@element.io>
2024-09-25 11:15:34 +02:00
Quentin Gliech
7d52ce7d4b Format files with Ruff (#17643)
I thought ruff check would also format, but it doesn't.

This runs ruff format in CI and dev scripts. The first commit is just a
run of `ruff format .` in the root directory.
2024-09-02 12:39:04 +01:00
Erik Johnston
23740eaa3d Correctly mention previous copyright (#16820)
During the migration the automated script to update the copyright
headers accidentally got rid of some of the existing copyright lines.
Reinstate them.
2024-01-23 11:26:48 +00:00
Erik Johnston
0455c40085 Update book location 2023-12-13 16:15:22 +00:00
Erik Johnston
8613f7693e More renaming 2023-12-13 15:41:11 +00:00
Patrick Cloke
8e1e62c9e0 Update license headers 2023-11-21 15:29:58 -05:00
Michael Sasser
3df70aa800 Replace all Prometheus datasource UIDs of the Grafana Dashboard with the variable ${DS_PROMETHEUS} and remove __inputs (#16471) 2023-10-23 19:50:50 +01:00
Patrick Cloke
aa483cb4c9 Update ruff config (#16283)
Enable additional checks & clean-up unneeded configuration.
2023-09-08 11:24:36 -04:00
Patrick Cloke
ad3f43be9a Run pyupgrade for python 3.7 & 3.8. (#16110) 2023-08-15 08:11:20 -04:00
Will Lewis
835174180b Fixed grafana deploy annotations in the dashboard config, so it shows for those not managing matrix.org (#15957)
Removed the 'matrix.org' hardcorded instance setting

Originally introduced in #15674

Co-authored-by: wrjlewis <will.lewis@askattest.com>
2023-07-20 12:33:06 +00:00
Eric Eastwood
887fa4b66b Switch from matrix:// to matrix-federation:// scheme for internal Synapse routing of outbound federation traffic (#15806)
`matrix://` is a registered specced scheme nowadays and doesn't make sense for
our internal to Synapse use case anymore. ([discussion]
(https://github.com/matrix-org/synapse/pull/15773#discussion_r1227598679))
2023-06-20 10:05:31 +01:00
Eric Eastwood
0b5f64ff09 Add Synapse version deploy annotations to Grafana dashboard (#15674)
Fix https://github.com/matrix-org/synapse/issues/15662

This manifests as purple lines that show up on all time series panels
that you can hover and see what version was deployed.

Also added a new "Deployed Synapse versions over time" panel
where the color block changes with each version. And mixed this
color block into the "Up" time series panel.

To get the Grafana dashboard JSON to copy here: use the **Share** icon at the top -> **Export** -> check the **Export for sharing externally** option -> **View JSON** or **Save to file**
2023-05-31 14:35:49 -05:00
Roel ter Maat
2611433b70 Add redis SSL configuration options (#15312)
* Add SSL options to redis config

* fix lint issues

* Add documentation and changelog file

* add missing . at the end of the changelog

* Move client context factory to new file

* Rename ssl to tls and fix typo

* fix lint issues

* Added when redis attributes were added
2023-05-11 13:02:51 +01:00
David Robertson
39795b3a4e Make it easier to use DataGrip w/ Synapse's schema (#14982)
Also tweak the schema dump script:

- add a note explaining myself how to use it
-Explicitly call `poetry run`, because not everyone uses direnv :(
2023-02-15 13:51:37 +00:00
999lakhisidhu
27a3a72a50 Support for selecting the Redis logical database. (#15034)
Note that this is only used for key-value store (cached values)
and not for the pub/sub replication used by Synapse.
2023-02-15 07:39:31 -05:00
David Robertson
1958f9de45 lnav config for synpase logs (#14953) 2023-02-01 12:36:04 +00:00
villepeh
d344bc8b6e Include x_forwarded in workers example configs (#14667) 2023-01-13 14:06:58 +00:00
reivilibre
51e7255fbb Fix the *MAU Limits* section of the Grafana dashboard relying on a specific job name for the workers of a Synapse deployment. (#14644) 2022-12-13 14:19:43 +00:00
reivilibre
a6514792b2 Update forgotten references to legacy metrics in the included Grafana dashboard. (#14477)
Fixes https://github.com/matrix-org/synapse/issues/14465
2022-11-22 10:51:01 +00:00
Dirk Klimpel
1eb8dcf4c9 Remove not needed replication listener in docker compose example (#14107) 2022-10-17 12:00:09 +01:00
reivilibre
ac7e5683d6 Add comments to the Prometheus recording rules to make it clear which set of rules you need for Grafana or Prometheus Console. (#13876) 2022-09-23 11:46:45 +01:00
villepeh
269eddad6f Add worker_main_http_uri to the contrib bash script (#13772)
* Add worker_main_http_uri, replace >> with >

Co-authored-by: Dirk Klimpel <5740567+dklimpel@users.noreply.github.com>
Co-authored-by: Erik Johnston <erik@matrix.org>
2022-09-21 15:58:46 +01:00
reivilibre
5261d2e2e8 Remove unused Prometheus recording rules from synapse-v2.rules and add comments describing where the rest are used. (#13756) 2022-09-08 17:50:15 +00:00
reivilibre
526f84bc2e Fix Prometheus recording rules to not use legacy metric names. (#13718) 2022-09-08 15:01:42 +01:00
reivilibre
b455c2a5ec Update Grafana dashboard to not use legacy metric names. (#13714) 2022-09-06 12:21:21 +01:00
reivilibre
f48f4dd59e Update the Grafana dashboard that is included with Synapse in the contrib directory. (#13697)
* Add missing graph to contrib

* Update with minor but plausible changes, including positioning changes

* Newsfile

Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>

Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
2022-09-01 16:27:06 +01:00
Richard van der Hoff
434fd82d5f Update grafana dashboard 2022-08-13 21:50:20 +01:00
villepeh
84c5e6b1fd Bash script for creating multiple stream writers (#13271)
Add another bash script to the contrib directory. It creates multiple stream writers and also prints out the example configuration for homeserver.yaml.

Signed-off-by: Ville Petteri Huh.
2022-07-19 12:37:20 +00:00
villepeh
bc8eefc1e1 Add a sample bash script to docs for creating multiple worker files (#13032)
Signed-off-by: Ville Petteri Huh.
2022-07-11 18:33:53 +01:00
Patrick Cloke
84cd0fe4e2 Fix-up the contrib/graph scripts. (#13013)
* Clarifies comments and documentation.
* Adds type-hints.
* Fixes Python 3 compatibility (and runs pyupgrade).
* Updates for changes in Synapse internals.
2022-06-10 08:30:14 -04:00
James
c316fe8d4a Docker Compose Worker Documentation and Examples (#12737) 2022-06-08 10:26:42 +01:00
Jacek Kuśnierz
88193f2125 Remove direct refeferences to PyNaCl (use signedjson instead). (#12902) 2022-06-01 07:32:35 -04:00
David Robertson
119938792b Remove unused contrib/experiments/cursesio.py (#12910) 2022-05-30 10:47:54 +01:00
David Robertson
80bd614dac Remove contrib/experiments/test_messaging.py (#12911) 2022-05-30 10:47:47 +01:00
David Robertson
563ef172ae Remove contrib/jitsimeetbridge (#12909) 2022-05-30 10:47:40 +01:00
David Robertson
72df42078b Remove contrib/scripts/kick_users.py (#12908) 2022-05-30 10:47:25 +01:00
Sheogorath
77258b6725 docs(contrib): Add link to documentation in dashboard (#12602) 2022-05-09 10:08:31 +00:00