1
0
Commit Graph

20 Commits

Author SHA1 Message Date
Eric Eastwood
a1e9abc7df Add Prometheus HTTP service discovery endpoint for easy discovery of all workers in Docker image (#19336)
Add Prometheus [HTTP service discovery](https://prometheus.io/docs/prometheus/latest/http_sd/)
endpoint for easy discovery of all workers in Docker image.

Follow-up to https://github.com/element-hq/synapse/pull/19324

Spawning from wanting to [run a load
test](https://github.com/element-hq/synapse-rust-apps/pull/397) against
the Complement Docker image of Synapse and see metrics from the
homeserver.


`GET http://<synapse_container>:9469/metrics/service_discovery`
```json5
[
  {
    "targets": [ "<host>", ... ],
    "labels": {
      "<labelname>": "<labelvalue>", ...
    }
  },
  ...
]
```

The metrics from each worker can also be accessed via
`http://<synapse_container>:9469/metrics/worker/<worker_name>` which is
what the service discovery response points to behind the scenes. This
way, you only need to expose a single port (9469) to access all metrics.

<details>
<summary>Real HTTP service discovery response</summary>

```json5
[
    {
        "targets": [
            "localhost:9469"
        ],
        "labels": {
            "job": "event_persister",
            "index": "1",
            "__metrics_path__": "/metrics/worker/event_persister1"
        }
    },
    {
        "targets": [
            "localhost:9469"
        ],
        "labels": {
            "job": "event_persister",
            "index": "2",
            "__metrics_path__": "/metrics/worker/event_persister2"
        }
    },
    {
        "targets": [
            "localhost:9469"
        ],
        "labels": {
            "job": "background_worker",
            "index": "1",
            "__metrics_path__": "/metrics/worker/background_worker1"
        }
    },
    {
        "targets": [
            "localhost:9469"
        ],
        "labels": {
            "job": "event_creator",
            "index": "1",
            "__metrics_path__": "/metrics/worker/event_creator1"
        }
    },
    {
        "targets": [
            "localhost:9469"
        ],
        "labels": {
            "job": "user_dir",
            "index": "1",
            "__metrics_path__": "/metrics/worker/user_dir1"
        }
    },
    {
        "targets": [
            "localhost:9469"
        ],
        "labels": {
            "job": "media_repository",
            "index": "1",
            "__metrics_path__": "/metrics/worker/media_repository1"
        }
    },
    {
        "targets": [
            "localhost:9469"
        ],
        "labels": {
            "job": "federation_inbound",
            "index": "1",
            "__metrics_path__": "/metrics/worker/federation_inbound1"
        }
    },
    {
        "targets": [
            "localhost:9469"
        ],
        "labels": {
            "job": "federation_reader",
            "index": "1",
            "__metrics_path__": "/metrics/worker/federation_reader1"
        }
    },
    {
        "targets": [
            "localhost:9469"
        ],
        "labels": {
            "job": "federation_sender",
            "index": "1",
            "__metrics_path__": "/metrics/worker/federation_sender1"
        }
    },
    {
        "targets": [
            "localhost:9469"
        ],
        "labels": {
            "job": "synchrotron",
            "index": "1",
            "__metrics_path__": "/metrics/worker/synchrotron1"
        }
    },
    {
        "targets": [
            "localhost:9469"
        ],
        "labels": {
            "job": "client_reader",
            "index": "1",
            "__metrics_path__": "/metrics/worker/client_reader1"
        }
    },
    {
        "targets": [
            "localhost:9469"
        ],
        "labels": {
            "job": "appservice",
            "index": "1",
            "__metrics_path__": "/metrics/worker/appservice1"
        }
    },
    {
        "targets": [
            "localhost:9469"
        ],
        "labels": {
            "job": "pusher",
            "index": "1",
            "__metrics_path__": "/metrics/worker/pusher1"
        }
    },
    {
        "targets": [
            "localhost:9469"
        ],
        "labels": {
            "job": "device_lists",
            "index": "1",
            "__metrics_path__": "/metrics/worker/device_lists1"
        }
    },
    {
        "targets": [
            "localhost:9469"
        ],
        "labels": {
            "job": "device_lists",
            "index": "2",
            "__metrics_path__": "/metrics/worker/device_lists2"
        }
    },
    {
        "targets": [
            "localhost:9469"
        ],
        "labels": {
            "job": "stream_writers",
            "index": "1",
            "__metrics_path__": "/metrics/worker/stream_writers1"
        }
    },
    {
        "targets": [
            "localhost:9469"
        ],
        "labels": {
            "job": "main",
            "index": "1",
            "__metrics_path__": "/metrics/worker/main"
        }
    }
]
```

</details>


And how it ends up as targets in Prometheus
(http://localhost:9090/targets):

(image)


### Testing strategy

1. Make sure your firewall allows the Docker containers to communicate
to the host (`host.docker.internal`) so they can access exposed ports of
other Docker containers. We want to allow Synapse to access the
Prometheus container and Grafana to access to the Prometheus container.
- `sudo ufw allow in on docker0 comment "Allow traffic from the default
Docker network to the host machine (host.docker.internal)"`
- `sudo ufw allow in on br-+ comment "(from Matrix Complement testing)
Allow traffic from custom Docker networks to the host machine
(host.docker.internal)"`
- [Complement firewall
docs](ee6acd9154/README.md (potential-conflict-with-firewall-software))
1. Build the Docker image for Synapse: `docker build -t
matrixdotorg/synapse -f docker/Dockerfile . && docker build -t
matrixdotorg/synapse-workers -f docker/Dockerfile-workers .`
([docs](7a24fafbc3/docker/README-testing.md (building-and-running-the-images-manually)))
 1. Start Synapse:
     ```
    docker run -d --name synapse \
        --mount type=volume,src=synapse-data,dst=/data \
        -e SYNAPSE_SERVER_NAME=my.docker.synapse.server \
        -e SYNAPSE_REPORT_STATS=no \
        -e SYNAPSE_ENABLE_METRICS=1 \
        -p 8008:8008 \
        -p 9469:9469 \
        matrixdotorg/synapse-workers:latest
    ```
    - Also try with workers:
       ```
      docker run -d --name synapse \
          --mount type=volume,src=synapse-data,dst=/data \
          -e SYNAPSE_SERVER_NAME=my.docker.synapse.server \
          -e SYNAPSE_REPORT_STATS=no \
          -e SYNAPSE_ENABLE_METRICS=1 \
          -e SYNAPSE_WORKER_TYPES="\
              event_persister:2, \
              background_worker, \
              event_creator, \
              user_dir, \
              media_repository, \
              federation_inbound, \
              federation_reader, \
              federation_sender, \
              synchrotron, \
              client_reader, \
              appservice, \
              pusher, \
              device_lists:2, \
stream_writers=account_data+presence+receipts+to_device+typing" \
          -p 8008:8008 \
          -p 9469:9469 \
          matrixdotorg/synapse-workers:latest
      ```
1. You should be able to see Prometheus service discovery endpoint at
http://localhost:9469/metrics/service_discovery
 1. Create a Prometheus config (`prometheus.yml`)
    ```yaml
    global:
      scrape_interval: 15s
      scrape_timeout: 15s
      evaluation_interval: 15s
    
    scrape_configs:
      - job_name: synapse
        scrape_interval: 15s
        metrics_path: /_synapse/metrics
        scheme: http
# We set `honor_labels` so that each service can set their own `job`
label
        #
# > honor_labels controls how Prometheus handles conflicts between
labels that are
# > already present in scraped data and labels that Prometheus would
attach
# > server-side ("job" and "instance" labels, manually configured target
# > labels, and labels generated by service discovery implementations).
        # >
# > *--
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config*
        honor_labels: true
        # Use HTTP service discovery
        #
        # Reference:
        #  - https://prometheus.io/docs/prometheus/latest/http_sd/
# -
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#http_sd_config
        http_sd_configs:
          - url: 'http://localhost:9469/metrics/service_discovery'
    ```
1. Start Prometheus (update the volume bind mount to the config you just
saved somewhere):
    ```
    docker run \
        --detach \
        --name=prometheus \
        --add-host host.docker.internal:host-gateway \
        -p 9090:9090 \
-v
~/Documents/code/random/prometheus-config/prometheus.yml:/etc/prometheus/prometheus.yml
\
        prom/prometheus
    ```
1. Make sure you're seeing some data in Prometheus. On
http://localhost:9090/query, search for `synapse_build_info`
 1. Start [Grafana](https://hub.docker.com/r/grafana/grafana)
    ```
docker run -d --name=grafana --add-host
host.docker.internal:host-gateway -p 3000:3000 grafana/grafana
    ```
1. Visit the Grafana dashboard, http://localhost:3000/ (Credentials:
`admin`/`admin`)
1. **Connections** -> **Data Sources** -> **Add data source** ->
**Prometheus**
     - Prometheus server URL: `http://host.docker.internal:9090`
1. Import the Synapse dashboard:
https://github.com/element-hq/synapse/blob/develop/contrib/grafana/synapse.json
2026-01-14 18:02:55 -06:00
Ben Banfield-Zanin
67f22a200d Update Docker images to use Debian trixie (13) and thus Python 3.13 (#19064) 2025-10-20 16:49:17 +01:00
Andrew Ferrazzutti
5ab05e7b95 docker: use shebangs to invoke generated scripts (#18295)
When generating scripts from templates, don't add a leading newline so
that their shebangs may be handled correctly.

### Pull Request Checklist

<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->

* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
  - Use markdown where necessary, mostly for `code blocks`.
  - End with either a period (.) or an exclamation mark (!).
  - Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct
(run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))

---------

Co-authored-by: Quentin Gliech <quenting@element.io>
2025-04-30 14:26:08 +00:00
Andrew Ferrazzutti
4097ada89f Optimize Dockerfile-workers (#18292)
- Use a `uv:python` image for the first build layer, to reduce the
number of intermediate images required, as the
main Dockerfile uses that image already
- Use a cache mount for `apt` commands
- Skip a pointless install of `redis-server`, since the redis Docker
image is copied from instead
- Move some RUN steps out of the final image layer & into the build
layer

Depends on https://github.com/element-hq/synapse/pull/18275

### Pull Request Checklist

<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->

* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
  - Use markdown where necessary, mostly for `code blocks`.
  - End with either a period (.) or an exclamation mark (!).
  - Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct
(run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
2025-04-30 15:54:30 +02:00
Andrew Ferrazzutti
90f346183a Use uv pip to install supervisor in workers image (#18275) 2025-04-01 12:32:56 +00:00
Andrew Ferrazzutti
92a29dcffc Docker: Use an ARG for debian version more often (#18272) 2025-03-25 13:57:55 +00:00
Erik Johnston
3cf1a3aa17 Use bookwork as docker base image (#16324) 2023-09-15 13:14:10 +01:00
dependabot[bot]
36a5bcae2c Bump library/redis from 6-bullseye to 7-bullseye in /docker (#15712)
Bumps library/redis from 6-bullseye to 7-bullseye.

---
updated-dependencies:
- dependency-name: library/redis
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-05 10:31:54 +01:00
Jason Little
874378c052 Docker fully qualified image names (#15689)
* Fully qualified docker image names for the main Dockerfile and Complement related.

* Fully qualified docker image names for Dockerfiles associated with building Debian release artifacts.

This one is harder and is separate from the other commit in case it wasn't correct or was unwanted. I decided to
do the expansion on the docker images in the Dockerfile itself, instead of the various source places that build
which distribution that is selected, as it would have been more invasive with the scripts breaking up the string
for tagging and such. This one is untested.

* Changelog

* Update docker/Dockerfile-workers

* Update docker/complement/Dockerfile

---------

Co-authored-by: reivilibre <olivier@librepush.net>
2023-05-31 15:13:31 +00:00
reivilibre
be3a8a85e3 Add --editable flag to complement.sh which uses an editable install of Synapse for faster turn-around times whilst developing iteratively. (#14548)
Co-authored-by: Mathieu Velten <mathieuv@matrix.org>
2022-12-07 15:45:31 +00:00
Richard van der Hoff
51436c8dd5 Complement test image: capture logs from nginx (#14063)
Have nginx send its logs to stderr/out, so that we can debug
https://github.com/matrix-org/synapse/issues/13334.
2022-10-05 17:37:34 +01:00
Richard van der Hoff
166fafdf8d synapse-workers docker: copy nginx and redis in from base images (#13447)
Part of my continuing quest to make the docker images build quicker: copy nginx and redis in from base docker images, rather than apt installing each time.
2022-08-04 12:59:27 +01:00
Brendan Abolivier
10e4093839 Call out buildkit is required when building test docker images (#13338)
Co-authored-by: David Robertson <davidr@element.io>
2022-07-21 14:29:58 +02:00
reivilibre
538044ac01 Collapse Docker build commands in Complement CI runs to make the logs easier to read. (#13058) 2022-06-15 14:42:27 +00:00
reivilibre
67f51c84f8 Merge the Complement testing Docker images into a single, multi-purpose image. (#12881)
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
2022-06-08 09:57:05 +00:00
reivilibre
d743b25c8f Use supervisord to supervise Postgres and Caddy in the Complement image. (#12480)
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
2022-04-27 14:39:41 +01:00
Richard van der Hoff
aaaff98202 Dockerfile-workers: reduce the amount we install (#12464)
This is an attempt to reduce the rebuild time. In short, we reduce the amount
of stuff that the dockerfile installs, so as to give a faster startup.
2022-04-14 15:36:49 +01:00
Richard van der Hoff
3cdf5a1386 Fix up healthcheck generation for workers docker image (#12405)
This wasn't quite generating the right thing.
2022-04-11 13:38:58 +00:00
Michael Kaye
e2c300e7e4 Create healthcheck script for synapse-workers container (#11429)
The intent is to iterate through all the worker ports and only
report healthy when all are healthy, starting with the main process.
2021-11-26 14:05:20 +00:00
Andrew Morgan
7e460ec2a5 Add a dockerfile for running a set of Synapse worker processes (#9162)
This PR adds a Dockerfile and some supporting files to the `docker/` directory. The Dockerfile's intention is to spin up a container with:

* A Synapse main process.
* Any desired worker processes, defined by a `SYNAPSE_WORKERS` environment variable supplied at runtime.
* A redis for worker communication.
* A nginx for routing traffic.
* A supervisord to start all worker processes and monitor them if any go down.

Note that **this is not currently intended to be used in production**. If you'd like to use Synapse workers with Docker, instead make use of the official image, with one worker per container. The purpose of this dockerfile is currently to allow testing Synapse in worker mode with the [Complement](https://github.com/matrix-org/complement/) test suite.

`configure_workers_and_start.py` is where most of the magic happens in this PR. It reads from environment variables (documented in the file) and creates all necessary config files for the processes. It is the entrypoint of the Dockerfile, and thus is run any time the docker container is spun up, recreating all config files in case you want to use a different set of workers. One can specify which workers they'd like to use by setting the `SYNAPSE_WORKERS` environment variable (as a comma-separated list of arbitrary worker names) or by setting it to `*` for all worker processes. We will be using the latter in CI.

Huge thanks to @MatMaul for helping get this all working 🎉 This PR is paired with its equivalent on the Complement side: https://github.com/matrix-org/complement/pull/62.

Note, for the purpose of testing this PR before it's merged: You'll need to (re)build the base Synapse docker image for everything to work (`matrixdotorg/synapse:latest`). Then build the worker-based docker image on top (`matrixdotorg/synapse:workers`).
2021-04-14 13:54:49 +01:00