Fix remote images being chosen over the local ones we just built with
Complement in CI (any Docker environment using the `containerd` image
store). This problem means that Complement jobs in CI don't actually
test against the code from the PR (since 2026-02-10).
This PR approaches the problem the same way that @AndrewFerr proposed in
https://github.com/element-hq/synapse/pull/18210. This is better than
the alternative listed below as we can just make our code compatible
with whatever image store is being used.
### Problem
Spawning from
https://github.com/element-hq/synapse/pull/19460#discussion_r2818760635
where we found that our Complement jobs in CI don't actually test
against the code from the PR at the moment.
This is caused by a change in Docker Engine 29.0.0:
> `containerd` image store is now the default for **fresh installs**.
This doesn't apply to daemons configured with `userns-remap` (see
[moby#47377](https://github.com/moby/moby/issues/47377)).
>
> *-- 29.0.0 (2025-11-10),
https://docs.docker.com/engine/release-notes/29/#2900*
And our `ubuntu-latest` GitHub runner (`Current runner version:
'2.331.0'`)
[points](https://github.com/actions/runner-images/blob/ubuntu24/20260209.23/images/ubuntu/Ubuntu2404-Readme.md)
to using Docker client/server `29.1.5` 🎯
This Docker version bump happened on
416418df15
(2026-02-10) (`28.0.4` -> `29.1.5`). Specific PR:
https://github.com/actions/runner-images/pull/13633
---
I found this because I reviewed and remembered
https://github.com/element-hq/synapse/pull/18210 was a thing that
@AndrewFerr ran into. And then running `dockers system prune` also
revealed the problematic `containerd` in CI. Checking the Docker
changelogs, I found the new default culprit and then could trace down
where the GitHub runners made the dependency update.
---------
Co-authored-by: Andrew Ferrazzutti <andrewf@element.io>
Part of: MSC4354 whose experimental feature tracking issue is
https://github.com/element-hq/synapse/issues/19409
Follows: #19340 (a necessary bugfix for `/event/` to set this metadata)
Partially supersedes: #18968
This PR implements the first batch of work to support MSC4354 Sticky
Events.
Sticky events are events that have been configured with a finite
'stickiness' duration,
capped to 1 hour per current MSC draft.
Whilst an event is sticky, we provide stronger delivery guarantees for
the event, both to
our clients and to remote homeservers, essentially making it reliable
delivery as long as we
have a functional connection to the client/server and until the
stickiness expires.
This PR merely supports creating sticky events and receiving the sticky
TTL metadata in clients.
It is not suitable for trialling sticky events since none of the other
semantics are implemented.
Contains a temporary SQLite workaround due to a bug in our supported
version enforcement: https://github.com/element-hq/synapse/issues/19452
---------
Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
Co-authored-by: Eric Eastwood <erice@element.io>
For reference, this PR used to include this whole `shared_config` block in the diff.
But https://github.com/element-hq/synapse/pull/19324 was merged first which introduced parts of it already.
Here is what this code used to look like: 566670c363/docker/configure_workers_and_start.py (L865-L868)
---
Original context for why it was changed this way:
https://github.com/matrix-org/synapse/pull/14921#discussion_r1126257933
Previously, this code made me question two things:
1. Do we actually use `worker_config["shared_extra_conf"]` in the
templates?
- At first glance, I couldn't see why we're updating `shared_extra_conf`
here. It's not used in the `worker.yaml.j2` template so all of this
seemed a bit pointless.
- Turns out, updating `shared_extra_conf` itself is pointless and it's
being done as a convenient place to mix the objects to get things right
in `shared_config` (confusing).
1. Does it actually do anything?
- Because `shared_config` starts out as an empty object, my first glance
made me think we we're just updating with an empty object and then just
re-assigning. But because we're in a loop, we actually accumulate the
`shared_extra_conf` from each worker.
I'm not sure whether I'm capturing my confusion well enough here but
basically, this made me spend time trying to figure out what/why we're
doing things this way and we can use a more clear pattern to accomplish
the same thing.
---
This change is spawning from looking at the
`docker/configure_workers_and_start.py` script in order to add a metrics
listener ([upcoming
PR](https://github.com/element-hq/synapse/pull/19324)).
Spawning from wanting to [run a load
test](https://github.com/element-hq/synapse-rust-apps/pull/397) against
the Complement Docker image of Synapse and see metrics from the
homeserver.
### Why not just provide your own homeserver config?
Probably possible but it gets tricky when you try to use the workers
variant of the Docker image (`docker/Dockerfile-workers`). The way to
workaround it would probably be to `yq` edit everything in a script and
change `/data/homeserver.yaml` and `/conf/workers/*.yaml` to add the
`metrics` listener. And then modify `/conf/workers/shared.yaml` to add
`enable_metrics: true`. Doesn't spark much joy.
If Synapse is under test (`SYNAPSE_LOG_TESTING` is set), we don't care
about seeing the "Applying schema" log lines at the INFO level every
time we run the tests (it's 100 lines of bulk for each homeserver).
```
synapse_main | 2025-08-29 22:34:03,453 - synapse.storage.prepare_database - 433 - INFO - main - Applying schema deltas for v73
synapse_main | 2025-08-29 22:34:03,454 - synapse.storage.prepare_database - 541 - INFO - main - Applying schema 73/01event_failed_pull_attempts.sql
synapse_main | 2025-08-29 22:34:03,463 - synapse.storage.prepare_database - 541 - INFO - main - Applying schema 73/02add_pusher_enabled.sql
synapse_main | 2025-08-29 22:34:03,473 - synapse.storage.prepare_database - 541 - INFO - main - Applying schema 73/02room_id_indexes_for_purging.sql
synapse_main | 2025-08-29 22:34:03,482 - synapse.storage.prepare_database - 541 - INFO - main - Applying schema 73/03pusher_device_id.sql
synapse_main | 2025-08-29 22:34:03,492 - synapse.storage.prepare_database - 541 - INFO - main - Applying schema 73/03users_approved_column.sql
synapse_main | 2025-08-29 22:34:03,502 - synapse.storage.prepare_database - 541 - INFO - main - Applying schema 73/04partial_join_details.sql
synapse_main | 2025-08-29 22:34:03,513 - synapse.storage.prepare_database - 541 - INFO - main - Applying schema 73/04pending_device_list_updates.sql
...
```
The Synapse logs are visible when a Complement test fails or you use
`COMPLEMENT_ALWAYS_PRINT_SERVER_LOGS=1`. This is spawning from a
Complement test with three homeservers and wanting less log noise to
scroll through.
Default values will be 1 room per minute, with a burst count of 10.
It's hard to imagine most users will be affected by this default rate,
but it's intentionally non-invasive in case of bots or other users that
need to create rooms at a large rate.
Server admins might want to down-tune this on their deployments.
---------
Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
Best reviewed commit by commit.
With the new dedicated MAS API
(https://github.com/element-hq/synapse/pull/18520), it's possible that
deactivation starts off the main process, which was not possible because
of a few calls.
I basically looked at everything that the deactivation handler was
doing, reviewed whether it could run on workers or not, and find a
workaround when possible
---------
Co-authored-by: Eric Eastwood <erice@element.io>
The main goal of this PR is to handle device list changes onto multiple
writers, off the main process, so that we can have logins happening
whilst Synapse is rolling-restarting.
This is quite an intrusive change, so I would advise to review this
commit by commit; I tried to keep the history as clean as possible.
There are a few things to consider:
- the `device_list_key` in stream tokens becomes a
`MultiWriterStreamToken`, which has a few implications in sync and on
the storage layer
- we had a split between `DeviceHandler` and `DeviceWorkerHandler` for
master vs. worker process. I've kept this split, but making it rather
writer vs. non-writer worker, using method overrides for doing
replication calls when needed
- there are a few operations that need to happen on a single worker at a
time. Instead of using cross-worker locks, for now I made them run on
the first writer on the list
---------
Co-authored-by: Eric Eastwood <erice@element.io>
Spawning from using this code elsewhere and not knowing why it's there.
Based on this article and @reivilibre's experience mentioning
`PYTHONUNBUFFERED=1`,
> #### programming languages where the default “print” statement buffers
>
> Also, here are a few programming language where the default print
statement will buffer output when writing to a pipe, and some ways to
disable buffering if you want:
>
> - Python (disable with `python -u`, or `PYTHONUNBUFFERED=1`, or
`sys.stdout.reconfigure(line_buffering=False)`, or `print(x,
flush=True)`)
>
> _--
https://jvns.ca/blog/2024/11/29/why-pipes-get-stuck-buffering/#programming-languages-where-the-default-print-statement-buffers_
- Use a `uv:python` image for the first build layer, to reduce the
number of intermediate images required, as the
main Dockerfile uses that image already
- Use a cache mount for `apt` commands
- Skip a pointless install of `redis-server`, since the redis Docker
image is copied from instead
- Move some RUN steps out of the final image layer & into the build
layer
Depends on https://github.com/element-hq/synapse/pull/18275
### Pull Request Checklist
<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->
* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
- Use markdown where necessary, mostly for `code blocks`.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct
(run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
- Explicitly use `mawk` instead of `awk`, since an extension of the
former is used
- Use `fflush` to reduce interleaving the output of different processes
& streams
- Move the `mawk` command to a shell function, instead of writing it
twice
- Look up the `SUPERVISOR_PROCESS_NAME` environment variable in `mawk`,
instead of reading it in the shell & using complex quoting to pass it to
`mawk`
### Pull Request Checklist
<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->
* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
- Use markdown where necessary, mostly for `code blocks`.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct
(run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
---------
Co-authored-by: Quentin Gliech <quenting@element.io>
This is a split off #18033
This uses a few tricks to speed up the building of docker images:
- This switches to use `uv pip install` instead of `pip install`. This
saves a bunch of time, especially when cross-compiling
- I then looked at what packages were not using binary wheels: I
upgraded MarkupSafe to have binaries for py3.12, and got back to Python
3.12 because hiredis didn't have builds for py3.13 with the version we
were using
- The generation of the requirements.txt is arch-agnostic, so I've
switched this one to run on the build architecture, so that both arch
can share it
- The download of runtime depdendencies can be done on the build
architecture through manual `apt-get download` plus `dpkg --extract`
- We were using -slim images, but still installed a bunch of -dev
dependencies. Turns out, all the dev dependencies were already installed
in the non-slim image, which saves a bunch of time as well
Adds new environment variables that can be used with the Docker image
(`SYNAPSE_HTTP_PROXY`/`SYNAPSE_HTTPS_PROXY`/`SYNAPSE_NO_PROXY`)
Useful for things like the [Secure Border
Gateway](https://element.io/server-suite/secure-border-gateways)
### Why is this necessary?
You can already configure the `HTTP_PROXY`/`HTTPS_PROXY` environment
variables to proxy outbound requests but setting this globally in the
Docker image affects all processes which isn't always desirable or
workable in the case where the proxy is running in the Docker image
itself (because the Debian packages will fail to download because the
proxy isn't up and running yet) . Adding Synapse specific environment
variables
(`SYNAPSE_HTTP_PROXY`/`SYNAPSE_HTTPS_PROXY`/`SYNAPSE_NO_PROXY`) makes
things much more targetable.
Be able to test `/login/sso/redirect` in Complement
Spawning from
https://github.com/element-hq/sbg/pull/421#discussion_r1854926218 where
we have a proxy that intercepts responses to
`/_matrix/client/v3/login/sso/redirect(/{idpId})` in order to upgrade
them to use OAuth 2.0 Pushed Authorization Requests (PAR). We have some
Complement tests in that codebase that go over this flow and these
changes are required [in order for the URL's to line
up](d648c8ce3f/synapse/rest/client/login.py (L652-L673)).