Commit Graph

9059 Commits

Author SHA1 Message Date
Erik Johnston
db975ea10d Expire sliding sync connections (#19211)
We add some logic to expire sliding sync connections if they get old or
if there is too much pending data to return.

The values of the constants are picked fairly arbitrarily, these are
currently:
1. More than 100 rooms with pending events if the connection hasn't been
used in over an hour
2. The connection hasn't been used for over a week

Reviewable commit-by-commit

---------

Co-authored-by: Eric Eastwood <erice@element.io>
2025-11-25 10:20:47 +00:00
Eric Eastwood
54c93a1372 Export SYNAPSE_SUPPORTED_COMPLEMENT_TEST_PACKAGES from scripts-dev/complement.sh (#19208)
This is useful as someone downstream can source the
`scripts-dev/complement.sh` script and run the same set of tests as
Synapse:

```bash
# Grab the test packages supported by Synapse.
#
# --fast: Skip rebuilding the docker images,
# --build-only: Will only build Docker images but because we also used `--fast`, it won't do anything.
# `>/dev/null` to redirect stdout to `/dev/null` to get rid of the `echo` logs from the script.
test_packages=$(source ${SYNAPSE_DIR}/scripts-dev/complement.sh --fast --build-only >/dev/null && echo "$SYNAPSE_SUPPORTED_COMPLEMENT_TEST_PACKAGES")
echo $test_packages
```

This is spawning from wanting to run the same set of Complement tests in
the https://github.com/element-hq/synapse-rust-apps project.
2025-11-21 19:01:43 -06:00
Eric Eastwood
e39fba61a7 Refactor scripts-dev/complement.sh logic to avoid exit (#19209)
This is useful so that the script can be sourced by other scripts
without exiting the calling subshell (composable).

This is split out from https://github.com/element-hq/synapse/pull/19208
to make easy to understand PR's and build up to where we want to go.
2025-11-21 10:51:19 -06:00
Devon Hudson
f5bf02eff6 1.143.0rc1 2025-11-18 13:20:59 -07:00
Devon Hudson
bc42899008 Allow subpaths in MAS endpoints (#19186)
Fixes #19184

### Pull Request Checklist

<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->

* [X] Pull request is based on the develop branch
* [X] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
  - Use markdown where necessary, mostly for `code blocks`.
  - End with either a period (.) or an exclamation mark (!).
  - Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [X] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct (run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
2025-11-18 18:45:33 +00:00
Devon Hudson
322481cd2d Run background updates on all databases (#19181)
Fixes #18322 

This PR changes synapse startup to run background updates against all
databases instead of just the "main" database.
This follows [what the admin api
does](https://github.com/element-hq/synapse/blob/develop/synapse/rest/admin/background_updates.py#L71-L77).

See the above linked issue for further details of why this is
beneficial.

### Pull Request Checklist

<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->

* [X] Pull request is based on the develop branch
* [X] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
  - Use markdown where necessary, mostly for `code blocks`.
  - End with either a period (.) or an exclamation mark (!).
  - Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [X] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct (run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
2025-11-17 15:32:21 +00:00
Eric Eastwood
edc0de9fa0 Fix bad deferred logcontext handling (#19180)
These aren't really something personally experienced but I just went
around the codebase looking for all of the Deferred `.callback`,
`.errback`, and `.cancel` and wrapped them with
`PreserveLoggingContext()`

Spawning from wanting to solve
https://github.com/element-hq/synapse/issues/19165 but unconfirmed
whether this has any effect.

To explain the fix, see the [*Deferred
callbacks*](3b59ac3b69/docs/log_contexts.md (deferred-callbacks))
section of our logcontext docs for more info (specifically using
solution 2).
2025-11-14 11:21:15 -06:00
Andrew Morgan
8da8d4b4f5 Remove explicit python 3.8/9 skips (#19177)
Co-authored-by: Devon Hudson <devon.dmytro@gmail.com>
2025-11-14 11:38:39 +00:00
Eric Eastwood
408a05ebbc Fix potential lost logcontext when PerDestinationQueue.shutdown(...) (#19178)
Spawning from looking at the logs in
https://github.com/element-hq/synapse/issues/19165#issuecomment-3527452941
which mention the `federation_transaction_transmission_loop`. I don't
think it's the source of the lost logcontext that person in the issue is
experiencing because this only applies when you try to `shutdown` the
homeserver.

Problem code introduced in
https://github.com/element-hq/synapse/pull/18828

To explain the fix, see the [*Deferred
callbacks*](3b59ac3b69/docs/log_contexts.md (deferred-callbacks))
section of our logcontext docs for more info (specifically using
solution 2).
2025-11-13 15:17:15 -06:00
Devon Hudson
5d545d1626 Remove support for PostgreSQL 13 (#19170)
This PR removes support for PostgreSQL 13 as it is deprecated
(tomorrow).
Uses https://github.com/element-hq/synapse/pull/18034 as a reference of
where to look, and also found a few other places that needed updating.
I didn't see anywhere in Complement that needs updating.
There is a companion Sytest PR deprecating psql13 over there:
https://github.com/matrix-org/sytest/pull/1418

### Pull Request Checklist

<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->

* [X] Pull request is based on the develop branch
* [X] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
  - Use markdown where necessary, mostly for `code blocks`.
  - End with either a period (.) or an exclamation mark (!).
  - Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [X] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct (run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
2025-11-13 19:38:59 +00:00
Andrew Ferrazzutti
9e23cded8f MSC4140: Remove auth from delayed event management endpoints (#19152)
As per recent proposals in MSC4140, remove authentication for
restarting/cancelling/sending a delayed event, and give each of those
actions its own endpoint. (The original consolidated endpoint is still
supported for backwards compatibility.)

### Pull Request Checklist

<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->

* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
  - Use markdown where necessary, mostly for `code blocks`.
  - End with either a period (.) or an exclamation mark (!).
  - Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct (run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))

---------

Co-authored-by: Half-Shot <will@half-shot.uk>
2025-11-13 18:56:17 +00:00
Eric Eastwood
4494cc0694 Point out which event caused the exception when checking MSC4293 redactions (#19169)
Spawning from looking at the stack trace in
https://github.com/element-hq/synapse/issues/19128 which has no useful
information on how to dig in deeper.
2025-11-13 12:08:22 -06:00
Eric Eastwood
47d24bd234 Add debug logs to track Clock callbacks (#19173)
Spawning from wanting to find the source of a `Clock.call_later()`
callback, https://github.com/element-hq/synapse/issues/19165
2025-11-13 12:07:23 -06:00
Eric Eastwood
b9dda0ff22 Restore printing sentinel for log_record.request (#19172)
This was unintentionally changed in
https://github.com/element-hq/synapse/pull/19068.

There is no real bug here. Without this PR, we just printed an empty
string for the `sentinel` logcontext whereas the prior art behavior was
to print `sentinel` which this PR restores.

Found while staring at the logs in
https://github.com/element-hq/synapse/issues/19165


### Reproduction strategy

1. Configure Synapse with
[logging](df802882bb/docs/sample_log_config.yaml)
1. Start Synapse: `poetry run synapse_homeserver --config-path
homeserver.yaml`
1. Notice the `asyncio - 64 - DEBUG - - Using selector: EpollSelector`
log line (notice empty string `- -`)
1. With this PR, the log line will be `asyncio - 64 - DEBUG - sentinel -
Using selector: EpollSelector` (notice `sentinel`)
2025-11-13 09:57:56 -06:00
reivilibre
938c97416d Add a shortcut return when there are no events to purge. (#19093)
Fixes: #13417

---------

Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
2025-11-13 14:26:37 +00:00
Jason Volk
e67ba69f20 Provide same servers list in s2s alias results as c2s. (#18970)
Signed-off-by: Jason Volk <jason@zemos.net>
Co-authored-by: dasha_uwu <dasha@linuxping.win>
2025-11-13 11:12:03 +00:00
Erik Johnston
df802882bb Further reduce cardinality of metrics on event persister (#19168)
Follow on from #19133 to only track a subset of event types.
2025-11-12 16:40:38 +00:00
Andrew Ferrazzutti
97cc05d1d8 Bump lower bounds of unit test exclusive dependencies for Python 3.10 support (#19167)
Co-authored-by: Andrew Morgan <andrew@amorgan.xyz>
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
2025-11-12 16:37:14 +00:00
Erik Johnston
3ba3c7fe7d Reduce cardinality of metrics on event persister (#19133)
This reduces the size of metrics by ~80%. Responding with the metrics
takes significant amounts of time.
2025-11-12 13:41:58 +00:00
Andrew Morgan
9722e05479 Update pyproject.toml to be compatible with other standard Python packaging tools (#19137) 2025-11-12 12:37:42 +00:00
Andrew Morgan
2c91896070 Run trial tests on Python 3.14 in PRs (#19135) 2025-11-12 12:02:50 +00:00
Eric Eastwood
8fa7d4a5a3 Ignore Python language refactors (.git-blame-ignore-revs) (#19150)
Ignore Python language refactors (`.git-blame-ignore-revs`)

 - https://github.com/element-hq/synapse/pull/19046
 - https://github.com/element-hq/synapse/pull/19111

### Pull Request Checklist

<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->

* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
  - Use markdown where necessary, mostly for `code blocks`.
  - End with either a period (.) or an exclamation mark (!).
  - Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct (run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
2025-11-10 22:34:30 +00:00
V02460
dc7f01f334 register_new_matrix_user: Support multiple config files (#18784)
Co-authored-by: Andrew Morgan <andrew@amorgan.xyz>
2025-11-10 16:52:57 +00:00
reivilibre
a50923b6bf Improve documentation around streams, particularly ID generators and adding new streams. (#18943)
This arises mostly from my recent experience adding a stream for Thread
Subscriptions
and trying to help others add their own streams.

---------

Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
2025-11-10 13:07:22 +00:00
Andrew Ferrazzutti
8580ab60c9 Add delayed_events table to boolean column port (#19155)
The `delayed_events` table has a boolean column that should be handled
by the SQLite->PostgreSQL migration script.
2025-11-10 12:17:42 +00:00
Andrew Ferrazzutti
fcac7e0282 Write union types as X | Y where possible (#19111)
aka PEP 604, added in Python 3.10
2025-11-06 14:02:33 -06:00
Erik Johnston
6790312831 Fixup logcontexts after replication PR. (#19146)
Fixes logcontext leaks introduced in #19138.
2025-11-05 15:38:14 +00:00
Erik Johnston
d3ffd04f66 Fix spelling (#19145)
Fixes up #19138
2025-11-05 14:00:59 +00:00
Erik Johnston
4906771da1 Faster redis replication handling (#19138)
Spawning a background process comes with a bunch of overhead, so let's
try to reduce the number of background processes we need to spawn when
handling inbound fed.

Currently, we seem to be doing roughly one per command. Instead, lets
keep the background process alive for a bit waiting for a new command to
come in.
2025-11-05 13:42:04 +00:00
Andrew Morgan
2fd8d88b42 1.142.0rc3 2025-11-04 17:39:28 +00:00
Andrew Morgan
0cbb2a15e0 Don't build free-threaded wheels (#19140)
Fixes https://github.com/element-hq/synapse/issues/19139.
2025-11-04 17:38:25 +00:00
Andrew Morgan
5d71034f81 1.142.0rc2 2025-11-04 16:21:50 +00:00
Andrew Morgan
4bbde142dc Skip building Python 3.9 wheels with cibuildwheel (#19119) 2025-11-04 16:20:01 +00:00
Andrew Morgan
2760d15348 1.142.0rc1 2025-11-04 13:34:46 +00:00
Erik Johnston
5408101d21 Speed up pruning of ratelimiter (#19129)
I noticed this in some profiling. Basically, we prune the ratelimiters
by copying and iterating over every entry every 60 seconds. Instead,
let's use a wheel timer to track when we should potentially prune a
given key, and then we a) check fewer keys, and b) can run more
frequently. Hopefully this should mean we don't have a large pause
everytime we prune a ratelimiter with lots of keys.

Also fixes a bug where we didn't prune entries that were added via
`record_action` and never subsequently updated. This affected the media
and joins-per-room ratelimiter.
2025-11-04 12:44:57 +00:00
Andrew Morgan
08f570f5f5 Fix "There is no current event loop in thread" error in tests (#19134) 2025-11-04 12:32:49 +00:00
Eric Eastwood
db00925ae7 Redirect stdout/stderr to logs after initialization (#19131)
This regressed in https://github.com/element-hq/synapse/pull/19121. I
moved things in https://github.com/element-hq/synapse/pull/19121 because
I thought that it made sense to redirect anything printed to
`stdout`/`stderr` to the logs as early as possible. But we actually want
to log any immediately apparent problems during initialization to
`stderr` in the terminal so that they are obvious and visible to the
operator.

Now, I've moved `redirect_stdio_to_logs()` back to where it was
previously along with some proper comment context for why we have it
there.
2025-11-03 16:16:23 -06:00
Eric Eastwood
891acfd502 Move oidc.load_metadata() startup into _base.start() (#19056)
Slightly related to ["clean-tenant
provisioning"](https://github.com/element-hq/synapse-small-hosts/issues/221)
as making startup cleaner, makes it more clear how to handle clean
provisioning.
2025-11-03 15:23:22 -06:00
Eric Eastwood
e02a6f5e5d Fix lost logcontext on HomeServer.shutdown() (#19108)
Same fix as https://github.com/element-hq/synapse/pull/19090

Spawning from working on clean tenant deprovisioning in the Synapse Pro
for small hosts project
(https://github.com/element-hq/synapse-small-hosts/pull/204).
2025-11-03 14:07:10 -06:00
Eric Eastwood
a7107458c6 Refactor app entrypoints (avoid exit(1) in our composable functions) (#19121)
- Move `register_start` (calls `os._exit(1)`) out of `setup` (our
composable function)
- We want to avoid `exit(...)` because we use these composable functions
in Synapse Pro for small hosts where we have multiple Synapse instances
running in the same process. We don't want a problem from one homeserver
tenant causing the entire Python process to exit and affect all of the
other homeserver tenants.
     - Continuation of https://github.com/element-hq/synapse/pull/19116
- Align our app entrypoints: `homeserver` (main), `generic_worker`
(worker), and `admin_cmd`

### Background

As part of Element's plan to support a light form of vhosting (virtual
host) (multiple instances of Synapse in the same Python process) (c.f
Synapse Pro for small hosts), we're currently diving into the details
and implications of running multiple instances of Synapse in the same
Python process.

"Clean tenant provisioning" tracked internally by
https://github.com/element-hq/synapse-small-hosts/issues/48
2025-11-03 12:04:43 -06:00
Eric Eastwood
e00a411837 Move exception handling up the stack (avoid exit(1) in our composable functions) (#19116)
Move exception handling up the stack (avoid `exit(1)` in our composable
functions)

Relevant to Synapse Pro for small hosts as we don't want to exit the
entire Python process and affect all homeserver tenants.


### Background

As part of Element's plan to support a light form of vhosting (virtual
host) (multiple instances of Synapse in the same Python process) (c.f
Synapse Pro for small hosts), we're currently diving into the details
and implications of running multiple instances of Synapse in the same
Python process.

"Clean tenant provisioning" tracked internally by
https://github.com/element-hq/synapse-small-hosts/issues/48
2025-11-03 11:18:56 -06:00
Andrew Morgan
69bab78b44 Python 3.14 support (#19055)
Co-authored-by: Eric Eastwood <erice@element.io>
2025-11-03 11:53:59 +00:00
Eric Eastwood
41a2762e58 Be mindful of other logging context filters in 3rd-party code (#19068)
Be mindful that Synapse can be run alongside other code in the same
Python process. We shouldn't overwrite fields on given log record unless
we know it's relevant to Synapse.

(no clobber)


### Background

As part of Element's plan to support a light form of vhosting (virtual
host) (multiple instances of Synapse in the same Python process), we're
currently diving into the details and implications of running multiple
instances of Synapse in the same Python process.

"Per-tenant logging" tracked internally by
https://github.com/element-hq/synapse-small-hosts/issues/48
2025-10-31 10:12:05 -05:00
Erik Johnston
3ccc5184e0 Fix schema lint script to understand CREATE TABLE IF NOT EXISTS (#19020)
The schema lint tries to make sure we don't add or remove indices in
schema files (rather than as background updates), *unless* the table was
created in the same schema file.

The regex to pull out the `CREATE TABLE` SQL incorrectly didn't
recognise `IF NOT EXISTS`.

There is a test delta file that shows that we accept different types of
`CREATE TABLE` and `CREATE INDEX` statements, as well as an index
creation that doesn't have a matching create table (to show that we do
still catch it). The test delta should be removed before merge.
2025-10-31 13:16:47 +00:00
V02460
07e7980572 Fix Rust’s confusing lifetime lint (#19118)
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
2025-10-31 12:09:13 +00:00
V02460
3595ff921f Pydantic v2 (#19071)
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Co-authored-by: Andrew Morgan <andrew@amorgan.xyz>
2025-10-31 09:22:22 +00:00
Andrew Morgan
300c5558ab Update check_dependencies to support markers (#19110) 2025-10-30 21:33:29 +00:00
Eric Eastwood
c0b9437ab6 Fix lost logcontext when using timeout_deferred(...) (#19090)
Fix lost logcontext when using `timeout_deferred(...)` and things
actually timeout.

Fix https://github.com/element-hq/synapse/issues/19087 (our HTTP client
times out requests using `timeout_deferred(...)`
Fix https://github.com/element-hq/synapse/issues/19066 (`/sync` uses
`notifier.wait_for_events()` which uses `timeout_deferred(...)` under
the hood)


### When/why did these lost logcontext warnings start happening?

```
synapse.logging.context - 107 - WARNING - sentinel - Expected logging context call_later but found POST-2453

synapse.logging.context - 107 - WARNING - sentinel - Expected logging context call_later was lost
```

In https://github.com/element-hq/synapse/pull/18828, we switched
`timeout_deferred(...)` from using `reactor.callLater(...)` to
[`clock.call_later(...)`](3b59ac3b69/synapse/util/clock.py (L224-L313))
under the hood. This meant it started dealing with logcontexts but our
`time_it_out()` callback didn't follow our [Synapse logcontext
rules](3b59ac3b69/docs/log_contexts.md).
2025-10-30 11:49:15 -05:00
Eric Eastwood
f0aae62f85 Cheaper logcontext debug logs (random_string_insecure_fast(...)) (#19094)
Follow-up to https://github.com/element-hq/synapse/pull/18966

During the weekly Backend team meeting, it was mentioned that
`random_string(...)` was taking a significant amount of CPU on
`matrix.org`. This makes sense as it relies on
[`secrets.choice(...)`](https://docs.python.org/3/library/secrets.html#secrets.choice),
a cryptographically secure function that is inherently computationally
expensive. And since https://github.com/element-hq/synapse/pull/18966,
we're calling `random_string(...)` as part of a bunch of logcontext
utilities.

Since we don't need cryptographically secure random strings for our
debug logs, this PR is introducing a new `random_string_insecure_fast(...)`
function that uses
[`random.choice(...)`](https://docs.python.org/3/library/random.html#random.choice)
which uses pseudo-random numbers that are "both fast and threadsafe".
2025-10-30 11:47:53 -05:00
Andrew Morgan
349599143e Move reading of multipart response into try body (#19062) 2025-10-30 15:22:52 +00:00