9108 Commits

Author SHA1 Message Date
Eric Eastwood
d27ff161f5 Add debug logs wherever we change current logcontext (#18966)
Add debug logs wherever we change current logcontext (`LoggingContext`).
I've had to make this same set of changes over and over as I've been
debugging things so it seems useful enough to include by default.

Instead of tracing things at the `set_current_context(...)` level, I've
added the debug logging on all of the utilities that utilize
`set_current_context(...)`. It's much easier to reason about the log
context changing because of `PreserveLoggingContext` changing things
than an opaque `set_current_context(...)` call.
2025-10-02 11:51:17 -05:00
Eric Eastwood
06a84f4fe0 Revert "Switch to OpenTracing's ContextVarsScopeManager (#18849)" (#19007)
Revert https://github.com/element-hq/synapse/pull/18849

Go back to our custom `LogContextScopeManager` after trying
OpenTracing's `ContextVarsScopeManager`.

Fix https://github.com/element-hq/synapse/issues/19004

### Why revert?

For reference, with the normal reactor, `ContextVarsScopeManager` worked
just as good as our custom `LogContextScopeManager` as far as I can tell
(and even better in some cases). But since Twisted appears to not fully
support `ContextVar`'s, it doesn't work as expected in all cases.
Compounding things, `ContextVarsScopeManager` was causing errors with
the experimental `SYNAPSE_ASYNC_IO_REACTOR` option.

Since we're not getting the full benefit that we originally desired, we
might as well revert and figure out alternatives for extending the
logcontext lifetimes to support the use case we were trying to unlock
(c.f. https://github.com/element-hq/synapse/pull/18804).

See
https://github.com/element-hq/synapse/issues/19004#issuecomment-3358052171
for more info.


### Does this require backporting and patch releases?

No. Since `ContextVarsScopeManager` operates just as good with the
normal reactor and was only causing actual errors with the experimental
`SYNAPSE_ASYNC_IO_REACTOR` option, I don't think this requires us to
backport and make patch releases at all.



### Maintain cross-links between main trace and background process work

In order to maintain the functionality introduced in https://github.com/element-hq/synapse/pull/18932 (cross-links between the background process trace and currently active trace), we also needed a small change.

Previously, when we were using `ContextVarsScopeManager`, it tracked the tracing scope across the logcontext changes without issue. Now that we're using our own custom `LogContextScopeManager` again, we need to capture the active span from the logcontext before we reset to the sentinel context because of the `PreserveLoggingContext()` below.

Added some tests to ensure we maintain the `run_as_background` tracing behavior regardless of the tracing scope manager we use.
2025-10-02 11:27:26 -05:00
Eric Eastwood
1c093509ce Switch task scheduler from raw logcontext manipulation (set_current_context) to utils (PreserveLoggingContext) (#18990)
Prefer the utils over raw logcontext manipulation.

Spawning from adding some logcontext debug logs in
https://github.com/element-hq/synapse/pull/18966 and since we're not
logging at the `set_current_context(...)` level (see reasoning there),
this removes some usage of `set_current_context(...)`.
2025-10-02 10:22:25 -05:00
Devon Hudson
396de6544a Cleanly shutdown SynapseHomeServer object (#18828)
This PR aims to allow for a clean shutdown of the `SynapseHomeServer`
object so that it can be fully deleted and cleaned up by garbage
collection without shutting down the entire python process.

Fix https://github.com/element-hq/synapse-small-hosts/issues/50

### Pull Request Checklist

<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->

* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
  - Use markdown where necessary, mostly for `code blocks`.
  - End with either a period (.) or an exclamation mark (!).
  - Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct (run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))

---------

Co-authored-by: Eric Eastwood <erice@element.io>
2025-10-01 02:42:09 +00:00
Sebastian Spaeth
d1c96ee0f2 Fix rc_room_creation and rc_reports docs - remove per_user typo (#18998) 2025-09-30 15:17:11 -05:00
Eric Eastwood
5adb08f3c9 Remove MockClock() (#18992)
Spawning from adding some logcontext debug logs in
https://github.com/element-hq/synapse/pull/18966 and since we're not
logging at the `set_current_context(...)` level (see reasoning there),
this removes some usage of `set_current_context(...)`.

Specifically, `MockClock.call_later(...)` doesn't handle logcontexts
correctly. It uses the calling logcontext as the callback context
(wrong, as the logcontext could finish before the callback finishes) and
it didn't reset back to the sentinel context before handing back to the
reactor. It was like this since it was [introduced 10+ years
ago](38da9884e7).
Instead of fixing the implementation which would just be a copy of our
normal `Clock`, we can just remove `MockClock`
2025-09-30 11:27:29 -05:00
Andrew Morgan
ad8dcc2119 Remove internal ReplicationUploadKeysForUserRestServlet (#18988) 2025-09-30 11:12:14 +01:00
Eric Eastwood
5143f93dc9 Fix server_name in logging context for multiple Synapse instances in one process (#18868)
### Background

As part of Element's plan to support a light form of vhosting (virtual
host) (multiple instances of Synapse in the same Python process), we're
currently diving into the details and implications of running multiple
instances of Synapse in the same Python process.

"Per-tenant logging" tracked internally by
https://github.com/element-hq/synapse-small-hosts/issues/48

### Prior art

Previously, we exposed `server_name` by providing a static logging
`MetadataFilter` that injected the values:


205d9e4fc4/synapse/config/logger.py (L216)

While this can work fine for the normal case of one Synapse instance per
Python process, this configures things globally and isn't compatible
when we try to start multiple Synapse instances because each subsequent
tenant will overwrite the previous tenant.


### What does this PR do?

We remove the `MetadataFilter` and replace it by tracking the
`server_name` in the `LoggingContext` and expose it with our existing
[`LoggingContextFilter`](205d9e4fc4/synapse/logging/context.py (L584-L622))
that we already use to expose information about the `request`.

This means that the `server_name` value follows wherever we log as
expected even when we have multiple Synapse instances running in the
same process.


### A note on logcontext

Anywhere, Synapse mistakenly uses the `sentinel` logcontext to log
something, we won't know which server sent the log. We've been fixing up
`sentinel` logcontext usage as tracked by
https://github.com/element-hq/synapse/issues/18905

Any further `sentinel` logcontext usage we find in the future can be
fixed piecemeal as normal.


d2a966f922/docs/log_contexts.md (L71-L81)


### Testing strategy

1. Adjust your logging config to include `%(server_name)s` in the format
    ```yaml
    formatters:
        precise:
format: '%(asctime)s - %(server_name)s - %(name)s - %(lineno)d -
%(levelname)s - %(request)s - %(message)s'
    ```
1. Start Synapse: `poetry run synapse_homeserver --config-path
homeserver.yaml`
1. Make some requests (`curl
http://localhost:8008/_matrix/client/versions`, etc)
1. Open the homeserver logs and notice the `server_name` in the logs as
expected. `unknown_server_from_sentinel_context` is expected for the
`sentinel` logcontext (things outside of Synapse).
2025-09-26 17:10:48 -05:00
Eric Eastwood
2f2b854ac1 Fix logcontext handling in timeout_deferred tests (#18974)
Related to https://github.com/element-hq/synapse/issues/18905

These fixes were split off from
https://github.com/element-hq/synapse/pull/18828 where @devonh was
seeing some test failures because `timeout_deferred(...)` is being
updated to use `Clock` utilities instead of raw `reactor` methods. This
test was failing in that branch/PR until we made this new version that
handles the logcontexts properly.

While the previous version of this test does pass on `develop`, it was
using what appears completely wrong assertions, assumptions, and bad
patterns to make it happen (see diff comments below)

---

Test originally introduced in https://github.com/matrix-org/synapse/pull/4407
2025-09-26 11:10:02 -05:00
Andrew Morgan
8f61bdb470 Note optional Element Commecial License in SPDX specifiers (#18973) 2025-09-26 12:43:07 +01:00
Andrew Morgan
7c32988f6b Update URLs in dockerfile metadata (#18971) 2025-09-26 12:40:50 +01:00
Hammy Havoc
688f635b59 Updated providers.json to use X instead of Twitter following rebrand and schema change (#18767) 2025-09-26 11:06:50 +01:00
Eric Eastwood
04721c85e6 Disconnect background process work from request trace (#18932)
Before https://github.com/element-hq/synapse/pull/18849, we we're using
our own custom `LogContextScopeManager` which tied the tracing scope to
the `LoggingContext`. Since we created a new
`BackgroundProcessLoggingContext` any time we
`run_as_background_process(...)`, the trace for the background work was
separate from the trace that kicked of the work as expected (e.g.
request trace is separate from the background process we kicked to fetch
more messages from the federation).

Since we've now switched to the `ContextVarsScopeManager` (in
https://github.com/element-hq/synapse/pull/18849), the tracing scope now
crosses the `LoggingContext` boundaries (and thread boundaries) without
a problem. This means we end up with request traces that include all of
the background work that we've kicked off bloating the trace and making
it hard to understand what's going on.

This PR separates the traces again to how things were before.
Additionally, things are even better now since I added some cross-link
references between the traces to easily be able to jump between.

Follow-up to https://github.com/element-hq/synapse/pull/18849

---

In the before, you can see that the trace is blown up by the background
process (`bgproc.qwer`).

In the after, we now only have a little cross-link marker span
(`start_bgproc.qwer`) to jump to background process trace.

Before | After
---  | ---
<some image> | <some image>



### Testing strategy

1. Run a Jaeger instance
(https://www.jaegertracing.io/docs/1.6/getting-started/)
    ```shell
    $ docker run -d --name jaeger \
      -e COLLECTOR_ZIPKIN_HTTP_PORT=9411 \
      -p 5775:5775/udp \
      -p 6831:6831/udp \
      -p 6832:6832/udp \
      -p 5778:5778 \
      -p 16686:16686 \
      -p 14268:14268 \
      -p 9411:9411 \
      jaegertracing/all-in-one:1.59.0
    ```
 1. Configure Synapse to use tracing:
     `homeserver.yaml`
     ```yaml
    ## Tracing ##
    opentracing:
      enabled: true
      jaeger_config:
        sampler:
          type: const
          param: 1
        logging:
          false
    ```
1. Make sure the optional `opentracing` dependency is installed: `poetry
install --extras all`
1. In the `VersionsRestServlet`, modify it to kick off a dummy
background process (easy to test this way)
    ```python
from synapse.metrics.background_process_metrics import
run_as_background_process

    async def _qwer() -> None:
        await self.clock.sleep(1)

    run_as_background_process("qwer", "test_server", _qwer)
    ```
1. Run Synapse: `poetry run synapse_homeserver --config-path
homeserver.yaml`
1. Fire off a version requests: `curl
http://localhost:8008/_matrix/client/versions`
 1. Visit http://localhost:16686/search to view the traces
     - Select the correct service
     - Look for the  `VersionsRestServlet` operation
     - Press 'Find Traces' button
     - Select the relevant trace
     - Notice how the trace isn't bloated
- Look for the `start_bgproc.qwer` span cross-linking to the background
process
- Jump to the other trace using the cross-link reference ->
`bgproc.qwer`
2025-09-25 21:45:18 -05:00
Travis Ralston
d2a966f922 Use signature support from policy servers when available (#18934)
Opening on Kegan's behalf


[MSC4284](https://github.com/matrix-org/matrix-spec-proposals/pull/4284)
has already been opened accordingly.

---------

Co-authored-by: Kegan Dougal <7190048+kegsay@users.noreply.github.com>
Co-authored-by: Eric Eastwood <erice@element.io>
2025-09-25 19:30:24 +00:00
Hugh Nimmo-Smith
fd8fa97b6a Document and fix room_config param when user_may_create_room callback is invoked for a room upgrade (#18721)
Co-authored-by: Eric Eastwood <erice@element.io>
2025-09-24 21:42:19 +00:00
Eric Eastwood
5266e423e2 Explain how Deferred callbacks interact with logcontexts (#18914)
Spawning from
https://github.com/matrix-org/synapse/pull/12588#discussion_r865843321

> It turns out `Deferred.cancel()` is a lot like
`Deferred.callback()`/`errback()` in that it will trash the logging
context:
> it can resume a coroutine, which will restore its own logging context,
then run:
> 
>  - until it blocks, setting the sentinel context
>  - or until it terminates, setting the context it was started with
> 
> So we need to wrap it in `with PreserveLoggingContext():`, like we do
with `.callback()`:
> 
> ```python
> with PreserveLoggingContext():
>     self.render_deferred.cancel()
> ```
>
> *-- @squahtx,
https://github.com/matrix-org/synapse/pull/12588#discussion_r865843321*
2025-09-24 16:20:42 -05:00
Eric Eastwood
0458f691b6 Fix run_coroutine_in_background(...) incorrectly handling logcontext (#18964)
Regressed in
https://github.com/element-hq/synapse/pull/18900#discussion_r2331554278
(see conversation there for more context)


### How is this a regression?

> To give this an update with more hindsight; this logic *was* redundant
with the early return and it is safe to remove this complexity

> 
> It seems like this actually has to do with completed vs incomplete
deferreds...
> 
> To explain how things previously worked *without* the early-return
shortcut:
> 
> With the normal case of **incomplete awaitable**, we store the
`calling_context` and the `f` function is called and runs until it
yields to the reactor. Because `f` follows the logcontext rules, it sets
the `sentinel` logcontext. Then in `run_in_background(...)`, we restore
the `calling_context`, store the current `ctx` (which is `sentinel`) and
return. When the deferred completes, we restore `ctx` (which is
`sentinel`) before yielding to the reactor again (all good
)
> 
> With the other case where we see a **completed awaitable**, we store
the `calling_context` and the `f` function is called and runs to
completion (no logcontext change). *This is where the shortcut would
kick in but I'm going to continue explaining as if we commented out the
shortcut.* -- Then in `run_in_background(...)`, we restore the
`calling_context`, store the current `ctx` (which is same as the
`calling_context`). Because the deferred is already completed, our extra
callback is called immediately and we restore `ctx` (which is same as
the `calling_context`). Since we never yield to the reactor, the
`calling_context` is perfect as that's what we want again (all good
)
> 
> ---
> 
> But this also means that our early-return shortcut is no longer just
an optimization and is *necessary* to act correctly in the **completed
awaitable** case as we want to return with the `calling_context` and not
reset to the `sentinel` context. I've updated the comment in
https://github.com/element-hq/synapse/pull/18964 to explain the
necessity as it's currently just described as an optimization.
> 
> But because we made the same change to
`run_coroutine_in_background(...)` which didn't have the same
early-return shortcut, we regressed the correct behavior  . This is
being fixed in https://github.com/element-hq/synapse/pull/18964
>
>
> *-- @MadLittleMods,
https://github.com/element-hq/synapse/pull/18900#discussion_r2373582917*

### How did we find this problem?

Spawning from @wrjlewis
[seeing](https://matrix.to/#/!SGNQGPGUwtcPBUotTL:matrix.org/$h3TxxPVlqC6BTL07dbrsz6PmaUoZxLiXnSTEY-QYDtA?via=jki.re&via=matrix.org&via=element.io)
`Starting metrics collection 'typing.get_new_events' from sentinel
context: metrics will be lost` in the logs:

<details>
<summary>More logs</summary>

```
synapse.http.request_metrics - 222 - ERROR - sentinel - Trying to stop RequestMetrics in the sentinel context.
2025-09-23 14:43:19,712 - synapse.util.metrics - 212 - WARNING - sentinel - Starting metrics collection 'typing.get_new_events' from sentinel context: metrics will be lost
2025-09-23 14:43:19,713 - synapse.rest.client.sync - 851 - INFO - sentinel - Client has disconnected; not serializing response.
2025-09-23 14:43:19,713 - synapse.http.server - 825 - WARNING - sentinel - Not sending response to request <XForwardedForRequest at 0x7f23e8111ed0 method='POST' uri='/_matrix/client/unstable/org.matrix.simplified_msc3575/sync?pos=281963%2Fs929324_147053_10_2652457_147960_2013_25554_4709564_0_164_2&timeout=30000' clientproto='HTTP/1.1' site='8008'>, already dis
connected.
2025-09-23 14:43:19,713 - synapse.access.http.8008 - 515 - INFO - sentinel - 92.40.194.87 - 8008 - {@me:wi11.co.uk} Processed request: 30.005sec/-8.041sec (0.001sec, 0.000sec) (0.000sec/0.002sec/2) 0B 200! "POST /_matrix/client/unstable/org.matrix.simplified_msc3575/
```

</details>

From the logs there, we can see things relating to
`typing.get_new_events` and
`/_matrix/client/unstable/org.matrix.simplified_msc3575/sync` which led
me to trying out Sliding Sync with the typing extension enabled and
allowed me to reproduce the problem locally. Sliding Sync is a unique
scenario as it's the only place we use `gather_optional_coroutines(...)`
-> `run_coroutine_in_background(...)` (introduced in
https://github.com/element-hq/synapse/pull/17884) to exhibit this
behavior.


### Testing strategy

1. Configure Synapse to enable
[MSC4186](https://github.com/matrix-org/matrix-spec-proposals/pull/4186):
Simplified Sliding Sync which is actually under
[MSC3575](https://github.com/matrix-org/matrix-spec-proposals/pull/3575)
    ```yaml
    experimental_features:
      msc3575_enabled: true
    ```
1. Start synapse: `poetry run synapse_homeserver --config-path
homeserver.yaml`
 1. Make a Sliding Sync request with one of the extensions enabled
    ```http
POST
http://localhost:8008/_matrix/client/unstable/org.matrix.simplified_msc3575/sync
    {
      "lists": {},
      "room_subscriptions": {
            "!FlgJYGQKAIvAscfBhq:my.synapse.linux.server": {
                "required_state": [],
                "timeline_limit": 1
            }
        },
        "extensions": {
            "typing": {
                "enabled": true
            }
        }
    }
    ```
1. Open your homeserver logs and notice warnings about `Starting ...
from sentinel context: metrics will be lost`
2025-09-24 15:24:47 +00:00
Eric Eastwood
25fa555395 Fix no active span when trying to log tracing error on startup (#18959)
Fix `no active span when trying to log` tracing error on startup.

Example error:
```log
synapse.logging.opentracing - 427 - ERROR - wake_destinations_needing_catchup-0 - There was no active span when trying to log. Did you forget to start one or did a context slip?
Stack (most recent call last):
  File "/usr/lib/python3.13/threading.py", line 1014, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib/python3.13/threading.py", line 1043, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.13/threading.py", line 994, in run
    self._target(*self._args, **self._kwargs)
  File "python3.13/site-packages/twisted/_threads/_threadworker.py", line 75, in work
    task()
  File "python3.13/site-packages/twisted/_threads/_team.py", line 192, in doWork
    task()
  File "python3.13/site-packages/twisted/python/threadpool.py", line 269, in inContext
    result = inContext.theWork()  # type: ignore[attr-defined]
  File "python3.13/site-packages/twisted/python/threadpool.py", line 285, in <lambda>
    inContext.theWork = lambda: context.call(  # type: ignore[attr-defined]
  File "python3.13/site-packages/twisted/python/context.py", line 117, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "python3.13/site-packages/twisted/python/context.py", line 82, in callWithContext
    return func(*args, **kw)
  File "python3.13/site-packages/twisted/enterprise/adbapi.py", line 282, in _runWithConnection
    result = func(conn, *args, **kw)
  File "synapse/synapse/storage/database.py", line 1094, in inner_func
    return func(db_conn, *args, **kwargs)
  File "synapse/synapse/storage/database.py", line 822, in new_transaction
    opentracing.log_kv({"message": "commit"})
  File "synapse/synapse/logging/opentracing.py", line 427, in ensure_active_span_inner_2
    logger.error(
```


### Why did this happen before?

This previously occurred because we called `init_tracer(...)` after the
reactor started up in `_base.start()`. But we actually attempt some
database transactions earlier than that which try to do some tracing
because of that `oidc = hs.get_oidc_handler()` line.

Notice `oidc = hs.get_oidc_handler()` happened before `_base.start(hs)`:


5be7679dd9/synapse/app/homeserver.py (L397-L408)


With this PR, I've updated things to `init_tracer(...)` earlier on
alongside where we `setup_logging(...)`.
2025-09-24 10:12:08 -05:00
Andrew Morgan
7708801d56 Fix triage_labelled GHA workflow (#18913) 2025-09-24 14:17:14 +01:00
PizZaKatZe
e766f325af fix: Compute user last seen timestamp from last seen devices (#18948)
## Fix last seen timestamp in `/_synapse/admin/v2/users` response

Fixes #18955

The last seen timestamps contained in `/_synapse/admin/v2/users`
responses were computed as follows:

```sql
                [...]
                LEFT JOIN (
                    SELECT user_id, MAX(last_seen) AS last_seen_ts
                    FROM user_ips GROUP BY user_id
                ) ls ON u.name = ls.user_id
                [...]
```

4367fb2d07/synapse/storage/databases/main/__init__.py (L302C1-L305C44)

This leads to empty timestamps (as in: user was never seen) if users are
inactive for longer than
[`user_ips_max_age`](https://element-hq.github.io/synapse/latest/usage/configuration/config_documentation.html#user_ips_max_age).

The fix is quite trivial: Use the `devices` table, as this one also
contains last seen timestamps but is *not* periodically purged.

We are using this for automatic user account deletion (via
[synadm](https://codeberg.org/synadm/synadm)) and the patched code works
as intended, whereas the unpatched version wants to delete users during
long vacations. 🫣
2025-09-24 11:59:11 +01:00
Tulir Asokan
512b3f50cf Update MSC4326 error code (#18947) 2025-09-24 11:57:24 +01:00
Shay
35c9cbb09d Add an Admin API to query a piece of local or cached remote media by ID (#18911) 2025-09-23 16:25:56 -05:00
Andrew Morgan
0447496549 Merge branch 'release-v1.139' into develop 2025-09-23 16:05:53 +01:00
Andrew Morgan
9ed0d36fe2 Bump batch size from 50 to 1000 for _get_e2e_cross_signing_signatures_for_devices query (#18939)
Co-authored-by: Eric Eastwood <erice@element.io>
2025-09-23 15:47:29 +01:00
Andrew Morgan
b10f3f5959 1.139.0rc2 2025-09-23 15:31:49 +01:00
Andrew Morgan
fd29e3219c Drop support for Ubuntu 24.10 'Oracular Oriole', add support for Ubuntu 25.04 'Plucky Puffin' (#18962) 2025-09-23 15:28:40 +01:00
Andrew Morgan
daf33e4954 1.139.0rc1 2025-09-23 13:28:34 +01:00
Andrew Morgan
ddc7627b22 Fix performance regression related to delayed events processing (#18926) 2025-09-23 09:47:30 +01:00
Eric Eastwood
5be7679dd9 Split loading config vs homeserver setup (#18933)
This allows us to get access to `server_name` so we can use it when
creating the `LoggingContext("main")` in the future (pre-requisite for
https://github.com/element-hq/synapse/pull/18868).

This also allows us more flexibility to parse config however we want and
setup a Synapse homeserver. Like what we do in [Synapse Pro for Small
Hosts](https://github.com/element-hq/synapse-small-hosts).

Split out from https://github.com/element-hq/synapse/pull/18868
2025-09-22 14:53:02 -05:00
Eric Eastwood
e7d98d3429 Remove sentinel logcontext in Clock utilities (looping_call, looping_call_now, call_later) (#18907)
Part of https://github.com/element-hq/synapse/issues/18905

Lints for ensuring we use `Clock.call_later` instead of
`reactor.callLater`, etc are coming in
https://github.com/element-hq/synapse/pull/18944

### Testing strategy

 1. Configure Synapse to log at the `DEBUG` level
1. Start Synapse: `poetry run synapse_homeserver --config-path
homeserver.yaml`
1. Wait 10 seconds for the [database profiling
loop](9cc4001778/synapse/storage/database.py (L711))
to execute
1. Notice the logcontext being used for the `Total database time` log
line

Before (`sentinel`):

```
2025-09-10 16:36:58,651 - synapse.storage.TIME - 707 - DEBUG - sentinel - Total database time: 0.646% {room_forgetter_stream_pos(2): 0.131%, reap_monthly_active_users(1): 0.083%, get_device_change_last_converted_pos(1): 0.078%}
```

After (`looping_call`):

```
2025-09-10 16:36:58,651 - synapse.storage.TIME - 707 - DEBUG - looping_call - Total database time: 0.646% {room_forgetter_stream_pos(2): 0.131%, reap_monthly_active_users(1): 0.083%, get_device_change_last_converted_pos(1): 0.078%}
```
2025-09-22 14:51:13 -05:00
Eric Eastwood
d05f44a1c6 Introduce Clock.add_system_event_trigger(...) to include logcontext by default (#18945)
Introduce `Clock.add_system_event_trigger(...)` to wrap system event
callback code in a logcontext, ensuring we can identify which server
generated the logs.

Background:

>  Ideally, nothing from the Synapse homeserver would be logged against the `sentinel` 
>  logcontext as we want to know which server the logs came from. In practice, this is not 
>  always the case yet especially outside of request handling. 
>   
>  Global things outside of Synapse (e.g. Twisted reactor code) should run in the 
>  `sentinel` logcontext. It's only when it calls into application code that a logcontext 
>  gets activated. This means the reactor should be started in the `sentinel` logcontext, 
>  and any time an awaitable yields control back to the reactor, it should reset the 
>  logcontext to be the `sentinel` logcontext. This is important to avoid leaking the 
>  current logcontext to the reactor (which would then get picked up and associated with 
>  the next thing the reactor does). 
>
> *-- `docs/log_contexts.md`

Also adds a lint to prefer `Clock.add_system_event_trigger(...)` over
`reactor.addSystemEventTrigger(...)`

Part of https://github.com/element-hq/synapse/issues/18905
2025-09-22 11:47:22 -05:00
Eric Eastwood
8d5d87fb0a Fix run_as_background_process not be awaited properly causing LoggingContext problems (#18938)
Basically, searching for any instance of
`run_as_background_process(...)` and making sure we wrap the deferred in
`make_deferred_yieldable(...)` if we try to `await` the result to make
it follow the [Synapse logcontext
rules](https://github.com/element-hq/synapse/blob/develop/docs/log_contexts.md).

Part of https://github.com/element-hq/synapse/issues/18905
2025-09-22 11:02:08 -05:00
Eric Eastwood
9a88d25f8e Fix run_in_background not be awaited properly causing LoggingContext problems (#18937)
Basically, searching for any instance of `run_in_background(...)` and
making sure we wrap the deferred in `make_deferred_yieldable(...)` if we
try to `await` the result to make it follow the [Synapse logcontext
rules](https://github.com/element-hq/synapse/blob/develop/docs/log_contexts.md).

Turns out, we only have this problem in some tests (phew)

Part of https://github.com/element-hq/synapse/issues/18905
2025-09-22 10:55:45 -05:00
Eric Eastwood
5a9ca1e3d9 Introduce Clock.call_when_running(...) to include logcontext by default (#18944)
Introduce `Clock.call_when_running(...)` to wrap startup code in a
logcontext, ensuring we can identify which server generated the logs.

Background:

>  Ideally, nothing from the Synapse homeserver would be logged against the `sentinel` 
>  logcontext as we want to know which server the logs came from. In practice, this is not 
>  always the case yet especially outside of request handling. 
>   
>  Global things outside of Synapse (e.g. Twisted reactor code) should run in the 
>  `sentinel` logcontext. It's only when it calls into application code that a logcontext 
>  gets activated. This means the reactor should be started in the `sentinel` logcontext, 
>  and any time an awaitable yields control back to the reactor, it should reset the 
>  logcontext to be the `sentinel` logcontext. This is important to avoid leaking the 
>  current logcontext to the reactor (which would then get picked up and associated with 
>  the next thing the reactor does). 
>
> *-- `docs/log_contexts.md`

Also adds a lint to prefer `Clock.call_when_running(...)` over
`reactor.callWhenRunning(...)`

Part of https://github.com/element-hq/synapse/issues/18905
2025-09-22 10:27:59 -05:00
SpiritCroc
83aca3f097 Implement MSC4169: backwards-compatible redaction sending for rooms < v11 using the /send endpoint (#18898)
Implement
[MSC4169](https://github.com/matrix-org/matrix-spec-proposals/pull/4169)

While there is a dedicated API endpoint for redactions, being able to
send redactions using the normal send endpoint is useful when using
[MSC4140](https://github.com/matrix-org/matrix-spec-proposals/pull/4140)
for sending delayed redactions to replicate expiring messages. Currently
this would only work on rooms >= v11 but fail with an internal server
error on older room versions when setting the `redacts` field in the
content, since older rooms would require that field to be outside of
`content`. We can address this by copying it over if necessary.

Relevant spec at
https://spec.matrix.org/v1.8/rooms/v11/#moving-the-redacts-property-of-mroomredaction-events-to-a-content-property

---------

Co-authored-by: Tulir Asokan <tulir@maunium.net>
2025-09-22 14:50:52 +01:00
Tulir Asokan
d80f515622 Update MSC4190 support (#18946) 2025-09-22 14:45:05 +01:00
Max Kratz
4367fb2d07 OIDC doc: adds missing jwt_config values to authentik example (#18931)
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
2025-09-18 15:05:41 +01:00
Andrew Morgan
b596faa4ec Cache _get_e2e_cross_signing_signatures_for_devices (#18899) 2025-09-18 12:06:08 +01:00
Eric Eastwood
6f9fab1089 Fix open redirect in legacy SSO flow (idp) (#18909)
- Validate the `idp` parameter to only accept the ones that are known in
the config file
- URL-encode the `idp` parameter for safety's sake (this is the main
fix)

Fix https://github.com/matrix-org/internal-config/issues/1651 (internal
link)

Regressed in https://github.com/element-hq/synapse/pull/17972
2025-09-17 13:54:47 -05:00
Eric Eastwood
84d64251dc Remove sentinel logcontext where we log in setup, start and exit (#18870)
Remove `sentinel` logcontext where we log in `setup`, `start`, and exit.

Instead of having one giant PR that removes all places we use `sentinel`
logcontext, I've decided to tackle this more piece-meal. This PR covers
the parts if you just startup Synapse and exit it with no requests or
activity going on in between.

Part of https://github.com/element-hq/synapse/issues/18905 (Remove
`sentinel` logcontext where we log in Synapse)

Prerequisite for https://github.com/element-hq/synapse/pull/18868.
Logging with the `sentinel` logcontext means we won't know which server
the log came from.



### Why


9cc4001778/docs/log_contexts.md (L71-L81)

(docs updated in https://github.com/element-hq/synapse/pull/18900)


### Testing strategy

1. Run Synapse normally and with `daemonize: true`: `poetry run
synapse_homeserver --config-path homeserver.yaml`
 1. Execute some requests
 1. Shutdown the server
 1. Look for any bad log entries in your homeserver logs:
    - `Expected logging context sentinel but found main`
    - `Expected logging context main was lost`
    - `Expected previous context`
    - `utime went backwards!`/`stime went backwards!`
- `Called stop on logcontext POST-0 without recording a start rusage`
 1. Look for any logs coming from the `sentinel` context


With these changes, you should only see the following logs (not from
Synapse) using the `sentinel` context if you start up Synapse and exit:

`homeserver.log`
```
2025-09-10 14:45:39,924 - asyncio - 64 - DEBUG - sentinel - Using selector: EpollSelector

2025-09-10 14:45:40,562 - twisted - 281 - INFO - sentinel - Received SIGINT, shutting down.

2025-09-10 14:45:40,562 - twisted - 281 - INFO - sentinel - (TCP Port 9322 Closed)
2025-09-10 14:45:40,563 - twisted - 281 - INFO - sentinel - (TCP Port 8008 Closed)
2025-09-10 14:45:40,563 - twisted - 281 - INFO - sentinel - (TCP Port 9093 Closed)
2025-09-10 14:45:40,564 - twisted - 281 - INFO - sentinel - Main loop terminated.
```
2025-09-16 17:15:08 -05:00
Eric Eastwood
769d30a247 Clarify Python dependency constraints (#18856)
Clarify Python dependency constraints

Spawning from
https://github.com/element-hq/synapse/pull/18852#issuecomment-3212003675
as I don't actually know the the exact rule of thumb. It's unclear to me
what we care about exactly. Our [deprecation
policy](https://element-hq.github.io/synapse/latest/deprecation_policy.html)
mentions Debian oldstable support at-least for the version of SQLite.
But then we only refer to Debian stable for the Twisted dependency.
2025-09-15 09:45:41 -05:00
Eric Eastwood
7ecfe8b1a8 Better explain which context the task is run in when using run_in_background(...) or run_as_background_process(...) (#18906)
Follow-up to https://github.com/element-hq/synapse/pull/18900
2025-09-12 09:29:35 -05:00
Hugh Nimmo-Smith
e1036ffa48 Add get_media_upload_limits_for_user and on_media_upload_limit_exceeded callbacks to module API (#18848)
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
2025-09-12 12:26:19 +01:00
Andrew Morgan
8c98cf7e55 Remove usage of deprecated pkg_resources interface (#18910) 2025-09-12 10:57:04 +01:00
Kegan Dougal
ec64c3e88d Ensure we /send PDUs which pass canonical JSON checks (#18641)
### Pull Request Checklist

Fixes https://github.com/element-hq/synapse/issues/18554

Looks like this was missed when it was
[implemented](2277df2a1e).

<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->

* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
  - Use markdown where necessary, mostly for `code blocks`.
  - End with either a period (.) or an exclamation mark (!).
  - Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct (run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))

---------

Co-authored-by: reivilibre <oliverw@element.io>
2025-09-12 08:54:20 +00:00
reivilibre
ada3a3b2b3 Add experimental support for MSC4308: Thread Subscriptions extension to Sliding Sync when MSC4306 and MSC4186 are enabled. (#18695)
Closes: #18436

Implements:
https://github.com/matrix-org/matrix-spec-proposals/pull/4308

Follows: #18674

Adds an extension to Sliding Sync and a companion
endpoint needed for backpaginating missed thread subscription changes,
as described in MSC4308

---------

Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
2025-09-11 14:45:04 +01:00
Eric Eastwood
9cc4001778 Better explain logcontext in run_in_background(...) and run_as_background_process(...) (#18900)
Also adds a section in the docs explaining the `sentinel` logcontext.

Spawning from https://github.com/element-hq/synapse/pull/18870


### Testing strategy

1. Run Synapse normally and with `daemonize: true`: `poetry run
synapse_homeserver --config-path homeserver.yaml`
 1. Execute some requests
 1. Shutdown the server
 1. Look for any bad log entries in your homeserver logs:
    - `Expected logging context sentinel but found main`
    - `Expected logging context main was lost`
    - `Expected previous context`
    - `utime went backwards!`/`stime went backwards!`
- `Called stop on logcontext POST-0 without recording a start rusage`
    - `Background process re-entered without a proc`

Twisted trial tests:

 1. Run full Twisted trial test suite.
1. Check the logs for `Test starting with non-sentinel logging context ...`
2025-09-10 10:22:53 -05:00
Eric Eastwood
ca655e4020 Start background tasks after we fork the process (daemonize) (#18886)
Spawning from https://github.com/element-hq/synapse/pull/18871

[This change](6ce2f3e59d)
was originally used to fix CPU time going backwards when we `daemonize`.

While, we don't seem to run into this problem on `develop`, I still
think this is a good change to make. We don't need background tasks
running on a process that will soon be forcefully exited and where the
reactor isn't even running yet. We now kick off the background tasks
(`run_as_background_process`) after we have forked the process and
started the reactor.

Also as simple note, we don't need background tasks running in both halves of a fork.
2025-09-09 10:10:34 -05:00
reivilibre
6fe8137a4a Configure Synapse to run MSC4306: Thread Subscriptions Complement tests. (#18819)
Pairs with: https://github.com/matrix-org/complement/pull/795

Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
2025-09-09 11:40:10 +01:00
David Baker
d48e69ad4c Fix prefixed support for MSC4133 (#18875)
This fixes two bugs that affect the availability of MSC4133 until the
next spec release.

1. The servlet didn't recognise the unstable endpoint even when the
homeserver advertised it
 2. The HS didn't advertise support for the stable prefixed version

Would only have been a problem until the next spec release but it's nice
to have it work before then.
2025-09-09 09:53:08 +01:00