synapse

Author	SHA1	Message	Date
Andrew Morgan	d8e6fcced7	1.141.0rc2	2025-10-28 10:20:39 +00:00
Andrew Morgan	db9a61c30f	Fix `bcrypt` errors preventing users from being able to log in (#19101 )	2025-10-28 10:16:02 +00:00
Andrew Morgan	1271e896b5	1.141.0rc1	2025-10-21 11:12:59 +01:00
Andrew Morgan	418c9f3fe5	Prevent `bcrypt` from raising a `ValueError` and log (#19078 )	2025-10-21 10:52:28 +01:00
Eric Eastwood	eac862629f	Revert "Move `start_doing_background_updates()` to `SynapseHomeServer.start_background_tasks()` (#19036 )" (#19059 ) ### Why See https://github.com/element-hq/synapse/pull/19036#discussion_r2427070612 Revert while I figure out the tests in https://github.com/element-hq/synapse/pull/19057	2025-10-20 10:55:41 -05:00
Ben Banfield-Zanin	67f22a200d	Update Docker images to use Debian trixie (13) and thus Python 3.13 (#19064 )	2025-10-20 16:49:17 +01:00
Andrew Morgan	a4f9274107	Fix indentation of sighup handler calling code (#19060 )	2025-10-14 15:10:48 +01:00
Tulir Asokan	ec7554b768	Stabilize support for MSC4326: Device masquerading for appservices (#19033 ) Note: the code references MSC3202, which is what MSC4326 was split off from. Only MSC4326 was accepted, MSC3202 wasn't yet.	2025-10-13 11:13:07 -05:00
Eric Eastwood	d2c582ef3c	Move unique snowflake homeserver background tasks to `start_background_tasks` (#19037 ) (the standard pattern for this kind of thing)	2025-10-13 10:19:09 -05:00
Eric Eastwood	2d07bd7fd2	Update TODO list of conflicting areas where we encounter metrics being clobbered (`ApplicationService`) (#19040 ) These errors are harmless and are a long-standing issue that is just now being logged, see https://github.com/element-hq/synapse/issues/19042 ``` 2025-10-10 15:30:00,026 - synapse.util.metrics - 330 - ERROR - notify_interested_services-0 - Metric named cache_lru_cache__matches_user_in_member_list_example.com already registered for server example.com 2025-10-10 16:30:00.167 2025-10-10 15:30:00,026 - synapse.util.metrics - 330 - ERROR - notify_interested_services-0 - Metric named cache_lru_cache_is_interested_in_room_example.com already registered for server example.com 2025-10-10 16:30:00.167 2025-10-10 15:30:00,025 - synapse.util.metrics - 330 - ERROR - notify_interested_services-0 - Metric named cache_lru_cache_is_interested_in_event_example.com already registered for server example.com 2025-10-10 16:29:15.560 2025-10-10 15:29:15,449 - synapse.util.metrics - 330 - ERROR - notify_interested_services_ephemeral-0 - Metric named cache_lru_cache__matches_user_in_member_list_example.com already registered for server example.com 2025-10-10 16:29:15.560 2025-10-10 15:29:15,449 - synapse.util.metrics - 330 - ERROR - notify_interested_services_ephemeral-0 - Metric named cache_lru_cache_is_interested_in_room_example.com already registered for server example.com ```	2025-10-13 10:15:47 -05:00
Andrew Morgan	a7303c5311	Fix deprecated token field in release script (#19039 )	2025-10-13 14:31:09 +01:00
Tulir Asokan	690b3a4fcc	Allow using MSC4190 features without opt-in (#19031 )	2025-10-13 13:07:11 +00:00
Eric Eastwood	d399d7649a	Move `start_doing_background_updates()` to `SynapseHomeServer.start_background_tasks()` (#19036 ) (more sane standard location for this sort of thing) The one difference here is that previously, `start_doing_background_updates ()` only ran on the main Synapse instance. But since it now lives in `start_background_tasks()`, it will run on the worker that supposed to `run_background_tasks`. Doesn't seem like a problem though.	2025-10-10 14:30:38 -05:00
Andrew Morgan	c0d6998dea	1.140.0rc1	2025-10-10 11:24:27 +01:00
Eric Eastwood	47fb4b43ca	Introduce `RootConfig.validate_config()` which can be subclassed in `HomeServerConfig` to do cross-config class validation (#19027 ) This means we can move the open registration config validation from `setup()` to `HomeServerConfig.validate_config()` (much more sane). Spawning from looking at this area of code in https://github.com/element-hq/synapse/pull/19015	2025-10-09 14:56:22 -05:00
Eric Eastwood	715cc5ee37	Split homeserver creation and setup (#19015 ) ### Background As part of Element's plan to support a light form of vhosting (virtual host) (multiple instances of Synapse in the same Python process), we're currently diving into the details and implications of running multiple instances of Synapse in the same Python process. "Clean tenant provisioning" tracked internally by https://github.com/element-hq/synapse-small-hosts/issues/221 ### Partial startup problem In the context of Synapse Pro for Small Hosts, since the Twisted reactor is already running (from the `multi_synapse` shard process itself), when provisioning a homeserver tenant, the `reactor.callWhenRunning(...)` callbacks will be invoked immediately. This includes the Synapse's [`start`](`0615b64bb4/synapse/app/homeserver.py (L418-L429)`) callback which sets up everything (including listeners, background tasks, etc). If we encounter an error at this point, we are partially setup but the exception will [bubble back to us](`8be122186b/multi_synapse/app/shard.py (L114-L121)`) without us having a handle to the homeserver yet so we can't call `hs.shutdown()` and clean everything up. ### What does this PR do? Structures Synapse so we split creating the homeserver instance from setting everything up. This way we have access to `hs` if anything goes wrong during setup and can subsequently `hs.shutdown()` to clean everything up.	2025-10-09 13:12:10 -05:00
Andrew Morgan	d440cfc9e2	Allow any release script command to accept `--gh-token` (#19035 )	2025-10-09 17:15:54 +01:00
fkwp	18f07fdc4c	Add MatrixRTC backend/services discovery endpoint (#18967 ) Co-authored-by: Andrew Morgan <andrew@amorgan.xyz>	2025-10-09 17:15:47 +01:00
Andrew Morgan	e3344dc0c3	Expose `defer_to_threadpool` in the module API (#19032 )	2025-10-09 15:15:13 +01:00
Andrew Morgan	bcbbccca23	Swap `macos-13` with `macos-15-intel` GHA runner in CI (#19025 )	2025-10-08 12:58:42 +01:00
Shay	8f01eb8ee0	Add an Admin API to fetch an event by ID (#18963 ) Adds an endpoint to allow server admins to fetch an event regardless of their membership in the originating room.	2025-10-08 11:38:15 +01:00
Eric Eastwood	631eed91f1	Fix bad merge with `start_background_tasks` (#19013 ) This was originally removed in https://github.com/element-hq/synapse/pull/18886 but it looks like it snuck back in https://github.com/element-hq/synapse/pull/18828 during a [bad merge](`4cd3d9172e`). Noticed while looking at Synapse setup and startup (just by happen stance). I don't think this has adverse effects on Synapse actually working and `start_background_tasks()` can be called multiple times. ### Is there a good way to audit all of these merges? As I would like to see the conflicts for each merge. This works but it's still hard to notice anything is wrong: ``` git log --remerge-diff <commit-sha> ``` > shows the difference from mechanical merge result and the result that is actually recorded in a merge commit via https://stackoverflow.com/questions/15277708/how-do-you-see-show-a-git-merge-conflict-resolution-that-was-done-given-a-mer/71181334#71181334 The following better. Specify the version range to the commit right before the merge to the merge. And can even specify which file to look at to make it more obvious with the hindsight we have now. ``` git log --remerge-diff <merge-commit-sha>~1..<merge-commit-sha> -- synapse/server.py ``` Example: ``` git log --remerge-diff 4cd3d9172ed7b87e509746851a376c861a27820e~1..4cd3d9172ed7b87e509746851a376c861a27820e -- synapse/server.py ```	2025-10-07 13:29:22 -05:00
Eric Eastwood	7b8831310f	No need to have `version_string` as an argument since it's always the same (#19012 ) Assuming, we're happy with https://github.com/element-hq/synapse/pull/19011, this PR makes sense.	2025-10-07 13:27:24 -05:00
Eric Eastwood	ca27938257	Align Synapse version string to use `SYNAPSE_VERSION` (#19011 ) See https://github.com/matrix-org/synapse/pull/12973 where we previously used `version_string="Synapse/" + get_distribution_version_string("matrix-synapse")` everywhere; and then updated to use `version_string=f"Synapse/{SYNAPSE_VERSION}"` for every other place except `synapse/app/homeserver.py` (why?!?!?!). This seems more like a typo than something on purpose especially without any context in the comments or PR. The whole point of that PR was trying to solve the missing git info in version strings. For reference, here is what both variables look like for me locally on the latest `develop`: - `SYNAPSE_VERSION`: `1.139.0 (b=develop,1d2ddbc76e,dirty)` - `VERSION`: `1.139.0` Only reason we may want to do this is to hide the branch name (some sensitive name that exposes a security fix, etc). But we don't hide anything: `https://matrix.org/_matrix/federation/v1/version` ```json { "server": { "name": "Synapse", "version": "1.139.0rc3 (b=matrix-org-hotfixes-priv,f538ed5ac3)" } } ``` On `matrix.org`, the `Server` response header is masked as `cloudflare` which would otherwise show `1.139.0rc3` for everything from the main process. --- This is spawning from looking at the way we setup and start Synapse for homeserver tenant provisioning in the Synapse Pro for Small Hosts project (https://github.com/element-hq/synapse-small-hosts/issues/221)	2025-10-07 10:44:56 -05:00
Andrew Morgan	2443760d0d	Update `KeyUploadServlet` to handle case where client sends `device_keys: null` (#19023 )	2025-10-07 16:23:55 +01:00
Till	42bbff8294	Validate the body of requests to `/keys/upload` (#17097 ) Co-authored-by: Andrew Morgan <andrew@amorgan.xyz> Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com> Co-authored-by: Eric Eastwood <erice@element.io>	2025-10-07 11:27:53 +01:00
Andrew Morgan	5465c68553	Remove unstable prefixes for MSC2732: Olm fallback keys (#18996 ) Co-authored-by: Eric Eastwood <erice@element.io>	2025-10-07 11:15:35 +01:00
Francesco Stefanini	1d2ddbc76e	Fix bug where ephemeral events were not filtered by room ID (#19002 ) Co-authored-by: Andrew Morgan <andrew@amorgan.xyz>	2025-10-03 13:19:57 +01:00
Eric Eastwood	70c044db8e	Remove deprecated `LoggingContext.set_current_context`/`LoggingContext.current_context` methods (#18989 ) These were added for backwards compatibility (and essentially deprecated) in https://github.com/matrix-org/synapse/pull/7408 (2020-05-04) because [`synapse-s3-storage-provider`](https://github.com/matrix-org/synapse-s3-storage-provider) previously relied on them -- but `synapse-s3-storage-provider` since been [updated](https://github.com/matrix-org/synapse-s3-storage-provider/pull/36) to no longer use them.	2025-10-02 13:21:37 -05:00
Eric Eastwood	6835e7be0d	Wrap the Rust HTTP client with `make_deferred_yieldable` (#18903 ) Wrap the Rust HTTP client with `make_deferred_yieldable` so downstream usage doesn't need to use `PreserveLoggingContext()` or `make_deferred_yieldable`. > it seems like we should have some wrapper around it that uses [`make_deferred_yieldable(...)`](`40edb10a98/docs/log_contexts.md (where-you-create-a-new-awaitable-make-it-follow-the-rules)`) to make things right so we don't have to do this in the downstream code. > > -- @MadLittleMods, https://github.com/element-hq/synapse/pull/18357#discussion_r2294941827 Spawning from wanting to [remove `PreserveLoggingContext()` from the codebase](https://github.com/element-hq/synapse/pull/18870) and thinking that we [shouldn't have to pollute all downstream usage with `PreserveLoggingContext()` or `make_deferred_yieldable`](https://github.com/element-hq/synapse/pull/18357#discussion_r2294941827) Part of https://github.com/element-hq/synapse/issues/18905 (Remove `sentinel` logcontext where we log in Synapse)	2025-10-02 13:00:50 -05:00
Eric Eastwood	d27ff161f5	Add debug logs wherever we change current logcontext (#18966 ) Add debug logs wherever we change current logcontext (`LoggingContext`). I've had to make this same set of changes over and over as I've been debugging things so it seems useful enough to include by default. Instead of tracing things at the `set_current_context(...)` level, I've added the debug logging on all of the utilities that utilize `set_current_context(...)`. It's much easier to reason about the log context changing because of `PreserveLoggingContext` changing things than an opaque `set_current_context(...)` call.	2025-10-02 11:51:17 -05:00
Eric Eastwood	06a84f4fe0	Revert "Switch to OpenTracing's `ContextVarsScopeManager` (#18849 )" (#19007 ) Revert https://github.com/element-hq/synapse/pull/18849 Go back to our custom `LogContextScopeManager` after trying OpenTracing's `ContextVarsScopeManager`. Fix https://github.com/element-hq/synapse/issues/19004 ### Why revert? For reference, with the normal reactor, `ContextVarsScopeManager` worked just as good as our custom `LogContextScopeManager` as far as I can tell (and even better in some cases). But since Twisted appears to not fully support `ContextVar`'s, it doesn't work as expected in all cases. Compounding things, `ContextVarsScopeManager` was causing errors with the experimental `SYNAPSE_ASYNC_IO_REACTOR` option. Since we're not getting the full benefit that we originally desired, we might as well revert and figure out alternatives for extending the logcontext lifetimes to support the use case we were trying to unlock (c.f. https://github.com/element-hq/synapse/pull/18804). See https://github.com/element-hq/synapse/issues/19004#issuecomment-3358052171 for more info. ### Does this require backporting and patch releases? No. Since `ContextVarsScopeManager` operates just as good with the normal reactor and was only causing actual errors with the experimental `SYNAPSE_ASYNC_IO_REACTOR` option, I don't think this requires us to backport and make patch releases at all. ### Maintain cross-links between main trace and background process work In order to maintain the functionality introduced in https://github.com/element-hq/synapse/pull/18932 (cross-links between the background process trace and currently active trace), we also needed a small change. Previously, when we were using `ContextVarsScopeManager`, it tracked the tracing scope across the logcontext changes without issue. Now that we're using our own custom `LogContextScopeManager` again, we need to capture the active span from the logcontext before we reset to the sentinel context because of the `PreserveLoggingContext()` below. Added some tests to ensure we maintain the `run_as_background` tracing behavior regardless of the tracing scope manager we use.	2025-10-02 11:27:26 -05:00
Eric Eastwood	1c093509ce	Switch task scheduler from raw logcontext manipulation (`set_current_context`) to utils (`PreserveLoggingContext`) (#18990 ) Prefer the utils over raw logcontext manipulation. Spawning from adding some logcontext debug logs in https://github.com/element-hq/synapse/pull/18966 and since we're not logging at the `set_current_context(...)` level (see reasoning there), this removes some usage of `set_current_context(...)`.	2025-10-02 10:22:25 -05:00
Devon Hudson	396de6544a	Cleanly shutdown SynapseHomeServer object (#18828 ) This PR aims to allow for a clean shutdown of the `SynapseHomeServer` object so that it can be fully deleted and cleaned up by garbage collection without shutting down the entire python process. Fix https://github.com/element-hq/synapse-small-hosts/issues/50 ### Pull Request Checklist <!-- Please read https://element-hq.github.io/synapse/latest/development/contributing_guide.html before submitting your pull request --> * [x] Pull request is based on the develop branch * [x] Pull request includes a [changelog file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog). The entry should: - Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from `EventStore` to `EventWorkerStore`.". - Use markdown where necessary, mostly for `code blocks`. - End with either a period (.) or an exclamation mark (!). - Start with a capital letter. - Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry. * [x] [Code style](https://element-hq.github.io/synapse/latest/code_style.html) is correct (run the [linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters)) --------- Co-authored-by: Eric Eastwood <erice@element.io>	2025-10-01 02:42:09 +00:00
Sebastian Spaeth	d1c96ee0f2	Fix `rc_room_creation` and `rc_reports` docs - remove `per_user` typo (#18998 )	2025-09-30 15:17:11 -05:00
Eric Eastwood	5adb08f3c9	Remove `MockClock()` (#18992 ) Spawning from adding some logcontext debug logs in https://github.com/element-hq/synapse/pull/18966 and since we're not logging at the `set_current_context(...)` level (see reasoning there), this removes some usage of `set_current_context(...)`. Specifically, `MockClock.call_later(...)` doesn't handle logcontexts correctly. It uses the calling logcontext as the callback context (wrong, as the logcontext could finish before the callback finishes) and it didn't reset back to the sentinel context before handing back to the reactor. It was like this since it was [introduced 10+ years ago](`38da9884e7`). Instead of fixing the implementation which would just be a copy of our normal `Clock`, we can just remove `MockClock`	2025-09-30 11:27:29 -05:00
Andrew Morgan	ad8dcc2119	Remove internal `ReplicationUploadKeysForUserRestServlet` (#18988 )	2025-09-30 11:12:14 +01:00
Eric Eastwood	5143f93dc9	Fix `server_name` in logging context for multiple Synapse instances in one process (#18868 ) ### Background As part of Element's plan to support a light form of vhosting (virtual host) (multiple instances of Synapse in the same Python process), we're currently diving into the details and implications of running multiple instances of Synapse in the same Python process. "Per-tenant logging" tracked internally by https://github.com/element-hq/synapse-small-hosts/issues/48 ### Prior art Previously, we exposed `server_name` by providing a static logging `MetadataFilter` that injected the values: `205d9e4fc4/synapse/config/logger.py (L216)` While this can work fine for the normal case of one Synapse instance per Python process, this configures things globally and isn't compatible when we try to start multiple Synapse instances because each subsequent tenant will overwrite the previous tenant. ### What does this PR do? We remove the `MetadataFilter` and replace it by tracking the `server_name` in the `LoggingContext` and expose it with our existing [`LoggingContextFilter`](`205d9e4fc4/synapse/logging/context.py (L584-L622)`) that we already use to expose information about the `request`. This means that the `server_name` value follows wherever we log as expected even when we have multiple Synapse instances running in the same process. ### A note on logcontext Anywhere, Synapse mistakenly uses the `sentinel` logcontext to log something, we won't know which server sent the log. We've been fixing up `sentinel` logcontext usage as tracked by https://github.com/element-hq/synapse/issues/18905 Any further `sentinel` logcontext usage we find in the future can be fixed piecemeal as normal. `d2a966f922/docs/log_contexts.md (L71-L81)` ### Testing strategy 1. Adjust your logging config to include `%(server_name)s` in the format ```yaml formatters: precise: format: '%(asctime)s - %(server_name)s - %(name)s - %(lineno)d - %(levelname)s - %(request)s - %(message)s' ``` 1. Start Synapse: `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Make some requests (`curl http://localhost:8008/_matrix/client/versions`, etc) 1. Open the homeserver logs and notice the `server_name` in the logs as expected. `unknown_server_from_sentinel_context` is expected for the `sentinel` logcontext (things outside of Synapse).	2025-09-26 17:10:48 -05:00
Eric Eastwood	2f2b854ac1	Fix logcontext handling in `timeout_deferred` tests (#18974 ) Related to https://github.com/element-hq/synapse/issues/18905 These fixes were split off from https://github.com/element-hq/synapse/pull/18828 where @devonh was seeing some test failures because `timeout_deferred(...)` is being updated to use `Clock` utilities instead of raw `reactor` methods. This test was failing in that branch/PR until we made this new version that handles the logcontexts properly. While the previous version of this test does pass on `develop`, it was using what appears completely wrong assertions, assumptions, and bad patterns to make it happen (see diff comments below) --- Test originally introduced in https://github.com/matrix-org/synapse/pull/4407	2025-09-26 11:10:02 -05:00
Andrew Morgan	8f61bdb470	Note optional Element Commecial License in SPDX specifiers (#18973 )	2025-09-26 12:43:07 +01:00
Andrew Morgan	7c32988f6b	Update URLs in dockerfile metadata (#18971 )	2025-09-26 12:40:50 +01:00
Hammy Havoc	688f635b59	Updated providers.json to use X instead of Twitter following rebrand and schema change (#18767 )	2025-09-26 11:06:50 +01:00
Eric Eastwood	04721c85e6	Disconnect background process work from request trace (#18932 ) Before https://github.com/element-hq/synapse/pull/18849, we we're using our own custom `LogContextScopeManager` which tied the tracing scope to the `LoggingContext`. Since we created a new `BackgroundProcessLoggingContext` any time we `run_as_background_process(...)`, the trace for the background work was separate from the trace that kicked of the work as expected (e.g. request trace is separate from the background process we kicked to fetch more messages from the federation). Since we've now switched to the `ContextVarsScopeManager` (in https://github.com/element-hq/synapse/pull/18849), the tracing scope now crosses the `LoggingContext` boundaries (and thread boundaries) without a problem. This means we end up with request traces that include all of the background work that we've kicked off bloating the trace and making it hard to understand what's going on. This PR separates the traces again to how things were before. Additionally, things are even better now since I added some cross-link references between the traces to easily be able to jump between. Follow-up to https://github.com/element-hq/synapse/pull/18849 --- In the before, you can see that the trace is blown up by the background process (`bgproc.qwer`). In the after, we now only have a little cross-link marker span (`start_bgproc.qwer`) to jump to background process trace. Before \| After --- \| --- <some image> \| <some image> ### Testing strategy 1. Run a Jaeger instance (https://www.jaegertracing.io/docs/1.6/getting-started/) ```shell $ docker run -d --name jaeger \ -e COLLECTOR_ZIPKIN_HTTP_PORT=9411 \ -p 5775:5775/udp \ -p 6831:6831/udp \ -p 6832:6832/udp \ -p 5778:5778 \ -p 16686:16686 \ -p 14268:14268 \ -p 9411:9411 \ jaegertracing/all-in-one:1.59.0 ``` 1. Configure Synapse to use tracing: `homeserver.yaml` ```yaml ## Tracing ## opentracing: enabled: true jaeger_config: sampler: type: const param: 1 logging: false ``` 1. Make sure the optional `opentracing` dependency is installed: `poetry install --extras all` 1. In the `VersionsRestServlet`, modify it to kick off a dummy background process (easy to test this way) ```python from synapse.metrics.background_process_metrics import run_as_background_process async def _qwer() -> None: await self.clock.sleep(1) run_as_background_process("qwer", "test_server", _qwer) ``` 1. Run Synapse: `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Fire off a version requests: `curl http://localhost:8008/_matrix/client/versions` 1. Visit http://localhost:16686/search to view the traces - Select the correct service - Look for the `VersionsRestServlet` operation - Press 'Find Traces' button - Select the relevant trace - Notice how the trace isn't bloated - Look for the `start_bgproc.qwer` span cross-linking to the background process - Jump to the other trace using the cross-link reference -> `bgproc.qwer`	2025-09-25 21:45:18 -05:00
Travis Ralston	d2a966f922	Use signature support from policy servers when available (#18934 ) Opening on Kegan's behalf [MSC4284](https://github.com/matrix-org/matrix-spec-proposals/pull/4284) has already been opened accordingly. --------- Co-authored-by: Kegan Dougal <7190048+kegsay@users.noreply.github.com> Co-authored-by: Eric Eastwood <erice@element.io>	2025-09-25 19:30:24 +00:00
Hugh Nimmo-Smith	fd8fa97b6a	Document and fix room_config param when user_may_create_room callback is invoked for a room upgrade (#18721 ) Co-authored-by: Eric Eastwood <erice@element.io>	2025-09-24 21:42:19 +00:00
Eric Eastwood	5266e423e2	Explain how Deferred callbacks interact with logcontexts (#18914 ) Spawning from https://github.com/matrix-org/synapse/pull/12588#discussion_r865843321 > It turns out `Deferred.cancel()` is a lot like `Deferred.callback()`/`errback()` in that it will trash the logging context: > it can resume a coroutine, which will restore its own logging context, then run: > > - until it blocks, setting the sentinel context > - or until it terminates, setting the context it was started with > > So we need to wrap it in `with PreserveLoggingContext():`, like we do with `.callback()`: > > ```python > with PreserveLoggingContext(): > self.render_deferred.cancel() > ``` > > -- @squahtx, https://github.com/matrix-org/synapse/pull/12588#discussion_r865843321	2025-09-24 16:20:42 -05:00
Eric Eastwood	0458f691b6	Fix `run_coroutine_in_background(...)` incorrectly handling logcontext (#18964 ) Regressed in https://github.com/element-hq/synapse/pull/18900#discussion_r2331554278 (see conversation there for more context) ### How is this a regression? > To give this an update with more hindsight; this logic was redundant with the early return and it is safe to remove this complexity ✅ > > It seems like this actually has to do with completed vs incomplete deferreds... > > To explain how things previously worked without the early-return shortcut: > > With the normal case of incomplete awaitable, we store the `calling_context` and the `f` function is called and runs until it yields to the reactor. Because `f` follows the logcontext rules, it sets the `sentinel` logcontext. Then in `run_in_background(...)`, we restore the `calling_context`, store the current `ctx` (which is `sentinel`) and return. When the deferred completes, we restore `ctx` (which is `sentinel`) before yielding to the reactor again (all good ✅) > > With the other case where we see a completed awaitable, we store the `calling_context` and the `f` function is called and runs to completion (no logcontext change). This is where the shortcut would kick in but I'm going to continue explaining as if we commented out the shortcut. -- Then in `run_in_background(...)`, we restore the `calling_context`, store the current `ctx` (which is same as the `calling_context`). Because the deferred is already completed, our extra callback is called immediately and we restore `ctx` (which is same as the `calling_context`). Since we never yield to the reactor, the `calling_context` is perfect as that's what we want again (all good ✅) > > --- > > But this also means that our early-return shortcut is no longer just an optimization and is necessary to act correctly in the completed awaitable case as we want to return with the `calling_context` and not reset to the `sentinel` context. I've updated the comment in https://github.com/element-hq/synapse/pull/18964 to explain the necessity as it's currently just described as an optimization. > > But because we made the same change to `run_coroutine_in_background(...)` which didn't have the same early-return shortcut, we regressed the correct behavior ❌ . This is being fixed in https://github.com/element-hq/synapse/pull/18964 > > > -- @MadLittleMods, https://github.com/element-hq/synapse/pull/18900#discussion_r2373582917 ### How did we find this problem? Spawning from @wrjlewis [seeing](https://matrix.to/#/!SGNQGPGUwtcPBUotTL:matrix.org/$h3TxxPVlqC6BTL07dbrsz6PmaUoZxLiXnSTEY-QYDtA?via=jki.re&via=matrix.org&via=element.io) `Starting metrics collection 'typing.get_new_events' from sentinel context: metrics will be lost` in the logs: <details> <summary>More logs</summary> ``` synapse.http.request_metrics - 222 - ERROR - sentinel - Trying to stop RequestMetrics in the sentinel context. 2025-09-23 14:43:19,712 - synapse.util.metrics - 212 - WARNING - sentinel - Starting metrics collection 'typing.get_new_events' from sentinel context: metrics will be lost 2025-09-23 14:43:19,713 - synapse.rest.client.sync - 851 - INFO - sentinel - Client has disconnected; not serializing response. 2025-09-23 14:43:19,713 - synapse.http.server - 825 - WARNING - sentinel - Not sending response to request <XForwardedForRequest at 0x7f23e8111ed0 method='POST' uri='/_matrix/client/unstable/org.matrix.simplified_msc3575/sync?pos=281963%2Fs929324_147053_10_2652457_147960_2013_25554_4709564_0_164_2&timeout=30000' clientproto='HTTP/1.1' site='8008'>, already dis connected. 2025-09-23 14:43:19,713 - synapse.access.http.8008 - 515 - INFO - sentinel - 92.40.194.87 - 8008 - {@me:wi11.co.uk} Processed request: 30.005sec/-8.041sec (0.001sec, 0.000sec) (0.000sec/0.002sec/2) 0B 200! "POST /_matrix/client/unstable/org.matrix.simplified_msc3575/ ``` </details> From the logs there, we can see things relating to `typing.get_new_events` and `/_matrix/client/unstable/org.matrix.simplified_msc3575/sync` which led me to trying out Sliding Sync with the typing extension enabled and allowed me to reproduce the problem locally. Sliding Sync is a unique scenario as it's the only place we use `gather_optional_coroutines(...)` -> `run_coroutine_in_background(...)` (introduced in https://github.com/element-hq/synapse/pull/17884) to exhibit this behavior. ### Testing strategy 1. Configure Synapse to enable [MSC4186](https://github.com/matrix-org/matrix-spec-proposals/pull/4186): Simplified Sliding Sync which is actually under [MSC3575](https://github.com/matrix-org/matrix-spec-proposals/pull/3575) ```yaml experimental_features: msc3575_enabled: true ``` 1. Start synapse: `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Make a Sliding Sync request with one of the extensions enabled ```http POST http://localhost:8008/_matrix/client/unstable/org.matrix.simplified_msc3575/sync { "lists": {}, "room_subscriptions": { "!FlgJYGQKAIvAscfBhq:my.synapse.linux.server": { "required_state": [], "timeline_limit": 1 } }, "extensions": { "typing": { "enabled": true } } } ``` 1. Open your homeserver logs and notice warnings about `Starting ... from sentinel context: metrics will be lost`	2025-09-24 15:24:47 +00:00
Eric Eastwood	25fa555395	Fix `no active span when trying to log` tracing error on startup (#18959 ) Fix `no active span when trying to log` tracing error on startup. Example error: ```log synapse.logging.opentracing - 427 - ERROR - wake_destinations_needing_catchup-0 - There was no active span when trying to log. Did you forget to start one or did a context slip? Stack (most recent call last): File "/usr/lib/python3.13/threading.py", line 1014, in _bootstrap self._bootstrap_inner() File "/usr/lib/python3.13/threading.py", line 1043, in _bootstrap_inner self.run() File "/usr/lib/python3.13/threading.py", line 994, in run self._target(self._args, self._kwargs) File "python3.13/site-packages/twisted/_threads/_threadworker.py", line 75, in work task() File "python3.13/site-packages/twisted/_threads/_team.py", line 192, in doWork task() File "python3.13/site-packages/twisted/python/threadpool.py", line 269, in inContext result = inContext.theWork() # type: ignore[attr-defined] File "python3.13/site-packages/twisted/python/threadpool.py", line 285, in <lambda> inContext.theWork = lambda: context.call( # type: ignore[attr-defined] File "python3.13/site-packages/twisted/python/context.py", line 117, in callWithContext return self.currentContext().callWithContext(ctx, func, args, *kw) File "python3.13/site-packages/twisted/python/context.py", line 82, in callWithContext return func(args, *kw) File "python3.13/site-packages/twisted/enterprise/adbapi.py", line 282, in _runWithConnection result = func(conn, args, *kw) File "synapse/synapse/storage/database.py", line 1094, in inner_func return func(db_conn, args, **kwargs) File "synapse/synapse/storage/database.py", line 822, in new_transaction opentracing.log_kv({"message": "commit"}) File "synapse/synapse/logging/opentracing.py", line 427, in ensure_active_span_inner_2 logger.error( ``` ### Why did this happen before? This previously occurred because we called `init_tracer(...)` after the reactor started up in `_base.start()`. But we actually attempt some database transactions earlier than that which try to do some tracing because of that `oidc = hs.get_oidc_handler()` line. Notice `oidc = hs.get_oidc_handler()` happened before `_base.start(hs)`: `5be7679dd9/synapse/app/homeserver.py (L397-L408)` With this PR, I've updated things to `init_tracer(...)` earlier on alongside where we `setup_logging(...)`.	2025-09-24 10:12:08 -05:00
Andrew Morgan	7708801d56	Fix `triage_labelled` GHA workflow (#18913 )	2025-09-24 14:17:14 +01:00
PizZaKatZe	e766f325af	fix: Compute user last seen timestamp from last seen devices (#18948 ) ## Fix last seen timestamp in `/_synapse/admin/v2/users` response Fixes #18955 The last seen timestamps contained in `/_synapse/admin/v2/users` responses were computed as follows: ```sql [...] LEFT JOIN ( SELECT user_id, MAX(last_seen) AS last_seen_ts FROM user_ips GROUP BY user_id ) ls ON u.name = ls.user_id [...] ``` `4367fb2d07/synapse/storage/databases/main/__init__.py (L302C1-L305C44)` This leads to empty timestamps (as in: user was never seen) if users are inactive for longer than [`user_ips_max_age`](https://element-hq.github.io/synapse/latest/usage/configuration/config_documentation.html#user_ips_max_age). The fix is quite trivial: Use the `devices` table, as this one also contains last seen timestamps but is not periodically purged. We are using this for automatic user account deletion (via [synadm](https://codeberg.org/synadm/synadm)) and the patched code works as intended, whereas the unpatched version wants to delete users during long vacations. 🫣	2025-09-24 11:59:11 +01:00

1 2 3 4 5 ...

8988 Commits