Fixes https://github.com/element-hq/synapse/issues/19175
This PR moves tracking of what lazy loaded membership we've sent to each
room out of the required state table. This avoids that table from
continuously growing, which massively helps performance as we pull out
all matching rows for the connection when we receive a request.
The new table is only read when we have data in a room to send, so we
end up reading a lot fewer rows from the DB. Though we now read from
that table for every room we have events to return in, rather than once
at the start of the request.
For an explanation of how the new table works, see the
[comment](https://github.com/element-hq/synapse/blob/erikj/sss_better_membership_storage2/synapse/storage/schema/main/delta/93/02_sliding_sync_members.sql#L15-L38)
on the table schema.
The table is designed so that we can later prune old entries if we wish,
but that is not implemented in this PR.
Reviewable commit-by-commit.
---------
Co-authored-by: Eric Eastwood <erice@element.io>
We add some logic to expire sliding sync connections if they get old or
if there is too much pending data to return.
The values of the constants are picked fairly arbitrarily, these are
currently:
1. More than 100 rooms with pending events if the connection hasn't been
used in over an hour
2. The connection hasn't been used for over a week
Reviewable commit-by-commit
---------
Co-authored-by: Eric Eastwood <erice@element.io>
Introduce `Clock.call_when_running(...)` to wrap startup code in a
logcontext, ensuring we can identify which server generated the logs.
Background:
> Ideally, nothing from the Synapse homeserver would be logged against the `sentinel`
> logcontext as we want to know which server the logs came from. In practice, this is not
> always the case yet especially outside of request handling.
>
> Global things outside of Synapse (e.g. Twisted reactor code) should run in the
> `sentinel` logcontext. It's only when it calls into application code that a logcontext
> gets activated. This means the reactor should be started in the `sentinel` logcontext,
> and any time an awaitable yields control back to the reactor, it should reset the
> logcontext to be the `sentinel` logcontext. This is important to avoid leaking the
> current logcontext to the reactor (which would then get picked up and associated with
> the next thing the reactor does).
>
> *-- `docs/log_contexts.md`
Also adds a lint to prefer `Clock.call_when_running(...)` over
`reactor.callWhenRunning(...)`
Part of https://github.com/element-hq/synapse/issues/18905
This was correctly handled for the "fallback" case where the background
updates hadn't finished
---------
Co-authored-by: Eric Eastwood <erice@element.io>
This is particularly a problem in a state reset scenario where the membership
might change without a corresponding event.
This PR is targeting a scenario where a state reset happens which causes
room membership to change. Previously, the cache would just hold onto
stale data and now we properly bust the cache in this scenario.
We have a few tests for these scenarios which you can see are now fixed
because we can remove the `FIXME` where we were previously manually
busting the cache in the test itself.
This is a general Synapse thing so by it's nature it helps out Sliding
Sync.
Fix https://github.com/element-hq/synapse/issues/17368
Prerequisite for https://github.com/element-hq/synapse/issues/17929
---
Match when are busting `_curr_state_delta_stream_cache`
Reset `sliding_sync_membership_snapshots` -> `forgotten` status when
membership changes (like rejoining a room).
Fix https://github.com/element-hq/synapse/issues/17781
### What was the problem before?
Previously, if someone used `/forget` on one of their rooms, it would
update `sliding_sync_membership_snapshots` as expected but when someone
rejoined the room (or had any membership change), the upsert didn't
overwrite and reset the `forgotten` status so it remained `forgotten`
and invisible down the Sliding Sync endpoint.
Performance optimization: We can avoid fetching rooms that the user has
left themselves (which could be a significant amount), then only add
back rooms that the user has `newly_left` (left in the token range of an
incremental sync). It's a lot faster to fetch less rooms than fetch them
all and throw them away in most cases. Since the user only leaves a room
(or is state reset out) once in a blue moon, we can avoid a lot of work.
Based on @erikjohnston's branch, erikj/ss_perf
---------
Co-authored-by: Erik Johnston <erik@matrix.org>
No need to sort if the range is large enough to cover all of the rooms
in the list. Previously, we would only do this optimization if the range
was exactly large enough.
Follow-up to https://github.com/element-hq/synapse/pull/17672
Move filters tests to rest layer in order to test the new (with sliding
sync tables) and fallback paths that Sliding Sync can use.
Also found a bug in the new path because it's not being tested which is
also fixed in this PR. We now take into account `has_known_state` when
filtering.
Spawning from
https://github.com/element-hq/synapse/pull/17662#discussion_r1755574791.
This should have been done when we started using the new sliding sync
tables in https://github.com/element-hq/synapse/pull/17630
We need to bust the `get_sliding_sync_rooms_for_user`
cache when the room encryption is updated and any
other field that is used in the query.
Follow-up to https://github.com/element-hq/synapse/pull/17630
- Bust cache for membership change (cross-reference
`get_rooms_for_user`)
- Bust cache for room `encryption` (cross-reference
`get_room_encryption`)
- Bust cache for `forgotten` (cross-reference
`did_forget`/`get_forgotten_rooms_for_user`)
I thought ruff check would also format, but it doesn't.
This runs ruff format in CI and dev scripts. The first commit is just a
run of `ruff format .` in the root directory.