Add changelog

Use parse_duration for newly introduced options
Replace EventContext fields prev_group and delta_ids with field state_group_deltas (#15233 )
2023-06-14 00:50:53 +02:00 · 2023-06-13 23:55:15 +02:00 · 2023-06-13 13:22:06 -07:00 · 2023-06-13 19:51:47 +02:00 · 2023-06-13 12:31:08 -05:00 · 2023-06-13 18:07:55 +02:00
51 changed files with 370 additions and 213 deletions
@@ -1,88 +1,3 @@
-Synapse 1.86.0 (2023-06-20)
-===========================
-
-No significant changes since 1.86.0rc2.
-
-
-Synapse 1.86.0rc2 (2023-06-14)
-==============================
-
-Bugfixes
--------
-
- Fix an error when having workers of different versions running. ([\#15774](https://github.com/matrix-org/synapse/issues/15774))
-
-
-Synapse 1.86.0rc1 (2023-06-13)
-==============================
-
-This version was tagged but never released.
-
-Features
--------
-
- Stable support for [MSC3882](https://github.com/matrix-org/matrix-spec-proposals/pull/3882) to allow an existing device/session to generate a login token for use on a new device/session. ([\#15388](https://github.com/matrix-org/synapse/issues/15388))
- Support resolving a room's [canonical alias](https://spec.matrix.org/v1.7/client-server-api/#mroomcanonical_alias) via the module API. ([\#15450](https://github.com/matrix-org/synapse/issues/15450))
- Enable support for [MSC3952](https://github.com/matrix-org/matrix-spec-proposals/pull/3952): intentional mentions. ([\#15520](https://github.com/matrix-org/synapse/issues/15520))
- Experimental [MSC3861](https://github.com/matrix-org/matrix-spec-proposals/pull/3861) support: delegate auth to an OIDC provider. ([\#15582](https://github.com/matrix-org/synapse/issues/15582))
- Add Synapse version deploy annotations to Grafana dashboard which enables easy correlation between behavior changes witnessed in a graph to a certain Synapse version and nail down regressions. ([\#15674](https://github.com/matrix-org/synapse/issues/15674))
- Add a catch-all * to the supported relation types when redacting an event and its related events. This is an update to [MSC3912](https://github.com/matrix-org/matrix-spec-proposals/pull/3861) implementation. ([\#15705](https://github.com/matrix-org/synapse/issues/15705))
- Speed up `/messages` by backfilling in the background when there are no backward extremities where we are directly paginating. ([\#15710](https://github.com/matrix-org/synapse/issues/15710))
- Expose a metric reporting the database background update status. ([\#15740](https://github.com/matrix-org/synapse/issues/15740))
-
-
-Bugfixes
--------
-
- Correctly clear caches when we delete a room. ([\#15609](https://github.com/matrix-org/synapse/issues/15609))
- Check permissions for enabling encryption earlier during room creation to avoid creating broken rooms. ([\#15695](https://github.com/matrix-org/synapse/issues/15695))
-
-
-Improved Documentation
----------------------
-
- Simplify query to find participating servers in a room. ([\#15732](https://github.com/matrix-org/synapse/issues/15732))
-
-
-Internal Changes
----------------
-
- Log when events are (maybe unexpectedly) filtered out of responses in tests. ([\#14213](https://github.com/matrix-org/synapse/issues/14213))
- Read from column `full_user_id` rather than `user_id` of tables `profiles` and `user_filters`. ([\#15649](https://github.com/matrix-org/synapse/issues/15649))
- Add support for tracing functions which return `Awaitable`s. ([\#15650](https://github.com/matrix-org/synapse/issues/15650))
- Cache requests for user's devices over federation. ([\#15675](https://github.com/matrix-org/synapse/issues/15675))
- Add fully qualified docker image names to Dockerfiles. ([\#15689](https://github.com/matrix-org/synapse/issues/15689))
- Remove some unused code. ([\#15690](https://github.com/matrix-org/synapse/issues/15690))
- Improve type hints. ([\#15694](https://github.com/matrix-org/synapse/issues/15694), [\#15697](https://github.com/matrix-org/synapse/issues/15697))
- Update docstring and traces on `maybe_backfill()` functions. ([\#15709](https://github.com/matrix-org/synapse/issues/15709))
- Add context for when/why to use the `long_retries` option when sending Federation requests. ([\#15721](https://github.com/matrix-org/synapse/issues/15721))
- Removed some unused fields. ([\#15723](https://github.com/matrix-org/synapse/issues/15723))
- Update federation error to more plainly explain we can only authorize our own membership events. ([\#15725](https://github.com/matrix-org/synapse/issues/15725))
- Prevent the `latest_deps` and `twisted_trunk` daily GitHub Actions workflows from running on forks of the codebase. ([\#15726](https://github.com/matrix-org/synapse/issues/15726))
- Improve performance of user directory search. ([\#15729](https://github.com/matrix-org/synapse/issues/15729))
- Remove redundant table join with `room_memberships` when doing a `is_host_joined()`/`is_host_invited()` call (`membership` is already part of the `current_state_events`). ([\#15731](https://github.com/matrix-org/synapse/issues/15731))
- Remove superfluous `room_memberships` join from background update. ([\#15733](https://github.com/matrix-org/synapse/issues/15733))
- Speed up typechecking CI. ([\#15752](https://github.com/matrix-org/synapse/issues/15752))
- Bump minimum supported Rust version to 1.60.0. ([\#15768](https://github.com/matrix-org/synapse/issues/15768))
-
-### Updates to locked dependencies
-
-* Bump importlib-metadata from 6.1.0 to 6.6.0. ([\#15711](https://github.com/matrix-org/synapse/issues/15711))
-* Bump library/redis from 6-bullseye to 7-bullseye in /docker. ([\#15712](https://github.com/matrix-org/synapse/issues/15712))
-* Bump log from 0.4.18 to 0.4.19. ([\#15761](https://github.com/matrix-org/synapse/issues/15761))
-* Bump phonenumbers from 8.13.11 to 8.13.13. ([\#15763](https://github.com/matrix-org/synapse/issues/15763))
-* Bump pyasn1 from 0.4.8 to 0.5.0. ([\#15713](https://github.com/matrix-org/synapse/issues/15713))
-* Bump pydantic from 1.10.8 to 1.10.9. ([\#15762](https://github.com/matrix-org/synapse/issues/15762))
-* Bump pyo3-log from 0.8.1 to 0.8.2. ([\#15759](https://github.com/matrix-org/synapse/issues/15759))
-* Bump pyopenssl from 23.1.1 to 23.2.0. ([\#15765](https://github.com/matrix-org/synapse/issues/15765))
-* Bump regex from 1.7.3 to 1.8.4. ([\#15769](https://github.com/matrix-org/synapse/issues/15769))
-* Bump sentry-sdk from 1.22.1 to 1.25.0. ([\#15714](https://github.com/matrix-org/synapse/issues/15714))
-* Bump sentry-sdk from 1.25.0 to 1.25.1. ([\#15764](https://github.com/matrix-org/synapse/issues/15764))
-* Bump serde from 1.0.163 to 1.0.164. ([\#15760](https://github.com/matrix-org/synapse/issues/15760))
-* Bump types-jsonschema from 4.17.0.7 to 4.17.0.8. ([\#15716](https://github.com/matrix-org/synapse/issues/15716))
-* Bump types-pyopenssl from 23.1.0.2 to 23.2.0.0. ([\#15766](https://github.com/matrix-org/synapse/issues/15766))
-* Bump types-requests from 2.31.0.0 to 2.31.0.1. ([\#15715](https://github.com/matrix-org/synapse/issues/15715))
-
 Synapse 1.85.2 (2023-06-08)
 ===========================

@@ -0,0 +1 @@
+Allow for the configuration of max request retries and min/max retry delays in the matrix federation client.
@@ -0,0 +1 @@
+Log when events are (maybe unexpectedly) filtered out of responses in tests.
@@ -0,0 +1 @@
+Replace `EventContext` fields `prev_group` and `delta_ids` with field `state_group_deltas`.
@@ -0,0 +1 @@
+Stable support for [MSC3882](https://github.com/matrix-org/matrix-spec-proposals/pull/3882) to allow an existing device/session to generate a login token for use on a new device/session.
@@ -0,0 +1 @@
+Support resolving a room's [canonical alias](https://spec.matrix.org/v1.7/client-server-api/#mroomcanonical_alias) via the module API.
@@ -0,0 +1 @@
+Enable support for [MSC3952](https://github.com/matrix-org/matrix-spec-proposals/pull/3952): intentional mentions.
@@ -0,0 +1 @@
+Experimental [MSC3861](https://github.com/matrix-org/matrix-spec-proposals/pull/3861) support: delegate auth to an OIDC provider.
@@ -0,0 +1 @@
+Correctly clear caches when we delete a room.
@@ -0,0 +1 @@
+Read from column `full_user_id` rather than `user_id` of tables `profiles` and `user_filters`.
@@ -0,0 +1 @@
+Add support for tracing functions which return `Awaitable`s.
@@ -0,0 +1 @@
+Add Syanpse version deploy annotations to Grafana dashboard which enables easy correlation between behavior changes witnessed in a graph to a certain Synapse version and nail down regressions.
@@ -0,0 +1 @@
+Cache requests for user's devices over federation.
@@ -0,0 +1 @@
+Add fully qualified docker image names to Dockerfiles.
@@ -0,0 +1 @@
+Remove some unused code.
@@ -0,0 +1 @@
+Improve type hints.
@@ -0,0 +1 @@
+Check permissions for enabling encryption earlier during room creation to avoid creating broken rooms.
@@ -0,0 +1 @@
+Improve type hints.
@@ -0,0 +1 @@
+Add a catch-all * to the supported relation types when redacting an event and its related events. This is an update to [MSC3912](https://github.com/matrix-org/matrix-spec-proposals/pull/3861) implementation.
@@ -0,0 +1 @@
+Update docstring and traces on `maybe_backfill()` functions.
@@ -0,0 +1 @@
+Speed up `/messages` by backfilling in the background when there are no backward extremities where we are directly paginating.
@@ -0,0 +1 @@
+Add context for when/why to use the `long_retries` option when sending Federation requests.
@@ -0,0 +1 @@
+Removed some unused fields.
@@ -0,0 +1 @@
+Update federation error to more plainly explain we can only authorize our own membership events.
@@ -0,0 +1 @@
+Prevent the `latest_deps` and `twisted_trunk` daily GitHub Actions workflows from running on forks of the codebase.
@@ -0,0 +1 @@
+Improve performance of user directory search.
@@ -0,0 +1 @@
+Remove redundant table join with `room_memberships` when doing a `is_host_joined()`/`is_host_invited()` call (`membership` is already part of the `current_state_events`).
@@ -0,0 +1 @@
+Simplify query to find participating servers in a room.
@@ -0,0 +1 @@
+Remove superfluous `room_memberships` join from background update.
@@ -0,0 +1 @@
+Improve `/messages` response time by avoiding backfill when we already have messages to return.
@@ -0,0 +1 @@
+Expose a metric reporting the database background update status.
@@ -0,0 +1 @@
+Speed up typechecking CI.
@@ -0,0 +1 @@
+Fix requesting multiple keys at once over federation, related to [MSC3983](https://github.com/matrix-org/matrix-spec-proposals/pull/3983).
@@ -0,0 +1 @@
+Bump minimum supported Rust version to 1.60.0.
@@ -0,0 +1 @@
+Fix requesting multiple keys at once over federation, related to [MSC3983](https://github.com/matrix-org/matrix-spec-proposals/pull/3983).
@@ -0,0 +1 @@
+Use parse_duration for federation client timeout and retry options.
@@ -1,21 +1,3 @@
-matrix-synapse-py3 (1.86.0) stable; urgency=medium
-
-  * New Synapse release 1.86.0.
-
- -- Synapse Packaging team <packages@matrix.org>  Tue, 20 Jun 2023 17:22:46 +0200
-
-matrix-synapse-py3 (1.86.0~rc2) stable; urgency=medium
-
-  * New Synapse release 1.86.0rc2.
-
- -- Synapse Packaging team <packages@matrix.org>  Wed, 14 Jun 2023 12:16:27 +0200
-
-matrix-synapse-py3 (1.86.0~rc1) stable; urgency=medium
-
-  * New Synapse release 1.86.0rc1.
-
- -- Synapse Packaging team <packages@matrix.org>  Tue, 13 Jun 2023 14:30:45 +0200
-
 matrix-synapse-py3 (1.85.2) stable; urgency=medium

  * New Synapse release 1.85.2.
@@ -1196,6 +1196,32 @@ Example configuration:
 allow_device_name_lookup_over_federation: true
 ```
 ---
+### `federation`
+
+The federation section defines some sub-options related to federation.
+
+The following options are related to configuring timeout and retry logic for one request,
+independently of the others.
+Short retry algorithm is used when something or someone will wait for the request to have an
+answer, while long retry is used for requests that happen in the background,
+like sending a federation transaction.
+
+* `client_timeout`: timeout for the federation requests in seconds. Default to 60s.
+* `max_short_retry_delay`: maximum delay to be used for the short retry algo in seconds. Default to 2s.
+* `max_long_retry_delay`: maximum delay to be used for the short retry algo in seconds. Default to 60s.
+* `max_short_retries`: maximum number of retries for the short retry algo. Default to 3 attempts.
+* `max_long_retries`: maximum number of retries for the long retry algo. Default to 10 attempts.
+
+Example configuration:
+```yaml
+federation:
+  client_timeout: 180
+  max_short_retry_delay: 7
+  max_long_retry_delay: 100
+  max_short_retries: 5
+  max_long_retries: 20
+```
+---
 ## Caching

 Options related to caching.
@@ -89,7 +89,7 @@ manifest-path = "rust/Cargo.toml"

 [tool.poetry]
 name = "matrix-synapse"
-version = "1.86.0"
+version = "1.85.2"
 description = "Homeserver for the Matrix decentralised comms protocol"
 authors = ["Matrix.org Team and Contributors <packages@matrix.org>"]
 license = "Apache-2.0"
@@ -22,6 +22,8 @@ class FederationConfig(Config):
    section = "federation"

    def read_config(self, config: JsonDict, **kwargs: Any) -> None:
+        federation_config = config.setdefault("federation", {})
+
        # FIXME: federation_domain_whitelist needs sytests
        self.federation_domain_whitelist: Optional[dict] = None
        federation_domain_whitelist = config.get("federation_domain_whitelist", None)
@@ -49,5 +51,19 @@ class FederationConfig(Config):
            "allow_device_name_lookup_over_federation", False
        )

+        # Allow for the configuration of timeout, max request retries
+        # and min/max retry delays in the matrix federation client.
+        self.client_timeout = Config.parse_duration(
+            federation_config.get("client_timeout", "60s")
+        )
+        self.max_long_retry_delay = Config.parse_duration(
+            federation_config.get("max_long_retry_delay", "60s")
+        )
+        self.max_short_retry_delay = Config.parse_duration(
+            federation_config.get("max_short_retry_delay", "2s")
+        )
+        self.max_long_retries = federation_config.get("max_long_retries", 10)
+        self.max_short_retries = federation_config.get("max_short_retries", 3)
+

 _METRICS_FOR_DOMAINS_SCHEMA = {"type": "array", "items": {"type": "string"}}
@@ -12,7 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 from abc import ABC, abstractmethod
-from typing import TYPE_CHECKING, List, Optional, Tuple
+from typing import TYPE_CHECKING, Dict, List, Optional, Tuple

 import attr
 from immutabledict import immutabledict
@@ -107,33 +107,32 @@ class EventContext(UnpersistedEventContextBase):
        state_delta_due_to_event: If `state_group` and `state_group_before_event` are not None
            then this is the delta of the state between the two groups.

-        prev_group: If it is known, ``state_group``'s prev_group. Note that this being
-            None does not necessarily mean that ``state_group`` does not have
-            a prev_group!
+        state_group_deltas: If not empty, this is a dict collecting a mapping of the state
+            difference between state groups.

-            If the event is a state event, this is normally the same as
-            ``state_group_before_event``.
+            The keys are a tuple of two integers: the initial group and final state group.
+            The corresponding value is a state map representing the state delta between
+            these state groups.

-            If ``state_group`` is None (ie, the event is an outlier), ``prev_group``
-            will always also be ``None``.
+            The dictionary is expected to have at most two entries with state groups of:

-            Note that this *not* (necessarily) the state group associated with
-            ``_prev_state_ids``.
+            1. The state group before the event and after the event.
+            2. The state group preceding the state group before the event and the
+               state group before the event.

-        delta_ids: If ``prev_group`` is not None, the state delta between ``prev_group``
-            and ``state_group``.
+            This information is collected and stored as part of an optimization for persisting
+            events.

        partial_state: if True, we may be storing this event with a temporary,
            incomplete state.
    """

    _storage: "StorageControllers"
+    state_group_deltas: Dict[Tuple[int, int], StateMap[str]]
    rejected: Optional[str] = None
    _state_group: Optional[int] = None
    state_group_before_event: Optional[int] = None
    _state_delta_due_to_event: Optional[StateMap[str]] = None
-    prev_group: Optional[int] = None
-    delta_ids: Optional[StateMap[str]] = None
    app_service: Optional[ApplicationService] = None

    partial_state: bool = False
@@ -145,16 +144,14 @@ class EventContext(UnpersistedEventContextBase):
        state_group_before_event: Optional[int],
        state_delta_due_to_event: Optional[StateMap[str]],
        partial_state: bool,
-        prev_group: Optional[int] = None,
-        delta_ids: Optional[StateMap[str]] = None,
+        state_group_deltas: Dict[Tuple[int, int], StateMap[str]],
    ) -> "EventContext":
        return EventContext(
            storage=storage,
            state_group=state_group,
            state_group_before_event=state_group_before_event,
            state_delta_due_to_event=state_delta_due_to_event,
-            prev_group=prev_group,
-            delta_ids=delta_ids,
+            state_group_deltas=state_group_deltas,
            partial_state=partial_state,
        )

@@ -163,7 +160,7 @@ class EventContext(UnpersistedEventContextBase):
        storage: "StorageControllers",
    ) -> "EventContext":
        """Return an EventContext instance suitable for persisting an outlier event"""
-        return EventContext(storage=storage)
+        return EventContext(storage=storage, state_group_deltas={})

    async def persist(self, event: EventBase) -> "EventContext":
        return self
@@ -183,13 +180,15 @@ class EventContext(UnpersistedEventContextBase):
            "state_group": self._state_group,
            "state_group_before_event": self.state_group_before_event,
            "rejected": self.rejected,
-            "prev_group": self.prev_group,
+            "state_group_deltas": _encode_state_group_delta(self.state_group_deltas),
            "state_delta_due_to_event": _encode_state_dict(
                self._state_delta_due_to_event
            ),
-            "delta_ids": _encode_state_dict(self.delta_ids),
            "app_service_id": self.app_service.id if self.app_service else None,
            "partial_state": self.partial_state,
+            # add dummy delta_ids and prev_group for backwards compatibility
+            "delta_ids": None,
+            "prev_group": None,
        }

    @staticmethod
@@ -204,17 +203,24 @@ class EventContext(UnpersistedEventContextBase):
        Returns:
            The event context.
        """
+        # workaround for backwards/forwards compatibility: if the input doesn't have a value
+        # for "state_group_deltas" just assign an empty dict
+        state_group_deltas = input.get("state_group_deltas", None)
+        if state_group_deltas:
+            state_group_deltas = _decode_state_group_delta(state_group_deltas)
+        else:
+            state_group_deltas = {}
+
        context = EventContext(
            # We use the state_group and prev_state_id stuff to pull the
            # current_state_ids out of the DB and construct prev_state_ids.
            storage=storage,
            state_group=input["state_group"],
            state_group_before_event=input["state_group_before_event"],
-            prev_group=input["prev_group"],
+            state_group_deltas=state_group_deltas,
            state_delta_due_to_event=_decode_state_dict(
                input["state_delta_due_to_event"]
            ),
-            delta_ids=_decode_state_dict(input["delta_ids"]),
            rejected=input["rejected"],
            partial_state=input.get("partial_state", False),
        )
@@ -349,7 +355,7 @@ class UnpersistedEventContext(UnpersistedEventContextBase):
    _storage: "StorageControllers"
    state_group_before_event: Optional[int]
    state_group_after_event: Optional[int]
-    state_delta_due_to_event: Optional[dict]
+    state_delta_due_to_event: Optional[StateMap[str]]
    prev_group_for_state_group_before_event: Optional[int]
    delta_ids_to_state_group_before_event: Optional[StateMap[str]]
    partial_state: bool
@@ -380,26 +386,16 @@ class UnpersistedEventContext(UnpersistedEventContextBase):

        events_and_persisted_context = []
        for event, unpersisted_context in amended_events_and_context:
-            if event.is_state():
-                context = EventContext(
-                    storage=unpersisted_context._storage,
-                    state_group=unpersisted_context.state_group_after_event,
-                    state_group_before_event=unpersisted_context.state_group_before_event,
-                    state_delta_due_to_event=unpersisted_context.state_delta_due_to_event,
-                    partial_state=unpersisted_context.partial_state,
-                    prev_group=unpersisted_context.state_group_before_event,
-                    delta_ids=unpersisted_context.state_delta_due_to_event,
-                )
-            else:
-                context = EventContext(
-                    storage=unpersisted_context._storage,
-                    state_group=unpersisted_context.state_group_after_event,
-                    state_group_before_event=unpersisted_context.state_group_before_event,
-                    state_delta_due_to_event=unpersisted_context.state_delta_due_to_event,
-                    partial_state=unpersisted_context.partial_state,
-                    prev_group=unpersisted_context.prev_group_for_state_group_before_event,
-                    delta_ids=unpersisted_context.delta_ids_to_state_group_before_event,
-                )
+            state_group_deltas = unpersisted_context._build_state_group_deltas()
+
+            context = EventContext(
+                storage=unpersisted_context._storage,
+                state_group=unpersisted_context.state_group_after_event,
+                state_group_before_event=unpersisted_context.state_group_before_event,
+                state_delta_due_to_event=unpersisted_context.state_delta_due_to_event,
+                partial_state=unpersisted_context.partial_state,
+                state_group_deltas=state_group_deltas,
+            )
            events_and_persisted_context.append((event, context))
        return events_and_persisted_context

@@ -452,11 +448,11 @@ class UnpersistedEventContext(UnpersistedEventContextBase):

        # if the event isn't a state event the state group doesn't change
        if not self.state_delta_due_to_event:
-            state_group_after_event = self.state_group_before_event
+            self.state_group_after_event = self.state_group_before_event

        # otherwise if it is a state event we need to get a state group for it
        else:
-            state_group_after_event = await self._storage.state.store_state_group(
+            self.state_group_after_event = await self._storage.state.store_state_group(
                event.event_id,
                event.room_id,
                prev_group=self.state_group_before_event,
@@ -464,16 +460,81 @@ class UnpersistedEventContext(UnpersistedEventContextBase):
                current_state_ids=None,
            )

+        state_group_deltas = self._build_state_group_deltas()
+
        return EventContext.with_state(
            storage=self._storage,
-            state_group=state_group_after_event,
+            state_group=self.state_group_after_event,
            state_group_before_event=self.state_group_before_event,
            state_delta_due_to_event=self.state_delta_due_to_event,
+            state_group_deltas=state_group_deltas,
            partial_state=self.partial_state,
-            prev_group=self.state_group_before_event,
-            delta_ids=self.state_delta_due_to_event,
        )

+    def _build_state_group_deltas(self) -> Dict[Tuple[int, int], StateMap]:
+        """
+        Collect deltas between the state groups associated with this context
+        """
+        state_group_deltas = {}
+
+        # if we know the state group before the event and after the event, add them and the
+        # state delta between them to state_group_deltas
+        if self.state_group_before_event and self.state_group_after_event:
+            # if we have the state groups we should have the delta
+            assert self.state_delta_due_to_event is not None
+            state_group_deltas[
+                (
+                    self.state_group_before_event,
+                    self.state_group_after_event,
+                )
+            ] = self.state_delta_due_to_event
+
+        # the state group before the event may also have a state group which precedes it, if
+        # we have that and the state group before the event, add them and the state
+        # delta between them to state_group_deltas
+        if (
+            self.prev_group_for_state_group_before_event
+            and self.state_group_before_event
+        ):
+            # if we have both state groups we should have the delta between them
+            assert self.delta_ids_to_state_group_before_event is not None
+            state_group_deltas[
+                (
+                    self.prev_group_for_state_group_before_event,
+                    self.state_group_before_event,
+                )
+            ] = self.delta_ids_to_state_group_before_event
+
+        return state_group_deltas
+
+
+def _encode_state_group_delta(
+    state_group_delta: Dict[Tuple[int, int], StateMap[str]]
+) -> List[Tuple[int, int, Optional[List[Tuple[str, str, str]]]]]:
+    if not state_group_delta:
+        return []
+
+    state_group_delta_encoded = []
+    for key, value in state_group_delta.items():
+        state_group_delta_encoded.append((key[0], key[1], _encode_state_dict(value)))
+
+    return state_group_delta_encoded
+
+
+def _decode_state_group_delta(
+    input: List[Tuple[int, int, List[Tuple[str, str, str]]]]
+) -> Dict[Tuple[int, int], StateMap[str]]:
+    if not input:
+        return {}
+
+    state_group_deltas = {}
+    for state_group_1, state_group_2, state_dict in input:
+        state_map = _decode_state_dict(state_dict)
+        assert state_map is not None
+        state_group_deltas[(state_group_1, state_group_2)] = state_map
+
+    return state_group_deltas
+

 def _encode_state_dict(
    state_dict: Optional[StateMap[str]],
@@ -260,7 +260,9 @@ class FederationClient(FederationBase):
        use_unstable = False
        for user_id, one_time_keys in query.items():
            for device_id, algorithms in one_time_keys.items():
-                if any(count > 1 for count in algorithms.values()):
+                # If more than one algorithm is requested, attempt to use the unstable
+                # endpoint.
+                if sum(algorithms.values()) > 1:
                    use_unstable = True
                if algorithms:
                    # For the stable query, choose only the first algorithm.
@@ -296,6 +298,7 @@ class FederationClient(FederationBase):
        else:
            logger.debug("Skipping unstable claim client keys API")

+        # TODO Potentially attempt multiple queries and combine the results?
        return await self.transport_layer.claim_client_keys(
            user, destination, content, timeout
        )
@@ -1016,7 +1016,9 @@ class FederationServer(FederationBase):
            for user_id, device_keys in result.items():
                for device_id, keys in device_keys.items():
                    for key_id, key in keys.items():
-                        json_result.setdefault(user_id, {})[device_id] = {key_id: key}
+                        json_result.setdefault(user_id, {}).setdefault(device_id, {})[
+                            key_id
+                        ] = key

        logger.info(
            "Claimed one-time-keys: %s",
@@ -40,6 +40,11 @@ if TYPE_CHECKING:

 logger = logging.getLogger(__name__)

+# How many single event gaps we tolerate returning in a `/messages` response before we
+# backfill and try to fill in the history. This is an arbitrarily picked number so feel
+# free to tune it in the future.
+BACKFILL_BECAUSE_TOO_MANY_GAPS_THRESHOLD = 3
+

@attr.s(slots=True, auto_attribs=True)
 class PurgeStatus:
@@ -486,35 +491,35 @@ class PaginationHandler:
                        room_id, room_token.stream
                    )

-                if not use_admin_priviledge and membership == Membership.LEAVE:
-                    # If they have left the room then clamp the token to be before
-                    # they left the room, to save the effort of loading from the
-                    # database.
+            # If they have left the room then clamp the token to be before
+            # they left the room, to save the effort of loading from the
+            # database.
+            if (
+                pagin_config.direction == Direction.BACKWARDS
+                and not use_admin_priviledge
+                and membership == Membership.LEAVE
+            ):
+                # This is only None if the room is world_readable, in which case
+                # "Membership.JOIN" would have been returned and we should never hit
+                # this branch.
+                assert member_event_id

-                    # This is only None if the room is world_readable, in which
-                    # case "JOIN" would have been returned.
-                    assert member_event_id
-
-                    leave_token = await self.store.get_topological_token_for_event(
-                        member_event_id
-                    )
-                    assert leave_token.topological is not None
-
-                    if leave_token.topological < curr_topo:
-                        from_token = from_token.copy_and_replace(
-                            StreamKeyType.ROOM, leave_token
-                        )
-
-                await self.hs.get_federation_handler().maybe_backfill(
-                    room_id,
-                    curr_topo,
-                    limit=pagin_config.limit,
+                leave_token = await self.store.get_topological_token_for_event(
+                    member_event_id
                )
+                assert leave_token.topological is not None
+
+                if leave_token.topological < curr_topo:
+                    from_token = from_token.copy_and_replace(
+                        StreamKeyType.ROOM, leave_token
+                    )

            to_room_key = None
            if pagin_config.to_token:
                to_room_key = pagin_config.to_token.room_key

+            # Initially fetch the events from the database. With any luck, we can return
+            # these without blocking on backfill (handled below).
            events, next_key = await self.store.paginate_room_events(
                room_id=room_id,
                from_key=from_token.room_key,
@@ -524,6 +529,94 @@ class PaginationHandler:
                event_filter=event_filter,
            )

+            if pagin_config.direction == Direction.BACKWARDS:
+                # We use a `Set` because there can be multiple events at a given depth
+                # and we only care about looking at the unique continum of depths to
+                # find gaps.
+                event_depths: Set[int] = {event.depth for event in events}
+                sorted_event_depths = sorted(event_depths)
+
+                # Inspect the depths of the returned events to see if there are any gaps
+                found_big_gap = False
+                number_of_gaps = 0
+                previous_event_depth = (
+                    sorted_event_depths[0] if len(sorted_event_depths) > 0 else 0
+                )
+                for event_depth in sorted_event_depths:
+                    # We don't expect a negative depth but we'll just deal with it in
+                    # any case by taking the absolute value to get the true gap between
+                    # any two integers.
+                    depth_gap = abs(event_depth - previous_event_depth)
+                    # A `depth_gap` of 1 is a normal continuous chain to the next event
+                    # (1 <-- 2 <-- 3) so anything larger indicates a missing event (it's
+                    # also possible there is no event at a given depth but we can't ever
+                    # know that for sure)
+                    if depth_gap > 1:
+                        number_of_gaps += 1
+
+                    # We only tolerate a small number single-event long gaps in the
+                    # returned events because those are most likely just events we've
+                    # failed to pull in the past. Anything longer than that is probably
+                    # a sign that we're missing a decent chunk of history and we should
+                    # try to backfill it.
+                    #
+                    # XXX: It's possible we could tolerate longer gaps if we checked
+                    # that a given events `prev_events` is one that has failed pull
+                    # attempts and we could just treat it like a dead branch of history
+                    # for now or at least something that we don't need the block the
+                    # client on to try pulling.
+                    #
+                    # XXX: If we had something like MSC3871 to indicate gaps in the
+                    # timeline to the client, we could also get away with any sized gap
+                    # and just have the client refetch the holes as they see fit.
+                    if depth_gap > 2:
+                        found_big_gap = True
+                        break
+                    previous_event_depth = event_depth
+
+                # Backfill in the foreground if we found a big gap, have too many holes,
+                # or we don't have enough events to fill the limit that the client asked
+                # for.
+                missing_too_many_events = (
+                    number_of_gaps > BACKFILL_BECAUSE_TOO_MANY_GAPS_THRESHOLD
+                )
+                not_enough_events_to_fill_response = len(events) < pagin_config.limit
+                if (
+                    found_big_gap
+                    or missing_too_many_events
+                    or not_enough_events_to_fill_response
+                ):
+                    did_backfill = (
+                        await self.hs.get_federation_handler().maybe_backfill(
+                            room_id,
+                            curr_topo,
+                            limit=pagin_config.limit,
+                        )
+                    )
+
+                    # If we did backfill something, refetch the events from the database to
+                    # catch anything new that might have been added since we last fetched.
+                    if did_backfill:
+                        events, next_key = await self.store.paginate_room_events(
+                            room_id=room_id,
+                            from_key=from_token.room_key,
+                            to_key=to_room_key,
+                            direction=pagin_config.direction,
+                            limit=pagin_config.limit,
+                            event_filter=event_filter,
+                        )
+                else:
+                    # Otherwise, we can backfill in the background for eventual
+                    # consistency's sake but we don't need to block the client waiting
+                    # for a costly federation call and processing.
+                    run_as_background_process(
+                        "maybe_backfill_in_the_background",
+                        self.hs.get_federation_handler().maybe_backfill,
+                        room_id,
+                        curr_topo,
+                        limit=pagin_config.limit,
+                    )
+
            next_token = from_token.copy_and_replace(StreamKeyType.ROOM, next_key)

        # if no events are returned from pagination, that implies
@@ -95,8 +95,6 @@ incoming_responses_counter = Counter(
 )


-MAX_LONG_RETRIES = 10
-MAX_SHORT_RETRIES = 3
 MAXINT = sys.maxsize


@@ -406,7 +404,12 @@ class MatrixFederationHttpClient:
        self.clock = hs.get_clock()
        self._store = hs.get_datastores().main
        self.version_string_bytes = hs.version_string.encode("ascii")
-        self.default_timeout = 60
+        self.default_timeout = hs.config.federation.client_timeout / 1000
+
+        self.max_long_retry_delay = hs.config.federation.max_long_retry_delay / 1000
+        self.max_short_retry_delay = hs.config.federation.max_short_retry_delay / 1000
+        self.max_long_retries = hs.config.federation.max_long_retries
+        self.max_short_retries = hs.config.federation.max_short_retries

        self._cooperator = Cooperator(scheduler=_make_scheduler(self.reactor))

@@ -535,10 +538,10 @@ class MatrixFederationHttpClient:
            logger.exception(f"Invalid destination: {request.destination}.")
            raise FederationDeniedError(request.destination)

-        if timeout:
-            _sec_timeout = timeout / 1000
-        else:
-            _sec_timeout = self.default_timeout
+        if timeout is None:
+            timeout = int(self.default_timeout)
+
+        _sec_timeout = timeout / 1000

        if (
            self.hs.config.federation.federation_domain_whitelist is not None
@@ -583,9 +586,9 @@ class MatrixFederationHttpClient:
            # XXX: Would be much nicer to retry only at the transaction-layer
            # (once we have reliable transactions in place)
            if long_retries:
-                retries_left = MAX_LONG_RETRIES
+                retries_left = self.max_long_retries
            else:
-                retries_left = MAX_SHORT_RETRIES
+                retries_left = self.max_short_retries

            url_bytes = request.uri
            url_str = url_bytes.decode("ascii")
@@ -730,12 +733,12 @@ class MatrixFederationHttpClient:

                    if retries_left and not timeout:
                        if long_retries:
-                            delay = 4 ** (MAX_LONG_RETRIES + 1 - retries_left)
-                            delay = min(delay, 60)
+                            delay = 4 ** (self.max_long_retries + 1 - retries_left)
+                            delay = min(delay, self.max_long_retry_delay)
                            delay *= random.uniform(0.8, 1.4)
                        else:
-                            delay = 0.5 * 2 ** (MAX_SHORT_RETRIES - retries_left)
-                            delay = min(delay, 2)
+                            delay = 0.5 * 2 ** (self.max_short_retries - retries_left)
+                            delay = min(delay, self.max_short_retry_delay)
                            delay *= random.uniform(0.8, 1.4)

                        logger.debug(
@@ -943,10 +946,9 @@ class MatrixFederationHttpClient:
            timeout=timeout,
        )

-        if timeout is not None:
-            _sec_timeout = timeout / 1000
-        else:
-            _sec_timeout = self.default_timeout
+        if timeout is None:
+            timeout = int(self.default_timeout)
+        _sec_timeout = timeout / 1000

        if parser is None:
            parser = cast(ByteParser[T], JsonParser())
@@ -1132,10 +1134,9 @@ class MatrixFederationHttpClient:
            timeout=timeout,
        )

-        if timeout is not None:
-            _sec_timeout = timeout / 1000
-        else:
-            _sec_timeout = self.default_timeout
+        if timeout is None:
+            timeout = int(self.default_timeout)
+        _sec_timeout = timeout / 1000

        if parser is None:
            parser = cast(ByteParser[T], JsonParser())
@@ -1208,10 +1209,9 @@ class MatrixFederationHttpClient:
            ignore_backoff=ignore_backoff,
        )

-        if timeout is not None:
-            _sec_timeout = timeout / 1000
-        else:
-            _sec_timeout = self.default_timeout
+        if timeout is None:
+            timeout = int(self.default_timeout)
+        _sec_timeout = timeout / 1000

        body = await _handle_response(
            self.reactor, _sec_timeout, request, response, start_ms, parser=JsonParser()
@@ -839,9 +839,8 @@ class EventsPersistenceStorageController:
                        "group" % (ev.event_id,)
                    )
                continue
-
-            if ctx.prev_group:
-                state_group_deltas[(ctx.prev_group, ctx.state_group)] = ctx.delta_ids
+            if ctx.state_group_deltas:
+                state_group_deltas.update(ctx.state_group_deltas)

        # We need to map the event_ids to their state groups. First, let's
        # check if the event is one we're persisting, in which case we can
@@ -177,7 +177,7 @@ class Requester:
            user=UserID.from_string(input["user_id"]),
            access_token_id=input["access_token_id"],
            is_guest=input["is_guest"],
-            scope=set(input.get("scope", [])),
+            scope=set(input["scope"]),
            shadow_banned=input["shadow_banned"],
            device_id=input["device_id"],
            app_service=appservice,
@@ -101,8 +101,7 @@ class TestEventContext(unittest.HomeserverTestCase):
        self.assertEqual(
            context.state_group_before_event, d_context.state_group_before_event
        )
-        self.assertEqual(context.prev_group, d_context.prev_group)
-        self.assertEqual(context.delta_ids, d_context.delta_ids)
+        self.assertEqual(context.state_group_deltas, d_context.state_group_deltas)
        self.assertEqual(context.app_service, d_context.app_service)

        self.assertEqual(
@@ -40,7 +40,7 @@ from synapse.server import HomeServer
 from synapse.util import Clock

 from tests.server import FakeTransport
-from tests.unittest import HomeserverTestCase
+from tests.unittest import HomeserverTestCase, override_config


 def check_logcontext(context: LoggingContextOrSentinel) -> None:
@@ -640,3 +640,21 @@ class FederationClientTests(HomeserverTestCase):
            self.cl.build_auth_headers(
                b"", b"GET", b"https://example.com", destination_is=b""
            )
+
+    @override_config(
+        {
+            "federation": {
+                "client_timeout": "180s",
+                "max_long_retry_delay": "100s",
+                "max_short_retry_delay": "7s",
+                "max_long_retries": 20,
+                "max_short_retries": 5,
+            }
+        }
+    )
+    def test_configurable_retry_and_delay_values(self) -> None:
+        self.assertEqual(self.cl.default_timeout, 180)
+        self.assertEqual(self.cl.max_long_retry_delay, 100)
+        self.assertEqual(self.cl.max_short_retry_delay, 7)
+        self.assertEqual(self.cl.max_long_retries, 20)
+        self.assertEqual(self.cl.max_short_retries, 5)
@@ -401,7 +401,10 @@ class EventChainStoreTestCase(HomeserverTestCase):
            assert persist_events_store is not None
            persist_events_store._store_event_txn(
                txn,
-                [(e, EventContext(self.hs.get_storage_controllers())) for e in events],
+                [
+                    (e, EventContext(self.hs.get_storage_controllers(), {}))
+                    for e in events
+                ],
            )

            # Actually call the function that calculates the auth chain stuff.
@@ -555,10 +555,15 @@ class StateTestCase(unittest.TestCase):
            (e.event_id for e in old_state + [event]), current_state_ids.values()
        )

-        self.assertIsNotNone(context.state_group_before_event)
+        assert context.state_group_before_event is not None
+        assert context.state_group is not None
+        self.assertEqual(
+            context.state_group_deltas.get(
+                (context.state_group_before_event, context.state_group)
+            ),
+            {(event.type, event.state_key): event.event_id},
+        )
        self.assertNotEqual(context.state_group_before_event, context.state_group)
-        self.assertEqual(context.state_group_before_event, context.prev_group)
-        self.assertEqual({("state", ""): event.event_id}, context.delta_ids)

    @defer.inlineCallbacks
    def test_trivial_annotate_message(
Author	SHA1	Message	Date
Mathieu Velten	cec52ea9b0	Add changelog	2023-06-14 00:50:53 +02:00
Mathieu Velten	0cb8502bbb	Use parse_duration for newly introduced options	2023-06-13 23:55:15 +02:00
Shay	553f2f53e7	Replace `EventContext` fields `prev_group` and `delta_ids` with field `state_group_deltas` (#15233 )	2023-06-13 13:22:06 -07:00
Mathieu Velten	59ec4a0dc1	Fix MSC3983 support: only one OTK per device was returned through federation (#15770 )	2023-06-13 19:51:47 +02:00
Eric Eastwood	0757d59ec4	Avoid backfill when we already have messages to return (#15737 ) We now only block the client to backfill when we see a large gap in the events (more than 2 events missing in a row according to `depth`), more than 3 single-event holes, or not enough messages to fill the response. Otherwise, we return the messages directly to the client and backfill in the background for eventual consistency sake. Fix https://github.com/matrix-org/synapse/issues/15696	2023-06-13 12:31:08 -05:00
Patrick Cloke	df945e0d7c	Fix MSC3983 support: Use the unstable /keys/claim federation endpoint if multiple keys are requested (#15755 )	2023-06-13 18:07:55 +02:00
				`@@ -0,0 +1 @@`
				`Allow for the configuration of max request retries and min/max retry delays in the matrix federation client.`
				`@@ -0,0 +1 @@`
				`Log when events are (maybe unexpectedly) filtered out of responses in tests.`
				`@@ -0,0 +1 @@`
				Replace `EventContext` fields `prev_group` and `delta_ids` with field `state_group_deltas`.
				`@@ -0,0 +1 @@`
				`Stable support for [MSC3882](https://github.com/matrix-org/matrix-spec-proposals/pull/3882) to allow an existing device/session to generate a login token for use on a new device/session.`
				`@@ -0,0 +1 @@`
				`Support resolving a room's [canonical alias](https://spec.matrix.org/v1.7/client-server-api/#mroomcanonical_alias) via the module API.`
				`@@ -0,0 +1 @@`
				`Enable support for [MSC3952](https://github.com/matrix-org/matrix-spec-proposals/pull/3952): intentional mentions.`
				`@@ -0,0 +1 @@`
				`Experimental [MSC3861](https://github.com/matrix-org/matrix-spec-proposals/pull/3861) support: delegate auth to an OIDC provider.`
				`@@ -0,0 +1 @@`
				`Correctly clear caches when we delete a room.`
				`@@ -0,0 +1 @@`
				Read from column `full_user_id` rather than `user_id` of tables `profiles` and `user_filters`.