dkjfhsdklfhsdlkjf

Pass instance name through to rdata
fixup! Thread through instance name to replication client
2020-03-25 14:55:02 +00:00 · 2020-03-25 14:05:53 +00:00 · 2020-03-25 11:41:38 +00:00 · 2020-03-25 11:34:56 +00:00 · 2020-03-25 11:34:43 +00:00 · 2020-03-25 11:34:10 +00:00
79 changed files with 2138 additions and 1725 deletions
--- a/changelog.d/6988.doc
+++ b/changelog.d/6988.doc
@@ -0,0 +1 @@
+Improve the documentation for database configuration.
--- a/changelog.d/7009.feature
+++ b/changelog.d/7009.feature
@@ -0,0 +1 @@
+Set `Referrer-Policy` header to `no-referrer` on media downloads.
--- a/changelog.d/7010.misc
+++ b/changelog.d/7010.misc
@@ -0,0 +1 @@
+Change device list streams to have one row per ID.
--- a/changelog.d/7011.misc
+++ b/changelog.d/7011.misc
@@ -0,0 +1 @@
+Remove concept of a non-limited stream.
--- a/changelog.d/7024.misc
+++ b/changelog.d/7024.misc
@@ -0,0 +1 @@
+Move catchup of replication streams logic to worker.
--- a/changelog.d/7089.bugfix
+++ b/changelog.d/7089.bugfix
@@ -0,0 +1 @@
+Fix a bug in the federation API which could cause occasional "Failed to get PDU" errors.
--- a/changelog.d/7110.misc
+++ b/changelog.d/7110.misc
@@ -0,0 +1 @@
+Convert some of synapse.rest.media to async/await.
--- a/changelog.d/7115.misc
+++ b/changelog.d/7115.misc
@@ -0,0 +1 @@
+De-duplicate / remove unused REST code for login and auth.
--- a/changelog.d/7116.misc
+++ b/changelog.d/7116.misc
@@ -0,0 +1 @@
+Convert `*StreamRow` classes to inner classes.
--- a/changelog.d/7117.bugfix
+++ b/changelog.d/7117.bugfix
@@ -0,0 +1 @@
+Fix a bug which meant that groups updates were not correctly replicated between workers.
--- a/docs/postgres.md
+++ b/docs/postgres.md
@@ -72,8 +72,7 @@ underneath the database, or if a different version of the locale is used on any
 replicas.

 The safest way to fix the issue is to take a dump and recreate the database with
-the correct `COLLATE` and `CTYPE` parameters (as per
-[docs/postgres.md](docs/postgres.md)). It is also possible to change the
+the correct `COLLATE` and `CTYPE` parameters (as shown above). It is also possible to change the
 parameters on a live database and run a `REINDEX` on the entire database,
 however extreme care must be taken to avoid database corruption.

@@ -105,19 +104,41 @@ of free memory the database host has available.
 When you are ready to start using PostgreSQL, edit the `database`
 section in your config file to match the following lines:

-    database:
-        name: psycopg2
-        args:
-            user: <user>
-            password: <pass>
-            database: <db>
-            host: <host>
-            cp_min: 5
-            cp_max: 10
+```yaml
+database:
+  name: psycopg2
+  args:
+    user: <user>
+    password: <pass>
+    database: <db>
+    host: <host>
+    cp_min: 5
+    cp_max: 10
+```

 All key, values in `args` are passed to the `psycopg2.connect(..)`
 function, except keys beginning with `cp_`, which are consumed by the
-twisted adbapi connection pool.
+twisted adbapi connection pool. See the [libpq
+documentation](https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS)
+for a list of options which can be passed.
+
+You should consider tuning the `args.keepalives_*` options if there is any danger of
+the connection between your homeserver and database dropping, otherwise Synapse
+may block for an extended period while it waits for a response from the
+database server. Example values might be:
+
+```yaml
+# seconds of inactivity after which TCP should send a keepalive message to the server
+keepalives_idle: 10
+
+# the number of seconds after which a TCP keepalive message that is not
+# acknowledged by the server should be retransmitted
+keepalives_interval: 10
+
+# the number of TCP keepalives that can be lost before the client's connection
+# to the server is considered dead
+keepalives_count: 3
+```

 ## Porting from SQLite

--- a/docs/sample_config.yaml
+++ b/docs/sample_config.yaml
@@ -578,13 +578,46 @@ acme:

 ## Database ##

+# The 'database' setting defines the database that synapse uses to store all of
+# its data.
+#
+# 'name' gives the database engine to use: either 'sqlite3' (for SQLite) or
+# 'psycopg2' (for PostgreSQL).
+#
+# 'args' gives options which are passed through to the database engine,
+# except for options starting 'cp_', which are used to configure the Twisted
+# connection pool. For a reference to valid arguments, see:
+#   * for sqlite: https://docs.python.org/3/library/sqlite3.html#sqlite3.connect
+#   * for postgres: https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS
+#   * for the connection pool: https://twistedmatrix.com/documents/current/api/twisted.enterprise.adbapi.ConnectionPool.html#__init__
+#
+#
+# Example SQLite configuration:
+#
+#database:
+#  name: sqlite3
+#  args:
+#    database: /path/to/homeserver.db
+#
+#
+# Example Postgres configuration:
+#
+#database:
+#  name: psycopg2
+#  args:
+#    user: synapse
+#    password: secretpassword
+#    database: synapse
+#    host: localhost
+#    cp_min: 5
+#    cp_max: 10
+#
+# For more information on using Synapse with Postgres, see `docs/postgres.md`.
+#
 database:
-  # The database engine name
-  name: "sqlite3"
-  # Arguments to pass to the engine
+  name: sqlite3
  args:
-    # Path to the database
-    database: "DATADIR/homeserver.db"
+    database: DATADIR/homeserver.db

 # Number of events to cache in memory.
 #
--- a/docs/tcp_replication.md
+++ b/docs/tcp_replication.md
@@ -14,16 +14,16 @@ example flow would be (where '>' indicates master to worker and
 '<' worker to master flows):

    > SERVER example.com
-    < REPLICATE events 53
+    < REPLICATE
+    > POSITION events 53
    > RDATA events 54 ["$foo1:bar.com", ...]
    > RDATA events 55 ["$foo4:bar.com", ...]

-The example shows the server accepting a new connection and sending its
-identity with the `SERVER` command, followed by the client asking to
-subscribe to the `events` stream from the token `53`. The server then
-periodically sends `RDATA` commands which have the format
-`RDATA <stream_name> <token> <row>`, where the format of `<row>` is
-defined by the individual streams.
+The example shows the server accepting a new connection and sending its identity
+with the `SERVER` command, followed by the client server to respond with the
+position of all streams. The server then periodically sends `RDATA` commands
+which have the format `RDATA <stream_name> <token> <row>`, where the format of
+`<row>` is defined by the individual streams.

 Error reporting happens by either the client or server sending an ERROR
 command, and usually the connection will be closed.
@@ -32,9 +32,6 @@ Since the protocol is a simple line based, its possible to manually
 connect to the server using a tool like netcat. A few things should be
 noted when manually using the protocol:

-   When subscribing to a stream using `REPLICATE`, the special token
-    `NOW` can be used to get all future updates. The special stream name
-    `ALL` can be used with `NOW` to subscribe to all available streams.
 -   The federation stream is only available if federation sending has
    been disabled on the main process.
 -   The server will only time connections out that have sent a `PING`
@@ -91,9 +88,7 @@ The client:
 -   Sends a `NAME` command, allowing the server to associate a human
    friendly name with the connection. This is optional.
 -   Sends a `PING` as above
-   For each stream the client wishes to subscribe to it sends a
-    `REPLICATE` with the `stream_name` and token it wants to subscribe
-    from.
+-   Sends a `REPLICATE` to get the current position of all streams.
 -   On receipt of a `SERVER` command, checks that the server name
    matches the expected server name.

@@ -140,9 +135,7 @@ the wire:
    > PING 1490197665618
    < NAME synapse.app.appservice
    < PING 1490197665618
-    < REPLICATE events 1
-    < REPLICATE backfill 1
-    < REPLICATE caches 1
+    < REPLICATE
    > POSITION events 1
    > POSITION backfill 1
    > POSITION caches 1
@@ -181,9 +174,9 @@ client (C):

 #### POSITION (S)

-   The position of the stream has been updated. Sent to the client
-    after all missing updates for a stream have been sent to the client
-    and they're now up to date.
+   On receipt of a POSITION command clients should check if they have missed any
+   updates, and if so then fetch them out of band. Sent in response to a
+   REPLICATE command (but can happen at any time).

 #### ERROR (S, C)

@@ -199,25 +192,16 @@ client (C):

 #### REPLICATE (C)

-Asks the server to replicate a given stream. The syntax is:
-
-```
-    REPLICATE <stream_name> <token>
-```
-
-Where `<token>` may be either:
- * a numeric stream_id to stream updates since (exclusive)
- * `NOW` to stream all subsequent updates.
-
-The `<stream_name>` is the name of a replication stream to subscribe
-to (see [here](../synapse/replication/tcp/streams/_base.py) for a list
-of streams). It can also be `ALL` to subscribe to all known streams,
-in which case the `<token>` must be set to `NOW`.
+Asks the server for the current position of all streams.

 #### USER_SYNC (C)

   A user has started or stopped syncing

+#### CLEAR_USER_SYNC (C)
+
+   The server should clear all associated user sync data from the worker.
+
 #### FEDERATION_ACK (C)

   Acknowledge receipt of some federation data
--- a/mypy.ini
+++ b/mypy.ini
@@ -75,3 +75,6 @@ ignore_missing_imports = True

 [mypy-jwt.*]
 ignore_missing_imports = True
+
+[mypy-txredisapi]
+ignore_missing_imports = True
--- a/synapse/app/generic_worker.py
+++ b/synapse/app/generic_worker.py
@@ -45,6 +45,7 @@ from synapse.http.site import SynapseSite
 from synapse.logging.context import LoggingContext, run_in_background
 from synapse.metrics import METRICS_PREFIX, MetricsResource, RegistryProxy
 from synapse.metrics.background_process_metrics import run_as_background_process
+from synapse.replication.http import REPLICATION_PREFIX, ReplicationRestResource
 from synapse.replication.slave.storage._base import BaseSlavedStore, __func__
 from synapse.replication.slave.storage.account_data import SlavedAccountDataStore
 from synapse.replication.slave.storage.appservice import SlavedApplicationServiceStore
@@ -64,13 +65,25 @@ from synapse.replication.slave.storage.receipts import SlavedReceiptsStore
 from synapse.replication.slave.storage.registration import SlavedRegistrationStore
 from synapse.replication.slave.storage.room import RoomStore
 from synapse.replication.slave.storage.transactions import SlavedTransactionStore
-from synapse.replication.tcp.client import ReplicationClientHandler
-from synapse.replication.tcp.streams._base import (
+from synapse.replication.tcp.client import ReplicationClientFactory
+from synapse.replication.tcp.commands import ClearUserSyncsCommand
+from synapse.replication.tcp.handler import ReplicationDataHandler
+from synapse.replication.tcp.streams import (
+    AccountDataStream,
    DeviceListsStream,
+    GroupServerStream,
+    PresenceStream,
+    PushersStream,
+    PushRulesStream,
    ReceiptsStream,
+    TagAccountDataStream,
    ToDeviceStream,
 )
-from synapse.replication.tcp.streams.events import EventsStreamEventRow, EventsStreamRow
+from synapse.replication.tcp.streams.events import (
+    EventsStream,
+    EventsStreamEventRow,
+    EventsStreamRow,
+)
 from synapse.rest.admin import register_servlets_for_media_repo
 from synapse.rest.client.v1 import events
 from synapse.rest.client.v1.initial_sync import InitialSyncRestServlet
@@ -93,6 +106,7 @@ from synapse.rest.client.v1.room import (
    RoomSendEventRestServlet,
    RoomStateEventRestServlet,
    RoomStateRestServlet,
+    RoomTypingRestServlet,
 )
 from synapse.rest.client.v1.voip import VoipRestServlet
 from synapse.rest.client.v2_alpha import groups, sync, user_directory
@@ -113,7 +127,6 @@ from synapse.types import ReadReceipt
 from synapse.util.async_helpers import Linearizer
 from synapse.util.httpresourcetree import create_resource_tree
 from synapse.util.manhole import manhole
-from synapse.util.stringutils import random_string
 from synapse.util.versionstring import get_version_string

 logger = logging.getLogger("synapse.app.generic_worker")
@@ -222,6 +235,7 @@ class GenericWorkerPresence(object):
        self.user_to_num_current_syncs = {}
        self.clock = hs.get_clock()
        self.notifier = hs.get_notifier()
+        self.instance_id = hs.get_instance_id()

        active_presence = self.store.take_presence_startup_info()
        self.user_to_current_state = {state.user_id: state for state in active_presence}
@@ -234,13 +248,24 @@ class GenericWorkerPresence(object):
            self.send_stop_syncing, UPDATE_SYNCING_USERS_MS
        )

-        self.process_id = random_string(16)
-        logger.info("Presence process_id is %r", self.process_id)
+        hs.get_reactor().addSystemEventTrigger(
+            "before",
+            "shutdown",
+            run_as_background_process,
+            "generic_presence.on_shutdown",
+            self._on_shutdown,
+        )
+
+    def _on_shutdown(self):
+        if self.hs.config.use_presence:
+            self.hs.get_tcp_replication().send_command(
+                ClearUserSyncsCommand(self.instance_id)
+            )

    def send_user_sync(self, user_id, is_syncing, last_sync_ms):
        if self.hs.config.use_presence:
            self.hs.get_tcp_replication().send_user_sync(
-                user_id, is_syncing, last_sync_ms
+                self.instance_id, user_id, is_syncing, last_sync_ms
            )

    def mark_as_coming_online(self, user_id):
@@ -357,40 +382,6 @@ class GenericWorkerPresence(object):
            return set()


-class GenericWorkerTyping(object):
-    def __init__(self, hs):
-        self._latest_room_serial = 0
-        self._reset()
-
-    def _reset(self):
-        """
-        Reset the typing handler's data caches.
-        """
-        # map room IDs to serial numbers
-        self._room_serials = {}
-        # map room IDs to sets of users currently typing
-        self._room_typing = {}
-
-    def stream_positions(self):
-        # We must update this typing token from the response of the previous
-        # sync. In particular, the stream id may "reset" back to zero/a low
-        # value which we *must* use for the next replication request.
-        return {"typing": self._latest_room_serial}
-
-    def process_replication_rows(self, token, rows):
-        if self._latest_room_serial > token:
-            # The master has gone backwards. To prevent inconsistent data, just
-            # clear everything.
-            self._reset()
-
-        # Set the latest serial token to whatever the server gave us.
-        self._latest_room_serial = token
-
-        for row in rows:
-            self._room_serials[row.room_id] = token
-            self._room_typing[row.room_id] = row.user_ids
-
-
 class GenericWorkerSlavedStore(
    # FIXME(#3714): We need to add UserDirectoryStore as we write directly
    # rather than going via the correct worker.
@@ -475,6 +466,7 @@ class GenericWorkerServer(HomeServer):
                    ProfileDisplaynameRestServlet(self).register(resource)
                    ProfileRestServlet(self).register(resource)
                    KeyUploadServlet(self).register(resource)
+                    RoomTypingRestServlet(self).register(resource)

                    sync.register_servlets(self, resource)
                    events.register_servlets(self, resource)
@@ -530,6 +522,9 @@ class GenericWorkerServer(HomeServer):
                if name in ["keys", "federation"]:
                    resources[SERVER_KEY_V2_PREFIX] = KeyApiV2Resource(self)

+                if name == "replication":
+                    resources[REPLICATION_PREFIX] = ReplicationRestResource(self)
+
        root_resource = create_resource_tree(resources, NoResource())

        _base.listen_tcp(
@@ -572,27 +567,35 @@ class GenericWorkerServer(HomeServer):
            else:
                logger.warning("Unrecognized listener type: %s", listener["type"])

-        self.get_tcp_replication().start_replication(self)
+        if self.config.redis.redis_enabled:
+            from synapse.replication.tcp.redis import RedisFactory
+
+            logger.info("Connecting to redis.")
+            factory = RedisFactory(self)
+            self.get_reactor().connectTCP(
+                self.config.redis.redis_host, self.config.redis.redis_port, factory
+            )
+        else:
+            factory = ReplicationClientFactory(self, self.config.worker_name)
+            host = self.config.worker_replication_host
+            port = self.config.worker_replication_port
+            self.get_reactor().connectTCP(host, port, factory)

    def remove_pusher(self, app_id, push_key, user_id):
        self.get_tcp_replication().send_remove_pusher(app_id, push_key, user_id)

-    def build_tcp_replication(self):
-        return GenericWorkerReplicationHandler(self)
-
    def build_presence_handler(self):
        return GenericWorkerPresence(self)

-    def build_typing_handler(self):
-        return GenericWorkerTyping(self)
+    def build_replication_data_handler(self):
+        return GenericWorkerReplicationHandler(self)


-class GenericWorkerReplicationHandler(ReplicationClientHandler):
+class GenericWorkerReplicationHandler(ReplicationDataHandler):
    def __init__(self, hs):
-        super(GenericWorkerReplicationHandler, self).__init__(hs.get_datastore())
+        super().__init__(hs)

        self.store = hs.get_datastore()
-        self.typing_handler = hs.get_typing_handler()
        # NB this is a SynchrotronPresence, not a normal PresenceHandler
        self.presence_handler = hs.get_presence_handler()
        self.notifier = hs.get_notifier()
@@ -601,32 +604,31 @@ class GenericWorkerReplicationHandler(ReplicationClientHandler):
        self.pusher_pool = hs.get_pusherpool()

        if hs.config.send_federation:
-            self.send_handler = FederationSenderHandler(hs, self)
+            self.send_handler = FederationSenderHandler(hs)
        else:
            self.send_handler = None

-    async def on_rdata(self, stream_name, token, rows):
-        await super(GenericWorkerReplicationHandler, self).on_rdata(
-            stream_name, token, rows
+    async def on_rdata(self, stream_name, instance_name, token, rows):
+        await super().on_rdata(stream_name, instance_name, token, rows)
+        run_in_background(
+            self.process_and_notify, stream_name, instance_name, token, rows
        )
-        run_in_background(self.process_and_notify, stream_name, token, rows)

    def get_streams_to_replicate(self):
-        args = super(GenericWorkerReplicationHandler, self).get_streams_to_replicate()
-        args.update(self.typing_handler.stream_positions())
+        args = super().get_streams_to_replicate()
+
        if self.send_handler:
            args.update(self.send_handler.stream_positions())
        return args

-    def get_currently_syncing_users(self):
-        return self.presence_handler.get_currently_syncing_users()
-
-    async def process_and_notify(self, stream_name, token, rows):
+    async def process_and_notify(self, stream_name, instance_name, token, rows):
        try:
            if self.send_handler:
-                self.send_handler.process_replication_rows(stream_name, token, rows)
+                self.send_handler.process_replication_rows(
+                    stream_name, instance_name, token, rows
+                )

-            if stream_name == "events":
+            if stream_name == EventsStream.NAME:
                # We shouldn't get multiple rows per token for events stream, so
                # we don't need to optimise this for multiple rows.
                for row in rows:
@@ -649,43 +651,39 @@ class GenericWorkerReplicationHandler(ReplicationClientHandler):
                    )

                await self.pusher_pool.on_new_notifications(token, token)
-            elif stream_name == "push_rules":
+            elif stream_name == PushRulesStream.NAME:
                self.notifier.on_new_event(
                    "push_rules_key", token, users=[row.user_id for row in rows]
                )
-            elif stream_name in ("account_data", "tag_account_data"):
+            elif stream_name in (AccountDataStream.NAME, TagAccountDataStream.NAME):
                self.notifier.on_new_event(
                    "account_data_key", token, users=[row.user_id for row in rows]
                )
-            elif stream_name == "receipts":
+            elif stream_name == ReceiptsStream.NAME:
                self.notifier.on_new_event(
                    "receipt_key", token, rooms=[row.room_id for row in rows]
                )
                await self.pusher_pool.on_new_receipts(
                    token, token, {row.room_id for row in rows}
                )
-            elif stream_name == "typing":
-                self.typing_handler.process_replication_rows(token, rows)
-                self.notifier.on_new_event(
-                    "typing_key", token, rooms=[row.room_id for row in rows]
-                )
-            elif stream_name == "to_device":
+            elif stream_name == ToDeviceStream.NAME:
                entities = [row.entity for row in rows if row.entity.startswith("@")]
                if entities:
                    self.notifier.on_new_event("to_device_key", token, users=entities)
-            elif stream_name == "device_lists":
+            elif stream_name == DeviceListsStream.NAME:
                all_room_ids = set()
                for row in rows:
-                    room_ids = await self.store.get_rooms_for_user(row.user_id)
-                    all_room_ids.update(room_ids)
+                    if row.entity.startswith("@"):
+                        room_ids = await self.store.get_rooms_for_user(row.entity)
+                        all_room_ids.update(room_ids)
                self.notifier.on_new_event("device_list_key", token, rooms=all_room_ids)
-            elif stream_name == "presence":
+            elif stream_name == PresenceStream.NAME:
                await self.presence_handler.process_replication_rows(token, rows)
-            elif stream_name == "receipts":
+            elif stream_name == GroupServerStream.NAME:
                self.notifier.on_new_event(
                    "groups_key", token, users=[row.user_id for row in rows]
                )
-            elif stream_name == "pushers":
+            elif stream_name == PushersStream.NAME:
                for row in rows:
                    if row.deleted:
                        self.stop_pusher(row.user_id, row.app_id, row.pushkey)
@@ -728,13 +726,14 @@ class FederationSenderHandler(object):
    to the federation sender.
    """

-    def __init__(self, hs: GenericWorkerServer, replication_client):
+    def __init__(self, hs: GenericWorkerServer):
+        self.hs = hs
        self.store = hs.get_datastore()
        self._is_mine_id = hs.is_mine_id
        self.federation_sender = hs.get_federation_sender()
-        self.replication_client = replication_client
+        # self.replication_client = hs.get_tcp_replication()

-        self.federation_position = self.store.federation_out_pos_startup
+        self.federation_position = {"master": self.store.federation_out_pos_startup}
        self._fed_position_linearizer = Linearizer(name="_fed_position_linearizer")

        self._last_ack = self.federation_position
@@ -755,12 +754,12 @@ class FederationSenderHandler(object):
    def stream_positions(self):
        return {"federation": self.federation_position}

-    def process_replication_rows(self, stream_name, token, rows):
+    def process_replication_rows(self, stream_name, instance_name, token, rows):
        # The federation stream contains things that we want to send out, e.g.
        # presence, typing, etc.
        if stream_name == "federation":
            send_queue.process_rows_for_federation(self.federation_sender, rows)
-            run_in_background(self.update_token, token)
+            run_in_background(self.update_token, instance_name, token)

        # We also need to poke the federation sender when new events happen
        elif stream_name == "events":
@@ -774,7 +773,10 @@ class FederationSenderHandler(object):

        # ... as well as device updates and messages
        elif stream_name == DeviceListsStream.NAME:
-            hosts = {row.destination for row in rows}
+            # The entities are either user IDs (starting with '@') whose devices
+            # have changed, or remote servers that we need to tell about
+            # changes.
+            hosts = {row.entity for row in rows if not row.entity.startswith("@")}
            for host in hosts:
                self.federation_sender.send_device_messages(host)

@@ -789,7 +791,7 @@ class FederationSenderHandler(object):
    async def _on_new_receipts(self, rows):
        """
        Args:
-            rows (iterable[synapse.replication.tcp.streams.ReceiptsStreamRow]):
+            rows (Iterable[synapse.replication.tcp.streams.ReceiptsStream.ReceiptsStreamRow]):
                new receipts to be processed
        """
        for receipt in rows:
@@ -805,9 +807,12 @@ class FederationSenderHandler(object):
            )
            await self.federation_sender.send_read_receipt(receipt_info)

-    async def update_token(self, token):
+    async def update_token(self, instance_name, token):
        try:
-            self.federation_position = token
+            self.federation_position[instance_name] = token
+            return
+
+            # FIXME

            # We linearize here to ensure we don't have races updating the token
            with (await self._fed_position_linearizer.queue(None)):
@@ -818,7 +823,7 @@ class FederationSenderHandler(object):

                    # We ACK this token over replication so that the master can drop
                    # its in memory queues
-                    self.replication_client.send_federation_ack(
+                    self.hs.get_tcp_replication().send_federation_ack(
                        self.federation_position
                    )
                    self._last_ack = self.federation_position
@@ -900,6 +905,10 @@ def start(config_options):
        # Force the pushers to start since they will be disabled in the main config
        config.send_federation = True

+    config.server.handle_typing = False
+    if config.worker_app == "synapse.app.client_reader":
+        config.server.handle_typing = True
+
    synapse.events.USE_FROZEN_DICTS = config.use_frozen_dicts

    ss = GenericWorkerServer(
@@ -915,6 +924,8 @@ def start(config_options):
        "before", "startup", _base.start, ss, config.worker_listeners
    )

+    ss.get_replication_streamer()
+
    _base.start_worker_reactor("synapse-generic-worker", config)


--- a/synapse/app/homeserver.py
+++ b/synapse/app/homeserver.py
@@ -263,6 +263,15 @@ class SynapseHomeServer(HomeServer):
    def start_listening(self, listeners):
        config = self.get_config()

+        if config.redis_enabled:
+            from synapse.replication.tcp.redis import RedisFactory
+
+            logger.info("Connecting to redis.")
+            factory = RedisFactory(self)
+            self.get_reactor().connectTCP(
+                self.config.redis.redis_host, self.config.redis.redis_port, factory
+            )
+
        for listener in listeners:
            if listener["type"] == "http":
                self._listening_services.extend(self._listener_http(config, listener))
@@ -282,6 +291,7 @@ class SynapseHomeServer(HomeServer):
                )
                for s in services:
                    reactor.addSystemEventTrigger("before", "shutdown", s.stopListening)
+
            elif listener["type"] == "metrics":
                if not self.get_config().enable_metrics:
                    logger.warning(
--- a/synapse/config/_base.py
+++ b/synapse/config/_base.py
@@ -294,7 +294,6 @@ class RootConfig(object):
        report_stats=None,
        open_private_ports=False,
        listeners=None,
-        database_conf=None,
        tls_certificate_path=None,
        tls_private_key_path=None,
        acme_domain=None,
@@ -367,7 +366,6 @@ class RootConfig(object):
                report_stats=report_stats,
                open_private_ports=open_private_ports,
                listeners=listeners,
-                database_conf=database_conf,
                tls_certificate_path=tls_certificate_path,
                tls_private_key_path=tls_private_key_path,
                acme_domain=acme_domain,
--- a/synapse/config/database.py
+++ b/synapse/config/database.py
@@ -1,5 +1,6 @@
 # -*- coding: utf-8 -*-
 # Copyright 2014-2016 OpenMarket Ltd
+# Copyright 2020 The Matrix.org Foundation C.I.C.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@@ -14,14 +15,60 @@
 # limitations under the License.
 import logging
 import os
-from textwrap import indent
-
-import yaml

 from synapse.config._base import Config, ConfigError

 logger = logging.getLogger(__name__)

+DEFAULT_CONFIG = """\
+## Database ##
+
+# The 'database' setting defines the database that synapse uses to store all of
+# its data.
+#
+# 'name' gives the database engine to use: either 'sqlite3' (for SQLite) or
+# 'psycopg2' (for PostgreSQL).
+#
+# 'args' gives options which are passed through to the database engine,
+# except for options starting 'cp_', which are used to configure the Twisted
+# connection pool. For a reference to valid arguments, see:
+#   * for sqlite: https://docs.python.org/3/library/sqlite3.html#sqlite3.connect
+#   * for postgres: https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS
+#   * for the connection pool: https://twistedmatrix.com/documents/current/api/twisted.enterprise.adbapi.ConnectionPool.html#__init__
+#
+#
+# Example SQLite configuration:
+#
+#database:
+#  name: sqlite3
+#  args:
+#    database: /path/to/homeserver.db
+#
+#
+# Example Postgres configuration:
+#
+#database:
+#  name: psycopg2
+#  args:
+#    user: synapse
+#    password: secretpassword
+#    database: synapse
+#    host: localhost
+#    cp_min: 5
+#    cp_max: 10
+#
+# For more information on using Synapse with Postgres, see `docs/postgres.md`.
+#
+database:
+  name: sqlite3
+  args:
+    database: %(database_path)s
+
+# Number of events to cache in memory.
+#
+#event_cache_size: 10K
+"""
+

 class DatabaseConnectionConfig:
    """Contains the connection config for a particular database.
@@ -36,10 +83,12 @@ class DatabaseConnectionConfig:
    """

    def __init__(self, name: str, db_config: dict):
-        if db_config["name"] not in ("sqlite3", "psycopg2"):
-            raise ConfigError("Unsupported database type %r" % (db_config["name"],))
+        db_engine = db_config.get("name", "sqlite3")

-        if db_config["name"] == "sqlite3":
+        if db_engine not in ("sqlite3", "psycopg2"):
+            raise ConfigError("Unsupported database type %r" % (db_engine,))
+
+        if db_engine == "sqlite3":
            db_config.setdefault("args", {}).update(
                {"cp_min": 1, "cp_max": 1, "check_same_thread": False}
            )
@@ -97,34 +146,10 @@ class DatabaseConfig(Config):

            self.set_databasepath(config.get("database_path"))

-    def generate_config_section(self, data_dir_path, database_conf, **kwargs):
-        if not database_conf:
-            database_path = os.path.join(data_dir_path, "homeserver.db")
-            database_conf = (
-                """# The database engine name
-          name: "sqlite3"
-          # Arguments to pass to the engine
-          args:
-            # Path to the database
-            database: "%(database_path)s"
-            """
-                % locals()
-            )
-        else:
-            database_conf = indent(yaml.dump(database_conf), " " * 10).lstrip()
-
-        return (
-            """\
-        ## Database ##
-
-        database:
-          %(database_conf)s
-        # Number of events to cache in memory.
-        #
-        #event_cache_size: 10K
-        """
-            % locals()
-        )
+    def generate_config_section(self, data_dir_path, **kwargs):
+        return DEFAULT_CONFIG % {
+            "database_path": os.path.join(data_dir_path, "homeserver.db")
+        }

    def read_arguments(self, args):
        self.set_databasepath(args.database_path)
--- a/synapse/config/homeserver.py
+++ b/synapse/config/homeserver.py
@@ -31,6 +31,7 @@ from .password import PasswordConfig
 from .password_auth_providers import PasswordAuthProviderConfig
 from .push import PushConfig
 from .ratelimiting import RatelimitConfig
+from .redis import RedisConfig
 from .registration import RegistrationConfig
 from .repository import ContentRepositoryConfig
 from .room_directory import RoomDirectoryConfig
@@ -82,4 +83,5 @@ class HomeServerConfig(RootConfig):
        RoomDirectoryConfig,
        ThirdPartyRulesConfig,
        TracerConfig,
+        RedisConfig,
    ]
--- a/synapse/config/redis.py
+++ b/synapse/config/redis.py
@@ -0,0 +1,47 @@
+# -*- coding: utf-8 -*-
+# Copyright 2020 The Matrix.org Foundation C.I.C.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from synapse.config._base import Config, ConfigError
+
+try:
+    import txredisapi
+except ImportError:
+    txredisapi = None
+
+
+MISSING_REDIS = """Missing 'txredisapi' library. This is required for redis support.
+
+    Install by running:
+        pip install txredisapi
+"""
+
+
+class RedisConfig(Config):
+    section = "redis"
+
+    def read_config(self, config, **kwargs):
+        redis_config = config.get("redis", {})
+        self.redis_enabled = redis_config.get("enabled", False)
+
+        if not self.redis_enabled:
+            return
+
+        if txredisapi is None:
+            raise ConfigError(MISSING_REDIS)
+
+        self.redis_host = redis_config.get("host", "localhost")
+        self.redis_port = redis_config.get("port", 6379)
+        self.redis_dbid = redis_config.get("dbid")
+        self.redis_password = redis_config.get("password")
--- a/synapse/config/server.py
+++ b/synapse/config/server.py
@@ -83,6 +83,8 @@ class ServerConfig(Config):
        # "disable" federation
        self.send_federation = config.get("send_federation", True)

+        self.handle_typing = config.get("handle_typing", True)
+
        # Whether to enable user presence.
        self.use_presence = config.get("use_presence", True)

--- a/synapse/config/workers.py
+++ b/synapse/config/workers.py
@@ -48,6 +48,8 @@ class WorkerConfig(Config):

        self.worker_main_http_uri = config.get("worker_main_http_uri", None)

+        self.instance_http_map = config.get("instance_http_map", {})
+
        # This option is really only here to support `--manhole` command line
        # argument.
        manhole = config.get("worker_manhole")
--- a/synapse/federation/federation_base.py
+++ b/synapse/federation/federation_base.py
@@ -25,11 +25,7 @@ from twisted.python.failure import Failure

 from synapse.api.constants import MAX_DEPTH, EventTypes, Membership
 from synapse.api.errors import Codes, SynapseError
-from synapse.api.room_versions import (
-    KNOWN_ROOM_VERSIONS,
-    EventFormatVersions,
-    RoomVersion,
-)
+from synapse.api.room_versions import EventFormatVersions, RoomVersion
 from synapse.crypto.event_signing import check_event_content_hash
 from synapse.crypto.keyring import Keyring
 from synapse.events import EventBase, make_event_from_dict
@@ -55,13 +51,15 @@ class FederationBase(object):
        self.store = hs.get_datastore()
        self._clock = hs.get_clock()

-    def _check_sigs_and_hash(self, room_version: str, pdu: EventBase) -> Deferred:
+    def _check_sigs_and_hash(
+        self, room_version: RoomVersion, pdu: EventBase
+    ) -> Deferred:
        return make_deferred_yieldable(
            self._check_sigs_and_hashes(room_version, [pdu])[0]
        )

    def _check_sigs_and_hashes(
-        self, room_version: str, pdus: List[EventBase]
+        self, room_version: RoomVersion, pdus: List[EventBase]
    ) -> List[Deferred]:
        """Checks that each of the received events is correctly signed by the
        sending server.
@@ -146,7 +144,7 @@ class PduToCheckSig(


 def _check_sigs_on_pdus(
-    keyring: Keyring, room_version: str, pdus: Iterable[EventBase]
+    keyring: Keyring, room_version: RoomVersion, pdus: Iterable[EventBase]
 ) -> List[Deferred]:
    """Check that the given events are correctly signed

@@ -191,10 +189,6 @@ def _check_sigs_on_pdus(
        for p in pdus
    ]

-    v = KNOWN_ROOM_VERSIONS.get(room_version)
-    if not v:
-        raise RuntimeError("Unrecognized room version %s" % (room_version,))
-
    # First we check that the sender event is signed by the sender's domain
    # (except if its a 3pid invite, in which case it may be sent by any server)
    pdus_to_check_sender = [p for p in pdus_to_check if not _is_invite_via_3pid(p.pdu)]
@@ -204,7 +198,7 @@ def _check_sigs_on_pdus(
            (
                p.sender_domain,
                p.redacted_pdu_json,
-                p.pdu.origin_server_ts if v.enforce_key_validity else 0,
+                p.pdu.origin_server_ts if room_version.enforce_key_validity else 0,
                p.pdu.event_id,
            )
            for p in pdus_to_check_sender
@@ -227,7 +221,7 @@ def _check_sigs_on_pdus(
    # event id's domain (normally only the case for joins/leaves), and add additional
    # checks. Only do this if the room version has a concept of event ID domain
    # (ie, the room version uses old-style non-hash event IDs).
-    if v.event_format == EventFormatVersions.V1:
+    if room_version.event_format == EventFormatVersions.V1:
        pdus_to_check_event_id = [
            p
            for p in pdus_to_check
@@ -239,7 +233,7 @@ def _check_sigs_on_pdus(
                (
                    get_domain_from_id(p.pdu.event_id),
                    p.redacted_pdu_json,
-                    p.pdu.origin_server_ts if v.enforce_key_validity else 0,
+                    p.pdu.origin_server_ts if room_version.enforce_key_validity else 0,
                    p.pdu.event_id,
                )
                for p in pdus_to_check_event_id
--- a/synapse/federation/federation_client.py
+++ b/synapse/federation/federation_client.py
@@ -220,8 +220,7 @@ class FederationClient(FederationBase):
        # FIXME: We should handle signature failures more gracefully.
        pdus[:] = await make_deferred_yieldable(
            defer.gatherResults(
-                self._check_sigs_and_hashes(room_version.identifier, pdus),
-                consumeErrors=True,
+                self._check_sigs_and_hashes(room_version, pdus), consumeErrors=True,
            ).addErrback(unwrapFirstError)
        )

@@ -291,9 +290,7 @@ class FederationClient(FederationBase):
                    pdu = pdu_list[0]

                    # Check signatures are correct.
-                    signed_pdu = await self._check_sigs_and_hash(
-                        room_version.identifier, pdu
-                    )
+                    signed_pdu = await self._check_sigs_and_hash(room_version, pdu)

                    break

@@ -350,7 +347,7 @@ class FederationClient(FederationBase):
        self,
        origin: str,
        pdus: List[EventBase],
-        room_version: str,
+        room_version: RoomVersion,
        outlier: bool = False,
        include_none: bool = False,
    ) -> List[EventBase]:
@@ -396,7 +393,7 @@ class FederationClient(FederationBase):
                        self.get_pdu(
                            destinations=[pdu.origin],
                            event_id=pdu.event_id,
-                            room_version=room_version,  # type: ignore
+                            room_version=room_version,
                            outlier=outlier,
                            timeout=10000,
                        )
@@ -434,7 +431,7 @@ class FederationClient(FederationBase):
        ]

        signed_auth = await self._check_sigs_and_hash_and_fetch(
-            destination, auth_chain, outlier=True, room_version=room_version.identifier
+            destination, auth_chain, outlier=True, room_version=room_version
        )

        signed_auth.sort(key=lambda e: e.depth)
@@ -661,7 +658,7 @@ class FederationClient(FederationBase):
                destination,
                list(pdus.values()),
                outlier=True,
-                room_version=room_version.identifier,
+                room_version=room_version,
            )

            valid_pdus_map = {p.event_id: p for p in valid_pdus}
@@ -756,7 +753,7 @@ class FederationClient(FederationBase):
        pdu = event_from_pdu_json(pdu_dict, room_version)

        # Check signatures are correct.
-        pdu = await self._check_sigs_and_hash(room_version.identifier, pdu)
+        pdu = await self._check_sigs_and_hash(room_version, pdu)

        # FIXME: We should handle signature failures more gracefully.

@@ -948,7 +945,7 @@ class FederationClient(FederationBase):
            ]

            signed_events = await self._check_sigs_and_hash_and_fetch(
-                destination, events, outlier=False, room_version=room_version.identifier
+                destination, events, outlier=False, room_version=room_version
            )
        except HttpResponseException as e:
            if not e.code == 400:
--- a/synapse/federation/federation_server.py
+++ b/synapse/federation/federation_server.py
@@ -409,7 +409,7 @@ class FederationServer(FederationBase):
        pdu = event_from_pdu_json(content, room_version)
        origin_host, _ = parse_server_name(origin)
        await self.check_server_matches_acl(origin_host, pdu.room_id)
-        pdu = await self._check_sigs_and_hash(room_version.identifier, pdu)
+        pdu = await self._check_sigs_and_hash(room_version, pdu)
        ret_pdu = await self.handler.on_invite_request(origin, pdu, room_version)
        time_now = self._clock.time_msec()
        return {"event": ret_pdu.get_pdu_json(time_now)}
@@ -425,7 +425,7 @@ class FederationServer(FederationBase):

        logger.debug("on_send_join_request: pdu sigs: %s", pdu.signatures)

-        pdu = await self._check_sigs_and_hash(room_version.identifier, pdu)
+        pdu = await self._check_sigs_and_hash(room_version, pdu)

        res_pdus = await self.handler.on_send_join_request(origin, pdu)
        time_now = self._clock.time_msec()
@@ -455,7 +455,7 @@ class FederationServer(FederationBase):

        logger.debug("on_send_leave_request: pdu sigs: %s", pdu.signatures)

-        pdu = await self._check_sigs_and_hash(room_version.identifier, pdu)
+        pdu = await self._check_sigs_and_hash(room_version, pdu)

        await self.handler.on_send_leave_request(origin, pdu)
        return {}
@@ -611,7 +611,7 @@ class FederationServer(FederationBase):
                logger.info("Accepting join PDU %s from %s", pdu.event_id, origin)

        # We've already checked that we know the room version by this point
-        room_version = await self.store.get_room_version_id(pdu.room_id)
+        room_version = await self.store.get_room_version(pdu.room_id)

        # Check signature.
        try:
@@ -819,7 +819,16 @@ class ReplicationFederationHandlerRegistry(FederationHandlerRegistry):
                edu_type, origin, content
            )

-        return await self._send_edu(edu_type=edu_type, origin=origin, content=content)
+        if edu_type == "m.typing":
+            instance_name = "synapse.app.client_reader"
+        else:
+            instance_name = "master"
+        return await self._send_edu(
+            instance_name=instance_name,
+            edu_type=edu_type,
+            origin=origin,
+            content=content,
+        )

    async def on_query(self, query_type, args):
        """Overrides FederationHandlerRegistry
--- a/synapse/federation/send_queue.py
+++ b/synapse/federation/send_queue.py
@@ -477,7 +477,7 @@ def process_rows_for_federation(transaction_queue, rows):

    Args:
        transaction_queue (FederationSender)
-        rows (list(synapse.replication.tcp.streams.FederationStreamRow))
+        rows (list(synapse.replication.tcp.streams.federation.FederationStream.FederationStreamRow))
    """

    # The federation stream contains a bunch of different types of
--- a/synapse/federation/sender/init.py
+++ b/synapse/federation/sender/init.py
@@ -499,4 +499,13 @@ class FederationSender(object):
        self._get_per_destination_queue(destination).attempt_new_transaction()

    def get_current_token(self) -> int:
+        # Dummy implementation for case where federation sender isn't offloaded
+        # to a worker.
        return 0
+
+    async def get_replication_rows(
+        self, from_token, to_token, limit, federation_ack=None
+    ):
+        # Dummy implementation for case where federation sender isn't offloaded
+        # to a worker.
+        return []
--- a/synapse/handlers/presence.py
+++ b/synapse/handlers/presence.py
@@ -747,7 +747,7 @@ class PresenceHandler(object):

        return False

-    async def get_all_presence_updates(self, last_id, current_id):
+    async def get_all_presence_updates(self, last_id, current_id, limit):
        """
        Gets a list of presence update rows from between the given stream ids.
        Each row has:
@@ -762,7 +762,7 @@ class PresenceHandler(object):
        """
        # TODO(markjh): replicate the unpersisted changes.
        # This could use the in-memory stores for recent changes.
-        rows = await self.store.get_all_presence_updates(last_id, current_id)
+        rows = await self.store.get_all_presence_updates(last_id, current_id, limit)
        return rows

    def notify_new_event(self):
--- a/synapse/handlers/typing.py
+++ b/synapse/handlers/typing.py
@@ -15,11 +15,13 @@

 import logging
 from collections import namedtuple
+from typing import List

 from twisted.internet import defer

 from synapse.api.errors import AuthError, SynapseError
 from synapse.logging.context import run_in_background
+from synapse.replication.tcp.streams import TypingStream
 from synapse.types import UserID, get_domain_from_id
 from synapse.util.caches.stream_change_cache import StreamChangeCache
 from synapse.util.metrics import Measure
@@ -257,7 +259,13 @@ class TypingHandler(object):
            "typing_key", self._latest_room_serial, rooms=[member.room_id]
        )

-    async def get_all_typing_updates(self, last_id, current_id):
+    async def get_all_typing_updates(
+        self, last_id: int, current_id: int, limit: int
+    ) -> List[dict]:
+        """Get up to `limit` typing updates between the given tokens, earliest
+        updates first.
+        """
+
        if last_id == current_id:
            return []

@@ -275,12 +283,60 @@ class TypingHandler(object):
                typing = self._room_typing[room_id]
                rows.append((serial, room_id, list(typing)))
        rows.sort()
-        return rows
+        return rows[:limit]

    def get_current_token(self):
        return self._latest_room_serial


+class TypingSlaveHandler(object):
+    def __init__(self, hs):
+        self.notifier = hs.get_notifier()
+
+        self._latest_room_serial = 0
+        self._reset()
+
+    def _reset(self):
+        """
+        Reset the typing handler's data caches.
+        """
+        # map room IDs to serial numbers
+        self._room_serials = {}
+        # map room IDs to sets of users currently typing
+        self._room_typing = {}
+
+    def stream_positions(self):
+        # We must update this typing token from the response of the previous
+        # sync. In particular, the stream id may "reset" back to zero/a low
+        # value which we *must* use for the next replication request.
+        return {
+            "typing": {"synapse.app.client_reader": self._latest_room_serial}
+        }  # FIXME
+
+    def process_replication_rows(self, stream_name, token, rows):
+        if stream_name != TypingStream.NAME:
+            return
+
+        if self._latest_room_serial > token:
+            # The master has gone backwards. To prevent inconsistent data, just
+            # clear everything.
+            self._reset()
+
+        # Set the latest serial token to whatever the server gave us.
+        self._latest_room_serial = token
+
+        for row in rows:
+            self._room_serials[row.room_id] = token
+            self._room_typing[row.room_id] = row.user_ids
+
+        self.notifier.on_new_event(
+            "typing_key", token, rooms=[row.room_id for row in rows]
+        )
+
+    def get_current_token(self) -> int:
+        return self._latest_room_serial
+
+
 class TypingNotificationEventSource(object):
    def __init__(self, hs):
        self.hs = hs
--- a/synapse/push/httppusher.py
+++ b/synapse/push/httppusher.py
@@ -375,7 +375,6 @@ class HttpPusher(object):
        if not notification_dict:
            return []
        try:
-            logger.info("SENDING PUSH EVENT to %s: %s", self.url, notification_dict)
            resp = yield self.http_client.post_json_get_json(
                self.url, notification_dict
            )
--- a/synapse/python_dependencies.py
+++ b/synapse/python_dependencies.py
@@ -98,6 +98,7 @@ CONDITIONAL_REQUIREMENTS = {
    "sentry": ["sentry-sdk>=0.7.2"],
    "opentracing": ["jaeger-client>=4.0.0", "opentracing>=2.2.0"],
    "jwt": ["pyjwt>=1.6.4"],
+    "redis": ["txredisapi>=1.4.7"],
 }

 ALL_OPTIONAL_REQUIREMENTS = set()  # type: Set[str]
--- a/synapse/replication/http/init.py
+++ b/synapse/replication/http/init.py
@@ -21,6 +21,7 @@ from synapse.replication.http import (
    membership,
    register,
    send_event,
+    streams,
 )

 REPLICATION_PREFIX = "/_synapse/replication"
@@ -32,9 +33,12 @@ class ReplicationRestResource(JsonResource):
        self.register_servlets(hs)

    def register_servlets(self, hs):
-        send_event.register_servlets(hs, self)
-        membership.register_servlets(hs, self)
+        if hs.config.worker_app is None:
+            send_event.register_servlets(hs, self)
+            membership.register_servlets(hs, self)
+            login.register_servlets(hs, self)
+            register.register_servlets(hs, self)
+            devices.register_servlets(hs, self)
+
+        streams.register_servlets(hs, self)
        federation.register_servlets(hs, self)
-        login.register_servlets(hs, self)
-        register.register_servlets(hs, self)
-        devices.register_servlets(hs, self)
--- a/synapse/replication/http/_base.py
+++ b/synapse/replication/http/_base.py
@@ -128,14 +128,25 @@ class ReplicationEndpoint(object):
        Returns a callable that accepts the same parameters as `_serialize_payload`.
        """
        clock = hs.get_clock()
-        host = hs.config.worker_replication_host
-        port = hs.config.worker_replication_http_port
+        master_host = hs.config.worker_replication_host
+        master_port = hs.config.worker_replication_http_port
+
+        instance_http_map = hs.config.instance_http_map

        client = hs.get_simple_http_client()

        @trace(opname="outgoing_replication_request")
        @defer.inlineCallbacks
-        def send_request(**kwargs):
+        def send_request(instance_name="master", **kwargs):
+            if instance_name == "master":
+                host = master_host
+                port = master_port
+            elif instance_name in instance_http_map:
+                host = instance_http_map[instance_name]["host"]
+                port = instance_http_map[instance_name]["port"]
+            else:
+                raise Exception("Unknown instance")
+
            data = yield cls._serialize_payload(**kwargs)

            url_args = [
--- a/synapse/replication/http/federation.py
+++ b/synapse/replication/http/federation.py
@@ -277,8 +277,10 @@ class ReplicationStoreRoomOnInviteRestServlet(ReplicationEndpoint):


 def register_servlets(hs, http_server):
-    ReplicationFederationSendEventsRestServlet(hs).register(http_server)
+    if hs.config.worker_app is None:
+        ReplicationFederationSendEventsRestServlet(hs).register(http_server)
+        ReplicationGetQueryRestServlet(hs).register(http_server)
+        ReplicationCleanRoomRestServlet(hs).register(http_server)
+        ReplicationStoreRoomOnInviteRestServlet(hs).register(http_server)
+
    ReplicationFederationSendEduRestServlet(hs).register(http_server)
-    ReplicationGetQueryRestServlet(hs).register(http_server)
-    ReplicationCleanRoomRestServlet(hs).register(http_server)
-    ReplicationStoreRoomOnInviteRestServlet(hs).register(http_server)
--- a/synapse/replication/http/streams.py
+++ b/synapse/replication/http/streams.py
@@ -0,0 +1,80 @@
+# -*- coding: utf-8 -*-
+# Copyright 2020 The Matrix.org Foundation C.I.C.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import logging
+
+from synapse.api.errors import SynapseError
+from synapse.http.servlet import parse_integer
+from synapse.replication.http._base import ReplicationEndpoint
+
+logger = logging.getLogger(__name__)
+
+
+class ReplicationGetStreamUpdates(ReplicationEndpoint):
+    """Fetches stream updates from a server. Used for streams not persisted to
+    the database, e.g. typing notifications.
+
+    The API looks like:
+
+        GET /_synapse/replication/get_repl_stream_updates/events?from_token=0&to_token=10&limit=100
+
+        200 OK
+
+        {
+            updates: [ ... ],
+            upto_token: 10,
+            limited: False,
+        }
+
+    """
+
+    NAME = "get_repl_stream_updates"
+    PATH_ARGS = ("stream_name",)
+    METHOD = "GET"
+
+    def __init__(self, hs):
+        super().__init__(hs)
+
+        # We pull the streams from the replication steamer (if we try and make
+        # them ourselves we end up in an import loop).
+        self.streams = hs.get_replication_streamer().get_streams()
+
+        self.instance_name = hs.config.worker_name or "master"
+
+    @staticmethod
+    def _serialize_payload(stream_name, from_token, upto_token, limit):
+        return {"from_token": from_token, "upto_token": upto_token, "limit": limit}
+
+    async def _handle_request(self, request, stream_name):
+        stream = self.streams.get(stream_name)
+        if stream is None:
+            raise SynapseError(400, "Unknown stream")
+
+        from_token = parse_integer(request, "from_token", required=True)
+        upto_token = parse_integer(request, "upto_token", required=True)
+        limit = parse_integer(request, "limit", required=True)
+
+        updates, upto_token, limited = await stream.get_updates_since(
+            self.instance_name, from_token, upto_token, limit
+        )
+
+        return (
+            200,
+            {"updates": updates, "upto_token": upto_token, "limited": limited},
+        )
+
+
+def register_servlets(hs, http_server):
+    ReplicationGetStreamUpdates(hs).register(http_server)
--- a/synapse/replication/slave/storage/_base.py
+++ b/synapse/replication/slave/storage/_base.py
@@ -18,8 +18,10 @@ from typing import Dict, Optional

 import six

-from synapse.storage._base import SQLBaseStore
-from synapse.storage.data_stores.main.cache import CURRENT_STATE_CACHE_NAME
+from synapse.storage.data_stores.main.cache import (
+    CURRENT_STATE_CACHE_NAME,
+    CacheInvalidationWorkerStore,
+)
 from synapse.storage.database import Database
 from synapse.storage.engines import PostgresEngine

@@ -35,7 +37,7 @@ def __func__(inp):
        return inp.__func__


-class BaseSlavedStore(SQLBaseStore):
+class BaseSlavedStore(CacheInvalidationWorkerStore):
    def __init__(self, database: Database, db_conn, hs):
        super(BaseSlavedStore, self).__init__(database, db_conn, hs)
        if isinstance(self.database_engine, PostgresEngine):
@@ -57,9 +59,15 @@ class BaseSlavedStore(SQLBaseStore):
        """
        pos = {}
        if self._cache_id_gen:
-            pos["caches"] = self._cache_id_gen.get_current_token()
+            pos["caches"] = {"master": self._cache_id_gen.get_current_token()}
        return pos

+    def get_cache_stream_token(self):
+        if self._cache_id_gen:
+            return self._cache_id_gen.get_current_token()
+        else:
+            return 0
+
    def process_replication_rows(self, stream_name, token, rows):
        if stream_name == "caches":
            if self._cache_id_gen:
--- a/synapse/replication/slave/storage/account_data.py
+++ b/synapse/replication/slave/storage/account_data.py
@@ -35,9 +35,7 @@ class SlavedAccountDataStore(TagsWorkerStore, AccountDataWorkerStore, BaseSlaved
    def stream_positions(self):
        result = super(SlavedAccountDataStore, self).stream_positions()
        position = self._account_data_id_gen.get_current_token()
-        result["user_account_data"] = position
-        result["room_account_data"] = position
-        result["tag_account_data"] = position
+        result["account_data"] = {"master": position}
        return result

    def process_replication_rows(self, stream_name, token, rows):
--- a/synapse/replication/slave/storage/deviceinbox.py
+++ b/synapse/replication/slave/storage/deviceinbox.py
@@ -45,7 +45,7 @@ class SlavedDeviceInboxStore(DeviceInboxWorkerStore, BaseSlavedStore):

    def stream_positions(self):
        result = super(SlavedDeviceInboxStore, self).stream_positions()
-        result["to_device"] = self._device_inbox_id_gen.get_current_token()
+        result["to_device"] = {"master": self._device_inbox_id_gen.get_current_token()}
        return result

    def process_replication_rows(self, stream_name, token, rows):
--- a/synapse/replication/slave/storage/devices.py
+++ b/synapse/replication/slave/storage/devices.py
@@ -29,7 +29,13 @@ class SlavedDeviceStore(EndToEndKeyWorkerStore, DeviceWorkerStore, BaseSlavedSto
        self.hs = hs

        self._device_list_id_gen = SlavedIdTracker(
-            db_conn, "device_lists_stream", "stream_id"
+            db_conn,
+            "device_lists_stream",
+            "stream_id",
+            extra_tables=[
+                ("user_signature_stream", "stream_id"),
+                ("device_lists_outbound_pokes", "stream_id"),
+            ],
        )
        device_list_max = self._device_list_id_gen.get_current_token()
        self._device_list_stream_cache = StreamChangeCache(
@@ -48,30 +54,34 @@ class SlavedDeviceStore(EndToEndKeyWorkerStore, DeviceWorkerStore, BaseSlavedSto
        # device list stream, so set them both to the device list ID
        # generator's current token.
        current_token = self._device_list_id_gen.get_current_token()
-        result[DeviceListsStream.NAME] = current_token
-        result[UserSignatureStream.NAME] = current_token
+        result[DeviceListsStream.NAME] = {"master": current_token}
+        result[UserSignatureStream.NAME] = {"master": current_token}
        return result

    def process_replication_rows(self, stream_name, token, rows):
        if stream_name == DeviceListsStream.NAME:
            self._device_list_id_gen.advance(token)
-            for row in rows:
-                self._invalidate_caches_for_devices(token, row.user_id, row.destination)
+            self._invalidate_caches_for_devices(token, rows)
        elif stream_name == UserSignatureStream.NAME:
+            self._device_list_id_gen.advance(token)
            for row in rows:
                self._user_signature_stream_cache.entity_has_changed(row.user_id, token)
        return super(SlavedDeviceStore, self).process_replication_rows(
            stream_name, token, rows
        )

-    def _invalidate_caches_for_devices(self, token, user_id, destination):
-        self._device_list_stream_cache.entity_has_changed(user_id, token)
+    def _invalidate_caches_for_devices(self, token, rows):
+        for row in rows:
+            # The entities are either user IDs (starting with '@') whose devices
+            # have changed, or remote servers that we need to tell about
+            # changes.
+            if row.entity.startswith("@"):
+                self._device_list_stream_cache.entity_has_changed(row.entity, token)
+                self.get_cached_devices_for_user.invalidate((row.entity,))
+                self._get_cached_user_device.invalidate_many((row.entity,))
+                self.get_device_list_last_stream_id_for_remote.invalidate((row.entity,))

-        if destination:
-            self._device_list_federation_stream_cache.entity_has_changed(
-                destination, token
-            )
-
-        self.get_cached_devices_for_user.invalidate((user_id,))
-        self._get_cached_user_device.invalidate_many((user_id,))
-        self.get_device_list_last_stream_id_for_remote.invalidate((user_id,))
+            else:
+                self._device_list_federation_stream_cache.entity_has_changed(
+                    row.entity, token
+                )
--- a/synapse/replication/slave/storage/events.py
+++ b/synapse/replication/slave/storage/events.py
@@ -95,8 +95,8 @@ class SlavedEventStore(

    def stream_positions(self):
        result = super(SlavedEventStore, self).stream_positions()
-        result["events"] = self._stream_id_gen.get_current_token()
-        result["backfill"] = -self._backfill_id_gen.get_current_token()
+        result["events"] = {"master": self._stream_id_gen.get_current_token()}
+        result["backfill"] = {"master": -self._backfill_id_gen.get_current_token()}
        return result

    def process_replication_rows(self, stream_name, token, rows):
--- a/synapse/replication/slave/storage/groups.py
+++ b/synapse/replication/slave/storage/groups.py
@@ -39,7 +39,7 @@ class SlavedGroupServerStore(GroupServerWorkerStore, BaseSlavedStore):

    def stream_positions(self):
        result = super(SlavedGroupServerStore, self).stream_positions()
-        result["groups"] = self._group_updates_id_gen.get_current_token()
+        result["groups"] = {"master": self._group_updates_id_gen.get_current_token()}
        return result

    def process_replication_rows(self, stream_name, token, rows):
--- a/synapse/replication/slave/storage/presence.py
+++ b/synapse/replication/slave/storage/presence.py
@@ -46,7 +46,7 @@ class SlavedPresenceStore(BaseSlavedStore):

        if self.hs.config.use_presence:
            position = self._presence_id_gen.get_current_token()
-            result["presence"] = position
+            result["presence"] = {"master": position}

        return result

--- a/synapse/replication/slave/storage/push_rule.py
+++ b/synapse/replication/slave/storage/push_rule.py
@@ -39,7 +39,9 @@ class SlavedPushRuleStore(SlavedEventStore, PushRulesWorkerStore):

    def stream_positions(self):
        result = super(SlavedPushRuleStore, self).stream_positions()
-        result["push_rules"] = self._push_rules_stream_id_gen.get_current_token()
+        result["push_rules"] = {
+            "master": self._push_rules_stream_id_gen.get_current_token()
+        }
        return result

    def process_replication_rows(self, stream_name, token, rows):
--- a/synapse/replication/slave/storage/pushers.py
+++ b/synapse/replication/slave/storage/pushers.py
@@ -30,9 +30,12 @@ class SlavedPusherStore(PusherWorkerStore, BaseSlavedStore):

    def stream_positions(self):
        result = super(SlavedPusherStore, self).stream_positions()
-        result["pushers"] = self._pushers_id_gen.get_current_token()
+        result["pushers"] = {"master": self._pushers_id_gen.get_current_token()}
        return result

+    def get_pushers_stream_token(self):
+        return self._pushers_id_gen.get_current_token()
+
    def process_replication_rows(self, stream_name, token, rows):
        if stream_name == "pushers":
            self._pushers_id_gen.advance(token)
--- a/synapse/replication/slave/storage/receipts.py
+++ b/synapse/replication/slave/storage/receipts.py
@@ -44,7 +44,7 @@ class SlavedReceiptsStore(ReceiptsWorkerStore, BaseSlavedStore):

    def stream_positions(self):
        result = super(SlavedReceiptsStore, self).stream_positions()
-        result["receipts"] = self._receipts_id_gen.get_current_token()
+        result["receipts"] = {"master": self._receipts_id_gen.get_current_token()}
        return result

    def invalidate_caches_for_receipt(self, room_id, receipt_type, user_id):
--- a/synapse/replication/slave/storage/room.py
+++ b/synapse/replication/slave/storage/room.py
@@ -32,7 +32,9 @@ class RoomStore(RoomWorkerStore, BaseSlavedStore):

    def stream_positions(self):
        result = super(RoomStore, self).stream_positions()
-        result["public_rooms"] = self._public_room_id_gen.get_current_token()
+        result["public_rooms"] = {
+            "master": self._public_room_id_gen.get_current_token()
+        }
        return result

    def process_replication_rows(self, stream_name, token, rows):
--- a/synapse/replication/tcp/client.py
+++ b/synapse/replication/tcp/client.py
@@ -16,26 +16,10 @@
 """

 import logging
-from typing import Dict, List, Optional

-from twisted.internet import defer
 from twisted.internet.protocol import ReconnectingClientFactory

-from synapse.replication.slave.storage._base import BaseSlavedStore
-from synapse.replication.tcp.protocol import (
-    AbstractReplicationClientHandler,
-    ClientReplicationStreamProtocol,
-)
-
-from .commands import (
-    Command,
-    FederationAckCommand,
-    InvalidateCacheCommand,
-    RemoteServerUpCommand,
-    RemovePusherCommand,
-    UserIpCommand,
-    UserSyncCommand,
-)
+from synapse.replication.tcp.protocol import ClientReplicationStreamProtocol

 logger = logging.getLogger(__name__)

@@ -51,10 +35,11 @@ class ReplicationClientFactory(ReconnectingClientFactory):
    initialDelay = 0.1
    maxDelay = 1  # Try at least once every N seconds

-    def __init__(self, hs, client_name, handler: AbstractReplicationClientHandler):
+    def __init__(self, hs, client_name):
        self.client_name = client_name
-        self.handler = handler
+        self.handler = hs.get_tcp_replication()
        self.server_name = hs.config.server_name
+        self.hs = hs
        self._clock = hs.get_clock()  # As self.clock is defined in super class

        hs.get_reactor().addSystemEventTrigger("before", "shutdown", self.stopTrying)
@@ -65,7 +50,7 @@ class ReplicationClientFactory(ReconnectingClientFactory):
    def buildProtocol(self, addr):
        logger.info("Connected to replication: %r", addr)
        return ClientReplicationStreamProtocol(
-            self.client_name, self.server_name, self._clock, self.handler
+            self.hs, self.client_name, self.server_name, self._clock, self.handler,
        )

    def clientConnectionLost(self, connector, reason):
@@ -75,170 +60,3 @@ class ReplicationClientFactory(ReconnectingClientFactory):
    def clientConnectionFailed(self, connector, reason):
        logger.error("Failed to connect to replication: %r", reason)
        ReconnectingClientFactory.clientConnectionFailed(self, connector, reason)
-
-
-class ReplicationClientHandler(AbstractReplicationClientHandler):
-    """A base handler that can be passed to the ReplicationClientFactory.
-
-    By default proxies incoming replication data to the SlaveStore.
-    """
-
-    def __init__(self, store: BaseSlavedStore):
-        self.store = store
-
-        # The current connection. None if we are currently (re)connecting
-        self.connection = None
-
-        # Any pending commands to be sent once a new connection has been
-        # established
-        self.pending_commands = []  # type: List[Command]
-
-        # Map from string -> deferred, to wake up when receiveing a SYNC with
-        # the given string.
-        # Used for tests.
-        self.awaiting_syncs = {}  # type: Dict[str, defer.Deferred]
-
-        # The factory used to create connections.
-        self.factory = None  # type: Optional[ReplicationClientFactory]
-
-    def start_replication(self, hs):
-        """Helper method to start a replication connection to the remote server
-        using TCP.
-        """
-        client_name = hs.config.worker_name
-        self.factory = ReplicationClientFactory(hs, client_name, self)
-        host = hs.config.worker_replication_host
-        port = hs.config.worker_replication_port
-        hs.get_reactor().connectTCP(host, port, self.factory)
-
-    async def on_rdata(self, stream_name, token, rows):
-        """Called to handle a batch of replication data with a given stream token.
-
-        By default this just pokes the slave store. Can be overridden in subclasses to
-        handle more.
-
-        Args:
-            stream_name (str): name of the replication stream for this batch of rows
-            token (int): stream token for this batch of rows
-            rows (list): a list of Stream.ROW_TYPE objects as returned by
-                Stream.parse_row.
-        """
-        logger.debug("Received rdata %s -> %s", stream_name, token)
-        self.store.process_replication_rows(stream_name, token, rows)
-
-    async def on_position(self, stream_name, token):
-        """Called when we get new position data. By default this just pokes
-        the slave store.
-
-        Can be overriden in subclasses to handle more.
-        """
-        self.store.process_replication_rows(stream_name, token, [])
-
-    def on_sync(self, data):
-        """When we received a SYNC we wake up any deferreds that were waiting
-        for the sync with the given data.
-
-        Used by tests.
-        """
-        d = self.awaiting_syncs.pop(data, None)
-        if d:
-            d.callback(data)
-
-    def on_remote_server_up(self, server: str):
-        """Called when get a new REMOTE_SERVER_UP command."""
-
-    def get_streams_to_replicate(self) -> Dict[str, int]:
-        """Called when a new connection has been established and we need to
-        subscribe to streams.
-
-        Returns:
-            map from stream name to the most recent update we have for
-            that stream (ie, the point we want to start replicating from)
-        """
-        args = self.store.stream_positions()
-        user_account_data = args.pop("user_account_data", None)
-        room_account_data = args.pop("room_account_data", None)
-        if user_account_data:
-            args["account_data"] = user_account_data
-        elif room_account_data:
-            args["account_data"] = room_account_data
-
-        return args
-
-    def get_currently_syncing_users(self):
-        """Get the list of currently syncing users (if any). This is called
-        when a connection has been established and we need to send the
-        currently syncing users. (Overriden by the synchrotron's only)
-        """
-        return []
-
-    def send_command(self, cmd):
-        """Send a command to master (when we get establish a connection if we
-        don't have one already.)
-        """
-        if self.connection:
-            self.connection.send_command(cmd)
-        else:
-            logger.warning("Queuing command as not connected: %r", cmd.NAME)
-            self.pending_commands.append(cmd)
-
-    def send_federation_ack(self, token):
-        """Ack data for the federation stream. This allows the master to drop
-        data stored purely in memory.
-        """
-        self.send_command(FederationAckCommand(token))
-
-    def send_user_sync(self, user_id, is_syncing, last_sync_ms):
-        """Poke the master that a user has started/stopped syncing.
-        """
-        self.send_command(UserSyncCommand(user_id, is_syncing, last_sync_ms))
-
-    def send_remove_pusher(self, app_id, push_key, user_id):
-        """Poke the master to remove a pusher for a user
-        """
-        cmd = RemovePusherCommand(app_id, push_key, user_id)
-        self.send_command(cmd)
-
-    def send_invalidate_cache(self, cache_func, keys):
-        """Poke the master to invalidate a cache.
-        """
-        cmd = InvalidateCacheCommand(cache_func.__name__, keys)
-        self.send_command(cmd)
-
-    def send_user_ip(self, user_id, access_token, ip, user_agent, device_id, last_seen):
-        """Tell the master that the user made a request.
-        """
-        cmd = UserIpCommand(user_id, access_token, ip, user_agent, device_id, last_seen)
-        self.send_command(cmd)
-
-    def send_remote_server_up(self, server: str):
-        self.send_command(RemoteServerUpCommand(server))
-
-    def await_sync(self, data):
-        """Returns a deferred that is resolved when we receive a SYNC command
-        with given data.
-
-        [Not currently] used by tests.
-        """
-        return self.awaiting_syncs.setdefault(data, defer.Deferred())
-
-    def update_connection(self, connection):
-        """Called when a connection has been established (or lost with None).
-        """
-        self.connection = connection
-        if connection:
-            for cmd in self.pending_commands:
-                connection.send_command(cmd)
-            self.pending_commands = []
-
-    def finished_connecting(self):
-        """Called when we have successfully subscribed and caught up to all
-        streams we're interested in.
-        """
-        logger.info("Finished connecting to server")
-
-        # We don't reset the delay any earlier as otherwise if there is a
-        # problem during start up we'll end up tight looping connecting to the
-        # server.
-        if self.factory:
-            self.factory.resetDelay()
--- a/synapse/replication/tcp/commands.py
+++ b/synapse/replication/tcp/commands.py
@@ -86,7 +86,7 @@ class RdataCommand(Command):

    Format::

-        RDATA <stream_name> <token> <row_json>
+        RDATA <stream_name> <instance_name> <token> <row_json>

    The `<token>` may either be a numeric stream id OR "batch". The latter case
    is used to support sending multiple updates with the same stream ID. This
@@ -107,22 +107,27 @@ class RdataCommand(Command):

    NAME = "RDATA"

-    def __init__(self, stream_name, token, row):
+    def __init__(self, stream_name, instance_name, token, row):
        self.stream_name = stream_name
+        self.instance_name = instance_name
        self.token = token
        self.row = row

    @classmethod
    def from_line(cls, line):
-        stream_name, token, row_json = line.split(" ", 2)
+        stream_name, instance_name, token, row_json = line.split(" ", 3)
        return cls(
-            stream_name, None if token == "batch" else int(token), json.loads(row_json)
+            stream_name,
+            instance_name,
+            None if token == "batch" else int(token),
+            json.loads(row_json),
        )

    def to_line(self):
        return " ".join(
            (
                self.stream_name,
+                self.instance_name,
                str(self.token) if self.token is not None else "batch",
                _json_encoder.encode(self.row),
            )
@@ -136,23 +141,24 @@ class PositionCommand(Command):
    """Sent by the server to tell the client the stream postition without
    needing to send an RDATA.

-    Sent to the client after all missing updates for a stream have been sent
-    to the client and they're now up to date.
+    On receipt of a POSITION command clients should check if they have missed
+    any updates, and if so then fetch them out of band.
    """

    NAME = "POSITION"

-    def __init__(self, stream_name, token):
+    def __init__(self, stream_name, instance_name, token):
        self.stream_name = stream_name
+        self.instance_name = instance_name
        self.token = token

    @classmethod
    def from_line(cls, line):
-        stream_name, token = line.split(" ", 1)
-        return cls(stream_name, int(token))
+        stream_name, instance_name, token = line.split(" ", 2)
+        return cls(stream_name, instance_name, int(token))

    def to_line(self):
-        return " ".join((self.stream_name, str(self.token)))
+        return " ".join((self.stream_name, self.instance_name, str(self.token)))


 class ErrorCommand(Command):
@@ -179,42 +185,24 @@ class NameCommand(Command):


 class ReplicateCommand(Command):
-    """Sent by the client to subscribe to the stream.
+    """Sent by the client to subscribe to streams.

    Format::

-        REPLICATE <stream_name> <token>
-
-    Where <token> may be either:
-        * a numeric stream_id to stream updates from
-        * "NOW" to stream all subsequent updates.
-
-    The <stream_name> can be "ALL" to subscribe to all known streams, in which
-    case the <token> must be set to "NOW", i.e.::
-
-        REPLICATE ALL NOW
+        REPLICATE
    """

    NAME = "REPLICATE"

-    def __init__(self, stream_name, token):
-        self.stream_name = stream_name
-        self.token = token
+    def __init__(self):
+        pass

    @classmethod
    def from_line(cls, line):
-        stream_name, token = line.split(" ", 1)
-        if token in ("NOW", "now"):
-            token = "NOW"
-        else:
-            token = int(token)
-        return cls(stream_name, token)
+        return cls()

    def to_line(self):
-        return " ".join((self.stream_name, str(self.token)))
-
-    def get_logcontext_id(self):
-        return "REPLICATE-" + self.stream_name
+        return ""


 class UserSyncCommand(Command):
@@ -225,30 +213,32 @@ class UserSyncCommand(Command):

    Format::

-        USER_SYNC <user_id> <state> <last_sync_ms>
+        USER_SYNC <instance_id> <user_id> <state> <last_sync_ms>

    Where <state> is either "start" or "stop"
    """

    NAME = "USER_SYNC"

-    def __init__(self, user_id, is_syncing, last_sync_ms):
+    def __init__(self, instance_id, user_id, is_syncing, last_sync_ms):
+        self.instance_id = instance_id
        self.user_id = user_id
        self.is_syncing = is_syncing
        self.last_sync_ms = last_sync_ms

    @classmethod
    def from_line(cls, line):
-        user_id, state, last_sync_ms = line.split(" ", 2)
+        instance_id, user_id, state, last_sync_ms = line.split(" ", 3)

        if state not in ("start", "end"):
            raise Exception("Invalid USER_SYNC state %r" % (state,))

-        return cls(user_id, state == "start", int(last_sync_ms))
+        return cls(instance_id, user_id, state == "start", int(last_sync_ms))

    def to_line(self):
        return " ".join(
            (
+                self.instance_id,
                self.user_id,
                "start" if self.is_syncing else "end",
                str(self.last_sync_ms),
@@ -256,6 +246,30 @@ class UserSyncCommand(Command):
        )


+class ClearUserSyncsCommand(Command):
+    """Sent by the client to inform the server that it should drop all
+    information about syncing users sent by the client.
+
+    Mainly used when client is about to shut down.
+
+    Format::
+
+        CLEAR_USER_SYNC <instance_id>
+    """
+
+    NAME = "CLEAR_USER_SYNC"
+
+    def __init__(self, instance_id):
+        self.instance_id = instance_id
+
+    @classmethod
+    def from_line(cls, line):
+        return cls(line)
+
+    def to_line(self):
+        return self.instance_id
+
+
 class FederationAckCommand(Command):
    """Sent by the client when it has processed up to a given point in the
    federation stream. This allows the master to drop in-memory caches of the
@@ -416,6 +430,7 @@ _COMMANDS = (
    InvalidateCacheCommand,
    UserIpCommand,
    RemoteServerUpCommand,
+    ClearUserSyncsCommand,
 )  # type: Tuple[Type[Command], ...]

 # Map of command name to command type.
@@ -438,6 +453,7 @@ VALID_CLIENT_COMMANDS = (
    ReplicateCommand.NAME,
    PingCommand.NAME,
    UserSyncCommand.NAME,
+    ClearUserSyncsCommand.NAME,
    FederationAckCommand.NAME,
    RemovePusherCommand.NAME,
    InvalidateCacheCommand.NAME,
--- a/synapse/replication/tcp/handler.py
+++ b/synapse/replication/tcp/handler.py
@@ -0,0 +1,399 @@
+# -*- coding: utf-8 -*-
+# Copyright 2017 Vector Creations Ltd
+# Copyright 2020 The Matrix.org Foundation C.I.C.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""A replication client for use by synapse workers.
+"""
+
+import logging
+from typing import Any, Callable, Dict, List
+
+from prometheus_client import Counter
+
+from synapse.metrics import LaterGauge
+from synapse.replication.tcp.commands import (
+    ClearUserSyncsCommand,
+    Command,
+    FederationAckCommand,
+    InvalidateCacheCommand,
+    PositionCommand,
+    RdataCommand,
+    RemoteServerUpCommand,
+    RemovePusherCommand,
+    ReplicateCommand,
+    UserIpCommand,
+    UserSyncCommand,
+)
+from synapse.replication.tcp.streams import STREAMS_MAP, Stream
+
+logger = logging.getLogger(__name__)
+
+
+user_sync_counter = Counter("synapse_replication_tcp_resource_user_sync", "")
+federation_ack_counter = Counter("synapse_replication_tcp_resource_federation_ack", "")
+remove_pusher_counter = Counter("synapse_replication_tcp_resource_remove_pusher", "")
+invalidate_cache_counter = Counter(
+    "synapse_replication_tcp_resource_invalidate_cache", ""
+)
+user_ip_cache_counter = Counter("synapse_replication_tcp_resource_user_ip_cache", "")
+
+
+class ReplicationClientHandler:
+    """Handles incoming commands from replication.
+
+    Proxies data to `HomeServer.get_replication_data_handler()`.
+    """
+
+    def __init__(self, hs):
+        self.replication_data_handler = hs.get_replication_data_handler()
+        self.store = hs.get_datastore()
+        self.notifier = hs.get_notifier()
+        self.clock = hs.get_clock()
+        self.presence_handler = hs.get_presence_handler()
+        self.instance_id = hs.get_instance_id()
+
+        self.instance_name = hs.config.worker.worker_name or "master"
+
+        self.connections = []  # type: List[Any]
+
+        self.streams = {
+            stream.NAME: stream(hs) for stream in STREAMS_MAP.values()
+        }  # type: Dict[str, Stream]
+
+        LaterGauge(
+            "synapse_replication_tcp_resource_total_connections",
+            "",
+            [],
+            lambda: len(self.connections),
+        )
+
+        LaterGauge(
+            "synapse_replication_tcp_resource_connections_per_stream",
+            "",
+            ["stream_name"],
+            lambda: {
+                (stream_name,): len(
+                    [
+                        conn
+                        for conn in self.connections
+                        if stream_name in conn.replication_streams
+                    ]
+                )
+                for stream_name in self.streams
+            },
+        )
+
+        # Map of stream to batched updates. See RdataCommand for info on how
+        # batching works.
+        self.pending_batches = {}  # type: Dict[str, List[Any]]
+
+        self.is_master = hs.config.worker_app is None
+
+        self.federation_sender = None
+        if self.is_master and not hs.config.send_federation:
+            self.federation_sender = hs.get_federation_sender()
+
+        self._server_notices_sender = None
+        if self.is_master:
+            self._server_notices_sender = hs.get_server_notices_sender()
+            self.notifier.add_remote_server_up_callback(self.send_remote_server_up)
+
+    def new_connection(self, connection):
+        self.connections.append(connection)
+
+    def lost_connection(self, connection):
+        try:
+            self.connections.remove(connection)
+        except ValueError:
+            pass
+
+    def connected(self) -> bool:
+        """Do we have any replication connections open?
+
+        Used to no-op if nothing is connected.
+        """
+        return bool(self.connections)
+
+    async def on_REPLICATE(self, cmd: ReplicateCommand):
+        # We only want to announce positions by the writer of the streams.
+        # Currently this is just the master process.
+        if not self.is_master:
+            return
+
+        if not self.connections:
+            raise Exception("Not connected")
+
+        for stream_name, stream in self.streams.items():
+            current_token = stream.current_token()
+            self.send_command(
+                PositionCommand(stream_name, self.instance_name, current_token)
+            )
+
+    async def on_USER_SYNC(self, cmd: UserSyncCommand):
+        user_sync_counter.inc()
+
+        if self.is_master:
+            await self.presence_handler.update_external_syncs_row(
+                cmd.instance_id, cmd.user_id, cmd.is_syncing, cmd.last_sync_ms
+            )
+
+    async def on_CLEAR_USER_SYNC(self, cmd: ClearUserSyncsCommand):
+        if self.is_master:
+            await self.presence_handler.update_external_syncs_clear(cmd.instance_id)
+
+    async def on_FEDERATION_ACK(self, cmd: FederationAckCommand):
+        federation_ack_counter.inc()
+
+        if self.federation_sender:
+            self.federation_sender.federation_ack(cmd.token)
+
+    async def on_REMOVE_PUSHER(self, cmd: RemovePusherCommand):
+        remove_pusher_counter.inc()
+
+        if self.is_master:
+            await self.store.delete_pusher_by_app_id_pushkey_user_id(
+                app_id=cmd.app_id, pushkey=cmd.push_key, user_id=cmd.user_id
+            )
+
+            self.notifier.on_new_replication_data()
+
+    async def on_INVALIDATE_CACHE(self, cmd: InvalidateCacheCommand):
+        invalidate_cache_counter.inc()
+
+        if self.is_master:
+            # We invalidate the cache locally, but then also stream that to other
+            # workers.
+            await self.store.invalidate_cache_and_stream(
+                cmd.cache_func, tuple(cmd.keys)
+            )
+
+    async def on_USER_IP(self, cmd: UserIpCommand):
+        user_ip_cache_counter.inc()
+
+        if self.is_master:
+            await self.store.insert_client_ip(
+                cmd.user_id,
+                cmd.access_token,
+                cmd.ip,
+                cmd.user_agent,
+                cmd.device_id,
+                cmd.last_seen,
+            )
+
+        if self._server_notices_sender:
+            await self._server_notices_sender.on_user_ip(cmd.user_id)
+
+    async def on_RDATA(self, cmd: RdataCommand):
+        stream_name = cmd.stream_name
+
+        try:
+            row = STREAMS_MAP[stream_name].parse_row(cmd.row)
+        except Exception:
+            logger.exception("[%s] Failed to parse RDATA: %r", stream_name, cmd.row)
+            raise
+
+        if cmd.token is None:
+            # I.e. this is part of a batch of updates for this stream. Batch
+            # until we get an update for the stream with a non None token
+            self.pending_batches.setdefault(stream_name, []).append(row)
+        else:
+            # Check if this is the last of a batch of updates
+            rows = self.pending_batches.pop(stream_name, [])
+            rows.append(row)
+            await self.on_rdata(stream_name, cmd.instance_name, cmd.token, rows)
+
+    async def on_rdata(
+        self, stream_name: str, instance_name: str, token: int, rows: list
+    ):
+        """Called to handle a batch of replication data with a given stream token.
+
+        Args:
+            stream_name: name of the replication stream for this batch of rows
+            token: stream token for this batch of rows
+            rows: a list of Stream.ROW_TYPE objects as returned by
+                Stream.parse_row.
+        """
+        logger.info("Received rdata %s %s -> %s", stream_name, instance_name, token)
+        await self.replication_data_handler.on_rdata(
+            stream_name, instance_name, token, rows
+        )
+
+    async def on_POSITION(self, cmd: PositionCommand):
+        stream = self.streams.get(cmd.stream_name)
+        if not stream:
+            logger.error("Got POSITION for unknown stream: %s", cmd.stream_name)
+            return
+
+        # Find where we previously streamed up to.
+        current_tokens = self.replication_data_handler.get_streams_to_replicate().get(
+            cmd.stream_name
+        )
+        if current_tokens is None:
+            logger.debug(
+                "Got POSITION for stream we're not subscribed to: %s", cmd.stream_name
+            )
+            return
+
+        current_token = current_tokens.get(cmd.instance_name, 0)
+
+        # Fetch all updates between then and now.
+        limited = cmd.token != current_token
+        while limited:
+            updates, current_token, limited = await stream.get_updates_since(
+                cmd.instance_name, current_token, cmd.token
+            )
+            if updates:
+                await self.on_rdata(
+                    cmd.stream_name,
+                    cmd.instance_name,
+                    current_token,
+                    [stream.parse_row(update[1]) for update in updates],
+                )
+
+        # We've now caught up to position sent to us, notify handler.
+        await self.replication_data_handler.on_position(cmd.stream_name, cmd.token)
+
+        # Handle any RDATA that came in while we were catching up.
+        rows = self.pending_batches.pop(cmd.stream_name, [])
+        if rows:
+            await self.on_rdata(
+                cmd.stream_name, cmd.instance_name, rows[-1].token, rows
+            )
+
+    async def on_REMOTE_SERVER_UP(self, cmd: RemoteServerUpCommand):
+        """Called when get a new REMOTE_SERVER_UP command."""
+        if self.is_master:
+            self.notifier.notify_remote_server_up(cmd.data)
+
+    def get_currently_syncing_users(self):
+        """Get the list of currently syncing users (if any). This is called
+        when a connection has been established and we need to send the
+        currently syncing users.
+        """
+        return self.presence_handler.get_currently_syncing_users()
+
+    def send_command(self, cmd: Command):
+        """Send a command to master (when we get establish a connection if we
+        don't have one already.)
+        """
+        for conn in self.connections:
+            conn.send_command(cmd)
+
+    def send_federation_ack(self, token: int):
+        """Ack data for the federation stream. This allows the master to drop
+        data stored purely in memory.
+        """
+        self.send_command(FederationAckCommand(token))
+
+    def send_user_sync(
+        self, instance_id: str, user_id: str, is_syncing: bool, last_sync_ms: int
+    ):
+        """Poke the master that a user has started/stopped syncing.
+        """
+        self.send_command(
+            UserSyncCommand(instance_id, user_id, is_syncing, last_sync_ms)
+        )
+
+    def send_remove_pusher(self, app_id: str, push_key: str, user_id: str):
+        """Poke the master to remove a pusher for a user
+        """
+        cmd = RemovePusherCommand(app_id, push_key, user_id)
+        self.send_command(cmd)
+
+    def send_invalidate_cache(self, cache_func: Callable, keys: tuple):
+        """Poke the master to invalidate a cache.
+        """
+        cmd = InvalidateCacheCommand(cache_func.__name__, keys)
+        self.send_command(cmd)
+
+    def send_user_ip(
+        self,
+        user_id: str,
+        access_token: str,
+        ip: str,
+        user_agent: str,
+        device_id: str,
+        last_seen: int,
+    ):
+        """Tell the master that the user made a request.
+        """
+        cmd = UserIpCommand(user_id, access_token, ip, user_agent, device_id, last_seen)
+        self.send_command(cmd)
+
+    def send_remote_server_up(self, server: str):
+        self.send_command(RemoteServerUpCommand(server))
+
+    def stream_update(self, stream_name: str, token: str, data: Any):
+        """Called when a new update is available to stream to clients.
+
+        We need to check if the client is interested in the stream or not
+        """
+        self.send_command(RdataCommand(stream_name, self.instance_name, token, data))
+
+
+class ReplicationDataHandler:
+    """A replication data handler that simply discards all data.
+    """
+
+    def __init__(self, hs):
+        self.store = hs.get_datastore()
+        self.typing_handler = hs.get_typing_handler()
+
+        self.slaved_store = hs.config.worker_app is not None
+        self.slaved_typing = not hs.config.server.handle_typing
+
+    async def on_rdata(
+        self, stream_name: str, instance_name: str, token: int, rows: list
+    ):
+        """Called to handle a batch of replication data with a given stream token.
+
+        By default this just pokes the slave store. Can be overridden in subclasses to
+        handle more.
+
+        Args:
+            stream_name (str): name of the replication stream for this batch of rows
+            token (int): stream token for this batch of rows
+            rows (list): a list of Stream.ROW_TYPE objects as returned by
+                Stream.parse_row.
+        """
+        if self.slaved_store:
+            self.store.process_replication_rows(stream_name, token, rows)
+
+        if self.slaved_typing:
+            self.typing_handler.process_replication_rows(stream_name, token, rows)
+
+    def get_streams_to_replicate(self) -> Dict[str, int]:
+        """Called when a new connection has been established and we need to
+        subscribe to streams.
+
+        Returns:
+            map from stream name to the most recent update we have for
+            that stream (ie, the point we want to start replicating from)
+        """
+        args = {}  # type: Dict[str, int]
+
+        if self.slaved_store:
+            args = self.store.stream_positions()
+
+        if self.slaved_typing:
+            args.update(self.typing_handler.stream_positions())
+
+        return args
+
+    async def on_position(self, stream_name: str, token: int):
+        if self.slaved_store:
+            self.store.process_replication_rows(stream_name, token, [])
+
+        if self.slaved_typing:
+            self.typing_handler.process_replication_rows(stream_name, token, [])
--- a/synapse/replication/tcp/protocol.py
+++ b/synapse/replication/tcp/protocol.py
@@ -35,9 +35,7 @@ indicate which side is sending, these are *not* included on the wire::
    > PING 1490197665618
    < NAME synapse.app.appservice
    < PING 1490197665618
-    < REPLICATE events 1
-    < REPLICATE backfill 1
-    < REPLICATE caches 1
+    < REPLICATE
    > POSITION events 1
    > POSITION backfill 1
    > POSITION caches 1
@@ -48,45 +46,40 @@ indicate which side is sending, these are *not* included on the wire::
    > ERROR server stopping
    * connection closed by server *
 """
-import abc
 import fcntl
 import logging
 import struct
 from collections import defaultdict
-from typing import Any, DefaultDict, Dict, List, Set, Tuple
+from typing import Any, DefaultDict, Dict, List, Set

-from six import iteritems, iterkeys
+from six import iteritems

 from prometheus_client import Counter

-from twisted.internet import defer
 from twisted.protocols.basic import LineOnlyReceiver
 from twisted.python.failure import Failure

-from synapse.logging.context import make_deferred_yieldable, run_in_background
 from synapse.metrics import LaterGauge
 from synapse.metrics.background_process_metrics import run_as_background_process
 from synapse.replication.tcp.commands import (
    COMMAND_MAP,
-    VALID_CLIENT_COMMANDS,
-    VALID_SERVER_COMMANDS,
    Command,
    ErrorCommand,
    NameCommand,
    PingCommand,
-    PositionCommand,
-    RdataCommand,
    RemoteServerUpCommand,
    ReplicateCommand,
    ServerCommand,
-    SyncCommand,
-    UserSyncCommand,
 )
-from synapse.replication.tcp.streams import STREAMS_MAP
-from synapse.types import Collection
+from synapse.replication.tcp.streams import STREAMS_MAP, Stream
 from synapse.util import Clock
 from synapse.util.stringutils import random_string

+MYPY = False
+if MYPY:
+    from synapse.server import HomeServer
+
+
 connection_close_counter = Counter(
    "synapse_replication_tcp_protocol_close_reason", "", ["reason_type"]
 )
@@ -127,16 +120,11 @@ class BaseReplicationStreamProtocol(LineOnlyReceiver):

    delimiter = b"\n"

-    # Valid commands we expect to receive
-    VALID_INBOUND_COMMANDS = []  # type: Collection[str]
-
-    # Valid commands we can send
-    VALID_OUTBOUND_COMMANDS = []  # type: Collection[str]
-
    max_line_buffer = 10000

-    def __init__(self, clock):
+    def __init__(self, clock, handler):
        self.clock = clock
+        self.handler = handler

        self.last_received_command = self.clock.time_msec()
        self.last_sent_command = 0
@@ -176,6 +164,8 @@ class BaseReplicationStreamProtocol(LineOnlyReceiver):
        # can time us out.
        self.send_command(PingCommand(self.clock.time_msec()))

+        self.handler.new_connection(self)
+
    def send_ping(self):
        """Periodically sends a ping and checks if we should close the connection
        due to the other side timing out.
@@ -213,11 +203,6 @@ class BaseReplicationStreamProtocol(LineOnlyReceiver):
        line = line.decode("utf-8")
        cmd_name, rest_of_line = line.split(" ", 1)

-        if cmd_name not in self.VALID_INBOUND_COMMANDS:
-            logger.error("[%s] invalid command %s", self.id(), cmd_name)
-            self.send_error("invalid command: %s", cmd_name)
-            return
-
        self.last_received_command = self.clock.time_msec()

        self.inbound_commands_counter[cmd_name] = (
@@ -249,8 +234,23 @@ class BaseReplicationStreamProtocol(LineOnlyReceiver):
        Args:
            cmd: received command
        """
-        handler = getattr(self, "on_%s" % (cmd.NAME,))
-        await handler(cmd)
+        handled = False
+
+        # First call any command handlers on this instance. These are for TCP
+        # specific handling.
+        cmd_func = getattr(self, "on_%s" % (cmd.NAME,), None)
+        if cmd_func:
+            await cmd_func(cmd)
+            handled = True
+
+        # Then call out to the handler.
+        cmd_func = getattr(self.handler, "on_%s" % (cmd.NAME,), None)
+        if cmd_func:
+            await cmd_func(cmd)
+            handled = True
+
+        if not handled:
+            logger.warning("Unhandled command: %r", cmd)

    def close(self):
        logger.warning("[%s] Closing connection", self.id())
@@ -258,6 +258,9 @@ class BaseReplicationStreamProtocol(LineOnlyReceiver):
        self.transport.loseConnection()
        self.on_connection_closed()

+    def send_remote_server_up(self, server: str):
+        self.send_command(RemoteServerUpCommand(server))
+
    def send_error(self, error_string, *args):
        """Send an error to remote and close the connection.
        """
@@ -379,6 +382,8 @@ class BaseReplicationStreamProtocol(LineOnlyReceiver):
        self.state = ConnectionStates.CLOSED
        self.pending_commands = []

+        self.handler.lost_connection(self)
+
        if self.transport:
            self.transport.unregisterProducer()

@@ -402,346 +407,66 @@ class BaseReplicationStreamProtocol(LineOnlyReceiver):


 class ServerReplicationStreamProtocol(BaseReplicationStreamProtocol):
-    VALID_INBOUND_COMMANDS = VALID_CLIENT_COMMANDS
-    VALID_OUTBOUND_COMMANDS = VALID_SERVER_COMMANDS
-
-    def __init__(self, server_name, clock, streamer):
-        BaseReplicationStreamProtocol.__init__(self, clock)  # Old style class
+    def __init__(self, hs, server_name, clock, handler):
+        BaseReplicationStreamProtocol.__init__(self, clock, handler)  # Old style class

        self.server_name = server_name
-        self.streamer = streamer
-
-        # The streams the client has subscribed to and is up to date with
-        self.replication_streams = set()  # type: Set[str]
-
-        # The streams the client is currently subscribing to.
-        self.connecting_streams = set()  # type:  Set[str]
-
-        # Map from stream name to list of updates to send once we've finished
-        # subscribing the client to the stream.
-        self.pending_rdata = {}  # type: Dict[str, List[Tuple[int, Any]]]

    def connectionMade(self):
        self.send_command(ServerCommand(self.server_name))
        BaseReplicationStreamProtocol.connectionMade(self)
-        self.streamer.new_connection(self)

    async def on_NAME(self, cmd):
        logger.info("[%s] Renamed to %r", self.id(), cmd.data)
        self.name = cmd.data

-    async def on_USER_SYNC(self, cmd):
-        await self.streamer.on_user_sync(
-            self.conn_id, cmd.user_id, cmd.is_syncing, cmd.last_sync_ms
-        )
-
-    async def on_REPLICATE(self, cmd):
-        stream_name = cmd.stream_name
-        token = cmd.token
-
-        if stream_name == "ALL":
-            # Subscribe to all streams we're publishing to.
-            deferreds = [
-                run_in_background(self.subscribe_to_stream, stream, token)
-                for stream in iterkeys(self.streamer.streams_by_name)
-            ]
-
-            await make_deferred_yieldable(
-                defer.gatherResults(deferreds, consumeErrors=True)
-            )
-        else:
-            await self.subscribe_to_stream(stream_name, token)
-
-    async def on_FEDERATION_ACK(self, cmd):
-        self.streamer.federation_ack(cmd.token)
-
-    async def on_REMOVE_PUSHER(self, cmd):
-        await self.streamer.on_remove_pusher(cmd.app_id, cmd.push_key, cmd.user_id)
-
-    async def on_INVALIDATE_CACHE(self, cmd):
-        await self.streamer.on_invalidate_cache(cmd.cache_func, cmd.keys)
-
-    async def on_REMOTE_SERVER_UP(self, cmd: RemoteServerUpCommand):
-        self.streamer.on_remote_server_up(cmd.data)
-
-    async def on_USER_IP(self, cmd):
-        self.streamer.on_user_ip(
-            cmd.user_id,
-            cmd.access_token,
-            cmd.ip,
-            cmd.user_agent,
-            cmd.device_id,
-            cmd.last_seen,
-        )
-
-    async def subscribe_to_stream(self, stream_name, token):
-        """Subscribe the remote to a stream.
-
-        This invloves checking if they've missed anything and sending those
-        updates down if they have. During that time new updates for the stream
-        are queued and sent once we've sent down any missed updates.
-        """
-        self.replication_streams.discard(stream_name)
-        self.connecting_streams.add(stream_name)
-
-        try:
-            # Get missing updates
-            updates, current_token = await self.streamer.get_stream_updates(
-                stream_name, token
-            )
-
-            # Send all the missing updates
-            for update in updates:
-                token, row = update[0], update[1]
-                self.send_command(RdataCommand(stream_name, token, row))
-
-            # We send a POSITION command to ensure that they have an up to
-            # date token (especially useful if we didn't send any updates
-            # above)
-            self.send_command(PositionCommand(stream_name, current_token))
-
-            # Now we can send any updates that came in while we were subscribing
-            pending_rdata = self.pending_rdata.pop(stream_name, [])
-            updates = []
-            for token, update in pending_rdata:
-                # If the token is null, it is part of a batch update. Batches
-                # are multiple updates that share a single token. To denote
-                # this, the token is set to None for all tokens in the batch
-                # except for the last. If we find a None token, we keep looking
-                # through tokens until we find one that is not None and then
-                # process all previous updates in the batch as if they had the
-                # final token.
-                if token is None:
-                    # Store this update as part of a batch
-                    updates.append(update)
-                    continue
-
-                if token <= current_token:
-                    # This update or batch of updates is older than
-                    # current_token, dismiss it
-                    updates = []
-                    continue
-
-                updates.append(update)
-
-                # Send all updates that are part of this batch with the
-                # found token
-                for update in updates:
-                    self.send_command(RdataCommand(stream_name, token, update))
-
-                # Clear stored updates
-                updates = []
-
-            # They're now fully subscribed
-            self.replication_streams.add(stream_name)
-        except Exception as e:
-            logger.exception("[%s] Failed to handle REPLICATE command", self.id())
-            self.send_error("failed to handle replicate: %r", e)
-        finally:
-            self.connecting_streams.discard(stream_name)
-
-    def stream_update(self, stream_name, token, data):
-        """Called when a new update is available to stream to clients.
-
-        We need to check if the client is interested in the stream or not
-        """
-        if stream_name in self.replication_streams:
-            # The client is subscribed to the stream
-            self.send_command(RdataCommand(stream_name, token, data))
-        elif stream_name in self.connecting_streams:
-            # The client is being subscribed to the stream
-            logger.debug("[%s] Queuing RDATA %r %r", self.id(), stream_name, token)
-            self.pending_rdata.setdefault(stream_name, []).append((token, data))
-        else:
-            # The client isn't subscribed
-            logger.debug("[%s] Dropping RDATA %r %r", self.id(), stream_name, token)
-
-    def send_sync(self, data):
-        self.send_command(SyncCommand(data))
-
-    def send_remote_server_up(self, server: str):
-        self.send_command(RemoteServerUpCommand(server))
-
-    def on_connection_closed(self):
-        BaseReplicationStreamProtocol.on_connection_closed(self)
-        self.streamer.lost_connection(self)
-
-
-class AbstractReplicationClientHandler(metaclass=abc.ABCMeta):
-    """
-    The interface for the handler that should be passed to
-    ClientReplicationStreamProtocol
-    """
-
-    @abc.abstractmethod
-    async def on_rdata(self, stream_name, token, rows):
-        """Called to handle a batch of replication data with a given stream token.
-
-        Args:
-            stream_name (str): name of the replication stream for this batch of rows
-            token (int): stream token for this batch of rows
-            rows (list): a list of Stream.ROW_TYPE objects as returned by
-                Stream.parse_row.
-        """
-        raise NotImplementedError()
-
-    @abc.abstractmethod
-    async def on_position(self, stream_name, token):
-        """Called when we get new position data."""
-        raise NotImplementedError()
-
-    @abc.abstractmethod
-    def on_sync(self, data):
-        """Called when get a new SYNC command."""
-        raise NotImplementedError()
-
-    @abc.abstractmethod
-    async def on_remote_server_up(self, server: str):
-        """Called when get a new REMOTE_SERVER_UP command."""
-        raise NotImplementedError()
-
-    @abc.abstractmethod
-    def get_streams_to_replicate(self):
-        """Called when a new connection has been established and we need to
-        subscribe to streams.
-
-        Returns:
-            map from stream name to the most recent update we have for
-            that stream (ie, the point we want to start replicating from)
-        """
-        raise NotImplementedError()
-
-    @abc.abstractmethod
-    def get_currently_syncing_users(self):
-        """Get the list of currently syncing users (if any). This is called
-        when a connection has been established and we need to send the
-        currently syncing users."""
-        raise NotImplementedError()
-
-    @abc.abstractmethod
-    def update_connection(self, connection):
-        """Called when a connection has been established (or lost with None).
-        """
-        raise NotImplementedError()
-
-    @abc.abstractmethod
-    def finished_connecting(self):
-        """Called when we have successfully subscribed and caught up to all
-        streams we're interested in.
-        """
-        raise NotImplementedError()
-

 class ClientReplicationStreamProtocol(BaseReplicationStreamProtocol):
-    VALID_INBOUND_COMMANDS = VALID_SERVER_COMMANDS
-    VALID_OUTBOUND_COMMANDS = VALID_CLIENT_COMMANDS
-
    def __init__(
        self,
+        hs: "HomeServer",
        client_name: str,
        server_name: str,
        clock: Clock,
-        handler: AbstractReplicationClientHandler,
+        handler,
    ):
-        BaseReplicationStreamProtocol.__init__(self, clock)
+        BaseReplicationStreamProtocol.__init__(self, clock, handler)
+
+        self.instance_id = hs.get_instance_id()

        self.client_name = client_name
        self.server_name = server_name
-        self.handler = handler
+
+        self.streams = {
+            stream.NAME: stream(hs) for stream in STREAMS_MAP.values()
+        }  # type: Dict[str, Stream]

        # Set of stream names that have been subscribe to, but haven't yet
        # caught up with. This is used to track when the client has been fully
        # connected to the remote.
-        self.streams_connecting = set()  # type: Set[str]
+        self.streams_connecting = set(STREAMS_MAP)  # type: Set[str]

        # Map of stream to batched updates. See RdataCommand for info on how
        # batching works.
-        self.pending_batches = {}  # type: Dict[str, Any]
+        self.pending_batches = {}  # type: Dict[str, List[Any]]

    def connectionMade(self):
-        self.send_command(NameCommand(self.client_name))
        BaseReplicationStreamProtocol.connectionMade(self)

-        # Once we've connected subscribe to the necessary streams
-        for stream_name, token in iteritems(self.handler.get_streams_to_replicate()):
-            self.replicate(stream_name, token)
-
-        # Tell the server if we have any users currently syncing (should only
-        # happen on synchrotrons)
-        currently_syncing = self.handler.get_currently_syncing_users()
-        now = self.clock.time_msec()
-        for user_id in currently_syncing:
-            self.send_command(UserSyncCommand(user_id, True, now))
-
-        # We've now finished connecting to so inform the client handler
-        self.handler.update_connection(self)
-
-        # This will happen if we don't actually subscribe to any streams
-        if not self.streams_connecting:
-            self.handler.finished_connecting()
+        self.send_command(NameCommand(self.client_name))
+        self.replicate()

    async def on_SERVER(self, cmd):
        if cmd.data != self.server_name:
            logger.error("[%s] Connected to wrong remote: %r", self.id(), cmd.data)
            self.send_error("Wrong remote")

-    async def on_RDATA(self, cmd):
-        stream_name = cmd.stream_name
-        inbound_rdata_count.labels(stream_name).inc()
-
-        try:
-            row = STREAMS_MAP[stream_name].parse_row(cmd.row)
-        except Exception:
-            logger.exception(
-                "[%s] Failed to parse RDATA: %r %r", self.id(), stream_name, cmd.row
-            )
-            raise
-
-        if cmd.token is None:
-            # I.e. this is part of a batch of updates for this stream. Batch
-            # until we get an update for the stream with a non None token
-            self.pending_batches.setdefault(stream_name, []).append(row)
-        else:
-            # Check if this is the last of a batch of updates
-            rows = self.pending_batches.pop(stream_name, [])
-            rows.append(row)
-            await self.handler.on_rdata(stream_name, cmd.token, rows)
-
-    async def on_POSITION(self, cmd):
-        # When we get a `POSITION` command it means we've finished getting
-        # missing updates for the given stream, and are now up to date.
-        self.streams_connecting.discard(cmd.stream_name)
-        if not self.streams_connecting:
-            self.handler.finished_connecting()
-
-        await self.handler.on_position(cmd.stream_name, cmd.token)
-
-    async def on_SYNC(self, cmd):
-        self.handler.on_sync(cmd.data)
-
-    async def on_REMOTE_SERVER_UP(self, cmd: RemoteServerUpCommand):
-        self.handler.on_remote_server_up(cmd.data)
-
-    def replicate(self, stream_name, token):
+    def replicate(self):
        """Send the subscription request to the server
        """
-        if stream_name not in STREAMS_MAP:
-            raise Exception("Invalid stream name %r" % (stream_name,))
+        logger.info("[%s] Subscribing to replication streams", self.id())

-        logger.info(
-            "[%s] Subscribing to replication stream: %r from %r",
-            self.id(),
-            stream_name,
-            token,
-        )
-
-        self.streams_connecting.add(stream_name)
-
-        self.send_command(ReplicateCommand(stream_name, token))
-
-    def on_connection_closed(self):
-        BaseReplicationStreamProtocol.on_connection_closed(self)
-        self.handler.update_connection(None)
+        self.send_command(ReplicateCommand())


 # The following simply registers metrics for the replication connections
--- a/synapse/replication/tcp/redis.py
+++ b/synapse/replication/tcp/redis.py
@@ -0,0 +1,158 @@
+# -*- coding: utf-8 -*-
+# Copyright 2020 The Matrix.org Foundation C.I.C.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import logging
+
+import txredisapi
+
+from synapse.logging.context import PreserveLoggingContext
+from synapse.metrics.background_process_metrics import run_as_background_process
+from synapse.replication.tcp.commands import (
+    COMMAND_MAP,
+    Command,
+    RdataCommand,
+    ReplicateCommand,
+)
+from synapse.util.stringutils import random_string
+
+logger = logging.getLogger(__name__)
+
+
+class RedisSubscriber(txredisapi.SubscriberProtocol):
+    """Connection to redis subscribed to replication stream.
+    """
+
+    def connectionMade(self):
+        logger.info("Connected to redis instance")
+        self.subscribe(self.stream_name)
+        self.send_command(ReplicateCommand())
+
+        self.handler.new_connection(self)
+
+    def messageReceived(self, pattern: str, channel: str, message: str):
+        """Received a message from redis.
+        """
+
+        if message.strip() == "":
+            # Ignore blank lines
+            return
+
+        line = message
+        cmd_name, rest_of_line = line.split(" ", 1)
+
+        cmd_cls = COMMAND_MAP[cmd_name]
+        try:
+            cmd = cmd_cls.from_line(rest_of_line)
+        except Exception as e:
+            logger.exception(
+                "[%s] failed to parse line %r: %r", self.id(), cmd_name, rest_of_line
+            )
+            self.send_error(
+                "failed to parse line for  %r: %r (%r):" % (cmd_name, e, rest_of_line)
+            )
+            return
+
+        # Now lets try and call on_<CMD_NAME> function
+        run_as_background_process(
+            "replication-" + cmd.get_logcontext_id(), self.handle_command, cmd
+        )
+
+    async def handle_command(self, cmd: Command):
+        """Handle a command we have received over the replication stream.
+
+        By default delegates to on_<COMMAND>, which should return an awaitable.
+
+        Args:
+            cmd: received command
+        """
+        handled = False
+
+        # First call any command handlers on this instance. These are for redis
+        # specific handling.
+        cmd_func = getattr(self, "on_%s" % (cmd.NAME,), None)
+        if cmd_func:
+            await cmd_func(cmd)
+            handled = True
+
+        # Then call out to the handler.
+        cmd_func = getattr(self.handler, "on_%s" % (cmd.NAME,), None)
+        if cmd_func:
+            await cmd_func(cmd)
+            handled = True
+
+        if not handled:
+            logger.warning("Unhandled command: %r", cmd)
+
+    def connectionLost(self, reason):
+        logger.info("Lost connection to redis instance")
+        self.handler.lost_connection(self)
+
+    def send_command(self, cmd):
+        """Send a command if connection has been established.
+
+        Args:
+            cmd (Command)
+        """
+        string = "%s %s" % (cmd.NAME, cmd.to_line())
+        if "\n" in string:
+            raise Exception("Unexpected newline in command: %r", string)
+
+        encoded_string = string.encode("utf-8")
+
+        async def _send():
+            with PreserveLoggingContext():
+                await self.redis_connection.publish(self.stream_name, encoded_string)
+
+        run_as_background_process("send-cmd", _send)
+
+    def stream_update(self, stream_name, token, data):
+        """Called when a new update is available to stream to clients.
+
+        We need to check if the client is interested in the stream or not
+        """
+        self.send_command(RdataCommand(stream_name, "master", token, data))
+
+
+class RedisFactory(txredisapi.SubscriberFactory):
+
+    maxDelay = 5
+    continueTrying = True
+    protocol = RedisSubscriber
+
+    def __init__(self, hs):
+        super(RedisFactory, self).__init__()
+
+        self.password = hs.config.redis.redis_password
+
+        self.handler = hs.get_tcp_replication()
+        self.stream_name = hs.hostname
+
+        self.redis_connection = txredisapi.lazyConnection(
+            host=hs.config.redis_host,
+            port=hs.config.redis_port,
+            dbid=hs.config.redis_dbid,
+            password=hs.config.redis.redis_password,
+            reconnect=True,
+        )
+
+        self.conn_id = random_string(5)
+
+    def buildProtocol(self, addr):
+        p = super(RedisFactory, self).buildProtocol(addr)
+        p.handler = self.handler
+        p.redis_connection = self.redis_connection
+        p.conn_id = self.conn_id
+        p.stream_name = self.stream_name
+        return p
--- a/synapse/replication/tcp/resource.py
+++ b/synapse/replication/tcp/resource.py
@@ -17,32 +17,21 @@

 import logging
 import random
-from typing import Any, List
-
-from six import itervalues
+from typing import Dict, List

 from prometheus_client import Counter

 from twisted.internet.protocol import Factory

-from synapse.metrics import LaterGauge
 from synapse.metrics.background_process_metrics import run_as_background_process
-from synapse.util.metrics import Measure, measure_func
-
-from .protocol import ServerReplicationStreamProtocol
-from .streams import STREAMS_MAP
-from .streams.federation import FederationStream
+from synapse.replication.tcp.protocol import ServerReplicationStreamProtocol
+from synapse.replication.tcp.streams import STREAMS_MAP, Stream, TypingStream
+from synapse.replication.tcp.streams.federation import FederationStream
+from synapse.util.metrics import Measure

 stream_updates_counter = Counter(
    "synapse_replication_tcp_resource_stream_updates", "", ["stream_name"]
 )
-user_sync_counter = Counter("synapse_replication_tcp_resource_user_sync", "")
-federation_ack_counter = Counter("synapse_replication_tcp_resource_federation_ack", "")
-remove_pusher_counter = Counter("synapse_replication_tcp_resource_remove_pusher", "")
-invalidate_cache_counter = Counter(
-    "synapse_replication_tcp_resource_invalidate_cache", ""
-)
-user_ip_cache_counter = Counter("synapse_replication_tcp_resource_user_ip_cache", "")

 logger = logging.getLogger(__name__)

@@ -52,13 +41,18 @@ class ReplicationStreamProtocolFactory(Factory):
    """

    def __init__(self, hs):
-        self.streamer = ReplicationStreamer(hs)
+        self.handler = hs.get_tcp_replication()
        self.clock = hs.get_clock()
        self.server_name = hs.config.server_name
+        self.hs = hs
+
+        # Ensure the replication streamer is started if we register a
+        # replication server endpoint.
+        hs.get_replication_streamer()

    def buildProtocol(self, addr):
        return ServerReplicationStreamProtocol(
-            self.server_name, self.clock, self.streamer
+            self.hs, self.server_name, self.clock, self.handler
        )


@@ -71,67 +65,43 @@ class ReplicationStreamer(object):

    def __init__(self, hs):
        self.store = hs.get_datastore()
-        self.presence_handler = hs.get_presence_handler()
        self.clock = hs.get_clock()
        self.notifier = hs.get_notifier()
-        self._server_notices_sender = hs.get_server_notices_sender()

        self._replication_torture_level = hs.config.replication_torture_level

-        # Current connections.
-        self.connections = []  # type: List[ServerReplicationStreamProtocol]
+        # Work out list of streams that this instance is the source of.
+        self.streams = []  # type: List[Stream]
+        if hs.config.worker_app is None:
+            for stream in STREAMS_MAP.values():
+                if stream == FederationStream:
+                    continue

-        LaterGauge(
-            "synapse_replication_tcp_resource_total_connections",
-            "",
-            [],
-            lambda: len(self.connections),
-        )
+                if stream == TypingStream:
+                    continue

-        # List of streams that clients can subscribe to.
-        # We only support federation stream if federation sending hase been
-        # disabled on the master.
-        self.streams = [
-            stream(hs)
-            for stream in itervalues(STREAMS_MAP)
-            if stream != FederationStream or not hs.config.send_federation
-        ]
+                self.streams.append(stream(hs))
+
+        if hs.config.server.handle_typing:
+            self.streams.append(TypingStream(hs))
+
+        # We always add federation stream
+        self.streams.append(FederationStream(hs))

        self.streams_by_name = {stream.NAME: stream for stream in self.streams}

-        LaterGauge(
-            "synapse_replication_tcp_resource_connections_per_stream",
-            "",
-            ["stream_name"],
-            lambda: {
-                (stream_name,): len(
-                    [
-                        conn
-                        for conn in self.connections
-                        if stream_name in conn.replication_streams
-                    ]
-                )
-                for stream_name in self.streams_by_name
-            },
-        )
-
-        self.federation_sender = None
-        if not hs.config.send_federation:
-            self.federation_sender = hs.get_federation_sender()
-
        self.notifier.add_replication_callback(self.on_notifier_poke)
-        self.notifier.add_remote_server_up_callback(self.send_remote_server_up)

        # Keeps track of whether we are currently checking for updates
        self.is_looping = False
        self.pending_updates = False

-        hs.get_reactor().addSystemEventTrigger("before", "shutdown", self.on_shutdown)
+        self.client = hs.get_tcp_replication()

-    def on_shutdown(self):
-        # close all connections on shutdown
-        for conn in self.connections:
-            conn.send_error("server shutting down")
+    def get_streams(self) -> Dict[str, Stream]:
+        """Get a mapp from stream name to stream instance.
+        """
+        return self.streams_by_name

    def on_notifier_poke(self):
        """Checks if there is actually any new data and sends it to the
@@ -140,7 +110,7 @@ class ReplicationStreamer(object):
        This should get called each time new data is available, even if it
        is currently being executed, so that nothing gets missed
        """
-        if not self.connections:
+        if not self.client.connected():
            # Don't bother if nothing is listening. We still need to advance
            # the stream tokens otherwise they'll fall beihind forever
            for stream in self.streams:
@@ -166,11 +136,6 @@ class ReplicationStreamer(object):
                self.pending_updates = False

                with Measure(self.clock, "repl.stream.get_updates"):
-                    # First we tell the streams that they should update their
-                    # current tokens.
-                    for stream in self.streams:
-                        stream.advance_current_token()
-
                    all_streams = self.streams

                    if self._replication_torture_level is not None:
@@ -180,7 +145,7 @@ class ReplicationStreamer(object):
                        random.shuffle(all_streams)

                    for stream in all_streams:
-                        if stream.last_token == stream.upto_token:
+                        if stream.last_token == stream.current_token():
                            continue

                        if self._replication_torture_level:
@@ -192,18 +157,17 @@ class ReplicationStreamer(object):
                            "Getting stream: %s: %s -> %s",
                            stream.NAME,
                            stream.last_token,
-                            stream.upto_token,
+                            stream.current_token(),
                        )
                        try:
-                            updates, current_token = await stream.get_updates()
+                            updates, current_token, limited = await stream.get_updates()
+                            self.pending_updates |= limited
                        except Exception:
                            logger.info("Failed to handle stream %s", stream.NAME)
                            raise

                        logger.debug(
-                            "Sending %d updates to %d connections",
-                            len(updates),
-                            len(self.connections),
+                            "Sending %d updates", len(updates),
                        )

                        if updates:
@@ -219,116 +183,17 @@ class ReplicationStreamer(object):
                        # token. See RdataCommand for more details.
                        batched_updates = _batch_updates(updates)

-                        for conn in self.connections:
-                            for token, row in batched_updates:
-                                try:
-                                    conn.stream_update(stream.NAME, token, row)
-                                except Exception:
-                                    logger.exception("Failed to replicate")
+                        for token, row in batched_updates:
+                            try:
+                                self.client.stream_update(stream.NAME, token, row)
+                            except Exception:
+                                logger.exception("Failed to replicate")

            logger.debug("No more pending updates, breaking poke loop")
        finally:
            self.pending_updates = False
            self.is_looping = False

-    @measure_func("repl.get_stream_updates")
-    async def get_stream_updates(self, stream_name, token):
-        """For a given stream get all updates since token. This is called when
-        a client first subscribes to a stream.
-        """
-        stream = self.streams_by_name.get(stream_name, None)
-        if not stream:
-            raise Exception("unknown stream %s", stream_name)
-
-        return await stream.get_updates_since(token)
-
-    @measure_func("repl.federation_ack")
-    def federation_ack(self, token):
-        """We've received an ack for federation stream from a client.
-        """
-        federation_ack_counter.inc()
-        if self.federation_sender:
-            self.federation_sender.federation_ack(token)
-
-    @measure_func("repl.on_user_sync")
-    async def on_user_sync(self, conn_id, user_id, is_syncing, last_sync_ms):
-        """A client has started/stopped syncing on a worker.
-        """
-        user_sync_counter.inc()
-        await self.presence_handler.update_external_syncs_row(
-            conn_id, user_id, is_syncing, last_sync_ms
-        )
-
-    @measure_func("repl.on_remove_pusher")
-    async def on_remove_pusher(self, app_id, push_key, user_id):
-        """A client has asked us to remove a pusher
-        """
-        remove_pusher_counter.inc()
-        await self.store.delete_pusher_by_app_id_pushkey_user_id(
-            app_id=app_id, pushkey=push_key, user_id=user_id
-        )
-
-        self.notifier.on_new_replication_data()
-
-    @measure_func("repl.on_invalidate_cache")
-    async def on_invalidate_cache(self, cache_func: str, keys: List[Any]):
-        """The client has asked us to invalidate a cache
-        """
-        invalidate_cache_counter.inc()
-
-        # We invalidate the cache locally, but then also stream that to other
-        # workers.
-        await self.store.invalidate_cache_and_stream(cache_func, tuple(keys))
-
-    @measure_func("repl.on_user_ip")
-    async def on_user_ip(
-        self, user_id, access_token, ip, user_agent, device_id, last_seen
-    ):
-        """The client saw a user request
-        """
-        user_ip_cache_counter.inc()
-        await self.store.insert_client_ip(
-            user_id, access_token, ip, user_agent, device_id, last_seen
-        )
-        await self._server_notices_sender.on_user_ip(user_id)
-
-    @measure_func("repl.on_remote_server_up")
-    def on_remote_server_up(self, server: str):
-        self.notifier.notify_remote_server_up(server)
-
-    def send_remote_server_up(self, server: str):
-        for conn in self.connections:
-            conn.send_remote_server_up(server)
-
-    def send_sync_to_all_connections(self, data):
-        """Sends a SYNC command to all clients.
-
-        Used in tests.
-        """
-        for conn in self.connections:
-            conn.send_sync(data)
-
-    def new_connection(self, connection):
-        """A new client connection has been established
-        """
-        self.connections.append(connection)
-
-    def lost_connection(self, connection):
-        """A client connection has been lost
-        """
-        try:
-            self.connections.remove(connection)
-        except ValueError:
-            pass
-
-        # We need to tell the presence handler that the connection has been
-        # lost so that it can handle any ongoing syncs on that connection.
-        run_as_background_process(
-            "update_external_syncs_clear",
-            self.presence_handler.update_external_syncs_clear,
-            connection.conn_id,
-        )
-

 def _batch_updates(updates):
    """Takes a list of updates of form [(token, row)] and sets the token to
--- a/synapse/replication/tcp/streams/init.py
+++ b/synapse/replication/tcp/streams/init.py
@@ -25,26 +25,66 @@ Each stream is defined by the following information:
    update_function:    The function that returns a list of updates between two tokens
 """

-from . import _base, events, federation
+from typing import Dict, Type
+
+from synapse.replication.tcp.streams._base import (
+    AccountDataStream,
+    BackfillStream,
+    CachesStream,
+    DeviceListsStream,
+    GroupServerStream,
+    PresenceStream,
+    PublicRoomsStream,
+    PushersStream,
+    PushRulesStream,
+    ReceiptsStream,
+    Stream,
+    TagAccountDataStream,
+    ToDeviceStream,
+    TypingStream,
+    UserSignatureStream,
+)
+from synapse.replication.tcp.streams.events import EventsStream
+from synapse.replication.tcp.streams.federation import FederationStream

 STREAMS_MAP = {
    stream.NAME: stream
    for stream in (
-        events.EventsStream,
-        _base.BackfillStream,
-        _base.PresenceStream,
-        _base.TypingStream,
-        _base.ReceiptsStream,
-        _base.PushRulesStream,
-        _base.PushersStream,
-        _base.CachesStream,
-        _base.PublicRoomsStream,
-        _base.DeviceListsStream,
-        _base.ToDeviceStream,
-        federation.FederationStream,
-        _base.TagAccountDataStream,
-        _base.AccountDataStream,
-        _base.GroupServerStream,
-        _base.UserSignatureStream,
+        EventsStream,
+        BackfillStream,
+        PresenceStream,
+        TypingStream,
+        ReceiptsStream,
+        PushRulesStream,
+        PushersStream,
+        CachesStream,
+        PublicRoomsStream,
+        DeviceListsStream,
+        ToDeviceStream,
+        FederationStream,
+        TagAccountDataStream,
+        AccountDataStream,
+        GroupServerStream,
+        UserSignatureStream,
    )
-}
+}  # type: Dict[str, Type[Stream]]
+
+
+__all__ = [
+    "STREAMS_MAP",
+    "Stream",
+    "BackfillStream",
+    "PresenceStream",
+    "TypingStream",
+    "ReceiptsStream",
+    "PushRulesStream",
+    "PushersStream",
+    "CachesStream",
+    "PublicRoomsStream",
+    "DeviceListsStream",
+    "ToDeviceStream",
+    "TagAccountDataStream",
+    "AccountDataStream",
+    "GroupServerStream",
+    "UserSignatureStream",
+]
--- a/synapse/replication/tcp/streams/_base.py
+++ b/synapse/replication/tcp/streams/_base.py
@@ -14,114 +14,40 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-import itertools
 import logging
 from collections import namedtuple
-from typing import Any, List, Optional
+from typing import Any, Awaitable, Callable, List, Optional, Tuple

 import attr

+from synapse.replication.http.streams import ReplicationGetStreamUpdates
+from synapse.types import JsonDict
+
 logger = logging.getLogger(__name__)


 MAX_EVENTS_BEHIND = 500000

-BackfillStreamRow = namedtuple(
-    "BackfillStreamRow",
-    (
-        "event_id",  # str
-        "room_id",  # str
-        "type",  # str
-        "state_key",  # str, optional
-        "redacts",  # str, optional
-        "relates_to",  # str, optional
-    ),
-)
-PresenceStreamRow = namedtuple(
-    "PresenceStreamRow",
-    (
-        "user_id",  # str
-        "state",  # str
-        "last_active_ts",  # int
-        "last_federation_update_ts",  # int
-        "last_user_sync_ts",  # int
-        "status_msg",  # str
-        "currently_active",  # bool
-    ),
-)
-TypingStreamRow = namedtuple(
-    "TypingStreamRow", ("room_id", "user_ids")  # str  # list(str)
-)
-ReceiptsStreamRow = namedtuple(
-    "ReceiptsStreamRow",
-    (
-        "room_id",  # str
-        "receipt_type",  # str
-        "user_id",  # str
-        "event_id",  # str
-        "data",  # dict
-    ),
-)
-PushRulesStreamRow = namedtuple("PushRulesStreamRow", ("user_id",))  # str
-PushersStreamRow = namedtuple(
-    "PushersStreamRow",
-    ("user_id", "app_id", "pushkey", "deleted"),  # str  # str  # str  # bool
-)

+# Some type aliases to make things a bit easier.

-@attr.s
-class CachesStreamRow:
-    """Stream to inform workers they should invalidate their cache.
+# A stream position token
+Token = int

-    Attributes:
-        cache_func: Name of the cached function.
-        keys: The entry in the cache to invalidate. If None then will
-            invalidate all.
-        invalidation_ts: Timestamp of when the invalidation took place.
-    """
-
-    cache_func = attr.ib(type=str)
-    keys = attr.ib(type=Optional[List[Any]])
-    invalidation_ts = attr.ib(type=int)
-
-
-PublicRoomsStreamRow = namedtuple(
-    "PublicRoomsStreamRow",
-    (
-        "room_id",  # str
-        "visibility",  # str
-        "appservice_id",  # str, optional
-        "network_id",  # str, optional
-    ),
-)
-DeviceListsStreamRow = namedtuple(
-    "DeviceListsStreamRow", ("user_id", "destination")  # str  # str
-)
-ToDeviceStreamRow = namedtuple("ToDeviceStreamRow", ("entity",))  # str
-TagAccountDataStreamRow = namedtuple(
-    "TagAccountDataStreamRow", ("user_id", "room_id", "data")  # str  # str  # dict
-)
-AccountDataStreamRow = namedtuple(
-    "AccountDataStream", ("user_id", "room_id", "data_type")  # str  # str  # str
-)
-GroupsStreamRow = namedtuple(
-    "GroupsStreamRow",
-    ("group_id", "user_id", "type", "content"),  # str  # str  # str  # dict
-)
-UserSignatureStreamRow = namedtuple("UserSignatureStreamRow", ("user_id"))  # str
+# A pair of position in stream and args used to create an instance of `ROW_TYPE`.
+StreamRow = Tuple[Token, tuple]


 class Stream(object):
    """Base class for the streams.

    Provides a `get_updates()` function that returns new updates since the last
-    time it was called up until the point `advance_current_token` was called.
+    time it was called.
    """

    NAME = None  # type: str  # The name of the stream
    # The type of the row. Used by the default impl of parse_row.
    ROW_TYPE = None  # type: Any
-    _LIMITED = True  # Whether the update function takes a limit

    @classmethod
    def parse_row(cls, row):
@@ -139,80 +65,58 @@ class Stream(object):
        return cls.ROW_TYPE(*row)

    def __init__(self, hs):
+
        # The token from which we last asked for updates
        self.last_token = self.current_token()

-        # The token that we will get updates up to
-        self.upto_token = self.current_token()
-
-    def advance_current_token(self):
-        """Updates `upto_token` to "now", which updates up until which point
-        get_updates[_since] will fetch rows till.
-        """
-        self.upto_token = self.current_token()
+        self.local_instance_name = hs.config.worker_name or "master"

    def discard_updates_and_advance(self):
        """Called when the stream should advance but the updates would be discarded,
        e.g. when there are no currently connected workers.
        """
-        self.upto_token = self.current_token()
-        self.last_token = self.upto_token
+        self.last_token = self.current_token()

-    async def get_updates(self):
+    async def get_updates(self) -> Tuple[List[Tuple[Token, JsonDict]], Token, bool]:
        """Gets all updates since the last time this function was called (or
-        since the stream was constructed if it hadn't been called before),
-        until the `upto_token`
+        since the stream was constructed if it hadn't been called before).

        Returns:
-            Deferred[Tuple[List[Tuple[int, Any]], int]:
-                Resolves to a pair ``(updates, current_token)``, where ``updates`` is a
-                list of ``(token, row)`` entries. ``row`` will be json-serialised and
-                sent over the replication steam.
+            A triplet `(updates, new_last_token, limited)`, where `updates` is
+            a list of `(token, row)` entries, `new_last_token` is the new
+            position in stream, and `limited` is whether there are more updates
+            to fetch.
        """
-        updates, current_token = await self.get_updates_since(self.last_token)
+        current_token = self.current_token()
+        updates, current_token, limited = await self.get_updates_since(
+            self.local_instance_name, self.last_token, current_token
+        )
        self.last_token = current_token

-        return updates, current_token
+        return updates, current_token, limited

-    async def get_updates_since(self, from_token):
+    async def get_updates_since(
+        self, instance_name: str, from_token: Token, upto_token: Token, limit: int = 100
+    ) -> Tuple[List[Tuple[Token, JsonDict]], Token, bool]:
        """Like get_updates except allows specifying from when we should
        stream updates

        Returns:
-            Deferred[Tuple[List[Tuple[int, Any]], int]:
-                Resolves to a pair ``(updates, current_token)``, where ``updates`` is a
-                list of ``(token, row)`` entries. ``row`` will be json-serialised and
-                sent over the replication steam.
+            A triplet `(updates, new_last_token, limited)`, where `updates` is
+            a list of `(token, row)` entries, `new_last_token` is the new
+            position in stream, and `limited` is whether there are more updates
+            to fetch.
        """
-        if from_token in ("NOW", "now"):
-            return [], self.upto_token
-
-        current_token = self.upto_token

        from_token = int(from_token)

-        if from_token == current_token:
-            return [], current_token
+        if from_token == upto_token:
+            return [], upto_token, False

-        logger.info("get_updates_since: %s", self.__class__)
-        if self._LIMITED:
-            rows = await self.update_function(
-                from_token, current_token, limit=MAX_EVENTS_BEHIND + 1
-            )
-
-            # never turn more than MAX_EVENTS_BEHIND + 1 into updates.
-            rows = itertools.islice(rows, MAX_EVENTS_BEHIND + 1)
-        else:
-            rows = await self.update_function(from_token, current_token)
-
-        updates = [(row[0], row[1:]) for row in rows]
-
-        # check we didn't get more rows than the limit.
-        # doing it like this allows the update_function to be a generator.
-        if self._LIMITED and len(updates) >= MAX_EVENTS_BEHIND:
-            raise Exception("stream %s has fallen behind" % (self.NAME))
-
-        return updates, current_token
+        updates, upto_token, limited = await self.update_function(
+            instance_name, from_token, upto_token, limit=limit,
+        )
+        return updates, upto_token, limited

    def current_token(self):
        """Gets the current token of the underlying streams. Should be provided
@@ -223,9 +127,8 @@ class Stream(object):
        """
        raise NotImplementedError()

-    def update_function(self, from_token, current_token, limit=None):
-        """Get updates between from_token and to_token. If Stream._LIMITED is
-        True then limit is provided, otherwise it's not.
+    def update_function(self, from_token, current_token, limit):
+        """Get updates between from_token and to_token.

        Returns:
            Deferred(list(tuple)): the first entry in the tuple is the token for
@@ -235,52 +138,145 @@ class Stream(object):
        raise NotImplementedError()


+def db_query_to_update_function(
+    query_function: Callable[[str, Token, Token, int], Awaitable[List[tuple]]]
+) -> Callable[[str, Token, Token, int], Awaitable[Tuple[List[StreamRow], Token, bool]]]:
+    """Wraps a db query function which returns a list of rows to make it
+    suitable for use as an `update_function` for the Stream class
+    """
+
+    async def update_function(instance_name, from_token, upto_token, limit):
+        rows = await query_function(from_token, upto_token, limit)
+        updates = [(row[0], row[1:]) for row in rows]
+        limited = False
+        if len(updates) == limit:
+            upto_token = rows[-1][0]
+            limited = True
+
+        return updates, upto_token, limited
+
+    return update_function
+
+
+def make_http_update_function(
+    hs, stream_name: str
+) -> Callable[[Token, Token, Token], Awaitable[Tuple[List[StreamRow], Token, bool]]]:
+    """Makes a suitable function for use as an `update_function` that queries
+    the master process for updates.
+    """
+
+    client = ReplicationGetStreamUpdates.make_client(hs)
+
+    async def update_function(
+        instance_name: str, from_token: int, upto_token: int, limit: int
+    ) -> Tuple[List[Tuple[int, tuple]], int, bool]:
+        return await client(
+            instance_name=instance_name,
+            stream_name=stream_name,
+            from_token=from_token,
+            upto_token=upto_token,
+            limit=limit,
+        )
+
+    return update_function
+
+
 class BackfillStream(Stream):
    """We fetched some old events and either we had never seen that event before
    or it went from being an outlier to not.
    """

+    BackfillStreamRow = namedtuple(
+        "BackfillStreamRow",
+        (
+            "event_id",  # str
+            "room_id",  # str
+            "type",  # str
+            "state_key",  # str, optional
+            "redacts",  # str, optional
+            "relates_to",  # str, optional
+        ),
+    )
+
    NAME = "backfill"
    ROW_TYPE = BackfillStreamRow

    def __init__(self, hs):
        store = hs.get_datastore()
        self.current_token = store.get_current_backfill_token  # type: ignore
-        self.update_function = store.get_all_new_backfill_event_rows  # type: ignore
+        self.update_function = db_query_to_update_function(store.get_all_new_backfill_event_rows)  # type: ignore

        super(BackfillStream, self).__init__(hs)


 class PresenceStream(Stream):
+    PresenceStreamRow = namedtuple(
+        "PresenceStreamRow",
+        (
+            "user_id",  # str
+            "state",  # str
+            "last_active_ts",  # int
+            "last_federation_update_ts",  # int
+            "last_user_sync_ts",  # int
+            "status_msg",  # str
+            "currently_active",  # bool
+        ),
+    )
+
    NAME = "presence"
-    _LIMITED = False
    ROW_TYPE = PresenceStreamRow

    def __init__(self, hs):
        store = hs.get_datastore()
        presence_handler = hs.get_presence_handler()

+        self._is_worker = hs.config.worker_app is not None
+
        self.current_token = store.get_current_presence_token  # type: ignore
-        self.update_function = presence_handler.get_all_presence_updates  # type: ignore
+
+        if hs.config.worker_app is None:
+            self.update_function = db_query_to_update_function(presence_handler.get_all_presence_updates)  # type: ignore
+        else:
+            # Query master process
+            self.update_function = make_http_update_function(hs, self.NAME)  # type: ignore

        super(PresenceStream, self).__init__(hs)


 class TypingStream(Stream):
+    TypingStreamRow = namedtuple(
+        "TypingStreamRow", ("room_id", "user_ids")  # str  # list(str)
+    )
+
    NAME = "typing"
-    _LIMITED = False
    ROW_TYPE = TypingStreamRow

    def __init__(self, hs):
        typing_handler = hs.get_typing_handler()

        self.current_token = typing_handler.get_current_token  # type: ignore
-        self.update_function = typing_handler.get_all_typing_updates  # type: ignore
+
+        if hs.config.handle_typing:
+            self.update_function = db_query_to_update_function(typing_handler.get_all_typing_updates)  # type: ignore
+        else:
+            # Query master process
+            self.update_function = make_http_update_function(hs, self.NAME)  # type: ignore

        super(TypingStream, self).__init__(hs)


 class ReceiptsStream(Stream):
+    ReceiptsStreamRow = namedtuple(
+        "ReceiptsStreamRow",
+        (
+            "room_id",  # str
+            "receipt_type",  # str
+            "user_id",  # str
+            "event_id",  # str
+            "data",  # dict
+        ),
+    )
+
    NAME = "receipts"
    ROW_TYPE = ReceiptsStreamRow

@@ -288,7 +284,7 @@ class ReceiptsStream(Stream):
        store = hs.get_datastore()

        self.current_token = store.get_max_receipt_stream_id  # type: ignore
-        self.update_function = store.get_all_updated_receipts  # type: ignore
+        self.update_function = db_query_to_update_function(store.get_all_updated_receipts)  # type: ignore

        super(ReceiptsStream, self).__init__(hs)

@@ -297,6 +293,8 @@ class PushRulesStream(Stream):
    """A user has changed their push rules
    """

+    PushRulesStreamRow = namedtuple("PushRulesStreamRow", ("user_id",))  # str
+
    NAME = "push_rules"
    ROW_TYPE = PushRulesStreamRow

@@ -310,13 +308,24 @@ class PushRulesStream(Stream):

    async def update_function(self, from_token, to_token, limit):
        rows = await self.store.get_all_push_rule_updates(from_token, to_token, limit)
-        return [(row[0], row[2]) for row in rows]
+
+        limited = False
+        if len(rows) == limit:
+            to_token = rows[-1][0]
+            limited = True
+
+        return [(row[0], (row[2],)) for row in rows], to_token, limited


 class PushersStream(Stream):
    """A user has added/changed/removed a pusher
    """

+    PushersStreamRow = namedtuple(
+        "PushersStreamRow",
+        ("user_id", "app_id", "pushkey", "deleted"),  # str  # str  # str  # bool
+    )
+
    NAME = "pushers"
    ROW_TYPE = PushersStreamRow

@@ -324,7 +333,7 @@ class PushersStream(Stream):
        store = hs.get_datastore()

        self.current_token = store.get_pushers_stream_token  # type: ignore
-        self.update_function = store.get_all_updated_pushers_rows  # type: ignore
+        self.update_function = db_query_to_update_function(store.get_all_updated_pushers_rows)  # type: ignore

        super(PushersStream, self).__init__(hs)

@@ -334,6 +343,21 @@ class CachesStream(Stream):
    the cache on the workers
    """

+    @attr.s
+    class CachesStreamRow:
+        """Stream to inform workers they should invalidate their cache.
+
+        Attributes:
+            cache_func: Name of the cached function.
+            keys: The entry in the cache to invalidate. If None then will
+                invalidate all.
+            invalidation_ts: Timestamp of when the invalidation took place.
+        """
+
+        cache_func = attr.ib(type=str)
+        keys = attr.ib(type=Optional[List[Any]])
+        invalidation_ts = attr.ib(type=int)
+
    NAME = "caches"
    ROW_TYPE = CachesStreamRow

@@ -341,7 +365,7 @@ class CachesStream(Stream):
        store = hs.get_datastore()

        self.current_token = store.get_cache_stream_token  # type: ignore
-        self.update_function = store.get_all_updated_caches  # type: ignore
+        self.update_function = db_query_to_update_function(store.get_all_updated_caches)  # type: ignore

        super(CachesStream, self).__init__(hs)

@@ -350,6 +374,16 @@ class PublicRoomsStream(Stream):
    """The public rooms list changed
    """

+    PublicRoomsStreamRow = namedtuple(
+        "PublicRoomsStreamRow",
+        (
+            "room_id",  # str
+            "visibility",  # str
+            "appservice_id",  # str, optional
+            "network_id",  # str, optional
+        ),
+    )
+
    NAME = "public_rooms"
    ROW_TYPE = PublicRoomsStreamRow

@@ -357,24 +391,28 @@ class PublicRoomsStream(Stream):
        store = hs.get_datastore()

        self.current_token = store.get_current_public_room_stream_id  # type: ignore
-        self.update_function = store.get_all_new_public_rooms  # type: ignore
+        self.update_function = db_query_to_update_function(store.get_all_new_public_rooms)  # type: ignore

        super(PublicRoomsStream, self).__init__(hs)


 class DeviceListsStream(Stream):
-    """Someone added/changed/removed a device
+    """Either a user has updated their devices or a remote server needs to be
+    told about a device update.
    """

+    @attr.s
+    class DeviceListsStreamRow:
+        entity = attr.ib(type=str)
+
    NAME = "device_lists"
-    _LIMITED = False
    ROW_TYPE = DeviceListsStreamRow

    def __init__(self, hs):
        store = hs.get_datastore()

        self.current_token = store.get_device_stream_token  # type: ignore
-        self.update_function = store.get_all_device_list_changes_for_remotes  # type: ignore
+        self.update_function = db_query_to_update_function(store.get_all_device_list_changes_for_remotes)  # type: ignore

        super(DeviceListsStream, self).__init__(hs)

@@ -383,6 +421,8 @@ class ToDeviceStream(Stream):
    """New to_device messages for a client
    """

+    ToDeviceStreamRow = namedtuple("ToDeviceStreamRow", ("entity",))  # str
+
    NAME = "to_device"
    ROW_TYPE = ToDeviceStreamRow

@@ -390,7 +430,7 @@ class ToDeviceStream(Stream):
        store = hs.get_datastore()

        self.current_token = store.get_to_device_stream_token  # type: ignore
-        self.update_function = store.get_all_new_device_messages  # type: ignore
+        self.update_function = db_query_to_update_function(store.get_all_new_device_messages)  # type: ignore

        super(ToDeviceStream, self).__init__(hs)

@@ -399,6 +439,10 @@ class TagAccountDataStream(Stream):
    """Someone added/removed a tag for a room
    """

+    TagAccountDataStreamRow = namedtuple(
+        "TagAccountDataStreamRow", ("user_id", "room_id", "data")  # str  # str  # dict
+    )
+
    NAME = "tag_account_data"
    ROW_TYPE = TagAccountDataStreamRow

@@ -406,7 +450,7 @@ class TagAccountDataStream(Stream):
        store = hs.get_datastore()

        self.current_token = store.get_max_account_data_stream_id  # type: ignore
-        self.update_function = store.get_all_updated_tags  # type: ignore
+        self.update_function = db_query_to_update_function(store.get_all_updated_tags)  # type: ignore

        super(TagAccountDataStream, self).__init__(hs)

@@ -415,6 +459,10 @@ class AccountDataStream(Stream):
    """Global or per room account data was changed
    """

+    AccountDataStreamRow = namedtuple(
+        "AccountDataStream", ("user_id", "room_id", "data_type")  # str  # str  # str
+    )
+
    NAME = "account_data"
    ROW_TYPE = AccountDataStreamRow

@@ -422,10 +470,11 @@ class AccountDataStream(Stream):
        self.store = hs.get_datastore()

        self.current_token = self.store.get_max_account_data_stream_id  # type: ignore
+        self.update_function = db_query_to_update_function(self._update_function)  # type: ignore

        super(AccountDataStream, self).__init__(hs)

-    async def update_function(self, from_token, to_token, limit):
+    async def _update_function(self, from_token, to_token, limit):
        global_results, room_results = await self.store.get_all_updated_account_data(
            from_token, from_token, to_token, limit
        )
@@ -440,6 +489,11 @@ class AccountDataStream(Stream):


 class GroupServerStream(Stream):
+    GroupsStreamRow = namedtuple(
+        "GroupsStreamRow",
+        ("group_id", "user_id", "type", "content"),  # str  # str  # str  # dict
+    )
+
    NAME = "groups"
    ROW_TYPE = GroupsStreamRow

@@ -447,7 +501,7 @@ class GroupServerStream(Stream):
        store = hs.get_datastore()

        self.current_token = store.get_group_stream_token  # type: ignore
-        self.update_function = store.get_all_groups_changes  # type: ignore
+        self.update_function = db_query_to_update_function(store.get_all_groups_changes)  # type: ignore

        super(GroupServerStream, self).__init__(hs)

@@ -456,14 +510,15 @@ class UserSignatureStream(Stream):
    """A user has signed their own device with their user-signing key
    """

+    UserSignatureStreamRow = namedtuple("UserSignatureStreamRow", ("user_id"))  # str
+
    NAME = "user_signature"
-    _LIMITED = False
    ROW_TYPE = UserSignatureStreamRow

    def __init__(self, hs):
        store = hs.get_datastore()

        self.current_token = store.get_device_stream_token  # type: ignore
-        self.update_function = store.get_all_user_signature_changes_for_remotes  # type: ignore
+        self.update_function = db_query_to_update_function(store.get_all_user_signature_changes_for_remotes)  # type: ignore

        super(UserSignatureStream, self).__init__(hs)
--- a/synapse/replication/tcp/streams/events.py
+++ b/synapse/replication/tcp/streams/events.py
@@ -19,7 +19,7 @@ from typing import Tuple, Type

 import attr

-from ._base import Stream
+from ._base import Stream, db_query_to_update_function


 """Handling of the 'events' replication stream
@@ -117,10 +117,11 @@ class EventsStream(Stream):
    def __init__(self, hs):
        self._store = hs.get_datastore()
        self.current_token = self._store.get_current_events_token  # type: ignore
+        self.update_function = db_query_to_update_function(self._update_function)  # type: ignore

        super(EventsStream, self).__init__(hs)

-    async def update_function(self, from_token, current_token, limit=None):
+    async def _update_function(self, from_token, current_token, limit=None):
        event_rows = await self._store.get_all_new_forward_event_rows(
            from_token, current_token, limit
        )
--- a/synapse/replication/tcp/streams/federation.py
+++ b/synapse/replication/tcp/streams/federation.py
@@ -15,15 +15,7 @@
 # limitations under the License.
 from collections import namedtuple

-from ._base import Stream
-
-FederationStreamRow = namedtuple(
-    "FederationStreamRow",
-    (
-        "type",  # str, the type of data as defined in the BaseFederationRows
-        "data",  # dict, serialization of a federation.send_queue.BaseFederationRow
-    ),
-)
+from synapse.replication.tcp.streams._base import Stream, db_query_to_update_function


 class FederationStream(Stream):
@@ -31,13 +23,24 @@ class FederationStream(Stream):
    sending disabled.
    """

+    FederationStreamRow = namedtuple(
+        "FederationStreamRow",
+        (
+            "type",  # str, the type of data as defined in the BaseFederationRows
+            "data",  # dict, serialization of a federation.send_queue.BaseFederationRow
+        ),
+    )
+
    NAME = "federation"
    ROW_TYPE = FederationStreamRow
+    _QUERY_MASTER = True

    def __init__(self, hs):
+        # Not all synapse instances will have a federation sender instance,
+        # whether that's a `FederationSender` or a `FederationRemoteSendQueue`,
+        # so we stub the stream out when that is the case.
        federation_sender = hs.get_federation_sender()
-
        self.current_token = federation_sender.get_current_token  # type: ignore
-        self.update_function = federation_sender.get_replication_rows  # type: ignore
+        self.update_function = db_query_to_update_function(federation_sender.get_replication_rows)  # type: ignore

        super(FederationStream, self).__init__(hs)
--- a/synapse/rest/client/v1/login.py
+++ b/synapse/rest/client/v1/login.py
@@ -28,7 +28,6 @@ from synapse.http.servlet import (
    parse_json_object_from_request,
    parse_string,
 )
-from synapse.push.mailer import load_jinja2_templates
 from synapse.rest.client.v2_alpha._base import client_patterns
 from synapse.rest.well_known import WellKnownBuilder
 from synapse.types import UserID, map_username_to_mxid_localpart
@@ -548,13 +547,6 @@ class SSOAuthHandler(object):
        self._registration_handler = hs.get_registration_handler()
        self._macaroon_gen = hs.get_macaroon_generator()

-        # Load the redirect page HTML template
-        self._template = load_jinja2_templates(
-            hs.config.sso_redirect_confirm_template_dir, ["sso_redirect_confirm.html"],
-        )[0]
-
-        self._server_name = hs.config.server_name
-
        # cast to tuple for use with str.startswith
        self._whitelisted_sso_clients = tuple(hs.config.sso_client_whitelist)

--- a/synapse/rest/client/v1/room.py
+++ b/synapse/rest/client/v1/room.py
@@ -816,7 +816,7 @@ class RoomTypingRestServlet(RestServlet):

        content = parse_json_object_from_request(request)

-        await self.presence_handler.bump_presence_active_time(requester.user)
+        # await self.presence_handler.bump_presence_active_time(requester.user)

        # Limit timeout to stop people from setting silly typing timeouts.
        timeout = min(content.get("timeout", 30000), 120000)
--- a/synapse/rest/client/v2_alpha/auth.py
+++ b/synapse/rest/client/v2_alpha/auth.py
@@ -142,14 +142,6 @@ class AuthRestServlet(RestServlet):
                % (CLIENT_API_PREFIX, LoginType.RECAPTCHA),
                "sitekey": self.hs.config.recaptcha_public_key,
            }
-            html_bytes = html.encode("utf8")
-            request.setResponseCode(200)
-            request.setHeader(b"Content-Type", b"text/html; charset=utf-8")
-            request.setHeader(b"Content-Length", b"%d" % (len(html_bytes),))
-
-            request.write(html_bytes)
-            finish_request(request)
-            return None
        elif stagetype == LoginType.TERMS:
            html = TERMS_TEMPLATE % {
                "session": session,
@@ -158,17 +150,19 @@ class AuthRestServlet(RestServlet):
                "myurl": "%s/r0/auth/%s/fallback/web"
                % (CLIENT_API_PREFIX, LoginType.TERMS),
            }
-            html_bytes = html.encode("utf8")
-            request.setResponseCode(200)
-            request.setHeader(b"Content-Type", b"text/html; charset=utf-8")
-            request.setHeader(b"Content-Length", b"%d" % (len(html_bytes),))
-
-            request.write(html_bytes)
-            finish_request(request)
-            return None
        else:
            raise SynapseError(404, "Unknown auth stage type")

+        # Render the HTML and return.
+        html_bytes = html.encode("utf8")
+        request.setResponseCode(200)
+        request.setHeader(b"Content-Type", b"text/html; charset=utf-8")
+        request.setHeader(b"Content-Length", b"%d" % (len(html_bytes),))
+
+        request.write(html_bytes)
+        finish_request(request)
+        return None
+
    async def on_POST(self, request, stagetype):

        session = parse_string(request, "session")
@@ -196,15 +190,6 @@ class AuthRestServlet(RestServlet):
                    % (CLIENT_API_PREFIX, LoginType.RECAPTCHA),
                    "sitekey": self.hs.config.recaptcha_public_key,
                }
-            html_bytes = html.encode("utf8")
-            request.setResponseCode(200)
-            request.setHeader(b"Content-Type", b"text/html; charset=utf-8")
-            request.setHeader(b"Content-Length", b"%d" % (len(html_bytes),))
-
-            request.write(html_bytes)
-            finish_request(request)
-
-            return None
        elif stagetype == LoginType.TERMS:
            authdict = {"session": session}

@@ -225,17 +210,19 @@ class AuthRestServlet(RestServlet):
                    "myurl": "%s/r0/auth/%s/fallback/web"
                    % (CLIENT_API_PREFIX, LoginType.TERMS),
                }
-            html_bytes = html.encode("utf8")
-            request.setResponseCode(200)
-            request.setHeader(b"Content-Type", b"text/html; charset=utf-8")
-            request.setHeader(b"Content-Length", b"%d" % (len(html_bytes),))
-
-            request.write(html_bytes)
-            finish_request(request)
-            return None
        else:
            raise SynapseError(404, "Unknown auth stage type")

+        # Render the HTML and return.
+        html_bytes = html.encode("utf8")
+        request.setResponseCode(200)
+        request.setHeader(b"Content-Type", b"text/html; charset=utf-8")
+        request.setHeader(b"Content-Length", b"%d" % (len(html_bytes),))
+
+        request.write(html_bytes)
+        finish_request(request)
+        return None
+
    def on_OPTIONS(self, _):
        return 200, {}

--- a/synapse/rest/media/v1/download_resource.py
+++ b/synapse/rest/media/v1/download_resource.py
@@ -50,6 +50,9 @@ class DownloadResource(DirectServeResource):
            b" media-src 'self';"
            b" object-src 'self';",
        )
+        request.setHeader(
+            b"Referrer-Policy", b"no-referrer",
+        )
        server_name, media_id, name = parse_media_id(request)
        if server_name == self.server_name:
            await self.media_repo.get_local_media(request, media_id, name)
--- a/synapse/rest/media/v1/media_repository.py
+++ b/synapse/rest/media/v1/media_repository.py
@@ -24,7 +24,6 @@ from six import iteritems

 import twisted.internet.error
 import twisted.web.http
-from twisted.internet import defer
 from twisted.web.resource import Resource

 from synapse.api.errors import (
@@ -114,15 +113,14 @@ class MediaRepository(object):
            "update_recently_accessed_media", self._update_recently_accessed
        )

-    @defer.inlineCallbacks
-    def _update_recently_accessed(self):
+    async def _update_recently_accessed(self):
        remote_media = self.recently_accessed_remotes
        self.recently_accessed_remotes = set()

        local_media = self.recently_accessed_locals
        self.recently_accessed_locals = set()

-        yield self.store.update_cached_last_access_time(
+        await self.store.update_cached_last_access_time(
            local_media, remote_media, self.clock.time_msec()
        )

@@ -138,8 +136,7 @@ class MediaRepository(object):
        else:
            self.recently_accessed_locals.add(media_id)

-    @defer.inlineCallbacks
-    def create_content(
+    async def create_content(
        self, media_type, upload_name, content, content_length, auth_user
    ):
        """Store uploaded content for a local user and return the mxc URL
@@ -158,11 +155,11 @@ class MediaRepository(object):

        file_info = FileInfo(server_name=None, file_id=media_id)

-        fname = yield self.media_storage.store_file(content, file_info)
+        fname = await self.media_storage.store_file(content, file_info)

        logger.info("Stored local media in file %r", fname)

-        yield self.store.store_local_media(
+        await self.store.store_local_media(
            media_id=media_id,
            media_type=media_type,
            time_now_ms=self.clock.time_msec(),
@@ -171,12 +168,11 @@ class MediaRepository(object):
            user_id=auth_user,
        )

-        yield self._generate_thumbnails(None, media_id, media_id, media_type)
+        await self._generate_thumbnails(None, media_id, media_id, media_type)

        return "mxc://%s/%s" % (self.server_name, media_id)

-    @defer.inlineCallbacks
-    def get_local_media(self, request, media_id, name):
+    async def get_local_media(self, request, media_id, name):
        """Responds to reqests for local media, if exists, or returns 404.

        Args:
@@ -190,7 +186,7 @@ class MediaRepository(object):
            Deferred: Resolves once a response has successfully been written
                to request
        """
-        media_info = yield self.store.get_local_media(media_id)
+        media_info = await self.store.get_local_media(media_id)
        if not media_info or media_info["quarantined_by"]:
            respond_404(request)
            return
@@ -204,13 +200,12 @@ class MediaRepository(object):

        file_info = FileInfo(None, media_id, url_cache=url_cache)

-        responder = yield self.media_storage.fetch_media(file_info)
-        yield respond_with_responder(
+        responder = await self.media_storage.fetch_media(file_info)
+        await respond_with_responder(
            request, responder, media_type, media_length, upload_name
        )

-    @defer.inlineCallbacks
-    def get_remote_media(self, request, server_name, media_id, name):
+    async def get_remote_media(self, request, server_name, media_id, name):
        """Respond to requests for remote media.

        Args:
@@ -236,8 +231,8 @@ class MediaRepository(object):
        # We linearize here to ensure that we don't try and download remote
        # media multiple times concurrently
        key = (server_name, media_id)
-        with (yield self.remote_media_linearizer.queue(key)):
-            responder, media_info = yield self._get_remote_media_impl(
+        with (await self.remote_media_linearizer.queue(key)):
+            responder, media_info = await self._get_remote_media_impl(
                server_name, media_id
            )

@@ -246,14 +241,13 @@ class MediaRepository(object):
            media_type = media_info["media_type"]
            media_length = media_info["media_length"]
            upload_name = name if name else media_info["upload_name"]
-            yield respond_with_responder(
+            await respond_with_responder(
                request, responder, media_type, media_length, upload_name
            )
        else:
            respond_404(request)

-    @defer.inlineCallbacks
-    def get_remote_media_info(self, server_name, media_id):
+    async def get_remote_media_info(self, server_name, media_id):
        """Gets the media info associated with the remote file, downloading
        if necessary.

@@ -274,8 +268,8 @@ class MediaRepository(object):
        # We linearize here to ensure that we don't try and download remote
        # media multiple times concurrently
        key = (server_name, media_id)
-        with (yield self.remote_media_linearizer.queue(key)):
-            responder, media_info = yield self._get_remote_media_impl(
+        with (await self.remote_media_linearizer.queue(key)):
+            responder, media_info = await self._get_remote_media_impl(
                server_name, media_id
            )

@@ -286,8 +280,7 @@ class MediaRepository(object):

        return media_info

-    @defer.inlineCallbacks
-    def _get_remote_media_impl(self, server_name, media_id):
+    async def _get_remote_media_impl(self, server_name, media_id):
        """Looks for media in local cache, if not there then attempt to
        download from remote server.

@@ -299,7 +292,7 @@ class MediaRepository(object):
        Returns:
            Deferred[(Responder, media_info)]
        """
-        media_info = yield self.store.get_cached_remote_media(server_name, media_id)
+        media_info = await self.store.get_cached_remote_media(server_name, media_id)

        # file_id is the ID we use to track the file locally. If we've already
        # seen the file then reuse the existing ID, otherwise genereate a new
@@ -317,19 +310,18 @@ class MediaRepository(object):
                logger.info("Media is quarantined")
                raise NotFoundError()

-            responder = yield self.media_storage.fetch_media(file_info)
+            responder = await self.media_storage.fetch_media(file_info)
            if responder:
                return responder, media_info

        # Failed to find the file anywhere, lets download it.

-        media_info = yield self._download_remote_file(server_name, media_id, file_id)
+        media_info = await self._download_remote_file(server_name, media_id, file_id)

-        responder = yield self.media_storage.fetch_media(file_info)
+        responder = await self.media_storage.fetch_media(file_info)
        return responder, media_info

-    @defer.inlineCallbacks
-    def _download_remote_file(self, server_name, media_id, file_id):
+    async def _download_remote_file(self, server_name, media_id, file_id):
        """Attempt to download the remote file from the given server name,
        using the given file_id as the local id.

@@ -351,7 +343,7 @@ class MediaRepository(object):
                ("/_matrix/media/v1/download", server_name, media_id)
            )
            try:
-                length, headers = yield self.client.get_file(
+                length, headers = await self.client.get_file(
                    server_name,
                    request_path,
                    output_stream=f,
@@ -397,7 +389,7 @@ class MediaRepository(object):
                )
                raise SynapseError(502, "Failed to fetch remote media")

-            yield finish()
+            await finish()

        media_type = headers[b"Content-Type"][0].decode("ascii")
        upload_name = get_filename_from_headers(headers)
@@ -405,7 +397,7 @@ class MediaRepository(object):

        logger.info("Stored remote media in file %r", fname)

-        yield self.store.store_cached_remote_media(
+        await self.store.store_cached_remote_media(
            origin=server_name,
            media_id=media_id,
            media_type=media_type,
@@ -423,7 +415,7 @@ class MediaRepository(object):
            "filesystem_id": file_id,
        }

-        yield self._generate_thumbnails(server_name, media_id, file_id, media_type)
+        await self._generate_thumbnails(server_name, media_id, file_id, media_type)

        return media_info

@@ -458,16 +450,15 @@ class MediaRepository(object):

        return t_byte_source

-    @defer.inlineCallbacks
-    def generate_local_exact_thumbnail(
+    async def generate_local_exact_thumbnail(
        self, media_id, t_width, t_height, t_method, t_type, url_cache
    ):
-        input_path = yield self.media_storage.ensure_media_is_in_local_cache(
+        input_path = await self.media_storage.ensure_media_is_in_local_cache(
            FileInfo(None, media_id, url_cache=url_cache)
        )

        thumbnailer = Thumbnailer(input_path)
-        t_byte_source = yield defer_to_thread(
+        t_byte_source = await defer_to_thread(
            self.hs.get_reactor(),
            self._generate_thumbnail,
            thumbnailer,
@@ -490,7 +481,7 @@ class MediaRepository(object):
                    thumbnail_type=t_type,
                )

-                output_path = yield self.media_storage.store_file(
+                output_path = await self.media_storage.store_file(
                    t_byte_source, file_info
                )
            finally:
@@ -500,22 +491,21 @@ class MediaRepository(object):

            t_len = os.path.getsize(output_path)

-            yield self.store.store_local_thumbnail(
+            await self.store.store_local_thumbnail(
                media_id, t_width, t_height, t_type, t_method, t_len
            )

            return output_path

-    @defer.inlineCallbacks
-    def generate_remote_exact_thumbnail(
+    async def generate_remote_exact_thumbnail(
        self, server_name, file_id, media_id, t_width, t_height, t_method, t_type
    ):
-        input_path = yield self.media_storage.ensure_media_is_in_local_cache(
+        input_path = await self.media_storage.ensure_media_is_in_local_cache(
            FileInfo(server_name, file_id, url_cache=False)
        )

        thumbnailer = Thumbnailer(input_path)
-        t_byte_source = yield defer_to_thread(
+        t_byte_source = await defer_to_thread(
            self.hs.get_reactor(),
            self._generate_thumbnail,
            thumbnailer,
@@ -537,7 +527,7 @@ class MediaRepository(object):
                    thumbnail_type=t_type,
                )

-                output_path = yield self.media_storage.store_file(
+                output_path = await self.media_storage.store_file(
                    t_byte_source, file_info
                )
            finally:
@@ -547,7 +537,7 @@ class MediaRepository(object):

            t_len = os.path.getsize(output_path)

-            yield self.store.store_remote_media_thumbnail(
+            await self.store.store_remote_media_thumbnail(
                server_name,
                media_id,
                file_id,
@@ -560,8 +550,7 @@ class MediaRepository(object):

            return output_path

-    @defer.inlineCallbacks
-    def _generate_thumbnails(
+    async def _generate_thumbnails(
        self, server_name, media_id, file_id, media_type, url_cache=False
    ):
        """Generate and store thumbnails for an image.
@@ -582,7 +571,7 @@ class MediaRepository(object):
        if not requirements:
            return

-        input_path = yield self.media_storage.ensure_media_is_in_local_cache(
+        input_path = await self.media_storage.ensure_media_is_in_local_cache(
            FileInfo(server_name, file_id, url_cache=url_cache)
        )

@@ -600,7 +589,7 @@ class MediaRepository(object):
            return

        if thumbnailer.transpose_method is not None:
-            m_width, m_height = yield defer_to_thread(
+            m_width, m_height = await defer_to_thread(
                self.hs.get_reactor(), thumbnailer.transpose
            )

@@ -620,11 +609,11 @@ class MediaRepository(object):
        for (t_width, t_height, t_type), t_method in iteritems(thumbnails):
            # Generate the thumbnail
            if t_method == "crop":
-                t_byte_source = yield defer_to_thread(
+                t_byte_source = await defer_to_thread(
                    self.hs.get_reactor(), thumbnailer.crop, t_width, t_height, t_type
                )
            elif t_method == "scale":
-                t_byte_source = yield defer_to_thread(
+                t_byte_source = await defer_to_thread(
                    self.hs.get_reactor(), thumbnailer.scale, t_width, t_height, t_type
                )
            else:
@@ -646,7 +635,7 @@ class MediaRepository(object):
                    url_cache=url_cache,
                )

-                output_path = yield self.media_storage.store_file(
+                output_path = await self.media_storage.store_file(
                    t_byte_source, file_info
                )
            finally:
@@ -656,7 +645,7 @@ class MediaRepository(object):

            # Write to database
            if server_name:
-                yield self.store.store_remote_media_thumbnail(
+                await self.store.store_remote_media_thumbnail(
                    server_name,
                    media_id,
                    file_id,
@@ -667,15 +656,14 @@ class MediaRepository(object):
                    t_len,
                )
            else:
-                yield self.store.store_local_thumbnail(
+                await self.store.store_local_thumbnail(
                    media_id, t_width, t_height, t_type, t_method, t_len
                )

        return {"width": m_width, "height": m_height}

-    @defer.inlineCallbacks
-    def delete_old_remote_media(self, before_ts):
-        old_media = yield self.store.get_remote_media_before(before_ts)
+    async def delete_old_remote_media(self, before_ts):
+        old_media = await self.store.get_remote_media_before(before_ts)

        deleted = 0

@@ -689,7 +677,7 @@ class MediaRepository(object):

            # TODO: Should we delete from the backup store

-            with (yield self.remote_media_linearizer.queue(key)):
+            with (await self.remote_media_linearizer.queue(key)):
                full_path = self.filepaths.remote_media_filepath(origin, file_id)
                try:
                    os.remove(full_path)
@@ -705,7 +693,7 @@ class MediaRepository(object):
                )
                shutil.rmtree(thumbnail_dir, ignore_errors=True)

-                yield self.store.delete_remote_media(origin, media_id)
+                await self.store.delete_remote_media(origin, media_id)
                deleted += 1

        return {"deleted": deleted}
--- a/synapse/rest/media/v1/preview_url_resource.py
+++ b/synapse/rest/media/v1/preview_url_resource.py
@@ -165,8 +165,7 @@ class PreviewUrlResource(DirectServeResource):
        og = await make_deferred_yieldable(defer.maybeDeferred(observable.observe))
        respond_with_json_bytes(request, 200, og, send_cors=True)

-    @defer.inlineCallbacks
-    def _do_preview(self, url, user, ts):
+    async def _do_preview(self, url, user, ts):
        """Check the db, and download the URL and build a preview

        Args:
@@ -179,7 +178,7 @@ class PreviewUrlResource(DirectServeResource):
        """
        # check the URL cache in the DB (which will also provide us with
        # historical previews, if we have any)
-        cache_result = yield self.store.get_url_cache(url, ts)
+        cache_result = await self.store.get_url_cache(url, ts)
        if (
            cache_result
            and cache_result["expires_ts"] > ts
@@ -192,13 +191,13 @@ class PreviewUrlResource(DirectServeResource):
                og = og.encode("utf8")
            return og

-        media_info = yield self._download_url(url, user)
+        media_info = await self._download_url(url, user)

        logger.debug("got media_info of '%s'", media_info)

        if _is_media(media_info["media_type"]):
            file_id = media_info["filesystem_id"]
-            dims = yield self.media_repo._generate_thumbnails(
+            dims = await self.media_repo._generate_thumbnails(
                None, file_id, file_id, media_info["media_type"], url_cache=True
            )

@@ -248,14 +247,14 @@ class PreviewUrlResource(DirectServeResource):
            # request itself and benefit from the same caching etc.  But for now we
            # just rely on the caching on the master request to speed things up.
            if "og:image" in og and og["og:image"]:
-                image_info = yield self._download_url(
+                image_info = await self._download_url(
                    _rebase_url(og["og:image"], media_info["uri"]), user
                )

                if _is_media(image_info["media_type"]):
                    # TODO: make sure we don't choke on white-on-transparent images
                    file_id = image_info["filesystem_id"]
-                    dims = yield self.media_repo._generate_thumbnails(
+                    dims = await self.media_repo._generate_thumbnails(
                        None, file_id, file_id, image_info["media_type"], url_cache=True
                    )
                    if dims:
@@ -293,7 +292,7 @@ class PreviewUrlResource(DirectServeResource):
        jsonog = json.dumps(og)

        # store OG in history-aware DB cache
-        yield self.store.store_url_cache(
+        await self.store.store_url_cache(
            url,
            media_info["response_code"],
            media_info["etag"],
@@ -305,8 +304,7 @@ class PreviewUrlResource(DirectServeResource):

        return jsonog.encode("utf8")

-    @defer.inlineCallbacks
-    def _download_url(self, url, user):
+    async def _download_url(self, url, user):
        # TODO: we should probably honour robots.txt... except in practice
        # we're most likely being explicitly triggered by a human rather than a
        # bot, so are we really a robot?
@@ -318,7 +316,7 @@ class PreviewUrlResource(DirectServeResource):
        with self.media_storage.store_into_file(file_info) as (f, fname, finish):
            try:
                logger.debug("Trying to get url '%s'", url)
-                length, headers, uri, code = yield self.client.get_file(
+                length, headers, uri, code = await self.client.get_file(
                    url, output_stream=f, max_size=self.max_spider_size
                )
            except SynapseError:
@@ -345,7 +343,7 @@ class PreviewUrlResource(DirectServeResource):
                    % (traceback.format_exception_only(sys.exc_info()[0], e),),
                    Codes.UNKNOWN,
                )
-            yield finish()
+            await finish()

        try:
            if b"Content-Type" in headers:
@@ -356,7 +354,7 @@ class PreviewUrlResource(DirectServeResource):

            download_name = get_filename_from_headers(headers)

-            yield self.store.store_local_media(
+            await self.store.store_local_media(
                media_id=file_id,
                media_type=media_type,
                time_now_ms=self.clock.time_msec(),
@@ -393,8 +391,7 @@ class PreviewUrlResource(DirectServeResource):
            "expire_url_cache_data", self._expire_url_cache_data
        )

-    @defer.inlineCallbacks
-    def _expire_url_cache_data(self):
+    async def _expire_url_cache_data(self):
        """Clean up expired url cache content, media and thumbnails.
        """
        # TODO: Delete from backup media store
@@ -403,12 +400,12 @@ class PreviewUrlResource(DirectServeResource):

        logger.info("Running url preview cache expiry")

-        if not (yield self.store.db.updates.has_completed_background_updates()):
+        if not (await self.store.db.updates.has_completed_background_updates()):
            logger.info("Still running DB updates; skipping expiry")
            return

        # First we delete expired url cache entries
-        media_ids = yield self.store.get_expired_url_cache(now)
+        media_ids = await self.store.get_expired_url_cache(now)

        removed_media = []
        for media_id in media_ids:
@@ -430,7 +427,7 @@ class PreviewUrlResource(DirectServeResource):
            except Exception:
                pass

-        yield self.store.delete_url_cache(removed_media)
+        await self.store.delete_url_cache(removed_media)

        if removed_media:
            logger.info("Deleted %d entries from url cache", len(removed_media))
@@ -440,7 +437,7 @@ class PreviewUrlResource(DirectServeResource):
        # may have a room open with a preview url thing open).
        # So we wait a couple of days before deleting, just in case.
        expire_before = now - 2 * 24 * 60 * 60 * 1000
-        media_ids = yield self.store.get_url_cache_media_before(expire_before)
+        media_ids = await self.store.get_url_cache_media_before(expire_before)

        removed_media = []
        for media_id in media_ids:
@@ -478,7 +475,7 @@ class PreviewUrlResource(DirectServeResource):
            except Exception:
                pass

-        yield self.store.delete_url_cache_media(removed_media)
+        await self.store.delete_url_cache_media(removed_media)

        logger.info("Deleted %d media from url cache", len(removed_media))

--- a/synapse/rest/media/v1/thumbnail_resource.py
+++ b/synapse/rest/media/v1/thumbnail_resource.py
@@ -16,8 +16,6 @@

 import logging

-from twisted.internet import defer
-
 from synapse.http.server import (
    DirectServeResource,
    set_cors_headers,
@@ -79,11 +77,10 @@ class ThumbnailResource(DirectServeResource):
                )
            self.media_repo.mark_recently_accessed(server_name, media_id)

-    @defer.inlineCallbacks
-    def _respond_local_thumbnail(
+    async def _respond_local_thumbnail(
        self, request, media_id, width, height, method, m_type
    ):
-        media_info = yield self.store.get_local_media(media_id)
+        media_info = await self.store.get_local_media(media_id)

        if not media_info:
            respond_404(request)
@@ -93,7 +90,7 @@ class ThumbnailResource(DirectServeResource):
            respond_404(request)
            return

-        thumbnail_infos = yield self.store.get_local_media_thumbnails(media_id)
+        thumbnail_infos = await self.store.get_local_media_thumbnails(media_id)

        if thumbnail_infos:
            thumbnail_info = self._select_thumbnail(
@@ -114,14 +111,13 @@ class ThumbnailResource(DirectServeResource):
            t_type = file_info.thumbnail_type
            t_length = thumbnail_info["thumbnail_length"]

-            responder = yield self.media_storage.fetch_media(file_info)
-            yield respond_with_responder(request, responder, t_type, t_length)
+            responder = await self.media_storage.fetch_media(file_info)
+            await respond_with_responder(request, responder, t_type, t_length)
        else:
            logger.info("Couldn't find any generated thumbnails")
            respond_404(request)

-    @defer.inlineCallbacks
-    def _select_or_generate_local_thumbnail(
+    async def _select_or_generate_local_thumbnail(
        self,
        request,
        media_id,
@@ -130,7 +126,7 @@ class ThumbnailResource(DirectServeResource):
        desired_method,
        desired_type,
    ):
-        media_info = yield self.store.get_local_media(media_id)
+        media_info = await self.store.get_local_media(media_id)

        if not media_info:
            respond_404(request)
@@ -140,7 +136,7 @@ class ThumbnailResource(DirectServeResource):
            respond_404(request)
            return

-        thumbnail_infos = yield self.store.get_local_media_thumbnails(media_id)
+        thumbnail_infos = await self.store.get_local_media_thumbnails(media_id)
        for info in thumbnail_infos:
            t_w = info["thumbnail_width"] == desired_width
            t_h = info["thumbnail_height"] == desired_height
@@ -162,15 +158,15 @@ class ThumbnailResource(DirectServeResource):
                t_type = file_info.thumbnail_type
                t_length = info["thumbnail_length"]

-                responder = yield self.media_storage.fetch_media(file_info)
+                responder = await self.media_storage.fetch_media(file_info)
                if responder:
-                    yield respond_with_responder(request, responder, t_type, t_length)
+                    await respond_with_responder(request, responder, t_type, t_length)
                    return

        logger.debug("We don't have a thumbnail of that size. Generating")

        # Okay, so we generate one.
-        file_path = yield self.media_repo.generate_local_exact_thumbnail(
+        file_path = await self.media_repo.generate_local_exact_thumbnail(
            media_id,
            desired_width,
            desired_height,
@@ -180,13 +176,12 @@ class ThumbnailResource(DirectServeResource):
        )

        if file_path:
-            yield respond_with_file(request, desired_type, file_path)
+            await respond_with_file(request, desired_type, file_path)
        else:
            logger.warning("Failed to generate thumbnail")
            respond_404(request)

-    @defer.inlineCallbacks
-    def _select_or_generate_remote_thumbnail(
+    async def _select_or_generate_remote_thumbnail(
        self,
        request,
        server_name,
@@ -196,9 +191,9 @@ class ThumbnailResource(DirectServeResource):
        desired_method,
        desired_type,
    ):
-        media_info = yield self.media_repo.get_remote_media_info(server_name, media_id)
+        media_info = await self.media_repo.get_remote_media_info(server_name, media_id)

-        thumbnail_infos = yield self.store.get_remote_media_thumbnails(
+        thumbnail_infos = await self.store.get_remote_media_thumbnails(
            server_name, media_id
        )

@@ -224,15 +219,15 @@ class ThumbnailResource(DirectServeResource):
                t_type = file_info.thumbnail_type
                t_length = info["thumbnail_length"]

-                responder = yield self.media_storage.fetch_media(file_info)
+                responder = await self.media_storage.fetch_media(file_info)
                if responder:
-                    yield respond_with_responder(request, responder, t_type, t_length)
+                    await respond_with_responder(request, responder, t_type, t_length)
                    return

        logger.debug("We don't have a thumbnail of that size. Generating")

        # Okay, so we generate one.
-        file_path = yield self.media_repo.generate_remote_exact_thumbnail(
+        file_path = await self.media_repo.generate_remote_exact_thumbnail(
            server_name,
            file_id,
            media_id,
@@ -243,21 +238,20 @@ class ThumbnailResource(DirectServeResource):
        )

        if file_path:
-            yield respond_with_file(request, desired_type, file_path)
+            await respond_with_file(request, desired_type, file_path)
        else:
            logger.warning("Failed to generate thumbnail")
            respond_404(request)

-    @defer.inlineCallbacks
-    def _respond_remote_thumbnail(
+    async def _respond_remote_thumbnail(
        self, request, server_name, media_id, width, height, method, m_type
    ):
        # TODO: Don't download the whole remote file
        # We should proxy the thumbnail from the remote server instead of
        # downloading the remote file and generating our own thumbnails.
-        media_info = yield self.media_repo.get_remote_media_info(server_name, media_id)
+        media_info = await self.media_repo.get_remote_media_info(server_name, media_id)

-        thumbnail_infos = yield self.store.get_remote_media_thumbnails(
+        thumbnail_infos = await self.store.get_remote_media_thumbnails(
            server_name, media_id
        )

@@ -278,8 +272,8 @@ class ThumbnailResource(DirectServeResource):
            t_type = file_info.thumbnail_type
            t_length = thumbnail_info["thumbnail_length"]

-            responder = yield self.media_storage.fetch_media(file_info)
-            yield respond_with_responder(request, responder, t_type, t_length)
+            responder = await self.media_storage.fetch_media(file_info)
+            await respond_with_responder(request, responder, t_type, t_length)
        else:
            logger.info("Failed to find any generated thumbnails")
            respond_404(request)
--- a/synapse/server.py
+++ b/synapse/server.py
@@ -78,13 +78,18 @@ from synapse.handlers.room_member_worker import RoomMemberWorkerHandler
 from synapse.handlers.set_password import SetPasswordHandler
 from synapse.handlers.stats import StatsHandler
 from synapse.handlers.sync import SyncHandler
-from synapse.handlers.typing import TypingHandler
+from synapse.handlers.typing import TypingHandler, TypingSlaveHandler
 from synapse.handlers.user_directory import UserDirectoryHandler
 from synapse.http.client import InsecureInterceptableContextFactory, SimpleHttpClient
 from synapse.http.matrixfederationclient import MatrixFederationHttpClient
 from synapse.notifier import Notifier
 from synapse.push.action_generator import ActionGenerator
 from synapse.push.pusherpool import PusherPool
+from synapse.replication.tcp.handler import (
+    ReplicationClientHandler,
+    ReplicationDataHandler,
+)
+from synapse.replication.tcp.resource import ReplicationStreamer
 from synapse.rest.media.v1.media_repository import (
    MediaRepository,
    MediaRepositoryResource,
@@ -100,6 +105,7 @@ from synapse.storage import DataStores, Storage
 from synapse.streams.events import EventSources
 from synapse.util import Clock
 from synapse.util.distributor import Distributor
+from synapse.util.stringutils import random_string

 logger = logging.getLogger(__name__)

@@ -199,6 +205,8 @@ class HomeServer(object):
        "saml_handler",
        "event_client_serializer",
        "storage",
+        "replication_streamer",
+        "replication_data_handler",
    ]

    REQUIRED_ON_MASTER_STARTUP = ["user_directory_handler", "stats_handler"]
@@ -224,6 +232,8 @@ class HomeServer(object):
        self._listening_services = []
        self.start_time = None

+        self.instance_id = random_string(5)
+
        self.clock = Clock(reactor)
        self.distributor = Distributor()
        self.ratelimiter = Ratelimiter()
@@ -236,6 +246,11 @@ class HomeServer(object):
        for depname in kwargs:
            setattr(self, depname, kwargs[depname])

+    def get_instance_id(self):
+        """A unique ID for this synapse process instance.
+        """
+        return self.instance_id
+
    def setup(self):
        logger.info("Setting up.")
        self.start_time = int(self.get_clock().time())
@@ -339,7 +354,10 @@ class HomeServer(object):
        return PresenceHandler(self)

    def build_typing_handler(self):
-        return TypingHandler(self)
+        if self.config.handle_typing:
+            return TypingHandler(self)
+        else:
+            return TypingSlaveHandler(self)

    def build_sync_handler(self):
        return SyncHandler(self)
@@ -439,10 +457,8 @@ class HomeServer(object):
    def build_federation_sender(self):
        if self.should_send_federation():
            return FederationSender(self)
-        elif not self.config.worker_app:
-            return FederationRemoteSendQueue(self)
        else:
-            raise Exception("Workers cannot send federation traffic")
+            return FederationRemoteSendQueue(self)

    def build_receipts_handler(self):
        return ReceiptsHandler(self)
@@ -451,7 +467,7 @@ class HomeServer(object):
        return ReadMarkerHandler(self)

    def build_tcp_replication(self):
-        raise NotImplementedError()
+        return ReplicationClientHandler(self)

    def build_action_generator(self):
        return ActionGenerator(self)
@@ -536,6 +552,12 @@ class HomeServer(object):
    def build_storage(self) -> Storage:
        return Storage(self, self.datastores)

+    def build_replication_streamer(self) -> ReplicationStreamer:
+        return ReplicationStreamer(self)
+
+    def build_replication_data_handler(self):
+        return ReplicationDataHandler(self)
+
    def remove_pusher(self, app_id, push_key, user_id):
        return self.get_pusherpool().remove_pusher(app_id, push_key, user_id)

--- a/synapse/server.pyi
+++ b/synapse/server.pyi
@@ -106,7 +106,7 @@ class HomeServer(object):
        pass
    def get_tcp_replication(
        self,
-    ) -> synapse.replication.tcp.client.ReplicationClientHandler:
+    ) -> synapse.replication.tcp.handler.ReplicationClientHandler:
        pass
    def get_federation_registry(
        self,
@@ -114,3 +114,5 @@ class HomeServer(object):
        pass
    def is_mine_id(self, domain_id: str) -> bool:
        pass
+    def get_instance_id(self) -> str:
+        pass
--- a/synapse/storage/data_stores/main/init.py
+++ b/synapse/storage/data_stores/main/init.py
@@ -144,7 +144,10 @@ class DataStore(
            db_conn,
            "device_lists_stream",
            "stream_id",
-            extra_tables=[("user_signature_stream", "stream_id")],
+            extra_tables=[
+                ("user_signature_stream", "stream_id"),
+                ("device_lists_outbound_pokes", "stream_id"),
+            ],
        )
        self._cross_signing_id_gen = StreamIdGenerator(
            db_conn, "e2e_cross_signing_keys", "stream_id"
--- a/synapse/storage/data_stores/main/cache.py
+++ b/synapse/storage/data_stores/main/cache.py
@@ -32,7 +32,29 @@ logger = logging.getLogger(__name__)
 CURRENT_STATE_CACHE_NAME = "cs_cache_fake"


-class CacheInvalidationStore(SQLBaseStore):
+class CacheInvalidationWorkerStore(SQLBaseStore):
+    def get_all_updated_caches(self, last_id, current_id, limit):
+        if last_id == current_id:
+            return defer.succeed([])
+
+        def get_all_updated_caches_txn(txn):
+            # We purposefully don't bound by the current token, as we want to
+            # send across cache invalidations as quickly as possible. Cache
+            # invalidations are idempotent, so duplicates are fine.
+            sql = (
+                "SELECT stream_id, cache_func, keys, invalidation_ts"
+                " FROM cache_invalidation_stream"
+                " WHERE stream_id > ? ORDER BY stream_id ASC LIMIT ?"
+            )
+            txn.execute(sql, (last_id, limit))
+            return txn.fetchall()
+
+        return self.db.runInteraction(
+            "get_all_updated_caches", get_all_updated_caches_txn
+        )
+
+
+class CacheInvalidationStore(CacheInvalidationWorkerStore):
    async def invalidate_cache_and_stream(self, cache_name: str, keys: Tuple[Any, ...]):
        """Invalidates the cache and adds it to the cache stream so slaves
        will know to invalidate their caches.
@@ -145,26 +167,6 @@ class CacheInvalidationStore(SQLBaseStore):
                },
            )

-    def get_all_updated_caches(self, last_id, current_id, limit):
-        if last_id == current_id:
-            return defer.succeed([])
-
-        def get_all_updated_caches_txn(txn):
-            # We purposefully don't bound by the current token, as we want to
-            # send across cache invalidations as quickly as possible. Cache
-            # invalidations are idempotent, so duplicates are fine.
-            sql = (
-                "SELECT stream_id, cache_func, keys, invalidation_ts"
-                " FROM cache_invalidation_stream"
-                " WHERE stream_id > ? ORDER BY stream_id ASC LIMIT ?"
-            )
-            txn.execute(sql, (last_id, limit))
-            return txn.fetchall()
-
-        return self.db.runInteraction(
-            "get_all_updated_caches", get_all_updated_caches_txn
-        )
-
    def get_cache_stream_token(self):
        if self._cache_id_gen:
            return self._cache_id_gen.get_current_token()
--- a/synapse/storage/data_stores/main/deviceinbox.py
+++ b/synapse/storage/data_stores/main/deviceinbox.py
@@ -207,6 +207,50 @@ class DeviceInboxWorkerStore(SQLBaseStore):
            "delete_device_msgs_for_remote", delete_messages_for_remote_destination_txn
        )

+    def get_all_new_device_messages(self, last_pos, current_pos, limit):
+        """
+        Args:
+            last_pos(int):
+            current_pos(int):
+            limit(int):
+        Returns:
+            A deferred list of rows from the device inbox
+        """
+        if last_pos == current_pos:
+            return defer.succeed([])
+
+        def get_all_new_device_messages_txn(txn):
+            # We limit like this as we might have multiple rows per stream_id, and
+            # we want to make sure we always get all entries for any stream_id
+            # we return.
+            upper_pos = min(current_pos, last_pos + limit)
+            sql = (
+                "SELECT max(stream_id), user_id"
+                " FROM device_inbox"
+                " WHERE ? < stream_id AND stream_id <= ?"
+                " GROUP BY user_id"
+            )
+            txn.execute(sql, (last_pos, upper_pos))
+            rows = txn.fetchall()
+
+            sql = (
+                "SELECT max(stream_id), destination"
+                " FROM device_federation_outbox"
+                " WHERE ? < stream_id AND stream_id <= ?"
+                " GROUP BY destination"
+            )
+            txn.execute(sql, (last_pos, upper_pos))
+            rows.extend(txn)
+
+            # Order by ascending stream ordering
+            rows.sort()
+
+            return rows
+
+        return self.db.runInteraction(
+            "get_all_new_device_messages", get_all_new_device_messages_txn
+        )
+

 class DeviceInboxBackgroundUpdateStore(SQLBaseStore):
    DEVICE_INBOX_STREAM_ID = "device_inbox_stream_drop"
@@ -411,47 +455,3 @@ class DeviceInboxStore(DeviceInboxWorkerStore, DeviceInboxBackgroundUpdateStore)
                rows.append((user_id, device_id, stream_id, message_json))

        txn.executemany(sql, rows)
-
-    def get_all_new_device_messages(self, last_pos, current_pos, limit):
-        """
-        Args:
-            last_pos(int):
-            current_pos(int):
-            limit(int):
-        Returns:
-            A deferred list of rows from the device inbox
-        """
-        if last_pos == current_pos:
-            return defer.succeed([])
-
-        def get_all_new_device_messages_txn(txn):
-            # We limit like this as we might have multiple rows per stream_id, and
-            # we want to make sure we always get all entries for any stream_id
-            # we return.
-            upper_pos = min(current_pos, last_pos + limit)
-            sql = (
-                "SELECT max(stream_id), user_id"
-                " FROM device_inbox"
-                " WHERE ? < stream_id AND stream_id <= ?"
-                " GROUP BY user_id"
-            )
-            txn.execute(sql, (last_pos, upper_pos))
-            rows = txn.fetchall()
-
-            sql = (
-                "SELECT max(stream_id), destination"
-                " FROM device_federation_outbox"
-                " WHERE ? < stream_id AND stream_id <= ?"
-                " GROUP BY destination"
-            )
-            txn.execute(sql, (last_pos, upper_pos))
-            rows.extend(txn)
-
-            # Order by ascending stream ordering
-            rows.sort()
-
-            return rows
-
-        return self.db.runInteraction(
-            "get_all_new_device_messages", get_all_new_device_messages_txn
-        )
--- a/synapse/storage/data_stores/main/devices.py
+++ b/synapse/storage/data_stores/main/devices.py
@@ -15,6 +15,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import logging
+from typing import List, Tuple

 from six import iteritems

@@ -31,7 +32,7 @@ from synapse.logging.opentracing import (
 )
 from synapse.metrics.background_process_metrics import run_as_background_process
 from synapse.storage._base import SQLBaseStore, db_to_json, make_in_list_sql_clause
-from synapse.storage.database import Database
+from synapse.storage.database import Database, LoggingTransaction
 from synapse.types import Collection, get_verify_key_from_cross_signing_key
 from synapse.util.caches.descriptors import (
    Cache,
@@ -112,23 +113,13 @@ class DeviceWorkerStore(SQLBaseStore):
        if not has_changed:
            return now_stream_id, []

-        # We retrieve n+1 devices from the list of outbound pokes where n is
-        # our outbound device update limit. We then check if the very last
-        # device has the same stream_id as the second-to-last device. If so,
-        # then we ignore all devices with that stream_id and only send the
-        # devices with a lower stream_id.
-        #
-        # If when culling the list we end up with no devices afterwards, we
-        # consider the device update to be too large, and simply skip the
-        # stream_id; the rationale being that such a large device list update
-        # is likely an error.
        updates = yield self.db.runInteraction(
            "get_device_updates_by_remote",
            self._get_device_updates_by_remote_txn,
            destination,
            from_stream_id,
            now_stream_id,
-            limit + 1,
+            limit,
        )

        # Return an empty list if there are no updates
@@ -166,14 +157,6 @@ class DeviceWorkerStore(SQLBaseStore):
                    "device_id": verify_key.version,
                }

-        # if we have exceeded the limit, we need to exclude any results with the
-        # same stream_id as the last row.
-        if len(updates) > limit:
-            stream_id_cutoff = updates[-1][2]
-            now_stream_id = stream_id_cutoff - 1
-        else:
-            stream_id_cutoff = None
-
        # Perform the equivalent of a GROUP BY
        #
        # Iterate through the updates list and copy non-duplicate
@@ -192,10 +175,6 @@ class DeviceWorkerStore(SQLBaseStore):
        query_map = {}
        cross_signing_keys_by_user = {}
        for user_id, device_id, update_stream_id, update_context in updates:
-            if stream_id_cutoff is not None and update_stream_id >= stream_id_cutoff:
-                # Stop processing updates
-                break
-
            if (
                user_id in master_key_by_user
                and device_id == master_key_by_user[user_id]["device_id"]
@@ -218,17 +197,6 @@ class DeviceWorkerStore(SQLBaseStore):
                if update_stream_id > previous_update_stream_id:
                    query_map[key] = (update_stream_id, update_context)

-        # If we didn't find any updates with a stream_id lower than the cutoff, it
-        # means that there are more than limit updates all of which have the same
-        # steam_id.
-
-        # That should only happen if a client is spamming the server with new
-        # devices, in which case E2E isn't going to work well anyway. We'll just
-        # skip that stream_id and return an empty list, and continue with the next
-        # stream_id next time.
-        if not query_map and not cross_signing_keys_by_user:
-            return stream_id_cutoff, []
-
        results = yield self._get_device_update_edus_by_remote(
            destination, from_stream_id, query_map
        )
@@ -607,22 +575,33 @@ class DeviceWorkerStore(SQLBaseStore):
        else:
            return set()

-    def get_all_device_list_changes_for_remotes(self, from_key, to_key):
-        """Return a list of `(stream_id, user_id, destination)` which is the
-        combined list of changes to devices, and which destinations need to be
-        poked. `destination` may be None if no destinations need to be poked.
+    async def get_all_device_list_changes_for_remotes(
+        self, from_key: int, to_key: int, limit: int,
+    ) -> List[Tuple[int, str]]:
+        """Return a list of `(stream_id, entity)` which is the combined list of
+        changes to devices and which destinations need to be poked. Entity is
+        either a user ID (starting with '@') or a remote destination.
        """
-        # We do a group by here as there can be a large number of duplicate
-        # entries, since we throw away device IDs.
+
+        # This query Does The Right Thing where it'll correctly apply the
+        # bounds to the inner queries.
        sql = """
-            SELECT MAX(stream_id) AS stream_id, user_id, destination
-            FROM device_lists_stream
-            LEFT JOIN device_lists_outbound_pokes USING (stream_id, user_id, device_id)
+            SELECT stream_id, entity FROM (
+                SELECT stream_id, user_id AS entity FROM device_lists_stream
+                UNION ALL
+                SELECT stream_id, destination AS entity FROM device_lists_outbound_pokes
+            ) AS e
            WHERE ? < stream_id AND stream_id <= ?
-            GROUP BY user_id, destination
+            LIMIT ?
        """
-        return self.db.execute(
-            "get_all_device_list_changes_for_remotes", None, sql, from_key, to_key
+
+        return await self.db.execute(
+            "get_all_device_list_changes_for_remotes",
+            None,
+            sql,
+            from_key,
+            to_key,
+            limit,
        )

    @cached(max_entries=10000)
@@ -1017,29 +996,49 @@ class DeviceStore(DeviceWorkerStore, DeviceBackgroundUpdateStore):
        """Persist that a user's devices have been updated, and which hosts
        (if any) should be poked.
        """
-        with self._device_list_id_gen.get_next() as stream_id:
+        if not device_ids:
+            return
+
+        with self._device_list_id_gen.get_next_mult(len(device_ids)) as stream_ids:
            yield self.db.runInteraction(
-                "add_device_change_to_streams",
-                self._add_device_change_txn,
+                "add_device_change_to_stream",
+                self._add_device_change_to_stream_txn,
+                user_id,
+                device_ids,
+                stream_ids,
+            )
+
+        if not hosts:
+            return stream_ids[-1]
+
+        context = get_active_span_text_map()
+        with self._device_list_id_gen.get_next_mult(
+            len(hosts) * len(device_ids)
+        ) as stream_ids:
+            yield self.db.runInteraction(
+                "add_device_outbound_poke_to_stream",
+                self._add_device_outbound_poke_to_stream_txn,
                user_id,
                device_ids,
                hosts,
-                stream_id,
+                stream_ids,
+                context,
            )
-        return stream_id

-    def _add_device_change_txn(self, txn, user_id, device_ids, hosts, stream_id):
-        now = self._clock.time_msec()
+        return stream_ids[-1]

+    def _add_device_change_to_stream_txn(
+        self,
+        txn: LoggingTransaction,
+        user_id: str,
+        device_ids: Collection[str],
+        stream_ids: List[str],
+    ):
        txn.call_after(
-            self._device_list_stream_cache.entity_has_changed, user_id, stream_id
+            self._device_list_stream_cache.entity_has_changed, user_id, stream_ids[-1],
        )
-        for host in hosts:
-            txn.call_after(
-                self._device_list_federation_stream_cache.entity_has_changed,
-                host,
-                stream_id,
-            )
+
+        min_stream_id = stream_ids[0]

        # Delete older entries in the table, as we really only care about
        # when the latest change happened.
@@ -1048,7 +1047,7 @@ class DeviceStore(DeviceWorkerStore, DeviceBackgroundUpdateStore):
            DELETE FROM device_lists_stream
            WHERE user_id = ? AND device_id = ? AND stream_id < ?
            """,
-            [(user_id, device_id, stream_id) for device_id in device_ids],
+            [(user_id, device_id, min_stream_id) for device_id in device_ids],
        )

        self.db.simple_insert_many_txn(
@@ -1056,11 +1055,22 @@ class DeviceStore(DeviceWorkerStore, DeviceBackgroundUpdateStore):
            table="device_lists_stream",
            values=[
                {"stream_id": stream_id, "user_id": user_id, "device_id": device_id}
-                for device_id in device_ids
+                for stream_id, device_id in zip(stream_ids, device_ids)
            ],
        )

-        context = get_active_span_text_map()
+    def _add_device_outbound_poke_to_stream_txn(
+        self, txn, user_id, device_ids, hosts, stream_ids, context,
+    ):
+        for host in hosts:
+            txn.call_after(
+                self._device_list_federation_stream_cache.entity_has_changed,
+                host,
+                stream_ids[-1],
+            )
+
+        now = self._clock.time_msec()
+        next_stream_id = iter(stream_ids)

        self.db.simple_insert_many_txn(
            txn,
@@ -1068,7 +1078,7 @@ class DeviceStore(DeviceWorkerStore, DeviceBackgroundUpdateStore):
            values=[
                {
                    "destination": destination,
-                    "stream_id": stream_id,
+                    "stream_id": next(next_stream_id),
                    "user_id": user_id,
                    "device_id": device_id,
                    "sent": False,
--- a/synapse/storage/data_stores/main/end_to_end_keys.py
+++ b/synapse/storage/data_stores/main/end_to_end_keys.py
@@ -537,7 +537,7 @@ class EndToEndKeyWorkerStore(SQLBaseStore):

        return result

-    def get_all_user_signature_changes_for_remotes(self, from_key, to_key):
+    def get_all_user_signature_changes_for_remotes(self, from_key, to_key, limit):
        """Return a list of changes from the user signature stream to notify remotes.
        Note that the user signature stream represents when a user signs their
        device with their user-signing key, which is not published to other
@@ -552,13 +552,19 @@ class EndToEndKeyWorkerStore(SQLBaseStore):
            Deferred[list[(int,str)]] a list of `(stream_id, user_id)`
        """
        sql = """
-            SELECT MAX(stream_id) AS stream_id, from_user_id AS user_id
+            SELECT stream_id, from_user_id AS user_id
            FROM user_signature_stream
            WHERE ? < stream_id AND stream_id <= ?
-            GROUP BY user_id
+            ORDER BY stream_id ASC
+            LIMIT ?
        """
        return self.db.execute(
-            "get_all_user_signature_changes_for_remotes", None, sql, from_key, to_key
+            "get_all_user_signature_changes_for_remotes",
+            None,
+            sql,
+            from_key,
+            to_key,
+            limit,
        )


--- a/synapse/storage/data_stores/main/events.py
+++ b/synapse/storage/data_stores/main/events.py
@@ -1267,104 +1267,6 @@ class EventsStore(
        ret = yield self.db.runInteraction("count_daily_active_rooms", _count)
        return ret

-    def get_current_backfill_token(self):
-        """The current minimum token that backfilled events have reached"""
-        return -self._backfill_id_gen.get_current_token()
-
-    def get_current_events_token(self):
-        """The current maximum token that events have reached"""
-        return self._stream_id_gen.get_current_token()
-
-    def get_all_new_forward_event_rows(self, last_id, current_id, limit):
-        if last_id == current_id:
-            return defer.succeed([])
-
-        def get_all_new_forward_event_rows(txn):
-            sql = (
-                "SELECT e.stream_ordering, e.event_id, e.room_id, e.type,"
-                " state_key, redacts, relates_to_id"
-                " FROM events AS e"
-                " LEFT JOIN redactions USING (event_id)"
-                " LEFT JOIN state_events USING (event_id)"
-                " LEFT JOIN event_relations USING (event_id)"
-                " WHERE ? < stream_ordering AND stream_ordering <= ?"
-                " ORDER BY stream_ordering ASC"
-                " LIMIT ?"
-            )
-            txn.execute(sql, (last_id, current_id, limit))
-            new_event_updates = txn.fetchall()
-
-            if len(new_event_updates) == limit:
-                upper_bound = new_event_updates[-1][0]
-            else:
-                upper_bound = current_id
-
-            sql = (
-                "SELECT event_stream_ordering, e.event_id, e.room_id, e.type,"
-                " state_key, redacts, relates_to_id"
-                " FROM events AS e"
-                " INNER JOIN ex_outlier_stream USING (event_id)"
-                " LEFT JOIN redactions USING (event_id)"
-                " LEFT JOIN state_events USING (event_id)"
-                " LEFT JOIN event_relations USING (event_id)"
-                " WHERE ? < event_stream_ordering"
-                " AND event_stream_ordering <= ?"
-                " ORDER BY event_stream_ordering DESC"
-            )
-            txn.execute(sql, (last_id, upper_bound))
-            new_event_updates.extend(txn)
-
-            return new_event_updates
-
-        return self.db.runInteraction(
-            "get_all_new_forward_event_rows", get_all_new_forward_event_rows
-        )
-
-    def get_all_new_backfill_event_rows(self, last_id, current_id, limit):
-        if last_id == current_id:
-            return defer.succeed([])
-
-        def get_all_new_backfill_event_rows(txn):
-            sql = (
-                "SELECT -e.stream_ordering, e.event_id, e.room_id, e.type,"
-                " state_key, redacts, relates_to_id"
-                " FROM events AS e"
-                " LEFT JOIN redactions USING (event_id)"
-                " LEFT JOIN state_events USING (event_id)"
-                " LEFT JOIN event_relations USING (event_id)"
-                " WHERE ? > stream_ordering AND stream_ordering >= ?"
-                " ORDER BY stream_ordering ASC"
-                " LIMIT ?"
-            )
-            txn.execute(sql, (-last_id, -current_id, limit))
-            new_event_updates = txn.fetchall()
-
-            if len(new_event_updates) == limit:
-                upper_bound = new_event_updates[-1][0]
-            else:
-                upper_bound = current_id
-
-            sql = (
-                "SELECT -event_stream_ordering, e.event_id, e.room_id, e.type,"
-                " state_key, redacts, relates_to_id"
-                " FROM events AS e"
-                " INNER JOIN ex_outlier_stream USING (event_id)"
-                " LEFT JOIN redactions USING (event_id)"
-                " LEFT JOIN state_events USING (event_id)"
-                " LEFT JOIN event_relations USING (event_id)"
-                " WHERE ? > event_stream_ordering"
-                " AND event_stream_ordering >= ?"
-                " ORDER BY event_stream_ordering DESC"
-            )
-            txn.execute(sql, (-last_id, -upper_bound))
-            new_event_updates.extend(txn.fetchall())
-
-            return new_event_updates
-
-        return self.db.runInteraction(
-            "get_all_new_backfill_event_rows", get_all_new_backfill_event_rows
-        )
-
    @cached(num_args=5, max_entries=10)
    def get_all_new_events(
        self,
@@ -1850,22 +1752,6 @@ class EventsStore(

        return (int(res["topological_ordering"]), int(res["stream_ordering"]))

-    def get_all_updated_current_state_deltas(self, from_token, to_token, limit):
-        def get_all_updated_current_state_deltas_txn(txn):
-            sql = """
-                SELECT stream_id, room_id, type, state_key, event_id
-                FROM current_state_delta_stream
-                WHERE ? < stream_id AND stream_id <= ?
-                ORDER BY stream_id ASC LIMIT ?
-            """
-            txn.execute(sql, (from_token, to_token, limit))
-            return txn.fetchall()
-
-        return self.db.runInteraction(
-            "get_all_updated_current_state_deltas",
-            get_all_updated_current_state_deltas_txn,
-        )
-
    def insert_labels_for_event_txn(
        self, txn, event_id, labels, room_id, topological_ordering
    ):
--- a/synapse/storage/data_stores/main/events_worker.py
+++ b/synapse/storage/data_stores/main/events_worker.py
@@ -963,3 +963,117 @@ class EventsWorkerStore(SQLBaseStore):
        complexity_v1 = round(state_events / 500, 2)

        return {"v1": complexity_v1}
+
+    def get_current_backfill_token(self):
+        """The current minimum token that backfilled events have reached"""
+        return -self._backfill_id_gen.get_current_token()
+
+    def get_current_events_token(self):
+        """The current maximum token that events have reached"""
+        return self._stream_id_gen.get_current_token()
+
+    def get_all_new_forward_event_rows(self, last_id, current_id, limit):
+        if last_id == current_id:
+            return defer.succeed([])
+
+        def get_all_new_forward_event_rows(txn):
+            sql = (
+                "SELECT e.stream_ordering, e.event_id, e.room_id, e.type,"
+                " state_key, redacts, relates_to_id"
+                " FROM events AS e"
+                " LEFT JOIN redactions USING (event_id)"
+                " LEFT JOIN state_events USING (event_id)"
+                " LEFT JOIN event_relations USING (event_id)"
+                " WHERE ? < stream_ordering AND stream_ordering <= ?"
+                " ORDER BY stream_ordering ASC"
+                " LIMIT ?"
+            )
+            txn.execute(sql, (last_id, current_id, limit))
+            new_event_updates = txn.fetchall()
+
+            if len(new_event_updates) == limit:
+                upper_bound = new_event_updates[-1][0]
+            else:
+                upper_bound = current_id
+
+            sql = (
+                "SELECT event_stream_ordering, e.event_id, e.room_id, e.type,"
+                " state_key, redacts, relates_to_id"
+                " FROM events AS e"
+                " INNER JOIN ex_outlier_stream USING (event_id)"
+                " LEFT JOIN redactions USING (event_id)"
+                " LEFT JOIN state_events USING (event_id)"
+                " LEFT JOIN event_relations USING (event_id)"
+                " WHERE ? < event_stream_ordering"
+                " AND event_stream_ordering <= ?"
+                " ORDER BY event_stream_ordering DESC"
+            )
+            txn.execute(sql, (last_id, upper_bound))
+            new_event_updates.extend(txn)
+
+            return new_event_updates
+
+        return self.db.runInteraction(
+            "get_all_new_forward_event_rows", get_all_new_forward_event_rows
+        )
+
+    def get_all_new_backfill_event_rows(self, last_id, current_id, limit):
+        if last_id == current_id:
+            return defer.succeed([])
+
+        def get_all_new_backfill_event_rows(txn):
+            sql = (
+                "SELECT -e.stream_ordering, e.event_id, e.room_id, e.type,"
+                " state_key, redacts, relates_to_id"
+                " FROM events AS e"
+                " LEFT JOIN redactions USING (event_id)"
+                " LEFT JOIN state_events USING (event_id)"
+                " LEFT JOIN event_relations USING (event_id)"
+                " WHERE ? > stream_ordering AND stream_ordering >= ?"
+                " ORDER BY stream_ordering ASC"
+                " LIMIT ?"
+            )
+            txn.execute(sql, (-last_id, -current_id, limit))
+            new_event_updates = txn.fetchall()
+
+            if len(new_event_updates) == limit:
+                upper_bound = new_event_updates[-1][0]
+            else:
+                upper_bound = current_id
+
+            sql = (
+                "SELECT -event_stream_ordering, e.event_id, e.room_id, e.type,"
+                " state_key, redacts, relates_to_id"
+                " FROM events AS e"
+                " INNER JOIN ex_outlier_stream USING (event_id)"
+                " LEFT JOIN redactions USING (event_id)"
+                " LEFT JOIN state_events USING (event_id)"
+                " LEFT JOIN event_relations USING (event_id)"
+                " WHERE ? > event_stream_ordering"
+                " AND event_stream_ordering >= ?"
+                " ORDER BY event_stream_ordering DESC"
+            )
+            txn.execute(sql, (-last_id, -upper_bound))
+            new_event_updates.extend(txn.fetchall())
+
+            return new_event_updates
+
+        return self.db.runInteraction(
+            "get_all_new_backfill_event_rows", get_all_new_backfill_event_rows
+        )
+
+    def get_all_updated_current_state_deltas(self, from_token, to_token, limit):
+        def get_all_updated_current_state_deltas_txn(txn):
+            sql = """
+                SELECT stream_id, room_id, type, state_key, event_id
+                FROM current_state_delta_stream
+                WHERE ? < stream_id AND stream_id <= ?
+                ORDER BY stream_id ASC LIMIT ?
+            """
+            txn.execute(sql, (from_token, to_token, limit))
+            return txn.fetchall()
+
+        return self.db.runInteraction(
+            "get_all_updated_current_state_deltas",
+            get_all_updated_current_state_deltas_txn,
+        )
--- a/synapse/storage/data_stores/main/presence.py
+++ b/synapse/storage/data_stores/main/presence.py
@@ -60,7 +60,7 @@ class PresenceStore(SQLBaseStore):
                    "status_msg": state.status_msg,
                    "currently_active": state.currently_active,
                }
-                for state in presence_states
+                for stream_id, state in zip(stream_orderings, presence_states)
            ],
        )

@@ -73,19 +73,22 @@ class PresenceStore(SQLBaseStore):
            )
            txn.execute(sql + clause, [stream_id] + list(args))

-    def get_all_presence_updates(self, last_id, current_id):
+    def get_all_presence_updates(self, last_id, current_id, limit):
        if last_id == current_id:
            return defer.succeed([])

        def get_all_presence_updates_txn(txn):
-            sql = (
-                "SELECT stream_id, user_id, state, last_active_ts,"
-                " last_federation_update_ts, last_user_sync_ts, status_msg,"
-                " currently_active"
-                " FROM presence_stream"
-                " WHERE ? < stream_id AND stream_id <= ?"
-            )
-            txn.execute(sql, (last_id, current_id))
+            sql = """
+                SELECT stream_id, user_id, state, last_active_ts,
+                    last_federation_update_ts, last_user_sync_ts,
+                    status_msg,
+                currently_active
+                FROM presence_stream
+                WHERE ? < stream_id AND stream_id <= ?
+                ORDER BY stream_id ASC
+                LIMIT ?
+            """
+            txn.execute(sql, (last_id, current_id, limit))
            return txn.fetchall()

        return self.db.runInteraction(
--- a/synapse/storage/data_stores/main/room.py
+++ b/synapse/storage/data_stores/main/room.py
@@ -732,6 +732,26 @@ class RoomWorkerStore(SQLBaseStore):

        return total_media_quarantined

+    def get_all_new_public_rooms(self, prev_id, current_id, limit):
+        def get_all_new_public_rooms(txn):
+            sql = """
+                SELECT stream_id, room_id, visibility, appservice_id, network_id
+                FROM public_room_list_stream
+                WHERE stream_id > ? AND stream_id <= ?
+                ORDER BY stream_id ASC
+                LIMIT ?
+            """
+
+            txn.execute(sql, (prev_id, current_id, limit))
+            return txn.fetchall()
+
+        if prev_id == current_id:
+            return defer.succeed([])
+
+        return self.db.runInteraction(
+            "get_all_new_public_rooms", get_all_new_public_rooms
+        )
+

 class RoomBackgroundUpdateStore(SQLBaseStore):
    REMOVE_TOMESTONED_ROOMS_BG_UPDATE = "remove_tombstoned_rooms_from_directory"
@@ -1249,26 +1269,6 @@ class RoomStore(RoomBackgroundUpdateStore, RoomWorkerStore, SearchStore):
    def get_current_public_room_stream_id(self):
        return self._public_room_id_gen.get_current_token()

-    def get_all_new_public_rooms(self, prev_id, current_id, limit):
-        def get_all_new_public_rooms(txn):
-            sql = """
-                SELECT stream_id, room_id, visibility, appservice_id, network_id
-                FROM public_room_list_stream
-                WHERE stream_id > ? AND stream_id <= ?
-                ORDER BY stream_id ASC
-                LIMIT ?
-            """
-
-            txn.execute(sql, (prev_id, current_id, limit))
-            return txn.fetchall()
-
-        if prev_id == current_id:
-            return defer.succeed([])
-
-        return self.db.runInteraction(
-            "get_all_new_public_rooms", get_all_new_public_rooms
-        )
-
    @defer.inlineCallbacks
    def block_room(self, room_id, user_id):
        """Marks the room as blocked. Can be called multiple times.
--- a/tests/config/test_database.py
+++ b/tests/config/test_database.py
@@ -21,9 +21,9 @@ from tests import unittest


 class DatabaseConfigTestCase(unittest.TestCase):
-    def test_database_configured_correctly_no_database_conf_param(self):
+    def test_database_configured_correctly(self):
        conf = yaml.safe_load(
-            DatabaseConfig().generate_config_section("/data_dir_path", None)
+            DatabaseConfig().generate_config_section(data_dir_path="/data_dir_path")
        )

        expected_database_conf = {
@@ -32,21 +32,3 @@ class DatabaseConfigTestCase(unittest.TestCase):
        }

        self.assertEqual(conf["database"], expected_database_conf)
-
-    def test_database_configured_correctly_database_conf_param(self):
-
-        database_conf = {
-            "name": "my super fast datastore",
-            "args": {
-                "user": "matrix",
-                "password": "synapse_database_password",
-                "host": "synapse_database_host",
-                "database": "matrix",
-            },
-        }
-
-        conf = yaml.safe_load(
-            DatabaseConfig().generate_config_section("/data_dir_path", database_conf)
-        )
-
-        self.assertEqual(conf["database"], database_conf)
--- a/tests/replication/slave/storage/_base.py
+++ b/tests/replication/slave/storage/_base.py
@@ -15,9 +15,10 @@

 from mock import Mock, NonCallableMock

-from synapse.replication.tcp.client import (
-    ReplicationClientFactory,
+from synapse.replication.tcp.client import ReplicationClientFactory
+from synapse.replication.tcp.handler import (
    ReplicationClientHandler,
+    WorkerReplicationDataHandler,
 )
 from synapse.replication.tcp.resource import ReplicationStreamProtocolFactory
 from synapse.storage.database import make_conn
@@ -51,16 +52,19 @@ class BaseSlavedStoreTestCase(unittest.HomeserverTestCase):
        self.event_id = 0

        server_factory = ReplicationStreamProtocolFactory(self.hs)
-        self.streamer = server_factory.streamer
+        self.streamer = hs.get_replication_streamer()

-        handler_factory = Mock()
-        self.replication_handler = ReplicationClientHandler(self.slaved_store)
-        self.replication_handler.factory = handler_factory
-
-        client_factory = ReplicationClientFactory(
-            self.hs, "client_name", self.replication_handler
+        # We now do some gut wrenching so that we have a client that is based
+        # off of the slave store rather than the main store.
+        self.replication_handler = ReplicationClientHandler(self.hs)
+        self.replication_handler.store = self.slaved_store
+        self.replication_handler.replication_data_handler = WorkerReplicationDataHandler(
+            self.slaved_store
        )

+        client_factory = ReplicationClientFactory(self.hs, "client_name")
+        client_factory.handler = self.replication_handler
+
        server = server_factory.buildProtocol(None)
        client = client_factory.buildProtocol(None)

--- a/tests/replication/tcp/streams/_base.py
+++ b/tests/replication/tcp/streams/_base.py
@@ -12,9 +12,10 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
+
 from mock import Mock

-from synapse.replication.tcp.commands import ReplicateCommand
+from synapse.replication.tcp.handler import ReplicationClientHandler
 from synapse.replication.tcp.protocol import ClientReplicationStreamProtocol
 from synapse.replication.tcp.resource import ReplicationStreamProtocolFactory

@@ -25,23 +26,46 @@ from tests.server import FakeTransport
 class BaseStreamTestCase(unittest.HomeserverTestCase):
    """Base class for tests of the replication streams"""

+    def make_homeserver(self, reactor, clock):
+        self.test_handler = Mock(wraps=TestReplicationClientHandler())
+        return self.setup_test_homeserver(replication_data_handler=self.test_handler)
+
    def prepare(self, reactor, clock, hs):
        # build a replication server
-        server_factory = ReplicationStreamProtocolFactory(self.hs)
-        self.streamer = server_factory.streamer
-        server = server_factory.buildProtocol(None)
+        server_factory = ReplicationStreamProtocolFactory(hs)
+        self.streamer = hs.get_replication_streamer()
+        self.server = server_factory.buildProtocol(None)

-        # build a replication client, with a dummy handler
-        handler_factory = Mock()
-        self.test_handler = TestReplicationClientHandler()
-        self.test_handler.factory = handler_factory
+        repl_handler = ReplicationClientHandler(hs)
+        repl_handler.handler = self.test_handler
        self.client = ClientReplicationStreamProtocol(
-            "client", "test", clock, self.test_handler
+            hs, "client", "test", clock, repl_handler,
        )

-        # wire them together
-        self.client.makeConnection(FakeTransport(server, reactor))
-        server.makeConnection(FakeTransport(self.client, reactor))
+        self._client_transport = None
+        self._server_transport = None
+
+    def reconnect(self):
+        if self._client_transport:
+            self.client.close()
+
+        if self._server_transport:
+            self.server.close()
+
+        self._client_transport = FakeTransport(self.server, self.reactor)
+        self.client.makeConnection(self._client_transport)
+
+        self._server_transport = FakeTransport(self.client, self.reactor)
+        self.server.makeConnection(self._server_transport)
+
+    def disconnect(self):
+        if self._client_transport:
+            self._client_transport = None
+            self.client.close()
+
+        if self._server_transport:
+            self._server_transport = None
+            self.server.close()

    def replicate(self):
        """Tell the master side of replication that something has happened, and then
@@ -50,29 +74,22 @@ class BaseStreamTestCase(unittest.HomeserverTestCase):
        self.streamer.on_notifier_poke()
        self.pump(0.1)

-    def replicate_stream(self, stream, token="NOW"):
-        """Make the client end a REPLICATE command to set up a subscription to a stream"""
-        self.client.send_command(ReplicateCommand(stream, token))
-
-
-class TestReplicationClientHandler(object):
-    """Drop-in for ReplicationClientHandler which just collects RDATA rows"""

+class TestReplicationClientHandler:
    def __init__(self):
-        self.received_rdata_rows = []
+        self.streams = set()
+        self._received_rdata_rows = []

    def get_streams_to_replicate(self):
-        return {}
-
-    def get_currently_syncing_users(self):
-        return []
-
-    def update_connection(self, connection):
-        pass
-
-    def finished_connecting(self):
-        pass
+        positions = {s: 0 for s in self.streams}
+        for stream, token, _ in self._received_rdata_rows:
+            if stream in self.streams:
+                positions[stream] = max(token, positions.get(stream, 0))
+        return positions

    async def on_rdata(self, stream_name, token, rows):
        for r in rows:
-            self.received_rdata_rows.append((stream_name, token, r))
+            self._received_rdata_rows.append((stream_name, token, r))
+
+    async def on_position(self, stream_name, token):
+        pass
--- a/tests/replication/tcp/streams/test_receipts.py
+++ b/tests/replication/tcp/streams/test_receipts.py
@@ -12,35 +12,68 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-from synapse.replication.tcp.streams._base import ReceiptsStreamRow
+from synapse.replication.tcp.streams._base import ReceiptsStream

 from tests.replication.tcp.streams._base import BaseStreamTestCase

 USER_ID = "@feeling:blue"
-ROOM_ID = "!room:blue"
-EVENT_ID = "$event:blue"


 class ReceiptsStreamTestCase(BaseStreamTestCase):
    def test_receipt(self):
+        self.reconnect()
+
        # make the client subscribe to the receipts stream
-        self.replicate_stream("receipts", "NOW")
+        self.test_handler.streams.add("receipts")

        # tell the master to send a new receipt
        self.get_success(
            self.hs.get_datastore().insert_receipt(
-                ROOM_ID, "m.read", USER_ID, [EVENT_ID], {"a": 1}
+                "!room:blue", "m.read", USER_ID, ["$event:blue"], {"a": 1}
            )
        )
        self.replicate()

        # there should be one RDATA command
-        rdata_rows = self.test_handler.received_rdata_rows
+        self.test_handler.on_rdata.assert_called_once()
+        stream_name, token, rdata_rows = self.test_handler.on_rdata.call_args[0]
+        self.assertEqual(stream_name, "receipts")
        self.assertEqual(1, len(rdata_rows))
-        self.assertEqual(rdata_rows[0][0], "receipts")
-        row = rdata_rows[0][2]  # type: ReceiptsStreamRow
-        self.assertEqual(ROOM_ID, row.room_id)
+        row = rdata_rows[0]  # type: ReceiptsStream.ReceiptsStreamRow
+        self.assertEqual("!room:blue", row.room_id)
        self.assertEqual("m.read", row.receipt_type)
        self.assertEqual(USER_ID, row.user_id)
-        self.assertEqual(EVENT_ID, row.event_id)
+        self.assertEqual("$event:blue", row.event_id)
        self.assertEqual({"a": 1}, row.data)
+
+        # Now let's disconnect and insert some data.
+        self.disconnect()
+
+        self.test_handler.on_rdata.reset_mock()
+
+        self.get_success(
+            self.hs.get_datastore().insert_receipt(
+                "!room2:blue", "m.read", USER_ID, ["$event2:foo"], {"a": 2}
+            )
+        )
+        self.replicate()
+
+        # Nothing should have happened as we are disconnected
+        self.test_handler.on_rdata.assert_not_called()
+
+        self.reconnect()
+        self.pump(0.1)
+
+        # We should now have caught up and get the missing data
+        self.test_handler.on_rdata.assert_called_once()
+        stream_name, token, rdata_rows = self.test_handler.on_rdata.call_args[0]
+        self.assertEqual(stream_name, "receipts")
+        self.assertEqual(token, 3)
+        self.assertEqual(1, len(rdata_rows))
+
+        row = rdata_rows[0]  # type: ReceiptsStream.ReceiptsStreamRow
+        self.assertEqual("!room2:blue", row.room_id)
+        self.assertEqual("m.read", row.receipt_type)
+        self.assertEqual(USER_ID, row.user_id)
+        self.assertEqual("$event2:foo", row.event_id)
+        self.assertEqual({"a": 2}, row.data)
--- a/tests/storage/test_devices.py
+++ b/tests/storage/test_devices.py
@@ -88,51 +88,6 @@ class DeviceStoreTestCase(tests.unittest.TestCase):
        # Check original device_ids are contained within these updates
        self._check_devices_in_updates(device_ids, device_updates)

-    @defer.inlineCallbacks
-    def test_get_device_updates_by_remote_limited(self):
-        # Test breaking the update limit in 1, 101, and 1 device_id segments
-
-        # first add one device
-        device_ids1 = ["device_id0"]
-        yield self.store.add_device_change_to_streams(
-            "user_id", device_ids1, ["someotherhost"]
-        )
-
-        # then add 101
-        device_ids2 = ["device_id" + str(i + 1) for i in range(101)]
-        yield self.store.add_device_change_to_streams(
-            "user_id", device_ids2, ["someotherhost"]
-        )
-
-        # then one more
-        device_ids3 = ["newdevice"]
-        yield self.store.add_device_change_to_streams(
-            "user_id", device_ids3, ["someotherhost"]
-        )
-
-        #
-        # now read them back.
-        #
-
-        # first we should get a single update
-        now_stream_id, device_updates = yield self.store.get_device_updates_by_remote(
-            "someotherhost", -1, limit=100
-        )
-        self._check_devices_in_updates(device_ids1, device_updates)
-
-        # Then we should get an empty list back as the 101 devices broke the limit
-        now_stream_id, device_updates = yield self.store.get_device_updates_by_remote(
-            "someotherhost", now_stream_id, limit=100
-        )
-        self.assertEqual(len(device_updates), 0)
-
-        # The 101 devices should've been cleared, so we should now just get one device
-        # update
-        now_stream_id, device_updates = yield self.store.get_device_updates_by_remote(
-            "someotherhost", now_stream_id, limit=100
-        )
-        self._check_devices_in_updates(device_ids3, device_updates)
-
    def _check_devices_in_updates(self, expected_device_ids, device_updates):
        """Check that an specific device ids exist in a list of device update EDUs"""
        self.assertEqual(len(device_updates), len(expected_device_ids))
Author	SHA1	Message	Date
Erik Johnston	83ecaeecbf	dkjfhsdklfhsdlkjf	2020-03-25 14:55:02 +00:00
Erik Johnston	0473f87a17	Pass instance name through to rdata	2020-03-25 14:05:53 +00:00
Erik Johnston	092b62ee7b	fixup! Thread through instance name to replication client	2020-03-25 11:41:38 +00:00
Erik Johnston	b6f6f5c399	Add replication listeners to wall workers	2020-03-25 11:34:56 +00:00
Erik Johnston	f7da931d62	PEP8 ???	2020-03-25 11:34:43 +00:00
Erik Johnston	9f15bffd72	Thread through instance name to replication client	2020-03-25 11:34:10 +00:00
Erik Johnston	6da24f2d5f	Merge branch 'erikj/catchup_on_worker' of github.com:matrix-org/synapse into erikj/split_out_fed_stream	2020-03-25 10:55:23 +00:00
Erik Johnston	5473f1806a	Change stream_positions to include instance name	2020-03-25 10:51:46 +00:00
Erik Johnston	f6e7daaac3	Add instance name to command	2020-03-25 10:21:22 +00:00
Erik Johnston	309c7eb1a1	Add some type aliases	2020-03-24 17:43:42 +00:00
Erik Johnston	f8038f4670	Fix HTTP update_function	2020-03-24 17:31:51 +00:00
Erik Johnston	9ea391054f	DFSDJFDSLKF	2020-03-24 17:27:50 +00:00
Erik Johnston	604f57f1bd	Merge branch 'erikj/catchup_on_worker' into erikj/split_out_typing	2020-03-24 17:21:26 +00:00
Erik Johnston	bd64b8fcd5	Fixup push rules stream	2020-03-24 16:52:17 +00:00
Erik Johnston	309aee4636	Move calling http replication out of base stream	2020-03-24 16:20:05 +00:00
Erik Johnston	e4c5b1d9d6	Review comments	2020-03-24 16:00:54 +00:00
Erik Johnston	7eec84bfbe	Shuffle around code typing handlers	2020-03-24 15:54:38 +00:00
Erik Johnston	4dd08f2501	Make ReplicationStreamer work on workers	2020-03-24 15:53:52 +00:00
Erik Johnston	55dfcd2f09	Add redis support	2020-03-24 15:04:18 +00:00
Erik Johnston	11fb08ffa9	mypy	2020-03-24 15:03:59 +00:00
Erik Johnston	ef4f063687	Move command processing out of transport	2020-03-24 14:17:18 +00:00
Erik Johnston	2380e401e4	Remove import loop	2020-03-24 11:47:57 +00:00
Erik Johnston	5d810c36a8	mypy	2020-03-24 10:06:15 +00:00
Erik Johnston	ea17e939df	Add CLEAR_USER_SYNCS command that is sent on shutdown. This should help with the case where a synchrotron gets restarted gracefully, rather than rely on 5 minute timeout.	2020-03-23 18:55:58 +00:00
Erik Johnston	225b993cf6	Remove `conn_id` usage for UserSyncCommand. Each tcp replication connection is assigned a "conn_id", which is used to give an ID to a remotely connected worker. In a redis world, there will no longer be a one to one mapping between connection and instance, so instead we need to replace such usages with an ID generated by the remote instances and included in the replicaiton commands. This really only effects UserSyncCommand.	2020-03-23 18:52:24 +00:00
Erik Johnston	3204b0e79f	Handle connection closing under us	2020-03-23 18:29:21 +00:00
Erik Johnston	ba1a8be930	Review comments	2020-03-23 16:13:12 +00:00
Erik Johnston	a2070a2c4e	Remove unused 'stream' param of REPLICATE and update docs	2020-03-23 14:56:22 +00:00
Erik Johnston	4f2a803c66	Merge branch 'develop' of github.com:matrix-org/synapse into erikj/catchup_on_worker	2020-03-23 14:49:07 +00:00
Richard van der Hoff	a564b92d37	Convert `*StreamRow` classes to inner classes (#7116 ) This just helps keep the rows closer to their streams, so that it's easier to see what the format of each stream is.	2020-03-23 13:59:11 +00:00
Richard van der Hoff	5126cb1253	Merge branch 'master' into develop	2020-03-23 13:54:29 +00:00
Richard van der Hoff	229eb81498	Merge tag 'v1.12.0' Synapse 1.12.0 (2020-03-23) =========================== No significant changes since 1.12.0rc1. Debian packages and Docker images are rebuilt using the latest versions of dependency libraries, including Twisted 20.3.0. Please see security advisory below. Security advisory ----------------- Synapse may be vulnerable to request-smuggling attacks when it is used with a reverse-proxy. The vulnerabilties are fixed in Twisted 20.3.0, and are described in [CVE-2020-10108](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-10108) and [CVE-2020-10109](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-10109). For a good introduction to this class of request-smuggling attacks, see https://portswigger.net/research/http-desync-attacks-request-smuggling-reborn. We are not aware of these vulnerabilities being exploited in the wild, and do not believe that they are exploitable with current versions of any reverse proxies. Nevertheless, we recommend that all Synapse administrators ensure that they have the latest versions of the Twisted library to ensure that their installation remains secure. * Administrators using the [`matrix.org` Docker image](https://hub.docker.com/r/matrixdotorg/synapse/) or the [Debian/Ubuntu packages from `matrix.org`](https://github.com/matrix-org/synapse/blob/master/INSTALL.md#matrixorg-packages) should ensure that they have version 1.12.0 installed: these images include Twisted 20.3.0. * Administrators who have [installed Synapse from source](https://github.com/matrix-org/synapse/blob/master/INSTALL.md#installing-from-source) should upgrade Twisted within their virtualenv by running: ```sh <path_to_virtualenv>/bin/pip install 'Twisted>=20.3.0' ``` * Administrators who have installed Synapse from distribution packages should consult the information from their distributions. The `matrix.org` Synapse instance was not vulnerable to these vulnerabilities. Advance notice of change to the default `git` branch for Synapse ---------------------------------------------------------------- Currently, the default `git` branch for Synapse is `master`, which tracks the latest release. After the release of Synapse 1.13.0, we intend to change this default to `develop`, which is the development tip. This is more consistent with common practice and modern `git` usage. Although we try to keep `develop` in a stable state, there may be occasions where regressions creep in. Developers and distributors who have scripts which run builds using the default branch of `Synapse` should therefore consider pinning their scripts to `master`. Synapse 1.12.0rc1 (2020-03-19) ============================== Features -------- - Changes related to room alias management ([MSC2432](https://github.com/matrix-org/matrix-doc/pull/2432)): - Publishing/removing a room from the room directory now requires the user to have a power level capable of modifying the canonical alias, instead of the room aliases. ([\#6965](https://github.com/matrix-org/synapse/issues/6965)) - Validate the `alt_aliases` property of canonical alias events. ([\#6971](https://github.com/matrix-org/synapse/issues/6971)) - Users with a power level sufficient to modify the canonical alias of a room can now delete room aliases. ([\#6986](https://github.com/matrix-org/synapse/issues/6986)) - Implement updated authorization rules and redaction rules for aliases events, from [MSC2261](https://github.com/matrix-org/matrix-doc/pull/2261) and [MSC2432](https://github.com/matrix-org/matrix-doc/pull/2432). ([\#7037](https://github.com/matrix-org/synapse/issues/7037)) - Stop sending m.room.aliases events during room creation and upgrade. ([\#6941](https://github.com/matrix-org/synapse/issues/6941)) - Synapse no longer uses room alias events to calculate room names for push notifications. ([\#6966](https://github.com/matrix-org/synapse/issues/6966)) - The room list endpoint no longer returns a list of aliases. ([\#6970](https://github.com/matrix-org/synapse/issues/6970)) - Remove special handling of aliases events from [MSC2260](https://github.com/matrix-org/matrix-doc/pull/2260) added in v1.10.0rc1. ([\#7034](https://github.com/matrix-org/synapse/issues/7034)) - Expose the `synctl`, `hash_password` and `generate_config` commands in the snapcraft package. Contributed by @devec0. ([\#6315](https://github.com/matrix-org/synapse/issues/6315)) - Check that server_name is correctly set before running database updates. ([\#6982](https://github.com/matrix-org/synapse/issues/6982)) - Break down monthly active users by `appservice_id` and emit via Prometheus. ([\#7030](https://github.com/matrix-org/synapse/issues/7030)) - Render a configurable and comprehensible error page if something goes wrong during the SAML2 authentication process. ([\#7058](https://github.com/matrix-org/synapse/issues/7058), [\#7067](https://github.com/matrix-org/synapse/issues/7067)) - Add an optional parameter to control whether other sessions are logged out when a user's password is modified. ([\#7085](https://github.com/matrix-org/synapse/issues/7085)) - Add prometheus metrics for the number of active pushers. ([\#7103](https://github.com/matrix-org/synapse/issues/7103), [\#7106](https://github.com/matrix-org/synapse/issues/7106)) - Improve performance when making HTTPS requests to sygnal, sydent, etc, by sharing the SSL context object between connections. ([\#7094](https://github.com/matrix-org/synapse/issues/7094)) Bugfixes -------- - When a user's profile is updated via the admin API, also generate a displayname/avatar update for that user in each room. ([\#6572](https://github.com/matrix-org/synapse/issues/6572)) - Fix a couple of bugs in email configuration handling. ([\#6962](https://github.com/matrix-org/synapse/issues/6962)) - Fix an issue affecting worker-based deployments where replication would stop working, necessitating a full restart, after joining a large room. ([\#6967](https://github.com/matrix-org/synapse/issues/6967)) - Fix `duplicate key` error which was logged when rejoining a room over federation. ([\#6968](https://github.com/matrix-org/synapse/issues/6968)) - Prevent user from setting 'deactivated' to anything other than a bool on the v2 PUT /users Admin API. ([\#6990](https://github.com/matrix-org/synapse/issues/6990)) - Fix py35-old CI by using native tox package. ([\#7018](https://github.com/matrix-org/synapse/issues/7018)) - Fix a bug causing `org.matrix.dummy_event` to be included in responses from `/sync`. ([\#7035](https://github.com/matrix-org/synapse/issues/7035)) - Fix a bug that renders UTF-8 text files incorrectly when loaded from media. Contributed by @TheStranjer. ([\#7044](https://github.com/matrix-org/synapse/issues/7044)) - Fix a bug that would cause Synapse to respond with an error about event visibility if a client tried to request the state of a room at a given token. ([\#7066](https://github.com/matrix-org/synapse/issues/7066)) - Repair a data-corruption issue which was introduced in Synapse 1.10, and fixed in Synapse 1.11, and which could cause `/sync` to return with 404 errors about missing events and unknown rooms. ([\#7070](https://github.com/matrix-org/synapse/issues/7070)) - Fix a bug causing account validity renewal emails to be sent even if the feature is turned off in some cases. ([\#7074](https://github.com/matrix-org/synapse/issues/7074)) Improved Documentation ---------------------- - Updated CentOS8 install instructions. Contributed by Richard Kellner. ([\#6925](https://github.com/matrix-org/synapse/issues/6925)) - Fix `POSTGRES_INITDB_ARGS` in the `contrib/docker/docker-compose.yml` example docker-compose configuration. ([\#6984](https://github.com/matrix-org/synapse/issues/6984)) - Change date in [INSTALL.md](./INSTALL.md#tls-certificates) for last date of getting TLS certificates to November 2019. ([\#7015](https://github.com/matrix-org/synapse/issues/7015)) - Document that the fallback auth endpoints must be routed to the same worker node as the register endpoints. ([\#7048](https://github.com/matrix-org/synapse/issues/7048)) Deprecations and Removals ------------------------- - Remove the unused query_auth federation endpoint per [MSC2451](https://github.com/matrix-org/matrix-doc/pull/2451). ([\#7026](https://github.com/matrix-org/synapse/issues/7026)) Internal Changes ---------------- - Add type hints to `logging/context.py`. ([\#6309](https://github.com/matrix-org/synapse/issues/6309)) - Add some clarifications to `README.md` in the database schema directory. ([\#6615](https://github.com/matrix-org/synapse/issues/6615)) - Refactoring work in preparation for changing the event redaction algorithm. ([\#6874](https://github.com/matrix-org/synapse/issues/6874), [\#6875](https://github.com/matrix-org/synapse/issues/6875), [\#6983](https://github.com/matrix-org/synapse/issues/6983), [\#7003](https://github.com/matrix-org/synapse/issues/7003)) - Improve performance of v2 state resolution for large rooms. ([\#6952](https://github.com/matrix-org/synapse/issues/6952), [\#7095](https://github.com/matrix-org/synapse/issues/7095)) - Reduce time spent doing GC, by freezing objects on startup. ([\#6953](https://github.com/matrix-org/synapse/issues/6953)) - Minor perfermance fixes to `get_auth_chain_ids`. ([\#6954](https://github.com/matrix-org/synapse/issues/6954)) - Don't record remote cross-signing keys in the `devices` table. ([\#6956](https://github.com/matrix-org/synapse/issues/6956)) - Use flake8-comprehensions to enforce good hygiene of list/set/dict comprehensions. ([\#6957](https://github.com/matrix-org/synapse/issues/6957)) - Merge worker apps together. ([\#6964](https://github.com/matrix-org/synapse/issues/6964), [\#7002](https://github.com/matrix-org/synapse/issues/7002), [\#7055](https://github.com/matrix-org/synapse/issues/7055), [\#7104](https://github.com/matrix-org/synapse/issues/7104)) - Remove redundant `store_room` call from `FederationHandler._process_received_pdu`. ([\#6979](https://github.com/matrix-org/synapse/issues/6979)) - Update warning for incorrect database collation/ctype to include link to documentation. ([\#6985](https://github.com/matrix-org/synapse/issues/6985)) - Add some type annotations to the database storage classes. ([\#6987](https://github.com/matrix-org/synapse/issues/6987)) - Port `synapse.handlers.presence` to async/await. ([\#6991](https://github.com/matrix-org/synapse/issues/6991), [\#7019](https://github.com/matrix-org/synapse/issues/7019)) - Add some type annotations to the federation base & client classes. ([\#6995](https://github.com/matrix-org/synapse/issues/6995)) - Port `synapse.rest.keys` to async/await. ([\#7020](https://github.com/matrix-org/synapse/issues/7020)) - Add a type check to `is_verified` when processing room keys. ([\#7045](https://github.com/matrix-org/synapse/issues/7045)) - Add type annotations and comments to the auth handler. ([\#7063](https://github.com/matrix-org/synapse/issues/7063))	2020-03-23 13:54:17 +00:00
Richard van der Hoff	b3cee0ce67	Fix processing of `groups` stream, and use symbolic names for streams (#7117 ) `groups` != `receipts` Introduced in #6964	2020-03-23 11:39:36 +00:00
Dionysis Grigoropoulos	96071eea8f	Set Referrer-Policy to no-referrer for media (#7009 )	2020-03-23 09:48:28 +00:00
Patrick Cloke	477c4f5b1c	Clean-up some auth/login REST code (#7115 )	2020-03-20 16:22:47 -04:00
Erik Johnston	259cdffa96	Newsfile	2020-03-20 15:31:53 +00:00
Erik Johnston	32c656865a	Always subscribe to all streams. This already happens since the worker merge.	2020-03-20 15:31:52 +00:00
Erik Johnston	8734b75ca8	Remove unused token param from REPLICATE cmd	2020-03-20 15:31:51 +00:00
Erik Johnston	1f83255de1	Move stream catchup to workers.	2020-03-20 15:31:49 +00:00
Erik Johnston	ba90596687	Add ability to catchup on stream by talking to master.	2020-03-20 15:31:47 +00:00
Erik Johnston	811d2ecf2e	Don't panic if streams get behind. The catchup will in future happen on workers, so master process won't need to protect itself by dropping the connection.	2020-03-20 15:31:45 +00:00
Erik Johnston	7233d38690	Move stream fetch DB queries to worker stores.	2020-03-20 15:31:43 +00:00
Richard van der Hoff	c165c1233b	Improve database configuration docs (#6988 ) Attempts to clarify the sample config for databases, and add some stuff about tcp keepalives to `postgres.md`.	2020-03-20 15:24:22 +00:00
Erik Johnston	fdb1344716	Remove concept of a non-limited stream. (#7011 )	2020-03-20 14:40:47 +00:00
Patrick Cloke	caec7d4fa0	Convert some of the media REST code to async/await (#7110 )	2020-03-20 07:20:02 -04:00
Patrick Cloke	c2db6599c8	Fix a bug in the federation API which could cause occasional "Failed to get PDU" errors (#7089 ).	2020-03-19 08:22:56 -04:00
Erik Johnston	a319cb1dd1	Change device list streams to have one row per ID (#7010 ) * Add 'device_lists_outbound_pokes' as extra table. This makes sure we check all the relevant tables to get the current max stream ID. Currently not doing so isn't problematic as the max stream ID in `device_lists_outbound_pokes` is the same as in `device_lists_stream`, however that will change. * Change device lists stream to have one row per id. This will make it possible to process the streams more incrementally, avoiding having to process large chunks at once. * Change device list replication to match new semantics. Instead of sending down batches of user ID/host tuples, send down a row per entity (user ID or host). * Newsfile * Remove handling of multiple rows per ID * Fix worker handling * Comments from review	2020-03-19 11:36:53 +00:00
Erik Johnston	6e6476ef07	Comments from review	2020-03-18 10:13:55 +00:00
Richard van der Hoff	4ce50519cd	Update postgres.md fix broken link	2020-03-17 18:08:43 +00:00
Erik Johnston	65a941d1f8	Merge branch 'develop' of github.com:matrix-org/synapse into erikj/fixup_devices_stream	2020-03-02 16:55:55 +00:00
Erik Johnston	e53744c737	Fix worker handling	2020-03-02 12:52:28 +00:00
Erik Johnston	f70f44abc7	Remove handling of multiple rows per ID	2020-02-28 11:45:35 +00:00
Erik Johnston	59ad93d2a4	Newsfile	2020-02-28 11:27:37 +00:00
Erik Johnston	9ce4e344a8	Change device list replication to match new semantics. Instead of sending down batches of user ID/host tuples, send down a row per entity (user ID or host).	2020-02-28 11:25:34 +00:00
Erik Johnston	f5caa1864e	Change device lists stream to have one row per id. This will make it possible to process the streams more incrementally, avoiding having to process large chunks at once.	2020-02-28 11:21:25 +00:00
Erik Johnston	c3c6c0e622	Add 'device_lists_outbound_pokes' as extra table. This makes sure we check all the relevant tables to get the current max stream ID. Currently not doing so isn't problematic as the max stream ID in `device_lists_outbound_pokes` is the same as in `device_lists_stream`, however that will change.	2020-02-28 11:15:11 +00:00
				`@@ -0,0 +1 @@`
				`Improve the documentation for database configuration.`
				`@@ -0,0 +1 @@`
				Set `Referrer-Policy` header to `no-referrer` on media downloads.
				`@@ -0,0 +1 @@`
				`Change device list streams to have one row per ID.`
				`@@ -0,0 +1 @@`
				`Move catchup of replication streams logic to worker.`
				`@@ -0,0 +1 @@`
				`Fix a bug in the federation API which could cause occasional "Failed to get PDU" errors.`
				`@@ -0,0 +1 @@`
				`Convert some of synapse.rest.media to async/await.`
				`@@ -0,0 +1 @@`
				`De-duplicate / remove unused REST code for login and auth.`
				`@@ -0,0 +1 @@`
				Convert `*StreamRow` classes to inner classes.
				`@@ -0,0 +1 @@`
				`Fix a bug which meant that groups updates were not correctly replicated between workers.`