Tidy up documentation a bit
This commit is contained in:
+9
-118
@@ -1,135 +1,27 @@
|
||||
TODO: Update with final contents of README after PR #70 merged in rust-synapse-compress-state repo
|
||||
|
||||
# State compressor
|
||||
|
||||
The state compressor is an **experimental** tool that attempts to reduce the number of rows
|
||||
in the `state_groups_state` table inside of a postgres database.
|
||||
|
||||
## Introduction to the state tables and compression
|
||||
### What is state?
|
||||
State is things like who is in a room, what the room topic/name is, who has
|
||||
what privilege levels etc. Synapse keeps track of it so that it can spot invalid
|
||||
events (e.g. ones sent by banned users, or by people with insufficient privilege).
|
||||
|
||||
### What is a state group?
|
||||
|
||||
Synapse needs to keep track of the state at the moment of each event. A state group
|
||||
corresponds to a unique state. The database table `event_to_state_groups` keeps track
|
||||
of the mapping from event ids to state group ids.
|
||||
|
||||
Consider the following simplified example:
|
||||
```
|
||||
State group id | State
|
||||
_____________________________________________
|
||||
1 | Alice in room
|
||||
2 | Alice in room, Bob in room
|
||||
3 | Bob in room
|
||||
|
||||
|
||||
Event id | What the event was
|
||||
______________________________________
|
||||
1 | Alice sends a message
|
||||
3 | Bob joins the room
|
||||
4 | Bob sends a message
|
||||
5 | Alice leaves the room
|
||||
6 | Bob sends a message
|
||||
|
||||
|
||||
Event id | State group id
|
||||
_________________________
|
||||
1 | 1
|
||||
2 | 1
|
||||
3 | 2
|
||||
4 | 2
|
||||
5 | 3
|
||||
6 | 3
|
||||
```
|
||||
### What are deltas and predecessors?
|
||||
When a new state event happens (e.g. Bob joins the room) a new state group is created.
|
||||
BUT instead of copying all of the state from the previous state group, we just store
|
||||
the change from the previous group (saving on lots of storage space!). The difference
|
||||
from the previous state group is called the "delta"
|
||||
|
||||
So for the previous example we would have the following (Note only rows 1 and 2 will
|
||||
make sense at this point):
|
||||
|
||||
```
|
||||
State group id | Previous state group id | Delta
|
||||
____________________________________________________________
|
||||
1 | NONE | Alice in room
|
||||
2 | 1 | Bob in room
|
||||
3 | NONE | Bob in room
|
||||
```
|
||||
So why is state group 3's previous state group NONE and not 2? Well the way that deltas
|
||||
work in synapse is that they can only add in new state or overwrite old state, but they
|
||||
cannot remove it. (So if the room topic is changed then that is just overwriting state,
|
||||
but removing alice from the room is neither an addition or an overwriting). If it is
|
||||
impossible to find a delta, then you just start from scratch again with a "snapshot" of
|
||||
the entire state.
|
||||
|
||||
(NOTE this is not documentation on how synapse handles leaving rooms but is purely for illustrative
|
||||
purposes)
|
||||
|
||||
The state of a state group is worked out by following the previous state group's and adding
|
||||
together all of the deltas (with the most recent taking precedence).
|
||||
|
||||
The mapping from state group to previous state group takes place in `state_group_edges`
|
||||
and the deltas are stored in `state_groups_state`
|
||||
|
||||
### What are we compressing then?
|
||||
In order to speed up the converstion from state group id to state, there is a limit of 100
|
||||
hops set by synapse (that is: we will only ever have to lookup the deltas for a maximum of
|
||||
100 state groups). It does this by taking another "snapshot" every 100 state groups.
|
||||
|
||||
However, it is these snapshots that take up the bulk of the storage in a synapse database,
|
||||
so we want to find a way to reduce the number of them without dramatically increasing the
|
||||
maximum number of hops needed to do lookups.
|
||||
|
||||
|
||||
## Compression Algorithm
|
||||
|
||||
The algorithm works by attempting to create a *tree* of deltas, produced by
|
||||
appending state groups to different "levels". Each level has a maximum size, where
|
||||
each state group is appended to the lowest level that is not full. This tool calls a
|
||||
state group "compressed" once it has been added to
|
||||
one of these levels.
|
||||
|
||||
This produces a graph that looks approximately like the following, in the case
|
||||
of having two levels with the bottom level (L1) having a maximum size of 3:
|
||||
|
||||
```
|
||||
L2 <-------------------- L2 <---------- ...
|
||||
^--- L1 <--- L1 <--- L1 ^--- L1 <--- L1 <--- L1
|
||||
|
||||
NOTE: A <--- B means that state group B's predecessor is A
|
||||
```
|
||||
The structure that synapse creates by default would be equivalent to having one level with
|
||||
a maximum length of 100.
|
||||
|
||||
**Note**: Increasing the sum of the sizes of levels will increase the time it
|
||||
takes to query the full state of a given state group.
|
||||
in the `state_groups_state` table inside of a postgres database. Documentation on how it works
|
||||
can be found on [its github repository](https://github.com/matrix-org/rust-synapse-compress-state).
|
||||
|
||||
## Enabling the state compressor
|
||||
|
||||
The state compressor requires the python library for the `auto_compressor` tool to be
|
||||
installed. Instructions for this can be found in the `README.md` file
|
||||
in the <a href=https://github.com/matrix-org/rust-synapse-compress-state>source repo</a> .
|
||||
installed. Instructions for this can be found in [the `python.md` file in the source
|
||||
repo](https://github.com/matrix-org/rust-synapse-compress-state/blob/main/docs/python.md).
|
||||
|
||||
The following configuration options are provided:
|
||||
|
||||
- `chunk_size`
|
||||
The rough number of state groups to work on at once. All of the entries from
|
||||
The number of state groups to work on at once. All of the entries from
|
||||
`state_groups_state` are requested from the database for state groups that are
|
||||
worked on. Therefore small chunk sizes may be needed on machines with low memory.
|
||||
Note: if the compressor fails to find space savings on the chunk as a whole
|
||||
(which may well happen in rooms with lots of backfill in) then the entire chunk
|
||||
is skipped. This defaults to 500
|
||||
is skipped. This defaults to 500
|
||||
|
||||
|
||||
- `number_of_rooms`
|
||||
The compressor will identify the rooms with the most uncompressed state and run on
|
||||
this many of them. This defaults to 5
|
||||
|
||||
- `number_of_chunks`
|
||||
The compressor will stop once it has finished compressing this many chunks. Defaults to 100
|
||||
|
||||
- `default_levels`
|
||||
Sizes of each new level in the compression algorithm, as a comma separated list.
|
||||
@@ -140,7 +32,6 @@ the levels effect the performance of fetching the state from the database, as th
|
||||
sum of the sizes is the upper bound on number of iterations needed to fetch a
|
||||
given set of state. This defaults to "100,50,25"
|
||||
|
||||
|
||||
- `time_between_runs`
|
||||
This controls how often the state compressor is run. This defaults to once every
|
||||
day.
|
||||
@@ -150,7 +41,7 @@ An example configuration:
|
||||
state_compressor:
|
||||
enabled: true
|
||||
chunk_size: 500
|
||||
number_of_rooms: 5
|
||||
number_of_chunks: 5
|
||||
default_levels: 100,50,25
|
||||
time_between_runs: 1d
|
||||
```
|
||||
@@ -36,7 +36,7 @@ class StateCompressorConfig(Config):
|
||||
raise ConfigError from e
|
||||
|
||||
self.compressor_chunk_size = compressor_config.get("chunk_size") or 500
|
||||
self.compressor_number_of_chunks = compressor_config.get("number_of_chunks") or 50
|
||||
self.compressor_number_of_chunks = compressor_config.get("number_of_chunks") or 100
|
||||
self.compressor_default_levels = (
|
||||
compressor_config.get("default_levels") or "100,50,25"
|
||||
)
|
||||
@@ -67,7 +67,7 @@ class StateCompressorConfig(Config):
|
||||
#
|
||||
#chunk_size: 1000
|
||||
|
||||
# The number of chunks to compress on each run. Defaults to 50.
|
||||
# The number of chunks to compress on each run. Defaults to 100.
|
||||
#
|
||||
#number_of_chunks: 1
|
||||
|
||||
@@ -87,6 +87,7 @@ _STATE_COMPRESSOR_SCHEMA = {
|
||||
"properties": {
|
||||
"enabled": {"type": "boolean"},
|
||||
"chunk_size": {"type": "number"},
|
||||
"number_of_chunks": {"type": "number"},
|
||||
"default_levels": {"type": "string"},
|
||||
"time_between_runs": {"type": "string"},
|
||||
},
|
||||
|
||||
Reference in New Issue
Block a user