Spawning a background process comes with a bunch of overhead, so let's
try to reduce the number of background processes we need to spawn when
handling inbound fed.
Currently, we seem to be doing roughly one per command. Instead, lets
keep the background process alive for a bit waiting for a new command to
come in.
I noticed this in some profiling. Basically, we prune the ratelimiters
by copying and iterating over every entry every 60 seconds. Instead,
let's use a wheel timer to track when we should potentially prune a
given key, and then we a) check fewer keys, and b) can run more
frequently. Hopefully this should mean we don't have a large pause
everytime we prune a ratelimiter with lots of keys.
Also fixes a bug where we didn't prune entries that were added via
`record_action` and never subsequently updated. This affected the media
and joins-per-room ratelimiter.
This regressed in https://github.com/element-hq/synapse/pull/19121. I
moved things in https://github.com/element-hq/synapse/pull/19121 because
I thought that it made sense to redirect anything printed to
`stdout`/`stderr` to the logs as early as possible. But we actually want
to log any immediately apparent problems during initialization to
`stderr` in the terminal so that they are obvious and visible to the
operator.
Now, I've moved `redirect_stdio_to_logs()` back to where it was
previously along with some proper comment context for why we have it
there.
- Move `register_start` (calls `os._exit(1)`) out of `setup` (our
composable function)
- We want to avoid `exit(...)` because we use these composable functions
in Synapse Pro for small hosts where we have multiple Synapse instances
running in the same process. We don't want a problem from one homeserver
tenant causing the entire Python process to exit and affect all of the
other homeserver tenants.
- Continuation of https://github.com/element-hq/synapse/pull/19116
- Align our app entrypoints: `homeserver` (main), `generic_worker`
(worker), and `admin_cmd`
### Background
As part of Element's plan to support a light form of vhosting (virtual
host) (multiple instances of Synapse in the same Python process) (c.f
Synapse Pro for small hosts), we're currently diving into the details
and implications of running multiple instances of Synapse in the same
Python process.
"Clean tenant provisioning" tracked internally by
https://github.com/element-hq/synapse-small-hosts/issues/48
Move exception handling up the stack (avoid `exit(1)` in our composable
functions)
Relevant to Synapse Pro for small hosts as we don't want to exit the
entire Python process and affect all homeserver tenants.
### Background
As part of Element's plan to support a light form of vhosting (virtual
host) (multiple instances of Synapse in the same Python process) (c.f
Synapse Pro for small hosts), we're currently diving into the details
and implications of running multiple instances of Synapse in the same
Python process.
"Clean tenant provisioning" tracked internally by
https://github.com/element-hq/synapse-small-hosts/issues/48
Be mindful that Synapse can be run alongside other code in the same
Python process. We shouldn't overwrite fields on given log record unless
we know it's relevant to Synapse.
(no clobber)
### Background
As part of Element's plan to support a light form of vhosting (virtual
host) (multiple instances of Synapse in the same Python process), we're
currently diving into the details and implications of running multiple
instances of Synapse in the same Python process.
"Per-tenant logging" tracked internally by
https://github.com/element-hq/synapse-small-hosts/issues/48
The schema lint tries to make sure we don't add or remove indices in
schema files (rather than as background updates), *unless* the table was
created in the same schema file.
The regex to pull out the `CREATE TABLE` SQL incorrectly didn't
recognise `IF NOT EXISTS`.
There is a test delta file that shows that we accept different types of
`CREATE TABLE` and `CREATE INDEX` statements, as well as an index
creation that doesn't have a matching create table (to show that we do
still catch it). The test delta should be removed before merge.
Fix lost logcontext when using `timeout_deferred(...)` and things
actually timeout.
Fix https://github.com/element-hq/synapse/issues/19087 (our HTTP client
times out requests using `timeout_deferred(...)`
Fix https://github.com/element-hq/synapse/issues/19066 (`/sync` uses
`notifier.wait_for_events()` which uses `timeout_deferred(...)` under
the hood)
### When/why did these lost logcontext warnings start happening?
```
synapse.logging.context - 107 - WARNING - sentinel - Expected logging context call_later but found POST-2453
synapse.logging.context - 107 - WARNING - sentinel - Expected logging context call_later was lost
```
In https://github.com/element-hq/synapse/pull/18828, we switched
`timeout_deferred(...)` from using `reactor.callLater(...)` to
[`clock.call_later(...)`](3b59ac3b69/synapse/util/clock.py (L224-L313))
under the hood. This meant it started dealing with logcontexts but our
`time_it_out()` callback didn't follow our [Synapse logcontext
rules](3b59ac3b69/docs/log_contexts.md).