Rewrite docs and remove log context rules (no longer relevant)

2025-08-27 22:46:58 -05:00
parent 0c8759bbb6
commit 3e66e0a1b8
1 changed files with 64 additions and 283 deletions
--- a/docs/log_contexts.md
+++ b/docs/log_contexts.md
@@ -1,12 +1,12 @@
 # Log Contexts

 To help track the processing of individual requests, synapse uses a
-'`log context`' to track which request it is handling at any given
-moment. This is done via a thread-local variable; a `logging.Filter` is
-then used to fish the information back out of the thread-local variable
+`LoggingContext` to track which request it is handling at any given
+moment. This is done via a `ContextVar` variable; a `logging.Filter` is
+then used to fish the information back out of the `ContextVar` variable
 and add it to each log record.

-Logcontexts are also used for CPU and database accounting, so that we
+Log contexts are also used for CPU and database accounting, so that we
 can track which requests were responsible for high CPU use or database
 activity.

@@ -14,18 +14,11 @@ The `synapse.logging.context` module provides facilities for managing
 the current log context (as well as providing the `LoggingContextFilter`
 class).

-Asynchronous functions make the whole thing complicated, so this document describes
-how it all works, and how to write code which follows the rules.
-
-In this document, "awaitable" refers to any object which can be `await`ed. In the context of
-Synapse, that normally means either a coroutine or a Twisted 
+In this document, "awaitable" refers to any object which can be `await`ed. In the
+context of Synapse, that normally means either a coroutine or a Twisted
 [`Deferred`](https://twistedmatrix.com/documents/current/api/twisted.internet.defer.Deferred.html).

-## Logcontexts without asynchronous code
-
-In the absence of any asynchronous voodoo, things are simple enough. As with
-any code of this nature, the rule is that our function should leave
-things as it found them:
+## Basic usage

 ```python
 from synapse.logging import context         # omitted from future snippets
@@ -45,7 +38,7 @@ def do_request_handling():
    logger.debug("phew")  # this will be logged against request_id
 ```

-LoggingContext implements the context management methods, so the above
+`LoggingContext` implements the context management methods, so the above
 can be written much more succinctly as:

 ```python
@@ -59,197 +52,76 @@ def do_request_handling():
    logger.debug("phew")
 ```

-## Using logcontexts with awaitables
+### The `sentinel` context

-Awaitables break the linear flow of code so that there is no longer a single entry point
-where we should set the logcontext and a single exit point where we should remove it.
+The default context is `context.SENTINEL_CONTEXT`, which is a sentinel value to
+represent the root context. This is what is used when there is no other context set.

-Consider the example above, where `do_request_handling` needs to do some
-blocking operation, and returns an awaitable:
+No CPU/database usage metrics are recorded against the `sentinel` context.
+
+Ideally, nothing from the Synapse homeserver would be logged against the `sentinel`
+context as we want to know where the logs came from. In practice, this is not always the
+case yet especially outside of request handling.
+
+Previously, the `sentinel` context played a bigger role when we had to carefully deal
+with thread-local storage; as we had to make sure to not leak another context to another
+task after we gave up control to the reactor so we set the 
+
+
+
+### `PreserveLoggingContext`
+
+In a similar vein of no longer as relevant, `PreserveLoggingContext` is another context
+manager helper and a little bit of syntactic sugar to set the current log context
+(without finishing it) and restore the previous context on exit.

 ```python
-async def handle_request(request_id):
-    with context.LoggingContext() as request_context:
-        request_context.request = request_id
-        await do_request_handling()
+import logging
+from synapse.logging.context import LoggingContext
+
+logger = logging.getLogger(__name__)
+
+def main() -> None:
+    with context.LoggingContext("main"):
+        task_context = context.LoggingContext("task")
+
+        with task_context:
+            logger.debug("foo")
+
+        # Bad: will throw an error because `task_context` is already finished
+        with task_context:
+            logger.debug("bar")
+
        logger.debug("finished")
 ```

-In the above flow:
-
-   The logcontext is set
-   `do_request_handling` is called, and returns an awaitable
-   `handle_request` awaits the awaitable
-   Execution of `handle_request` is suspended
-
-So we have stopped processing the request (and will probably go on to
-start processing the next), without clearing the logcontext.
-
-To circumvent this problem, synapse code assumes that, wherever you have
-an awaitable, you will want to `await` it. To that end, wherever
-functions return awaitables, we adopt the following conventions:
-
-**Rules for functions returning awaitables:**
-
-> -   If the awaitable is already complete, the function returns with the
->     same logcontext it started with.
-> -   If the awaitable is incomplete, the function clears the logcontext
->     before returning; when the awaitable completes, it restores the
->     logcontext before running any callbacks.
-
-That sounds complicated, but actually it means a lot of code (including
-the example above) "just works". There are two cases:
-
-   If `do_request_handling` returns a completed awaitable, then the
-    logcontext will still be in place. In this case, execution will
-    continue immediately after the `await`; the "finished" line will
-    be logged against the right context, and the `with` block restores
-    the original context before we return to the caller.
-   If the returned awaitable is incomplete, `do_request_handling` clears
-    the logcontext before returning. The logcontext is therefore clear
-    when `handle_request` `await`s the awaitable.
-
-    Once `do_request_handling`'s awaitable completes, it will reinstate
-    the logcontext, before running the second half of `handle_request`,
-    so again the "finished" line will be logged against the right context,
-    and the `with` block restores the original context.
-
-As an aside, it's worth noting that `handle_request` follows our rules
- though that only matters if the caller has its own logcontext which it
-cares about.
-
-The following sections describe pitfalls and helpful patterns when
-implementing these rules.
-
-Always await your awaitables
----------------------------
-
-Whenever you get an awaitable back from a function, you should `await` on
-it as soon as possible. Do not pass go; do not do any logging; do not
-call any other functions.
+This can be fixed by using `PreserveLoggingContext`:

 ```python
-async def fun():
-    logger.debug("starting")
-    await do_some_stuff()       # just like this
+import logging
+from synapse.logging.context import LoggingContext

-    coro = more_stuff()
-    result = await coro         # also fine, of course
+logger = logging.getLogger(__name__)

-    return result
+def main() -> None:
+    with context.LoggingContext("main"):
+        task_context = context.LoggingContext("task")
+
+        with PreserveLoggingContext(task_context):
+            logger.debug("foo")
+        with PreserveLoggingContext(task_context):
+            logger.debug("bar")
+
+        logger.debug("finished") # this will be logged against main
 ```

-Provided this pattern is followed all the way back up to the callchain
-to where the logcontext was set, this will make things work out ok:
-provided `do_some_stuff` and `more_stuff` follow the rules above, then
-so will `fun`.
+Or you could equivalently just manage the log context manually via
+`set_current_context`.

-It's all too easy to forget to `await`: for instance if we forgot that
-`do_some_stuff` returned an awaitable, we might plough on regardless. This
-leads to a mess; it will probably work itself out eventually, but not
-before a load of stuff has been logged against the wrong context.
-(Normally, other things will break, more obviously, if you forget to
-`await`, so this tends not to be a major problem in practice.)
-
-Of course sometimes you need to do something a bit fancier with your
-awaitable - not all code follows the linear A-then-B-then-C pattern.
-Notes on implementing more complex patterns are in later sections.
-
-## Where you create a new awaitable, make it follow the rules
-
-Most of the time, an awaitable comes from another synapse function.
-Sometimes, though, we need to make up a new awaitable, or we get an awaitable
-back from external code. We need to make it follow our rules.
-
-The easy way to do it is by using `context.make_deferred_yieldable`. Suppose we want to implement
-`sleep`, which returns a deferred which will run its callbacks after a
-given number of seconds. That might look like:
-
-```python
-# not a logcontext-rules-compliant function
-def get_sleep_deferred(seconds):
-    d = defer.Deferred()
-    reactor.callLater(seconds, d.callback, None)
-    return d
-```
-
-That doesn't follow the rules, but we can fix it by calling it through
-`context.make_deferred_yieldable`:
-
-```python
-async def sleep(seconds):
-    return await context.make_deferred_yieldable(get_sleep_deferred(seconds))
-```

 ## Fire-and-forget

-Sometimes you want to fire off a chain of execution, but not wait for
-its result. That might look a bit like this:
-
-```python
-async def do_request_handling():
-    await foreground_operation()
-
-    # *don't* do this
-    background_operation()
-
-    logger.debug("Request handling complete")
-
-async def background_operation():
-    await first_background_step()
-    logger.debug("Completed first step")
-    await second_background_step()
-    logger.debug("Completed second step")
-```
-
-The above code does a couple of steps in the background after
-`do_request_handling` has finished. The log lines are still logged
-against the `request_context` logcontext, which may or may not be
-desirable. There are two big problems with the above, however. The first
-problem is that, if `background_operation` returns an incomplete
-awaitable, it will expect its caller to `await` immediately, so will have
-cleared the logcontext. In this example, that means that 'Request
-handling complete' will be logged without any context.
-
-The second problem, which is potentially even worse, is that when the
-awaitable returned by `background_operation` completes, it will restore
-the original logcontext. There is nothing waiting on that awaitable, so
-the logcontext will leak into the reactor and possibly get attached to
-some arbitrary future operation.
-
-There are two potential solutions to this.
-
-One option is to surround the call to `background_operation` with a
-`PreserveLoggingContext` call. That will reset the logcontext before
-starting `background_operation` (so the context restored when the
-deferred completes will be the empty logcontext), and will restore the
-current logcontext before continuing the foreground process:
-
-```python
-async def do_request_handling():
-    await foreground_operation()
-
-    # start background_operation off in the empty logcontext, to
-    # avoid leaking the current context into the reactor.
-    with PreserveLoggingContext():
-        background_operation()
-
-    # this will now be logged against the request context
-    logger.debug("Request handling complete")
-```
-
-Obviously that option means that the operations done in
-`background_operation` would be not be logged against a logcontext
-(though that might be fixed by setting a different logcontext via a
-`with LoggingContext(...)` in `background_operation`).
-
-The second option is to use `context.run_in_background`, which wraps a
-function so that it doesn't reset the logcontext even when it returns
-an incomplete awaitable, and adds a callback to the returned awaitable to
-reset the logcontext. In other words, it turns a function that follows
-the Synapse rules about logcontexts and awaitables into one which behaves
-more like an external function --- the opposite operation to that
-described in the previous section. It can be used like this:
+To drive an awaitable in the background, you can use `context.run_in_background`:

 ```python
 async def do_request_handling():
@@ -261,104 +133,13 @@ async def do_request_handling():
    logger.debug("Request handling complete")
 ```

-## Passing synapse deferreds into third-party functions
-
-A typical example of this is where we want to collect together two or
-more awaitables via `defer.gatherResults`:
-
-```python
-a1 = operation1()
-a2 = operation2()
-a3 = defer.gatherResults([a1, a2])
-```
-
-This is really a variation of the fire-and-forget problem above, in that
-we are firing off `a1` and `a2` without awaiting on them. The difference
-is that we now have third-party code attached to their callbacks. Anyway
-either technique given in the [Fire-and-forget](#fire-and-forget)
-section will work.
-
-Of course, the new awaitable returned by `gather` needs to be
-wrapped in order to make it follow the logcontext rules before we can
-yield it, as described in [Where you create a new awaitable, make it
-follow the
-rules](#where-you-create-a-new-awaitable-make-it-follow-the-rules).
-
-So, option one: reset the logcontext before starting the operations to
-be gathered:
-
-```python
-async def do_request_handling():
-    with PreserveLoggingContext():
-        a1 = operation1()
-        a2 = operation2()
-        result = await defer.gatherResults([a1, a2])
-```
-
-In this case particularly, though, option two, of using
-`context.run_in_background` almost certainly makes more sense, so that
-`operation1` and `operation2` are both logged against the original
-logcontext. This looks like:
-
 ```python
 async def do_request_handling():
    a1 = context.run_in_background(operation1)
    a2 = context.run_in_background(operation2)

-    result = await make_deferred_yieldable(defer.gatherResults([a1, a2]))
+    result = await defer.gatherResults([a1, a2])
 ```

-## A note on garbage-collection of awaitable chains
-
-It turns out that our logcontext rules do not play nicely with awaitable
-chains which get orphaned and garbage-collected.
-
-Imagine we have some code that looks like this:
-
-```python
-listener_queue = []
-
-def on_something_interesting():
-    for d in listener_queue:
-        d.callback("foo")
-
-async def await_something_interesting():
-    new_awaitable = defer.Deferred()
-    listener_queue.append(new_awaitable)
-
-    with PreserveLoggingContext():
-        await new_awaitable
-```
-
-Obviously, the idea here is that we have a bunch of things which are
-waiting for an event. (It's just an example of the problem here, but a
-relatively common one.)
-
-Now let's imagine two further things happen. First of all, whatever was
-waiting for the interesting thing goes away. (Perhaps the request times
-out, or something *even more* interesting happens.)
-
-Secondly, let's suppose that we decide that the interesting thing is
-never going to happen, and we reset the listener queue:
-
-```python
-def reset_listener_queue():
-    listener_queue.clear()
-```
-
-So, both ends of the awaitable chain have now dropped their references,
-and the awaitable chain is now orphaned, and will be garbage-collected at
-some point. Note that `await_something_interesting` is a coroutine, 
-which Python implements as a generator function.  When Python
-garbage-collects generator functions, it gives them a chance to 
-clean up by making the `await` (or `yield`) raise a `GeneratorExit`
-exception. In our case, that means that the `__exit__` handler of
-`PreserveLoggingContext` will carefully restore the request context, but
-there is now nothing waiting for its return, so the request context is
-never cleared.
-
-To reiterate, this problem only arises when *both* ends of a awaitable
-chain are dropped. Dropping the the reference to an awaitable you're
-supposed to be awaiting is bad practice, so this doesn't
-actually happen too much. Unfortunately, when it does happen, it will
-lead to leaked logcontexts which are incredibly hard to track down.
+`background_process_metrics.run_as_background_process` also exists if you want some
+automatic tracing and metrics for the background task.