1
0

Compare commits

...

6 Commits

Author SHA1 Message Date
Olivier 'reivilibre
eeb5ef89c4 Trace cache names 2025-08-28 17:58:59 +01:00
Olivier 'reivilibre
0f390dc0aa Cache tracing
actually use u32 for size logging
2025-08-28 13:10:23 +01:00
Erik Johnston
f8a44638eb 1.137.0 2025-08-26 10:23:44 +01:00
Andrew Morgan
40edb10a98 Linkify MSC and CVE in the changelog 2025-08-19 11:01:21 +01:00
Andrew Morgan
3d7e39b2ea add backticks to changelog 2025-08-19 11:00:15 +01:00
Andrew Morgan
c51da9bac0 1.137.0rc1 2025-08-19 10:55:42 +01:00
16 changed files with 415 additions and 10 deletions

View File

@@ -1,3 +1,47 @@
# Synapse 1.137.0 (2025-08-26)
No significant changes since 1.137.0rc1.
# Synapse 1.137.0rc1 (2025-08-19)
### Bugfixes
- Fix a bug which could corrupt auth chains making it impossible to perform state resolution. ([\#18746](https://github.com/element-hq/synapse/issues/18746))
- Fix error message in `register_new_matrix_user` utility script for empty `registration_shared_secret`. ([\#18780](https://github.com/element-hq/synapse/issues/18780))
- Allow enabling [MSC4108](https://github.com/matrix-org/matrix-spec-proposals/pull/4108) when the stable Matrix Authentication Service integration is enabled. ([\#18832](https://github.com/element-hq/synapse/issues/18832))
### Improved Documentation
- Include IPv6 networks in `denied-peer-ips` of coturn setup. Contributed by @litetex. ([\#18781](https://github.com/element-hq/synapse/issues/18781))
### Internal Changes
- Update tests to ensure all database tables are emptied when purging a room. ([\#18794](https://github.com/element-hq/synapse/issues/18794))
- Instrument the `encode_response` part of Sliding Sync requests for more complete traces in Jaeger. ([\#18815](https://github.com/element-hq/synapse/issues/18815))
- Tag Sliding Sync traces when we `wait_for_events`. ([\#18816](https://github.com/element-hq/synapse/issues/18816))
- Fix `portdb` CI by hardcoding the new `pg_dump` restrict key that was added due to [CVE-2025-8714](https://nvd.nist.gov/vuln/detail/cve-2025-8714). ([\#18824](https://github.com/element-hq/synapse/issues/18824))
### Updates to locked dependencies
* Bump actions/add-to-project from 5b1a254a3546aef88e0a7724a77a623fa2e47c36 to 0c37450c4be3b6a7582b2fb013c9ebfd9c8e9300. ([\#18557](https://github.com/element-hq/synapse/issues/18557))
* Bump actions/cache from 4.2.3 to 4.2.4. ([\#18799](https://github.com/element-hq/synapse/issues/18799))
* Bump actions/checkout from 4.2.2 to 4.3.0. ([\#18800](https://github.com/element-hq/synapse/issues/18800))
* Bump actions/download-artifact from 4.3.0 to 5.0.0. ([\#18801](https://github.com/element-hq/synapse/issues/18801))
* Bump docker/metadata-action from 5.7.0 to 5.8.0. ([\#18773](https://github.com/element-hq/synapse/issues/18773))
* Bump mypy from 1.16.1 to 1.17.1. ([\#18775](https://github.com/element-hq/synapse/issues/18775))
* Bump phonenumbers from 9.0.10 to 9.0.11. ([\#18797](https://github.com/element-hq/synapse/issues/18797))
* Bump pygithub from 2.6.1 to 2.7.0. ([\#18779](https://github.com/element-hq/synapse/issues/18779))
* Bump serde_json from 1.0.141 to 1.0.142. ([\#18776](https://github.com/element-hq/synapse/issues/18776))
* Bump slab from 0.4.10 to 0.4.11. ([\#18809](https://github.com/element-hq/synapse/issues/18809))
* Bump tokio from 1.47.0 to 1.47.1. ([\#18774](https://github.com/element-hq/synapse/issues/18774))
* Bump types-pyyaml from 6.0.12.20250516 to 6.0.12.20250809. ([\#18798](https://github.com/element-hq/synapse/issues/18798))
* Bump types-setuptools from 80.9.0.20250529 to 80.9.0.20250809. ([\#18796](https://github.com/element-hq/synapse/issues/18796))
# Synapse 1.136.0 (2025-08-12)
Note: This release includes the security fixes from `1.135.2` and `1.136.0rc2`, detailed below.

View File

@@ -1 +0,0 @@
Fix a bug which could corrupt auth chains making it impossible to perform state resolution.

View File

@@ -1 +0,0 @@
Fix error message in `register_new_matrix_user` utility script for empty `registration_shared_secret`.

View File

@@ -1 +0,0 @@
Include IPv6 networks in denied-peer-ips of coturn setup. Contributed by @litetex.

View File

@@ -1 +0,0 @@
Update tests to ensure all database tables are emptied when purging a room.

View File

@@ -1 +0,0 @@
Instrument the `encode_response` part of Sliding Sync requests for more complete traces in Jaeger.

View File

@@ -1 +0,0 @@
Tag Sliding Sync traces when we `wait_for_events`.

View File

@@ -1 +0,0 @@
Fix portdb CI by hardcoding the new pg_dump restrict key that was added due to CVE-2025-8714.

View File

@@ -1 +0,0 @@
Allow enabling MSC4108 when the stable Matrix Authentication Service integration is enabled.

12
debian/changelog vendored
View File

@@ -1,3 +1,15 @@
matrix-synapse-py3 (1.137.0) stable; urgency=medium
* New Synapse release 1.137.0.
-- Synapse Packaging team <packages@matrix.org> Tue, 26 Aug 2025 10:23:41 +0100
matrix-synapse-py3 (1.137.0~rc1) stable; urgency=medium
* New Synapse release 1.137.0rc1.
-- Synapse Packaging team <packages@matrix.org> Tue, 19 Aug 2025 10:55:22 +0100
matrix-synapse-py3 (1.136.0) stable; urgency=medium
* New Synapse release 1.136.0.

View File

@@ -101,7 +101,7 @@ module-name = "synapse.synapse_rust"
[tool.poetry]
name = "matrix-synapse"
version = "1.136.0"
version = "1.137.0"
description = "Homeserver for the Matrix decentralised comms protocol"
authors = ["Matrix.org Team and Contributors <packages@matrix.org>"]
license = "AGPL-3.0-or-later"

View File

@@ -14,6 +14,7 @@ pub mod matrix_const;
pub mod push;
pub mod rendezvous;
pub mod segmenter;
pub mod tmp_cachetrace;
lazy_static! {
static ref LOGGING_HANDLE: ResetHandle = pyo3_log::init();
@@ -55,6 +56,7 @@ fn synapse_rust(py: Python<'_>, m: &Bound<'_, PyModule>) -> PyResult<()> {
http_client::register_module(py, m)?;
rendezvous::register_module(py, m)?;
segmenter::register_module(py, m)?;
tmp_cachetrace::register_module(py, m)?;
Ok(())
}

299
rust/src/tmp_cachetrace.rs Normal file
View File

@@ -0,0 +1,299 @@
use std::{
collections::BTreeMap,
fs::File,
io::{BufWriter, Write},
sync::{
atomic::{AtomicBool, Ordering},
mpsc::{self, Receiver, SyncSender},
Arc, OnceLock,
},
time::{SystemTime, UNIX_EPOCH},
};
use anyhow::{bail, Context};
use pyo3::{
pyclass, pymethods,
types::{PyAnyMethods, PyModule, PyModuleMethods},
Bound, PyAny, PyResult, Python,
};
struct Row {
cache: u16,
time_ms: i64,
hash: u64,
op: Op,
}
enum Op {
Register { cache_name: String },
New { key_size: u64, value_size: u64 },
Request,
Invalidate,
Evict,
}
#[pyclass]
pub struct CacheTracer {
tx: SyncSender<Row>,
error_flag: Arc<AtomicBool>,
cache_names: BTreeMap<String, u16>,
}
#[pymethods]
impl CacheTracer {
#[new]
#[pyo3(signature = ())]
pub fn py_new() -> Self {
let (tx, rx) = mpsc::sync_channel(2048);
let error_flag = Arc::new(AtomicBool::new(false));
std::thread::spawn({
let error_flag = Arc::clone(&error_flag);
move || {
if let Err(err) = receive_and_log_traces(rx, error_flag) {
eprintln!("error in cache tracer: {err}");
}
}
});
CacheTracer {
tx,
error_flag,
cache_names: BTreeMap::new(),
}
}
#[pyo3(signature = (cache, key, value))]
pub fn on_new(
&mut self,
py: Python<'_>,
cache: &str,
key: Bound<'_, PyAny>,
value: Bound<'_, PyAny>,
) {
let key_hash = key.hash().unwrap() as u64;
let key_size = get_size_of(py, &key);
let value_size = get_size_of(py, &value);
let cache_id = if let Some(cache_id) = self.cache_names.get(cache) {
*cache_id
} else {
let new = self.cache_names.len() as u16;
self.cache_names.insert(cache.to_owned(), new);
let _ = self.tx.try_send(Row {
cache: new,
op: Op::Register {
cache_name: cache.to_owned(),
},
hash: 0,
time_ms: SystemTime::now()
.duration_since(UNIX_EPOCH)
.unwrap()
.as_millis() as i64,
});
new
};
if let Err(_e) = self.tx.try_send(Row {
cache: cache_id,
op: Op::New {
key_size,
value_size,
},
hash: key_hash,
time_ms: SystemTime::now()
.duration_since(UNIX_EPOCH)
.unwrap()
.as_millis() as i64,
}) {
self.error_flag.store(true, Ordering::Relaxed);
}
}
#[pyo3(signature = (cache, key))]
pub fn on_request(&mut self, _py: Python<'_>, cache: &str, key: Bound<'_, PyAny>) {
let key_hash = key.hash().unwrap() as u64;
let cache_id = if let Some(cache_id) = self.cache_names.get(cache) {
*cache_id
} else {
let new = self.cache_names.len() as u16;
self.cache_names.insert(cache.to_owned(), new);
let _ = self.tx.try_send(Row {
cache: new,
op: Op::Register {
cache_name: cache.to_owned(),
},
hash: 0,
time_ms: SystemTime::now()
.duration_since(UNIX_EPOCH)
.unwrap()
.as_millis() as i64,
});
new
};
if let Err(_e) = self.tx.try_send(Row {
cache: cache_id,
op: Op::Request,
hash: key_hash,
time_ms: SystemTime::now()
.duration_since(UNIX_EPOCH)
.unwrap()
.as_millis() as i64,
}) {
self.error_flag.store(true, Ordering::Relaxed);
}
}
#[pyo3(signature = (cache, key))]
pub fn on_invalidate(&mut self, _py: Python<'_>, cache: &str, key: Bound<'_, PyAny>) {
let key_hash = key.hash().unwrap() as u64;
let cache_id = if let Some(cache_id) = self.cache_names.get(cache) {
*cache_id
} else {
let new = self.cache_names.len() as u16;
self.cache_names.insert(cache.to_owned(), new);
let _ = self.tx.try_send(Row {
cache: new,
op: Op::Register {
cache_name: cache.to_owned(),
},
hash: 0,
time_ms: SystemTime::now()
.duration_since(UNIX_EPOCH)
.unwrap()
.as_millis() as i64,
});
new
};
if let Err(_e) = self.tx.try_send(Row {
cache: cache_id,
op: Op::Invalidate,
hash: key_hash,
time_ms: SystemTime::now()
.duration_since(UNIX_EPOCH)
.unwrap()
.as_millis() as i64,
}) {
self.error_flag.store(true, Ordering::Relaxed);
}
}
#[pyo3(signature = (cache, key))]
pub fn on_evict(&mut self, _py: Python<'_>, cache: &str, key: Bound<'_, PyAny>) {
let key_hash = key.hash().unwrap() as u64;
let cache_id = if let Some(cache_id) = self.cache_names.get(cache) {
*cache_id
} else {
let new = self.cache_names.len() as u16;
self.cache_names.insert(cache.to_owned(), new);
let _ = self.tx.try_send(Row {
cache: new,
op: Op::Register {
cache_name: cache.to_owned(),
},
hash: 0,
time_ms: SystemTime::now()
.duration_since(UNIX_EPOCH)
.unwrap()
.as_millis() as i64,
});
new
};
if let Err(_e) = self.tx.try_send(Row {
cache: cache_id,
op: Op::Evict,
hash: key_hash,
time_ms: SystemTime::now()
.duration_since(UNIX_EPOCH)
.unwrap()
.as_millis() as i64,
}) {
self.error_flag.store(true, Ordering::Relaxed);
}
}
}
static GETSIZEOF: OnceLock<pyo3::Py<pyo3::PyAny>> = OnceLock::new();
fn get_size_of(py: Python<'_>, obj: &Bound<'_, PyAny>) -> u64 {
let getsizeof = GETSIZEOF.get_or_init(|| {
let sys = PyModule::import(py, "synapse.util.caches.lrucache").unwrap();
let func = sys.getattr("_get_size_of").unwrap().unbind();
func
});
let size: u64 = getsizeof.call1(py, (obj,)).unwrap().extract(py).unwrap();
size
}
fn receive_and_log_traces(rx: Receiver<Row>, error_flag: Arc<AtomicBool>) -> anyhow::Result<()> {
let pid = std::process::id();
let f = File::create_new(format!("/tmp/syncachetrace-{pid}"))
.context("failed to start cache tracer")?;
let mut bw = BufWriter::new(f);
let mut last_time = SystemTime::now()
.duration_since(UNIX_EPOCH)
.unwrap()
.as_millis() as i64;
while let Ok(row) = rx.recv() {
if error_flag.load(Ordering::Relaxed) {
bw.write_all(b"DEADBEEF")?;
bw.flush()?;
bail!("error flagged");
}
let time_delta = row.time_ms.saturating_sub(last_time);
last_time = row.time_ms;
bw.write_all(&(time_delta as i16).to_be_bytes())?;
bw.write_all(&row.cache.to_be_bytes())?;
bw.write_all(&row.hash.to_be_bytes())?;
match row.op {
Op::Register { cache_name } => {
bw.write_all(b"*")?;
bw.write_all(&(cache_name.len() as u32).to_be_bytes())?;
bw.write_all(cache_name.as_bytes())?;
}
Op::New {
key_size,
value_size,
} => {
let key_size = key_size.min(u32::MAX as u64) as u32;
let value_size = value_size.min(u32::MAX as u64) as u32;
bw.write_all(b"N")?;
bw.write_all(&key_size.to_be_bytes())?;
bw.write_all(&value_size.to_be_bytes())?;
}
Op::Request => {
bw.write_all(b"R")?;
}
Op::Invalidate => {
bw.write_all(b"I")?;
}
Op::Evict => {
bw.write_all(b"E")?;
}
}
}
Ok(())
}
pub fn register_module(py: Python<'_>, m: &Bound<'_, PyModule>) -> PyResult<()> {
let child_module = PyModule::new(py, "tmp_cachetrace")?;
child_module.add_class::<CacheTracer>()?;
m.add_submodule(&child_module)?;
py.import("sys")?
.getattr("modules")?
.set_item("synapse.synapse_rust.tmp_cachetrace", child_module)?;
Ok(())
}

View File

@@ -1,5 +1,5 @@
$schema: https://element-hq.github.io/synapse/latest/schema/v1/meta.schema.json
$id: https://element-hq.github.io/synapse/schema/synapse/v1.136/synapse-config.schema.json
$id: https://element-hq.github.io/synapse/schema/synapse/v1.137/synapse-config.schema.json
type: object
properties:
modules:

View File

@@ -0,0 +1,6 @@
class CacheTracer:
def __init__(self) -> None: ...
def on_new(self, cache: str, key: object, value: object) -> None: ...
def on_request(self, cache: str, key: object) -> None: ...
def on_invalidate(self, cache: str, key: object) -> None: ...
def on_evict(self, cache: str, key: object) -> None: ...

View File

@@ -21,6 +21,7 @@
import logging
import math
import os
import threading
import weakref
from enum import Enum
@@ -64,6 +65,7 @@ from synapse.util.linked_list import ListNode
if TYPE_CHECKING:
from synapse.server import HomeServer
from synapse.synapse_rust.tmp_cachetrace import CacheTracer
logger = logging.getLogger(__name__)
@@ -102,6 +104,24 @@ VT = TypeVar("VT")
# a general type var, distinct from either KT or VT
T = TypeVar("T")
_tracer: Optional["CacheTracer"] = None
_should_trace = "SYNTRACE" in os.environ
def get_tracer() -> Optional["CacheTracer"]:
from synapse.synapse_rust.tmp_cachetrace import CacheTracer
global _tracer
if _tracer:
return _tracer
if _should_trace:
_tracer = CacheTracer()
return _tracer
return None
class _TimedListNode(ListNode[T]):
"""A `ListNode` that tracks last access time."""
@@ -493,6 +513,7 @@ class LruCache(Generic[KT, VT]):
Note: The new key does not have to be unique.
"""
# Default `clock` to something sensible. Note that we rename it to
# `real_clock` so that mypy doesn't think its still `Optional`.
if clock is None:
@@ -504,6 +525,11 @@ class LruCache(Generic[KT, VT]):
self.cache = cache # Used for introspection.
self.apply_cache_factor_from_config = apply_cache_factor_from_config
if not isinstance(cache, TreeCache):
self._tracer = get_tracer()
else:
self._tracer = None
# Save the original max size, and apply the default size factor.
self._original_max_size = max_size
# We previously didn't apply the cache factor here, and as such some caches were
@@ -542,6 +568,8 @@ class LruCache(Generic[KT, VT]):
extra_index: Dict[KT, Set[KT]] = {}
self._cache_name = cache_name or str(id(self))
def evict() -> None:
while cache_len() > self.max_size:
# Get the last node in the list (i.e. the oldest node).
@@ -559,6 +587,10 @@ class LruCache(Generic[KT, VT]):
evicted_len = delete_node(node)
cache.pop(node.key, None)
if self._tracer:
self._tracer.on_evict(self._cache_name, node.key)
if metrics:
metrics.inc_evictions(EvictionReason.size, evicted_len)
@@ -675,6 +707,10 @@ class LruCache(Generic[KT, VT]):
to False if this fetch should *not* prevent a node from
being expired.
"""
if self._tracer:
self._tracer.on_request(self._cache_name, key)
node = cache.get(key, None)
if node is not None:
if update_last_access:
@@ -750,6 +786,10 @@ class LruCache(Generic[KT, VT]):
key: KT, value: VT, callbacks: Collection[Callable[[], None]] = ()
) -> None:
node = cache.get(key, None)
if self._tracer:
self._tracer.on_new(self._cache_name, key, value)
if node is not None:
# We sometimes store large objects, e.g. dicts, which cause
# the inequality check to take a long time. So let's only do
@@ -792,6 +832,8 @@ class LruCache(Generic[KT, VT]):
@synchronized
def cache_pop(key: KT, default: Optional[T] = None) -> Union[None, T, VT]:
if self._tracer:
self._tracer.on_invalidate(self._cache_name, key)
node = cache.get(key, None)
if node:
evicted_len = delete_node(node)
@@ -813,6 +855,8 @@ class LruCache(Generic[KT, VT]):
may be of lower cardinality than the TreeCache - in which case the whole
subtree is deleted.
"""
if self._tracer:
self._tracer.on_invalidate(self._cache_name, key)
popped = cache.pop(key, None)
if popped is None:
return
@@ -824,6 +868,8 @@ class LruCache(Generic[KT, VT]):
@synchronized
def cache_clear() -> None:
for node in cache.values():
if self._tracer:
self._tracer.on_invalidate(self._cache_name, node.key)
node.run_and_clear_callbacks()
node.drop_from_lists()
@@ -841,6 +887,8 @@ class LruCache(Generic[KT, VT]):
@synchronized
def cache_contains(key: KT) -> bool:
if self._tracer:
self._tracer.on_request(self._cache_name, key)
return key in cache
@synchronized
@@ -857,6 +905,8 @@ class LruCache(Generic[KT, VT]):
return
for key in keys:
if self._tracer:
self._tracer.on_invalidate(self._cache_name, key)
node = cache.pop(key, None)
if not node:
continue