Replace PyICU with Rust icu_segmenter crate (#18553)
Co-authored-by: anoa's Codex Agent <codex@amorgan.xyz> Co-authored-by: Quentin Gliech <quenting@element.io>
This commit is contained in:
@@ -29,8 +29,6 @@ easiest way of installing the latest version is to use [rustup](https://rustup.r
|
||||
|
||||
Synapse can connect to PostgreSQL via the [psycopg2](https://pypi.org/project/psycopg2/) Python library. Building this library from source requires access to PostgreSQL's C header files. On Debian or Ubuntu Linux, these can be installed with `sudo apt install libpq-dev`.
|
||||
|
||||
Synapse has an optional, improved user search with better Unicode support. For that you need the development package of `libicu`. On Debian or Ubuntu Linux, this can be installed with `sudo apt install libicu-dev`.
|
||||
|
||||
The source code of Synapse is hosted on GitHub. You will also need [a recent version of git](https://github.com/git-guides/install-git).
|
||||
|
||||
For some tests, you will need [a recent version of Docker](https://docs.docker.com/get-docker/).
|
||||
|
||||
@@ -164,10 +164,7 @@ $ poetry cache clear --all .
|
||||
# including the wheel artifacts which is not covered by the above command
|
||||
# (see https://github.com/python-poetry/poetry/issues/10304)
|
||||
#
|
||||
# This is necessary in order to rebuild or fetch new wheels. For example, if you update
|
||||
# the `icu` library in on your system, you will need to rebuild the PyICU Python package
|
||||
# in order to incorporate the correct dynamically linked library locations otherwise you
|
||||
# will run into errors like: `ImportError: libicui18n.so.75: cannot open shared object file: No such file or directory`
|
||||
# This is necessary in order to rebuild or fetch new wheels.
|
||||
$ rm -rf $(poetry config cache-dir)
|
||||
```
|
||||
|
||||
|
||||
@@ -286,7 +286,7 @@ Installing prerequisites on Ubuntu or Debian:
|
||||
```sh
|
||||
sudo apt install build-essential python3-dev libffi-dev \
|
||||
python3-pip python3-setuptools sqlite3 \
|
||||
libssl-dev virtualenv libjpeg-dev libxslt1-dev libicu-dev
|
||||
libssl-dev virtualenv libjpeg-dev libxslt1-dev
|
||||
```
|
||||
|
||||
##### ArchLinux
|
||||
@@ -295,7 +295,7 @@ Installing prerequisites on ArchLinux:
|
||||
|
||||
```sh
|
||||
sudo pacman -S base-devel python python-pip \
|
||||
python-setuptools python-virtualenv sqlite3 icu
|
||||
python-setuptools python-virtualenv sqlite3
|
||||
```
|
||||
|
||||
##### CentOS/Fedora
|
||||
@@ -305,8 +305,7 @@ Installing prerequisites on CentOS or Fedora Linux:
|
||||
```sh
|
||||
sudo dnf install libtiff-devel libjpeg-devel libzip-devel freetype-devel \
|
||||
libwebp-devel libxml2-devel libxslt-devel libpq-devel \
|
||||
python3-virtualenv libffi-devel openssl-devel python3-devel \
|
||||
libicu-devel
|
||||
python3-virtualenv libffi-devel openssl-devel python3-devel
|
||||
sudo dnf group install "Development Tools"
|
||||
```
|
||||
|
||||
@@ -333,7 +332,7 @@ dnf install python3.12 python3.12-devel
|
||||
```
|
||||
Finally, install common prerequisites
|
||||
```bash
|
||||
dnf install libicu libicu-devel libpq5 libpq5-devel lz4 pkgconf
|
||||
dnf install libpq5 libpq5-devel lz4 pkgconf
|
||||
dnf group install "Development Tools"
|
||||
```
|
||||
###### Using venv module instead of virtualenv command
|
||||
@@ -365,20 +364,6 @@ xcode-select --install
|
||||
|
||||
Some extra dependencies may be needed. You can use Homebrew (https://brew.sh) for them.
|
||||
|
||||
You may need to install icu, and make the icu binaries and libraries accessible.
|
||||
Please follow [the official instructions of PyICU](https://pypi.org/project/PyICU/) to do so.
|
||||
|
||||
If you're struggling to get icu discovered, and see:
|
||||
```
|
||||
RuntimeError:
|
||||
Please install pkg-config on your system or set the ICU_VERSION environment
|
||||
variable to the version of ICU you have installed.
|
||||
```
|
||||
despite it being installed and having your `PATH` updated, you can omit this dependency by
|
||||
not specifying `--extras all` to `poetry`. If using postgres, you can install Synapse via
|
||||
`poetry install --extras saml2 --extras oidc --extras postgres --extras opentracing --extras redis --extras sentry`.
|
||||
ICU is not a hard dependency on getting a working installation.
|
||||
|
||||
On ARM-based Macs you may also need to install libjpeg and libpq:
|
||||
```sh
|
||||
brew install jpeg libpq
|
||||
@@ -400,8 +385,7 @@ Installing prerequisites on openSUSE:
|
||||
```sh
|
||||
sudo zypper in -t pattern devel_basis
|
||||
sudo zypper in python-pip python-setuptools sqlite3 python-virtualenv \
|
||||
python-devel libffi-devel libopenssl-devel libjpeg62-devel \
|
||||
libicu-devel
|
||||
python-devel libffi-devel libopenssl-devel libjpeg62-devel
|
||||
```
|
||||
|
||||
##### OpenBSD
|
||||
|
||||
@@ -117,6 +117,13 @@ each upgrade are complete before moving on to the next upgrade, to avoid
|
||||
stacking them up. You can monitor the currently running background updates with
|
||||
[the Admin API](usage/administration/admin_api/background_updates.html#status).
|
||||
|
||||
# Upgrading to v1.134.0
|
||||
|
||||
## ICU bundled with Synapse
|
||||
|
||||
Synapse now uses the Rust `icu` library for improved user search. Installing the
|
||||
native ICU library on your system is no longer required.
|
||||
|
||||
# Upgrading to v1.130.0
|
||||
|
||||
## Documented endpoint which can be delegated to a federation worker
|
||||
|
||||
@@ -77,14 +77,11 @@ The user provided search term is lowercased and normalized using [NFKC](https://
|
||||
this treats the string as case-insensitive, canonicalizes different forms of the
|
||||
same text, and maps some "roughly equivalent" characters together.
|
||||
|
||||
The search term is then split into words:
|
||||
|
||||
* If [ICU](https://en.wikipedia.org/wiki/International_Components_for_Unicode) is
|
||||
available, then the system's [default locale](https://unicode-org.github.io/icu/userguide/locale/#default-locales)
|
||||
will be used to break the search term into words. (See the
|
||||
[installation instructions](setup/installation.md) for how to install ICU.)
|
||||
* If unavailable, then runs of ASCII characters, numbers, underscores, and hyphens
|
||||
are considered words.
|
||||
The search term is then split into segments using the [`icu_segmenter`
|
||||
Rust crate](https://crates.io/crates/icu_segmenter). This crate ships with its
|
||||
own dictionary and Long Short Term-Memory (LSTM) machine learning models
|
||||
per-language to segment words. Read more [in the crate's
|
||||
documentation](https://docs.rs/icu/latest/icu/segmenter/struct.WordSegmenter.html#method.new_auto).
|
||||
|
||||
The queries for PostgreSQL and SQLite are detailed below, but their overall goal
|
||||
is to find matching users, preferring users who are "real" (e.g. not bots,
|
||||
|
||||
Reference in New Issue
Block a user