1
0
Commit Graph

110 Commits

Author SHA1 Message Date
Eric Eastwood
5a9ca1e3d9 Introduce Clock.call_when_running(...) to include logcontext by default (#18944)
Introduce `Clock.call_when_running(...)` to wrap startup code in a
logcontext, ensuring we can identify which server generated the logs.

Background:

>  Ideally, nothing from the Synapse homeserver would be logged against the `sentinel` 
>  logcontext as we want to know which server the logs came from. In practice, this is not 
>  always the case yet especially outside of request handling. 
>   
>  Global things outside of Synapse (e.g. Twisted reactor code) should run in the 
>  `sentinel` logcontext. It's only when it calls into application code that a logcontext 
>  gets activated. This means the reactor should be started in the `sentinel` logcontext, 
>  and any time an awaitable yields control back to the reactor, it should reset the 
>  logcontext to be the `sentinel` logcontext. This is important to avoid leaking the 
>  current logcontext to the reactor (which would then get picked up and associated with 
>  the next thing the reactor does). 
>
> *-- `docs/log_contexts.md`

Also adds a lint to prefer `Clock.call_when_running(...)` over
`reactor.callWhenRunning(...)`

Part of https://github.com/element-hq/synapse/issues/18905
2025-09-22 10:27:59 -05:00
reivilibre
a31d53b28f Use twisted.internet.testing module in tests instead of deprecated twisted.test.proto_helpers. (#18728)
Follows: #18727

---------

Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
2025-07-30 12:32:10 +01:00
dependabot[bot]
9d43bec326 Bump ruff from 0.7.3 to 0.11.10 (#18451)
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Andrew Morgan <andrew@amorgan.xyz>
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
2025-05-20 15:23:30 +01:00
Will Hunt
d17295e5c3 Store hashes of media files, and allow quarantining by hash. (#18277)
This PR makes a few radical changes to media. This now stores the SHA256
hash of each file stored in the database (excluding thumbnails, more on
that later). If a set of media is quarantined, any additional uploads of
the same file contents or any other files with the same hash will be
quarantined at the same time.

Currently this does NOT:
 - De-duplicate media, although a future extension could be to do that.
- Run any background jobs to identify the hashes of older files. This
could also be a future extension, though the value of doing so is
limited to combat the abuse of recent media.
- Hash thumbnails. It's assumed that thumbnails are parented to some
form of media, so you'd likely be wanting to quarantine the media and
the thumbnail at the same time.
2025-03-27 17:26:34 +00:00
Travis Ralston
d0a474d312 Enable authenticated media by default (#17889)
Co-authored-by: Olivier 'reivilibre <oliverw@matrix.org>
2024-11-20 14:48:22 +00:00
Erik Johnston
bb5a692946 Fix slipped logging context when media rejected (#17239)
When a module rejects a piece of media we end up trying to close the
same logging context twice.

Instead of fixing the existing code we refactor to use an async context
manager, which is easier to write correctly.
2024-05-29 11:14:42 +01:00
Erik Johnston
23740eaa3d Correctly mention previous copyright (#16820)
During the migration the automated script to update the copyright
headers accidentally got rid of some of the existing copyright lines.
Reinstate them.
2024-01-23 11:26:48 +00:00
Erik Johnston
eaad9bb156 Merge remote-tracking branch 'gitlab/clokep/license-license' into new_develop 2023-12-13 15:11:56 +00:00
David Robertson
0619c2bbd2 Move media retention tests out of rest tests (#16684)
* Move media retention tests out of rest tests

AFAICS this doesn't make any HTTP requests and so it ought not to belong
in `tests.rest`.

* Changelog
2023-11-27 01:29:46 +00:00
Patrick Cloke
8e1e62c9e0 Update license headers 2023-11-21 15:29:58 -05:00
Patrick Cloke
ff716b483b Return attrs for more media repo APIs. (#16611) 2023-11-09 11:00:30 -05:00
Patrick Cloke
26b960b08b Register media servlets via regex. (#16419)
This converts the media servlet URLs in the same way as
(most) of the rest of Synapse. This will give more flexibility
in the versions each endpoint exists under.
2023-10-06 07:22:55 -04:00
Patrick Cloke
6f18812bb0 Add stubs package for lxml. (#15697)
The stubs have some issues so this has some generous cast
and ignores in it, but it is better than not having stubs.

Note that confusing that Element is a function which creates
_Element instances (and similarly for Comment).
2023-05-31 17:06:57 +00:00
Patrick Cloke
1e89976b26 Rename blacklist/whitelist internally. (#15620)
Avoid renaming configuration settings for now and rename internal code
to use blocklist and allowlist instead.
2023-05-19 12:25:25 +00:00
Patrick Cloke
4ee82c0576 Apply url_preview_url_blacklist to oEmbed and pre-cached images (#15601)
There are two situations which were previously not properly checked:

1. If the requested URL was replaced with an oEmbed URL, then the
   oEmbed URL was not checked against url_preview_url_blacklist.
2. Follow-up URLs (either via autodiscovery of oEmbed or to pre-cache
   images) were not checked against url_preview_url_blacklist.
2023-05-16 16:25:01 -04:00
Travis Ralston
ab4535b608 Add config option to prevent media downloads from listed domains. (#15197)
This stops media (and thumbnails) from being accessed from the
listed domains. It does not delete any already locally cached media,
but will prevent accessing it.

Note that admin APIs are unaffected by this change.
2023-05-09 14:08:51 -04:00
Patrick Cloke
a5fb382a29 Separate HTTP preview code and URL previewer. (#15269)
Separates REST layer code from the actual URL previewing.
2023-03-20 14:32:26 -04:00
Patrick Cloke
4fc8875876 Refactor media modules. (#15146)
* Removes the `v1` directory from `test.rest.media.v1`.
* Moves the non-REST code from `synapse.rest.media.v1` to `synapse.media`.
* Flatten the `v1` directory from `synapse.rest.media`,  but leave compatiblity
  with 3rd party media repositories and spam checkers.
2023-02-27 08:26:05 -05:00
Patrick Cloke
682151a464 Do not fail completely if oEmbed autodiscovery fails. (#15092)
Previously if an autodiscovered oEmbed request failed (e.g. the
oEmbed endpoint is down or does not exist) then the entire URL
preview would fail. Instead we now return everything we can, even
if this additional request fails.
2023-02-23 16:08:53 -05:00
dependabot[bot]
9bb2eac719 Bump black from 22.12.0 to 23.1.0 (#15103) 2023-02-22 15:29:09 -05:00
David Robertson
2dff93099b Typecheck tests.rest.media.v1.test_media_storage (#15008)
* Fix MediaStorage type hint

* Typecheck tests.rest.media.v1.test_media_storage

* Changelog

* Remove assert and make the comment succinct

* Fix syntax for olddeps
2023-02-07 15:24:44 +00:00
Jeyachandran Rathnam
babeeb4e7a Unescape HTML entities in oEmbed titles. (#14781)
It doesn't seem valid that HTML entities should appear in
the title field of oEmbed responses, but a popular WordPress
plug-in seems to do it.

There should not be harm in unescaping these.
2023-01-09 14:22:02 +00:00
Patrick Cloke
00c93d2e7e Be more lenient in the oEmbed response parsing. (#14089)
Attempt to parse any valid information from an oEmbed response
(instead of bailing at the first unexpected data). This should allow
for more partial oEmbed data to be returned, resulting in better /
more URL previews, even if those URL previews are only partial.
2022-10-07 09:29:43 -04:00
Andrew Morgan
918c74bfb5 Add a MXCUri class to make working with mxc uri's easier. (#13162) 2022-09-15 12:57:16 +00:00
Eric Eastwood
7b67e93d49 Provide more info why we don't have any thumbnails to serve (#13038)
Fix https://github.com/matrix-org/synapse/issues/13016

## New error code and status

### Before

Previously, we returned a `404` for `/thumbnail` which isn't even in the spec.

```json
{
  "errcode": "M_NOT_FOUND",
  "error": "Not found [b'hs1', b'tefQeZhmVxoiBfuFQUKRzJxc']"
}
```

### After

What does the spec say?

> 400: The request does not make sense to the server, or the server cannot thumbnail the content. For example, the client requested non-integer dimensions or asked for negatively-sized images.
>
> *-- https://spec.matrix.org/v1.1/client-server-api/#get_matrixmediav3thumbnailservernamemediaid*

Now with this PR, we respond with a `400` when we don't have thumbnails to serve and we explain why we might not have any thumbnails.

```json
{
    "errcode": "M_UNKNOWN",
    "error": "Cannot find any thumbnails for the requested media ([b'example.com', b'12345']). This might mean the media is not a supported_media_format=(image/jpeg, image/jpg, image/webp, image/gif, image/png) or that thumbnailing failed for some other reason. (Dynamic thumbnails are disabled on this server.)",
}
```

> Cannot find any thumbnails for the requested media ([b'example.com', b'12345']). This might mean the media is not a supported_media_format=(image/jpeg, image/jpg, image/webp, image/gif, image/png) or that thumbnailing failed for some other reason. (Dynamic thumbnails are disabled on this server.)


---

We still respond with a 404 in many other places. But we can iterate on those later and maybe keep some in some specific places after spec updates/clarification: https://github.com/matrix-org/matrix-spec/issues/1122

We can also iterate on the bugs where Synapse doesn't thumbnail when it should in other issues/PRs.
2022-07-15 11:42:21 -05:00
David Teller
11f811470f Uniformize spam-checker API, part 5: expand other spam-checker callbacks to return Tuple[Codes, dict] (#13044)
Signed-off-by: David Teller <davidt@element.io>
Co-authored-by: Brendan Abolivier <babolivier@matrix.org>
2022-07-11 16:52:10 +00:00
Andrew Morgan
6cba6a51af Merge branch 'master' into develop 2022-06-28 15:19:48 +01:00
reivilibre
fa13080618 Merge pull request from GHSA-22p3-qrh9-cx32
* Make _iterate_over_text easier to read by using simple data structures

* Prefer a set of tags to ignore

In my tests, it's 4x faster to check for containment in a set of this size

* Add a stack size limit to _iterate_over_text

* Continue accepting the case where there is no body element

* Use an early return instead for None

Co-authored-by: Richard van der Hoff <richard@matrix.org>
2022-06-28 14:29:08 +01:00
Robert Long
9b683ea80f Add Cross-Origin-Resource-Policy header to thumbnail and download media endpoints (#12944) 2022-06-27 14:44:05 +01:00
Patrick Cloke
0fcc0ae37c Improve URL previews for sites with only Twitter card information. (#13056)
Pull out `twitter:` meta tags when generating a preview and
use it to augment any `og:` meta tags.

Prefers Open Graph information over Twitter card information.
2022-06-16 07:41:57 -04:00
Andrew Morgan
a47636c570 Prevent local quarantined media from being claimed by media retention (#12972) 2022-06-07 10:53:47 +00:00
Patrick Cloke
148fe58a24 Do not break URL previews if an image is unreachable. (#12950)
Avoid breaking a URL preview completely if the chosen image 404s
or is unreachable for some other reason (e.g. DNS).
2022-06-06 07:46:04 -04:00
Patrick Cloke
01df5bacac Improve URL previews for some pages (#12951)
* Skip `og` and `meta` tags where the value is empty.
* Fallback to the favicon if there are no other images.
* Ignore tags meant for navigation.
2022-06-03 12:09:12 -04:00
Andrew Morgan
2fc787c341 Add config options for media retention (#12732) 2022-05-31 16:35:29 +00:00
Brendan Abolivier
f96b85eca8 Ensure the type of URL attributes is always str when matching against preview blacklist (#12333) 2022-03-31 11:49:49 +02:00
Patrick Cloke
4587b35929 Clean-up logic for rebasing URLs during URL preview. (#12219)
By using urljoin from the standard library and reducing the number
of places URLs are rebased.
2022-03-16 07:21:36 -04:00
Dirk Klimpel
32c828d0f7 Add type hints to tests/rest. (#12208)
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
2022-03-11 12:42:22 +00:00
Dirk Klimpel
7e91107be1 Add type hints to tests/rest (#12146)
* Add type hints to `tests/rest`

* newsfile

* change import from `SigningKey`
2022-03-03 16:05:44 +00:00
Patrick Cloke
02d708568b Replace assertEquals and friends with non-deprecated versions. (#12092) 2022-02-28 07:12:29 -05:00
Richard van der Hoff
e24ff8ebe3 Remove HomeServer.get_datastore() (#12031)
The presence of this method was confusing, and mostly present for backwards
compatibility. Let's get rid of it.

Part of #11733
2022-02-23 11:04:02 +00:00
Denis Kasak
337f38cac3 Implement a content type allow list for URL previews (#11936)
This implements an allow list for content types for which Synapse will attempt URL preview. If a URL resolves to a resource with a content type which isn't in the list, the download will terminate immediately.

This makes sense given that Synapse would never successfully generate a URL preview for such files in the first place, and helps prevent issues with streaming media servers, such as #8302.

Signed-off-by: Denis Kasak dkasak@termina.org.uk
2022-02-10 15:43:01 +00:00
Patrick Cloke
807efd26ae Support rendering previews with data: URLs in them (#11767)
Images which are data URLs will no longer break URL
previews and will properly be "downloaded" and
thumbnailed.
2022-01-24 08:58:18 -05:00
Patrick Cloke
eb39da6782 Move HTML parsing to a separate file for URL previews. (#11566)
* Splits the logic for parsing HTML from the resource handling code.
* Fix a circular import in the oEmbed code (which uses the HTML parsing code).
* Renames some of the HTML parsing methods to:
  * Make it clear which methods are "internal" to the module.
  * Clarify what the methods do.
2021-12-13 17:55:07 +00:00
Sean Quah
858d80bf0f Fix media repository failing when media store path contains symlinks (#11446) 2021-12-02 16:05:24 +00:00
Sean Quah
91f2bd0907 Prevent the media store from writing outside of the configured directory
Also tighten validation of server names by forbidding invalid characters
in IPv6 addresses and empty domain labels.
2021-11-19 13:39:15 +00:00
Shay
f5c6a80886 Handle missing Content-Type header when accessing remote media (#11200)
* add code to handle missing content-type header and a test to verify that it works

* add handling for missing content-type in the /upload endpoint as well

* slightly refactor test code to put private method in approriate place

* handle possible null value for content-type when pulling from the local db

* add changelog

* refactor test and add code to handle missing content-type in cached remote media

* requested changes

* Update changelog.d/11200.bugfix

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
2021-11-01 10:26:02 -07:00
Patrick Cloke
732bbf6737 Be more lenient when parsing the version for oEmbed responses. (#11065) 2021-10-13 07:00:07 -04:00
Sean Quah
84f5d83257 Add tests for MediaFilePaths (#11057) 2021-10-12 18:19:35 +01:00
Patrick Cloke
1b112840d2 Autodiscover oEmbed endpoint from returned HTML (#10822)
Searches the returned HTML for an oEmbed endpoint using the
autodiscovery mechanism (`<link rel=...>`), and will request it
to generate the preview.
2021-10-08 14:14:42 -04:00
Sean Quah
2be0fde3d6 Fix empty url_cache_thumbnails/yyyy-mm-dd/ directories being left behind (#10924) 2021-09-29 10:24:37 +01:00