The spec defines that the media endpoints should return 504 when a file
is not-yet-uploaded, which has been interpreted to include when a file
was deleted. Modifies the /media/v3/download/ and /media/r0/thumbnail
endpoints.
Some duplicated-ish code from src/database/key_value/rooms/timeline.rs
about handling errors from `pdus_since`/`pdus_until`, it seems like the
error message was actually directly copy-pasted from there because it
referred to the wrong function lol
Previously, we only fetched keys once, only requesting them again if we have any missing, allowing for ancient keys to be used to sign PDUs and transactions
Now we refresh keys that either have or are about to expire, preventing attacks that make use of leaked private keys of a homeserver
We also ensure that when validating PDUs or transactions, that they are valid at the origin_server_ts or time of us receiving the transaction respectfully
As to not break event authorization for old rooms, we need to keep old keys around
We move verify_keys which we no longer see in direct requests to the origin to old_verify_keys
We keep old_verify_keys indefinitely as mentioned above, as to not break event authorization (at least until a future MSC addresses this)
Original patch by Matthias. Benjamin just rebased it onto grapevine and
fixed clippy/rustc warnings.
Co-authored-by: Benjamin Lee <benjamin@computer.surgery>
The previous code would drop some events entirely if any events between
`skip` and `skip + limit` were not visible to the user. This would cause
the set of events skipped by the `skip(skip)` method to extend past
`skip` in the raw result set, because `skip(skip)` was being called
*after* filtering out invisible events.
This bug will become much more severe with a full filtering
implementation, because it will be more likely for events to be filtered
out. Currently, it is only possible to trigger with rooms that have
history visibility set to "invited" or "joined".
The previous code would fail to return next_batch if any of the events
in the window were not visible to the user. It would also return an
unnecessary next_batch when no more results are available if the total
number of results is exactly `skip + limit`.
This bug will become much more severe with a full filtering
implementation, because we will be more likely to trigger it by
filtering out events in a search call. Currently, it is only possible to
trigger with rooms that have history visibility set to "invited" or
"joined".
Unfortunately we need to pull tracing-opentelemetry from git because
there hasn't been a release including the dependency bump on the other
opentelemetry crates.
Previously, we were returning redundant member count updates or encrypted
device updates from the /sync endpoint in some cases. The extra member
count updates are spec-compliant, but unnecessary, while the extra
encrypted device updates violate the spec.
The refactor necessary to fix this bug is also necessary to support
filtering on state events in sync.
Details:
Joined room incremental sync needs to examine state events for four
purposes:
1. determining whether we need to return an update to room member counts
2. determining the set of left/joined devices for encrypted rooms
(returned in `device_lists`)
3. returning state events to the client (in `rooms.joined.*.state`)
4. tracking which member events we have sent to the client, so they can
be omitted on future requests when lazy-loading is enabled.
The state events that we need to examine for the first two cases is member
events in the delta between `since` and the end of `timeline`. For the
second two cases, we need the delta between `since` and the start of
`timeline`, plus contextual member events for any senders that occur in
`timeline`. The second list is subject to filtering, while the first is
not.
Before this change, we were using the same set of state events that we are
returning to the client (cases 3/4) to do the analysis for cases 1/2.
In a compliant implementation, this would result in us missing some
relevant member events in 1/2 in addition to seeing redundant member
events. In current grapevine this is not the case because the set of
events that we return to the client is always a superset of the set that
is needed for cases 1/2. This is because we don't support filtering, and
we have an existing bug[1] where we are returning the delta between
`since` and the end of `timeline` rather than the start.
[1]: https://gitlab.computer.surgery/matrix/grapevine-fork/-/issues/5
Fixing this is necessary to implement filtering because otherwise
we would start missing some member events for member count or encrypted
device updates if the relevant member events are rejected by the filter.
This would be much worse than our current behavior.
Previously, `Content-Disposition` was always set to `inline`, even for
HTML, which means that XSS could be easily acheived by uploading
malicious HTML and getting someone to click on the Matrix HTTP API link
for that piece of media. Now, we have an allowlist of safe values for
`Content-Type` that use `inline` while everything else defaults to
`attachment`, including HTML and SVG, which prevents XSS.
We also set the `Content-Security-Policy` header because why not.
A `set_header_or_panic` function is introduced to do what it says in
case Ruma begins providing better or worse values for the relevant
headers in the future. The safest way to handle such a case is simply
to panic.
This allows us to use the `ruma_route` convenience function even when we
need to add our own hacks into the responses, thus making us less
reliant on Ruma.
This cache can serve invalid responses, and has an extremely low hit
rate.
It serves invalid responses because because it's only keyed off
the `since` parameter, but many of the other request parameters also
affect the response or it's side effects. This will become worse once we
implement filtering, because there will be a wider space of parameters
with different responses. This problem is fixable, but not worth it
because of the low hit rate.
The low hit rate is because normal clients will always issue the next
sync request with `since` set to the `prev_batch` value of the previous
response. The only time we expect to see multiple requests with the same
`since` is when the response is empty, but we don't cache empty
responses.
This was confirmed experimentally by logging cache hits and misses over
15 minutes with a wide variety of clients. This test was run on
matrix.computer.surgery, which has only a few active users, but a
large volume of sync traffic from many rooms. Over the test period, we
had 3 hits and 5309 misses. All hits occurred in the first minute, so I
suspect that they had something to do with client recovery from an
offline state. The clients that were connected during the test are:
- element web
- schildichat web
- iamb
- gomuks
- nheko
- fractal
- fluffychat web
- fluffychat android
- cinny web
- element android
- element X android
Fixes: #2
This change is fully automated, except the `rustfmt.toml` changes and
a few clippy directives to allow specific functions with too many lines
because they are longer now.
I manually expanded the HTML into a more readable format, the rest was
rustfmt's doing. It's beyond me why/how someone would willing write a
pile of HTML like that...
Doing this will allow `rustfmt` to collapse lines more efficiently.
Specifically, a lot of these lines fail to wrap to 80 columns without
these changes.