Commit graph

336 commits

Author SHA1 Message Date
Charles Hall
9d8e1a1490
fix api/server_server events 2024-07-16 11:12:07 -07:00
Charles Hall
db666fe903
fix api/client_server/directory events 2024-07-16 11:12:07 -07:00
Charles Hall
b6cba0c4ae
extract closure into a function
This was mostly written by using rust-analyzer's "extract to function"
and "extract to variable" functionality.
2024-07-16 11:12:07 -07:00
Charles Hall
e49fe04f10
fix api/appservice_server events 2024-07-16 11:12:07 -07:00
Charles Hall
f5e10f5a8f
fix api/ruma_wrapper/axum events 2024-07-16 11:12:07 -07:00
Charles Hall
ca4f780c93
fix api/client_server/membership events 2024-07-16 11:12:07 -07:00
Charles Hall
cb036593ea
refactor send_request in api/server_server
Seriously, what is going on with the control flow in this codebase?
2024-06-24 12:43:28 -07:00
Charles Hall
e83a30af4b
reduce duplicate events
I hate `log_error`. A better way to do this would be to not reuse the
same error type literally everywhere, so you could distinguish, in
`crate::service::sending::Service::handle_response`, whether to emit an
event based on which function created the error. Fixing that is a lot
more work, though.
2024-06-24 12:40:59 -07:00
Charles Hall
12b0fb7f91
don't write KBs of html to the logs
Handing this to tracing as a String makes it automatically escape
newlines and such.
2024-06-24 12:40:57 -07:00
Charles Hall
c7e03a06f7
refuse admin room alias changes unless admin bot
I.e. don't allow the `#admins:example.com` alias to be set or unset by
any user other than `@grapevine:example.com`.
2024-06-12 18:36:55 -07:00
Matthias Ahouansou
9087da91db
fix(keys): only use keys valid at the time of PDU or transaction, and actually refresh keys
Previously, we only fetched keys once, only requesting them again if we have any missing, allowing for ancient keys to be used to sign PDUs and transactions
Now we refresh keys that either have or are about to expire, preventing attacks that make use of leaked private keys of a homeserver
We also ensure that when validating PDUs or transactions, that they are valid at the origin_server_ts or time of us receiving the transaction respectfully
As to not break event authorization for old rooms, we need to keep old keys around
We move verify_keys which we no longer see in direct requests to the origin to old_verify_keys
We keep old_verify_keys indefinitely as mentioned above, as to not break event authorization (at least until a future MSC addresses this)

Original patch by Matthias. Benjamin just rebased it onto grapevine and
fixed clippy/rustc warnings.

Co-authored-by: Benjamin Lee <benjamin@computer.surgery>
2024-06-12 11:10:50 -07:00
Matthias Ahouansou
da99b0706e
fix(edus): ensure sender server is the same as the user in the content
Original patch by Matthias. Benjamin modified the logic to include
logging when an event was rejected, for consistency with the existing
check on device key updates.

Co-authored-by: Benjamin Lee <benjamin@computer.surgery>
2024-06-12 10:36:41 -07:00
Benjamin Lee
83cdc9c708
drop redacted events from search results 2024-06-12 10:32:36 -07:00
Charles Hall
0c2094a56f
record FoundIn with metrics instead of traces
This is much more efficient in terms of network use and data storage,
and also easier to visualize.
2024-06-06 20:56:36 -07:00
Charles Hall
22dd7f1a54
move FoundIn to observability.rs 2024-06-05 17:41:36 -07:00
Charles Hall
71f3d84115
rename password-related utils functions 2024-06-04 19:35:25 -07:00
Charles Hall
aa4cd8b1e1
switch to RustCrypto's argon2 crate 2024-06-04 19:35:21 -07:00
Lambda
7dbae9ac2b
Fix tracing in send_request() 2024-06-04 13:32:32 -07:00
Lambda
c6f75a1b93
Add tracing span for Ar::from_request()
The fact that this is called for every request is somewhat obscured, it
should be obvious in tracing at least.
2024-06-04 13:32:32 -07:00
Lambda
f7a0e3012b
server_server: log ignored signing key updates 2024-06-04 13:32:32 -07:00
Lambda
88bb2ea600
Remove redundant span attributes
There's no need to record attributes that are already present in all
callers.
2024-06-04 13:32:31 -07:00
Lambda
f35cbfd89e
More tracing spans 2024-06-04 13:32:31 -07:00
Lambda
148df18989
Stop debug-logging every incoming request 2024-06-04 13:32:31 -07:00
Benjamin Lee
3551a6ef7a
fix dropped events in search
The previous code would drop some events entirely if any events between
`skip` and `skip + limit` were not visible to the user. This would cause
the set of events skipped by the `skip(skip)` method to extend past
`skip` in the raw result set, because `skip(skip)` was being called
*after* filtering out invisible events.

This bug will become much more severe with a full filtering
implementation, because it will be more likely for events to be filtered
out. Currently, it is only possible to trigger with rooms that have
history visibility set to "invited" or "joined".
2024-06-04 01:12:53 -07:00
Benjamin Lee
0cdf03288a
fix missing next_batch for search
The previous code would fail to return next_batch if any of the events
in the window were not visible to the user. It would also return an
unnecessary next_batch when no more results are available if the total
number of results is exactly `skip + limit`.

This bug will become much more severe with a full filtering
implementation, because we will be more likely to trigger it by
filtering out events in a search call. Currently, it is only possible to
trigger with rooms that have history visibility set to "invited" or
"joined".
2024-06-04 01:12:53 -07:00
Benjamin Lee
c64a474954
workaround to fix search in element
We inherited a similar workaround from conduit, but removed it in
71c48f66c4. At the time, it was not clear
that this had broken search.

Fixes: !26
2024-06-03 18:02:41 -07:00
Lambda
5c39c7c5ff Use destination field in X-Matrix Authorization header
Both validating and sending it is a MUST since Matrix v1.3.
2024-06-03 20:50:49 +00:00
Lambda
62dd097f49 Use Ruma XMatrix type instead of rolling our own
Both the hand-rolled parser and serialization were wrong in countless
ways. The current Ruma parser is much better, and the Ruma serialization
will be fixed by https://github.com/ruma/ruma/pull/1830.
2024-06-03 20:50:49 +00:00
Lambda
bf1d54defc axum: factor out non-generic parts of request conversion
This saves ~10% in binary size!
2024-05-31 10:42:47 +00:00
Benjamin Lee
ec1b086a35
very minor cleanup in the sync endpoint
I meant to do this in 146465693e, but
looks like I forgot.
2024-05-30 10:19:24 -07:00
Charles Hall
8f0fdfb2f2
upgrade all cargo dependencies
Unfortunately we need to pull tracing-opentelemetry from git because
there hasn't been a release including the dependency bump on the other
opentelemetry crates.
2024-05-26 19:47:00 -07:00
Charles Hall
3daf2229d6
enable option_as_ref_cloned lint 2024-05-26 19:47:00 -07:00
Charles Hall
eaeb7620d9
enable multiple_bound_locations lint 2024-05-26 19:47:00 -07:00
Charles Hall
92d9f81a78
enable mixed_attributes_style lint 2024-05-26 19:47:00 -07:00
Charles Hall
c9859a9b2d
enable assigning_clones lint 2024-05-26 19:47:00 -07:00
Charles Hall
793d809ac6
enable unused_qualifications lint 2024-05-26 19:47:00 -07:00
Lambda
67cb6f817d Instrument caches 2024-05-22 20:10:42 +00:00
Charles Hall
a60a9551e1
Revert "Merge branch 'check-if-membership-is-case-endpoints' into 'next'"
This reverts commit 7ace9b0dff, reversing
changes made to 624654a88b.
2024-05-21 16:34:26 -07:00
Benjamin Lee
8d09a7e490 don't return extra member count or e2ee device updates from sync
Previously, we were returning redundant member count updates or encrypted
device updates from the /sync endpoint in some cases. The extra member
count updates are spec-compliant, but unnecessary, while the extra
encrypted device updates violate the spec.

The refactor necessary to fix this bug is also necessary to support
filtering on state events in sync.

Details:

Joined room incremental sync needs to examine state events for four
purposes:

 1. determining whether we need to return an update to room member counts
 2. determining the set of left/joined devices for encrypted rooms
    (returned in `device_lists`)
 3. returning state events to the client (in `rooms.joined.*.state`)
 4. tracking which member events we have sent to the client, so they can
    be omitted on future requests when lazy-loading is enabled.

The state events that we need to examine for the first two cases is member
events in the delta between `since` and the end of `timeline`. For the
second two cases, we need the delta between `since` and the start of
`timeline`, plus contextual member events for any senders that occur in
`timeline`. The second list is subject to filtering, while the first is
not.

Before this change, we were using the same set of state events that we are
returning to the client (cases 3/4) to do the analysis for cases 1/2.
In a compliant implementation, this would result in us missing some
relevant member events in 1/2 in addition to seeing redundant member
events. In current grapevine this is not the case because the set of
events that we return to the client is always a superset of the set that
is needed for cases 1/2. This is because we don't support filtering, and
we have an existing bug[1] where we are returning the delta between
`since` and the end of `timeline` rather than the start.

[1]: https://gitlab.computer.surgery/matrix/grapevine-fork/-/issues/5

Fixing this is necessary to implement filtering because otherwise
we would start missing some member events for member count or encrypted
device updates if the relevant member events are rejected by the filter.
This would be much worse than our current behavior.
2024-05-20 21:13:13 +00:00
Charles Hall
a60501189d
prevent xss via user-uploaded media
Previously, `Content-Disposition` was always set to `inline`, even for
HTML, which means that XSS could be easily acheived by uploading
malicious HTML and getting someone to click on the Matrix HTTP API link
for that piece of media. Now, we have an allowlist of safe values for
`Content-Type` that use `inline` while everything else defaults to
`attachment`, including HTML and SVG, which prevents XSS.

We also set the `Content-Security-Policy` header because why not.

A `set_header_or_panic` function is introduced to do what it says in
case Ruma begins providing better or worse values for the relevant
headers in the future. The safest way to handle such a case is simply
to panic.
2024-05-19 21:05:02 -07:00
Charles Hall
f8961d5578
rename Ruma to Ar
This follows the pattern of the previous commit.
2024-05-19 19:04:20 -07:00
Charles Hall
7ea98dac72
rename RumaResponse to Ra
It's very commonly used so having a short name is worthwhile, I think.
2024-05-19 19:03:45 -07:00
Charles Hall
230ebd3884
don't automatically wrap in RumaResponse
This allows us to use the `ruma_route` convenience function even when we
need to add our own hacks into the responses, thus making us less
reliant on Ruma.
2024-05-18 18:31:36 -07:00
Charles Hall
87ac0e2a38
don't log that federation is disabled
This mostly just spams the logs with useless information when doing
cursed local testing.
2024-05-16 22:33:37 -07:00
Charles Hall
93a2bf9c93
change FedDest::{into -> to}_*
They don't need to take ownership of `self`.
2024-05-16 22:15:06 -07:00
Benjamin Lee
146465693e
remove sync response cache
This cache can serve invalid responses, and has an extremely low hit
rate.

It serves invalid responses because because it's only keyed off
the `since` parameter, but many of the other request parameters also
affect the response or it's side effects. This will become worse once we
implement filtering, because there will be a wider space of parameters
with different responses. This problem is fixable, but not worth it
because of the low hit rate.

The low hit rate is because normal clients will always issue the next
sync request with `since` set to the `prev_batch` value of the previous
response. The only time we expect to see multiple requests with the same
`since` is when the response is empty, but we don't cache empty
responses.

This was confirmed experimentally by logging cache hits and misses over
15 minutes with a wide variety of clients. This test was run on
matrix.computer.surgery, which has only a few active users, but a
large volume of sync traffic from many rooms. Over the test period, we
had 3 hits and 5309 misses. All hits occurred in the first minute, so I
suspect that they had something to do with client recovery from an
offline state. The clients that were connected during the test are:

 - element web
 - schildichat web
 - iamb
 - gomuks
 - nheko
 - fractal
 - fluffychat web
 - fluffychat android
 - cinny web
 - element android
 - element X android

Fixes: #2
2024-05-16 21:33:06 -07:00
Charles Hall
5cb2551422
enable error_on_line_overflow and fix errors
These required some manual intervention.
2024-05-16 19:11:40 -07:00
Charles Hall
0afc1d2f50
change rustfmt configuration
This change is fully automated, except the `rustfmt.toml` changes and
a few clippy directives to allow specific functions with too many lines
because they are longer now.
2024-05-16 19:11:40 -07:00
Charles Hall
40d6ce230d
reformat report formatting
I manually expanded the HTML into a more readable format, the rest was
rustfmt's doing. It's beyond me why/how someone would willing write a
pile of HTML like that...
2024-05-16 19:10:52 -07:00
Charles Hall
ac53948450
use more, qualify less
Doing this will allow `rustfmt` to collapse lines more efficiently.
Specifically, a lot of these lines fail to wrap to 80 columns without
these changes.
2024-05-16 19:09:10 -07:00