Previously we were only using trust-dns for resolving SRV records in
server discovery, and then for resolving the hostname from the SRV
record target if one exists. With the previous behavior, admins need to
ensure that both their system resolver and trust-dns are working
correctly in order for outgoing traffic to work reliably. This can be
confusing to debug, because it's not obvious to the admin if or when
each resolver are being used. Now, everything goes through trust-dns and
outgoing federation DNS should fail/succeed more predictably.
I also expect some performance improvement from having an in-process DNS
cache, but haven't taken measurements yet.
This used to be supported, as we explicitly call std::fs::create_dir_all
when initializing these, but it was broken in
b01b70fc20, which attempts to canonicalize
the paths to check for overlap before creating them.
Previously, read receipts would only be forwarded via federation
incidentally when some PDU was later sent to the destination server.
Trigger a send without any event to collect EDUs and get read receipts
out directly.
For example, if the database path is `/foo` and the media path is
`/foo/bar`, but `/foo/bar` is a symlink or hardlink to `/baz`, the
previous check would pass. The whole point of this check is to ensure
that the database and media data can't step on each other, so this check
is needed to deny that kind of situation as well.
It would probably be good to add a test for this behavior.
The primary motivation for this change is to support databases that
don't take a path, e.g. out of process databases.
This configuration structure leaves the door open for other media
storage mechanisms in the future, such as S3.
It's also structured to avoid `#[serde(flatten)]` so that we can use
`#[serde(deny_unknown_fields)]`.
Previously we required every alias in a canonical alias event sent by a
client to be valid, and would only validate local aliases. This
prevented clients from adding/removing canonical aliases if there were
existing remote or invalid aliases.
Previously, we would only attempt to validate the aliases in the event
content if we were able to parse the event, and would silently allow it
otherwise.
The previous logic would increment the backoff counter both when a
request actually fails and when we do not make a request because the
server was already in backoff. This lead to a positive feedback loop
where every request made while a server is in backoff increases the
backoff delay, making it impossible to recover from backoff unless the
entire backoff delay elapses with zero requests.
Failing to reset the backoff state resulted in a monotonically
increasing backoff delay. If a remote server was temporarily
unavailable, we would have a persistently increased rate of key query
failures until the backoff state was reset by a server restart. If
enough key queries were attempted while the remote was unavailable, it
can accumulate an arbitrarily long backoff delay and effectively block
all future key queries to this server.
This is pure code-motion, with no behavior changes. The new structure
will make it easier to fix the backoff behavior, and makes the code
somewhat less of a nightmare to follow.
Previously attempting to delete an MXC that is only associated with
dangling thumbnails would fail, because it assumes that every thumbnail
must have a corresponding original in the db, and errors out if it can't
find the original. This is incorrect because we create dangling
thumbnails when requesting a remote thumbnail over federation when we
don't have the original file.
When requesting remote thumbnails over federation, we can end up with a
thumbnail in the media db without an associated original file. Because
of this, skipping thumbnails is insufficient to get a list of all MXCs.
Previously we were only skipping thumbnails that had both dimensions
nonzero. I think the assumption was that only the dimensions allowed by
services::media::thumbnail_properties would be used. This is not the
case because we have used arbitrary thumbnail dimensions when requesting
remote thumbnails.
Now that we are able to distinguish between corrupted media keys and
missing files, it makes more sense to propagate the corrupted keys up to
the caller.
This is useful to easily distinguish missing files from corrupted keys.
All existing usage sites have been modified so there is no behavior
change in this commit.
We *should* ensure that media deletion is always successful, but when a
bug causes a single object to fail deletion it's better to try to delete
the remaining objects than to give up entirely.