reset device key query backoff after a successful request

Failing to reset the backoff state resulted in a monotonically
increasing backoff delay. If a remote server was temporarily
unavailable, we would have a persistently increased rate of key query
failures until the backoff state was reset by a server restart. If
enough key queries were attempted while the remote was unavailable, it
can accumulate an arbitrarily long backoff delay and effectively block
all future key queries to this server.
This commit is contained in:
Olivia Lee 2024-12-01 16:06:09 -08:00
parent 79cedccdb6
commit 4ee8312068
No known key found for this signature in database
GPG key ID: 54D568A15B9CD1F9
2 changed files with 15 additions and 3 deletions

View file

@ -205,6 +205,10 @@ This will be the first release of Grapevine since it was forked from Conduit
21. Fix tiebreaking comparisons between events during state resolution. This
will reduce the rate at which servers disagree about the state of rooms.
([!141](https://gitlab.computer.surgery/matrix/grapevine/-/merge_requests/141))
22. Fix bug where the backoff state for remote device key queries was not reset
after a successful request, causing an increasing rate of key query failures
over time until a server restart.
([!149](https://gitlab.computer.surgery/matrix/grapevine/-/merge_requests/149))
### Added

View file

@ -486,6 +486,11 @@ async fn back_off_key_requests(server: OwnedServerName) {
}
}
/// Stops backing off remote device key requests to a server after a success.
async fn reset_key_request_back_off(server: &ServerName) {
services().globals.bad_query_ratelimiter.write().await.remove(server);
}
/// Requests device keys from a remote server, unless the server is in backoff.
///
/// Updates backoff state depending on the result of the request.
@ -494,9 +499,12 @@ async fn request_keys_from(
keys: Vec<(&UserId, &Vec<OwnedDeviceId>)>,
) -> Result<federation::keys::get_keys::v1::Response> {
let result = request_keys_from_inner(server, keys).await;
if let Err(error) = &result {
debug!(%server, %error, "remote device key query failed");
back_off_key_requests(server.to_owned()).await;
match &result {
Ok(_) => reset_key_request_back_off(server).await,
Err(error) => {
debug!(%server, %error, "remote device key query failed");
back_off_key_requests(server.to_owned()).await;
}
}
result
}