remove sync response cache

This cache can serve invalid responses, and has an extremely low hit
rate.

It serves invalid responses because because it's only keyed off
the `since` parameter, but many of the other request parameters also
affect the response or it's side effects. This will become worse once we
implement filtering, because there will be a wider space of parameters
with different responses. This problem is fixable, but not worth it
because of the low hit rate.

The low hit rate is because normal clients will always issue the next
sync request with `since` set to the `prev_batch` value of the previous
response. The only time we expect to see multiple requests with the same
`since` is when the response is empty, but we don't cache empty
responses.

This was confirmed experimentally by logging cache hits and misses over
15 minutes with a wide variety of clients. This test was run on
matrix.computer.surgery, which has only a few active users, but a
large volume of sync traffic from many rooms. Over the test period, we
had 3 hits and 5309 misses. All hits occurred in the first minute, so I
suspect that they had something to do with client recovery from an
offline state. The clients that were connected during the test are:

 - element web
 - schildichat web
 - iamb
 - gomuks
 - nheko
 - fractal
 - fluffychat web
 - fluffychat android
 - cinny web
 - element android
 - element X android

Fixes: #2
This commit is contained in:
Benjamin Lee 2024-05-16 21:02:05 -07:00
parent 5cb2551422
commit 146465693e
No known key found for this signature in database
GPG key ID: FB9624E2885D55A4
2 changed files with 10 additions and 130 deletions

View file

@ -23,15 +23,12 @@ use hyper::{
};
use reqwest::dns::{Addrs, Resolve, Resolving};
use ruma::{
api::{
client::sync::sync_events,
federation::discovery::{ServerSigningKeys, VerifyKey},
},
api::federation::discovery::{ServerSigningKeys, VerifyKey},
serde::Base64,
DeviceId, OwnedDeviceId, OwnedEventId, OwnedRoomId, OwnedServerName,
OwnedServerSigningKeyId, OwnedUserId, RoomVersionId, ServerName, UserId,
DeviceId, OwnedEventId, OwnedRoomId, OwnedServerName,
OwnedServerSigningKeyId, RoomVersionId, ServerName, UserId,
};
use tokio::sync::{broadcast, watch::Receiver, Mutex, RwLock, Semaphore};
use tokio::sync::{broadcast, Mutex, RwLock, Semaphore};
use tracing::{error, info};
use trust_dns_resolver::TokioAsyncResolver;
@ -41,12 +38,6 @@ type WellKnownMap = HashMap<OwnedServerName, (FedDest, String)>;
type TlsNameMap = HashMap<String, (Vec<IpAddr>, u16)>;
// Time if last failed try, number of failed tries
type RateLimitState = (Instant, u32);
type SyncHandle = (
// since
Option<String>,
// rx
Receiver<Option<Result<sync_events::v3::Response>>>,
);
pub(crate) struct Service {
pub(crate) db: &'static dyn Data,
@ -70,8 +61,6 @@ pub(crate) struct Service {
Arc<RwLock<HashMap<OwnedServerName, RateLimitState>>>,
pub(crate) servername_ratelimiter:
Arc<RwLock<HashMap<OwnedServerName, Arc<Semaphore>>>>,
pub(crate) sync_receivers:
RwLock<HashMap<(OwnedUserId, OwnedDeviceId), SyncHandle>>,
pub(crate) roomid_mutex_insert:
RwLock<HashMap<OwnedRoomId, Arc<Mutex<()>>>>,
pub(crate) roomid_mutex_state: RwLock<HashMap<OwnedRoomId, Arc<Mutex<()>>>>,
@ -237,7 +226,6 @@ impl Service {
roomid_mutex_federation: RwLock::new(HashMap::new()),
roomid_federationhandletime: RwLock::new(HashMap::new()),
stateres_mutex: Arc::new(Mutex::new(())),
sync_receivers: RwLock::new(HashMap::new()),
rotate: RotationHandler::new(),
shutdown: AtomicBool::new(false),
};