Currently we have some exponential backoff logic scattered in different
locations, with multiple distinct bad implementations. We should
centralize backoff logic in one place and actually do it correctly.
This backoff logic is similar to synapse's implementation[1], with a
couple fixes:
- we wait until we observe 5 consecutive failures before we start
delaying requests, to avoid being sensitive to a small fraction of
failed requests on an otherwise healthy server.
- synapse's implementation is kinda similar to our "only increment the
failure count once per batch of concurrent requests" behavoir, where
they base the retry state written to the store on the state observed
at the beginning of the request, rather on the state observed at the
end of the request. Their implementation has a bug, where a success
will be ignored if a failure occurs in the same batch. We do not
replicate this bug.
Our parameter choices are significantly less aggressive than synapse[2], which
starts at 10m delay, has a multiplier of 2, and saturates at 4d delay.
[1]: 70b0e38603/synapse/util/retryutils.py
[2]: 70b0e38603/synapse/config/federation.py (L83)
Mainly to make it easier to initialize the SERVICES global correctly in
more than one place.
Also this stuff really shouldn't live at the crate root anyway.
ReloadHandle is taken from conduwuit commit
8a5599adf9eafe9111f3d1597f8fb333b8b76849, authored by Benjamin.
Co-authored-by: Benjamin Lee <benjamin@computer.surgery>
This change is fully automated, except the `rustfmt.toml` changes and
a few clippy directives to allow specific functions with too many lines
because they are longer now.