'end_token' and 'start_token' have been refactored a bit because we need
to take the bounds of the examined events *before* filtering, otherwise
we'll send a pagination token to the client that is inside the set of
events we examined on this call. In extreme situations, this may leave a
client unable to make progress at all, because the first event that
matches if filter is more than 'load_limit' away from the base event.
This bug was present before the filtering implementation, but was less
significant because we only dropped events when they were not visible to
the user.
This seems to be completely redundant with the 'limit' body parameter,
with the only difference being:
> The filter may be applied before or/and after the limit parameter -
filter> whichever the homeserver prefers.
This sentence seems to apply to the 'limit' body parameter, but not to
the 'limit' field on the filter. This was probably unintentional on the
part of the spec authors, but I've switched to using same 'load_limit'
pattern we're using elsewhere anyway.
I've asked a few times for clarification on whether the `senders` field
in the filter applies to userids mentioned in the typing/receipt ephemeral
events, and never got a response. Synapse does not filter these userids by
sender, so we're gonna go with that.
This is the filter.room.timeline.{senders,types,contains_url} fields, and
their associated not_* pairs.
I decided not to change the `prev_batch` calculation for sliding-sync to
use the new `oldest_event_count` value, because I'm not confident in the
correct behavior. The current sliding-sync behavior is gives `prev_batch
= oldest_event_count` except when there are no new events. In this
case, `oldest_event_count` is `None`, but the current sliding-sync
implementation uses `prev_batch = since`. This is definitely wrong,
because both `since` and `prev_batch` are exclusive bounds. If the
correct thing to do is to return the lower exclusive bound of the range
of events that may have been included in the timeline, then we would
want `since - 1`. The other option would be to return `prev_batch =
None`, like we have in sync v3. I don't know which of these is correct,
so I'm just gonna keep the current (definitely incorrect) behavior to
avoid making things worse.
Before this change we were just returning an empty object for left or
invited rooms that don't have any updates. This is valid coloredding to
the spec, but it's a nicer to debug if they are omitted it and results in
a little less network traffic. For joined rooms, we are already skipping
empty updates.
With filtering support, it's much more common to have sync responses where
many rooms are empty, because all of the state/timeline events may be
filtered out.
I asked in #matrix-spec:matrix.org and go clarification that we should be
omitting the timeline field completely for rooms that are filtered out
by the timeline.(not_)rooms filter. Ruma's skip_serializing_if attribute
on the timeline field will currently cause it to be omitted when events is
empty. If [this fix][1] is merged, it will be omitted only when events is
empty, prev_batch is None, and limited is false.
[1]: https://github.com/ruma/ruma/pull/1796
TODO: maybe do something about clippy::too_many_arguments
These are the fields at filter.room.{rooms,not_rooms}, that apply to all
categories. The category-specific room filters are in
filter.room.{state,timeline,ephemeral}.{rooms,not_rooms}.
One thing I'm a little worried about with this implementation is that
it's possible for some wildcard expressions to result in a
regex::Error::CompiledTooBig error. It seems like rejecting patterns that
would result in a ReDOS is a good idea, but the matrix spec doesn't say
anything about it.
The plan is to move all of the per-event checks into the
pdu_event_allowed function.
TODO: split the `visibility_filter` function into it's own commit. I
think this one was inherited from an earlier conduwuit commit.
As far as I can tell, 'filter.limit' and the 'limit' query parameter are
completely redundant. I've moved the 'take(limit)' call until after
filtering, to ensure that we can return up to 'limit' events even when
some are rejected by the filter. In a future commit, I will add a global
limit on loaded events to avoid DoS.
I really doubt anybody is sending /message requests with a filter that
rejects the entire request, but it's the first step in the filter
implementation.
Now it includes the user, room, and event ID. As a bonus, the sync
function is now slightly less gigantic.
TODO: put this in a separate MR, and include a similar change for
invited rooms
The previous code would drop some events entirely if any events between
`skip` and `skip + limit` were not visible to the user. This would cause
the set of events skipped by the `skip(skip)` method to extend past
`skip` in the raw result set, because `skip(skip)` was being called
*after* filtering out invisible events.
This bug will become much more severe with a full filtering
implementation, because it will be more likely for events to be filtered
out. Currently, it is only possible to trigger with rooms that have
history visibility set to "invited" or "joined".
The previous code would fail to return next_batch if any of the events
in the window were not visible to the user. It would also return an
unnecessary next_batch when no more results are available if the total
number of results is exactly `skip + limit`.
This bug will become much more severe with a full filtering
implementation, because we will be more likely to trigger it by
filtering out events in a search call. Currently, it is only possible to
trigger with rooms that have history visibility set to "invited" or
"joined".
* Extract into reusable function (we'll need this later)
* Merge the values we want to override over the defaults, instead of
dropping the defaults completely
* Don't unnecessarily allocate (the `vec![]` usage is gone)