Thoughts on Improving Messaging Protocols — Part 2, Matrix
A Proposal to Simplify and Speed Up Matrix Merge Operations
In the first part of this blog post, I explained how the Matrix protocol works, contrasted its design philosophy with XMPP, and discussed why these differences lead to performance costs in Matrix. Matrix processes each conversation as a graph of events, merged in real-time[1].
Merge operations can be costly in Matrix for large rooms, affecting both database storage and load and disk usage when memory is exhausted, reaching swap level.
That said, there is still room for improvement in the protocol. We have designed and tested slight changes that could make Matrix much more efficient for large rooms.
A Proposal to Simplify and Speed Up Merge Operations
Here is the rationale behind a proposal we have made to simplify and speed up merge operations:
State resolution v2 uses certain graph algorithms, which can result in at least linear processing time for the number of state events in a room’s DAG, creating a significant load on servers.
The goal of this issue is to discuss and develop changes to state resolution to achieve O(n log n) total processing time when handling a room with n state events (i.e., O(log n) on average) in realistic scenarios, while maintaining a good user experience.
The approach described below is closer to state resolution v1 but seeks to address state resets in a different way.
For more detail, you can read our proposal on the Matrix spec tracker: Make state resolution faster.
In simpler terms, we propose adding a version associated with each event_id
to simplify conflict management and introduce a heuristic that skips traversal of large parts of the graph.
Impact of the Proposal
From our initial assessment, in a very large room — such as one with 100,000 members — our approach could improve processing performance by 100x to 1000x, as the current processing cost scales with the number of users in the room. This improvement would enable smoother conversations, reduced lag, and more responsive interactions for end-users, while also reducing server infrastructure load and resource usage.
While our primary goal is to improve performance in very large rooms, these changes benefit all users by reducing overall server load and improving processing times across various room sizes.
We plan to implement this improvement in our own code to evaluate its real-world effectiveness while the Matrix team considers its potential value for the reference protocol.
- For those who remember, a conversation in Matrix is similar to the collaborative editing protocol built on top of XMPP for the Google Wave platform.