7:00 AM — Our engineering team detected a sharp rise in database session pool timeouts in the Conversations service and began investigating. This was the same symptom that caused the outage on Friday, April 24th.
Root cause — Similar to Friday's instance, we traced the degradation to session exhaustion in our primary database for Conversations. However, for this outage, we isolated the session exhaustion specifically to the message hot path, where internal operations were holding database sessions longer than necessary, and under load the pool could not keep up — causing requests across the service to time out.
Response — We shipped a set of targeted fixes that reduce session pressure on a go-forward basis: removing a redundant secondary-lookup path so each message performs less database work, and tightening request cancellation in our conversation-details endpoint so failing sub-requests release their sessions immediately rather than waiting on a full timeout.
We believe this issue has been permanently resolved on a go-forward basis, and will be monitoring our core systems with a different pattern to detect and prevent this issue from happening again earlier.
We apologize once more for any inconvenience this may have caused.