perf: fix notification read replica scan path#983
Merged
Conversation
37f2f10 to
f84ad2d
Compare
f84ad2d to
e8d453a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This started as notification timeout protection, but the deeper replica issue is the notification read path for high-fanout users.
Live diagnosis on the read replica showed:
notificationhas about 23.5M rows/v1/notifications/..., includinglimit=0unread pollingARRAY[@user_id] && user_idsfall off the GIN path into a parallel scan of the whole notification tablepg_statshad{51}as the topuser_idsMCV, around 7 percent of the tablefollow:51alone had about 843k notification rowsThis PR now addresses the scan path directly:
(user_ids[1], timestamp DESC, group_id DESC, type) WHERE array_length(user_ids, 1) = 1limit=0unread-poll fast path so polling does not hydrate full notification payloadsI deliberately did not trim grouped
actionsin this API-only PR. The apps adapter derivesuserIdsand exact user-list counts from the action array, so capping that payload safely needs an API/client contract change.Verification
go test ./api -run "TestV1Notifications" -count=1go test ./...ddl/migrations/0227_notification_single_recipient_user_timestamp_idx.sqlagainst localtest_apiEXPLAINusesnotification_single_recipient_user_timestamp_idxfor the single-recipient predicate