Skip to content

perf: fix notification read replica scan path#983

Merged
raymondjacobson merged 1 commit into
mainfrom
codex/replica-safe-notifications
Jul 1, 2026
Merged

perf: fix notification read replica scan path#983
raymondjacobson merged 1 commit into
mainfrom
codex/replica-safe-notifications

Conversation

@raymondjacobson

@raymondjacobson raymondjacobson commented Jun 30, 2026

Copy link
Copy Markdown
Member

Summary

This started as notification timeout protection, but the deeper replica issue is the notification read path for high-fanout users.

Live diagnosis on the read replica showed:

  • notification has about 23.5M rows
  • the hot endpoint is /v1/notifications/..., including limit=0 unread polling
  • high-fanout recipients can make ARRAY[@user_id] && user_ids fall off the GIN path into a parallel scan of the whole notification table
  • pg_stats had {51} as the top user_ids MCV, around 7 percent of the table
  • follow:51 alone had about 843k notification rows
  • these long read snapshots match the replica conflict pattern driving block_diff up

This PR now addresses the scan path directly:

  • adds a concurrent expression btree index for the common single-recipient shape: (user_ids[1], timestamp DESC, group_id DESC, type) WHERE array_length(user_ids, 1) = 1
  • rewrites full notification reads and unread polling to split matching into:
    • a single-recipient branch that can use the new btree index
    • a multi-recipient fallback branch that keeps using the existing array overlap semantics
  • preserves multi-recipient notifications with regression coverage
  • keeps the limit=0 unread-poll fast path so polling does not hydrate full notification payloads
  • keeps the 8s timeout as defense-in-depth, not the primary fix

I deliberately did not trim grouped actions in this API-only PR. The apps adapter derives userIds and exact user-list counts from the action array, so capping that payload safely needs an API/client contract change.

Verification

  • go test ./api -run "TestV1Notifications" -count=1
  • go test ./...
  • Applied ddl/migrations/0227_notification_single_recipient_user_timestamp_idx.sql against local test_api
  • Confirmed EXPLAIN uses notification_single_recipient_user_timestamp_idx for the single-recipient predicate

@raymondjacobson raymondjacobson force-pushed the codex/replica-safe-notifications branch from 37f2f10 to f84ad2d Compare June 30, 2026 22:35
@raymondjacobson raymondjacobson changed the title perf: reduce notification read replica pressure perf: fix notification read replica scan path Jun 30, 2026
@raymondjacobson raymondjacobson force-pushed the codex/replica-safe-notifications branch from f84ad2d to e8d453a Compare July 1, 2026 06:06
@raymondjacobson raymondjacobson merged commit e36b83c into main Jul 1, 2026
5 checks passed
@raymondjacobson raymondjacobson deleted the codex/replica-safe-notifications branch July 1, 2026 06:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant