Skip to content

fix: suffixes_prefixes_titles always reflects current set state#166

Open
gaoflow wants to merge 1 commit into
derek73:masterfrom
gaoflow:fix/suffixes-prefixes-titles-stale-cache
Open

fix: suffixes_prefixes_titles always reflects current set state#166
gaoflow wants to merge 1 commit into
derek73:masterfrom
gaoflow:fix/suffixes-prefixes-titles-stale-cache

Conversation

@gaoflow

@gaoflow gaoflow commented Jun 25, 2026

Copy link
Copy Markdown

Problem

Constants.suffixes_prefixes_titles cached its result in _pst after the
first access and never invalidated that cache. Any subsequent add() or
remove() call on titles, prefixes, suffix_acronyms, or
suffix_not_acronyms was silently ignored by the stale cache.

This creates an observable inconsistency: is_title() / is_prefix() /
is_suffix() query the live sets and return the correct answer, but
is_rootname() — which delegates to suffixes_prefixes_titles — keeps
returning the stale cached answer, causing it to contradict the other helpers.

Minimal reproduction:

from nameparser import HumanName
from nameparser.config import Constants

C = Constants()
# Access the property to prime the cache.
_ = C.suffixes_prefixes_titles

C.prefixes.add('xpfx')
hn = HumanName("", C)

# These two should agree — before this fix they don't:
print(hn.is_prefix('xpfx'))   # True  (correct — queries live set)
print(hn.is_rootname('xpfx')) # True  (wrong  — reads stale cache, should be False)

The same inconsistency affects titles and suffixes added or removed after the
property is first read. join_on_conjunctions relies on is_rootname to
count rootname_pieces and choose whether to join single-letter conjunctions,
so a stale _pst can silently skew that decision.

Fix

Drop _pst entirely and compute the set union fresh on every access. The
union of four SetManager instances is a simple O(n) operation with a small
constant and is called only during name parsing, so there is no meaningful
performance cost.

Tests

Five new tests in tests/test_constants.py verify that:

  • suffixes_prefixes_titles reflects titles and prefixes added after
    construction.
  • suffixes_prefixes_titles no longer contains a title that was removed.
  • is_rootname and is_title/is_prefix remain consistent after
    add()/remove() calls.

All 682 tests pass (672 pre-existing + 10 from the dual-run fixture applied
to the 5 new test methods).


This pull request was prepared with the assistance of AI, under my direction and review.

The `suffixes_prefixes_titles` property on `Constants` cached its result
in `_pst` after the first access.  Any subsequent `add()` or `remove()`
call on `titles`, `prefixes`, `suffix_acronyms`, or `suffix_not_acronyms`
was silently ignored by the cache, so `is_rootname()` kept returning the
stale result.

Concretely, a word added to `C.titles` after the property was first
accessed would still be treated as a rootname (first/middle/last) by
`join_on_conjunctions`, even though `is_title()` correctly returned
`True` for it.  This contradicts the documented behaviour of per-instance
config customisation described in AGENTS.md.

Fix: drop the `_pst` cache entirely and compute the union fresh on every
access.  The four-set union is cheap and the simplest correct approach.

Add five tests that assert the property and `is_rootname` stay consistent
with the live sets after `add()`/`remove()` calls.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant