Skip to content

pgn: accept tag pairs with leading whitespace (fixes #1115)#1195

Open
gaoflow wants to merge 2 commits into
niklasf:masterfrom
gaoflow:fix/1115-pgn-whitespace-headers
Open

pgn: accept tag pairs with leading whitespace (fixes #1115)#1195
gaoflow wants to merge 2 commits into
niklasf:masterfrom
gaoflow:fix/1115-pgn-whitespace-headers

Conversation

@gaoflow

@gaoflow gaoflow commented Jun 25, 2026

Copy link
Copy Markdown

Summary

read_game() (and therefore read_headers()) failed to parse game
headers that are preceded by horizontal whitespace, e.g. a PGN string
that begins with " [Event ...]".

The PGN standard (section 2.1) specifies that white space is not
significant, so a tag pair indented with spaces or tabs should be
accepted.

Root cause

Two checks in the header-parsing loop used the raw line string:

  1. if not line.startswith("["): — breaks out of the header loop for any
    line that doesn't begin immediately with [.
  2. TAG_REGEX.match(line) — the regex is anchored at ^ and will not
    match a line with leading whitespace.

Together these caused the parser to treat an indented first tag as the
start of the movetext section, so no headers were ever parsed and the
game was returned with default header values.

Fix

Strip leading whitespace into a local stripped variable before the
startswith guard and the regex match. The original line value is
preserved for the movetext tokeniser that runs after the header loop.

Reproducer (from #1115)

import chess.pgn, io

pgn = io.StringIO(' [Event "Test"]\n[White "Alice"]\n[Black "Bob"]\n[Result "*"]\n\n1. e4 e5 *')
game = chess.pgn.read_game(pgn)
assert game.headers["Event"] == "Test"   # previously returned "?"
assert game.headers["White"] == "Alice"  # previously returned "?"

This pull request was prepared with the assistance of AI, under my direction and review.

The PGN standard specifies that whitespace is not significant.
`read_game()` was checking `line.startswith("[")` and matching
`TAG_REGEX` against the raw line, both of which fail when a tag pair
is preceded by horizontal whitespace (e.g. a file that begins with
" [Event ...]").

Strip leading whitespace before the `startswith` guard and the regex
match so that indented tag pairs are recognised correctly.
@niklasf

niklasf commented Jun 25, 2026

Copy link
Copy Markdown
Owner

Looks good. Can you please add the reproducer as a test?

@gaoflow

gaoflow commented Jun 26, 2026

Copy link
Copy Markdown
Author

Added in b133b5f6.

The new test keeps the reproducer's important shape: the first PGN tag pair has leading whitespace, and the White header contains Patricia 3.1_dev_ca7ef0a3, which previously could leak into movetext parsing. It asserts the headers, first move, and that no parser errors were collected.

Local verification:

python -m unittest test.PgnTestCase.test_read_game_with_leading_whitespace_before_header
python -m unittest test.PgnTestCase
python -m unittest test
git diff --check

The full test run passed with 292 tests, 22 skipped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants