Skip to content

gh-152052: Fix misleading json error for \uXXXX escape at end of input#152053

Merged
StanFromIreland merged 3 commits into
python:mainfrom
tonghuaroot:fix-json-uXXXX-eof-unterminated
Jun 26, 2026
Merged

gh-152052: Fix misleading json error for \uXXXX escape at end of input#152053
StanFromIreland merged 3 commits into
python:mainfrom
tonghuaroot:fix-json-uXXXX-eof-unterminated

Conversation

@tonghuaroot

@tonghuaroot tonghuaroot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

The C accelerator of json reported Invalid \uXXXX escape where the pure-Python decoder reported Unterminated string starting at, for a \uXXXX escape whose final hex digit is the last character of the input. The two decoders disagreed on the error class, and the C one pointed the user at a non-existent invalid escape rather than the missing closing quote.

Root cause in Modules/_json.c scanstring_unicode(): the bounds check used >= where > is correct. The four hex digits are read at indices end-4 .. end-1; when end == len those are all in bounds, so the escape is complete and valid and the string is merely unterminated. Changing >= to > lets control fall through to the existing Unterminated string starting at path, matching the pure-Python decoder (the two json error messages were synchronized in bpo-5067).

Tests cover both decoders: test_fail.py::test_truncated_input asserts the exact message/position under TestCFail and TestPyFail (including a high surrogate and a surrogate pair at EOF, and a nested non-zero offset), and test_scanstring.py::test_bad_escapes gains two no-regression cases (a genuinely truncated escape and four non-hex chars at EOF) confirming the change does not weaken rejection of real invalid escapes.

Fixes #152052

…f input

The C accelerator of json reported "Invalid \uXXXX escape" instead of
"Unterminated string starting at" for a complete, valid \uXXXX escape
whose last hex digit is the final character of the input, diverging from
the pure-Python decoder. The bounds check used `>=` where `>` is correct:
when end == len the four hex digits at indices len-4..len-1 are all in
bounds, so the escape is valid and the string is merely unterminated.
Comment thread Misc/NEWS.d/next/Library/2026-06-24-12-00-00.gh-issue-152052.yBssDE.rst Outdated

@StanFromIreland StanFromIreland left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@StanFromIreland StanFromIreland added needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes needs backport to 3.15 pre-release feature fixes, bugs and security fixes labels Jun 26, 2026
@StanFromIreland StanFromIreland merged commit 588be7a into python:main Jun 26, 2026
64 of 65 checks passed
@miss-islington-app

Copy link
Copy Markdown

Thanks @tonghuaroot for the PR, and @StanFromIreland for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13, 3.14, 3.15.
🐍🍒⛏🤖

@bedevere-app

bedevere-app Bot commented Jun 26, 2026

Copy link
Copy Markdown

GH-152283 is a backport of this pull request to the 3.15 branch.

@bedevere-app bedevere-app Bot removed the needs backport to 3.15 pre-release feature fixes, bugs and security fixes label Jun 26, 2026
@bedevere-app

bedevere-app Bot commented Jun 26, 2026

Copy link
Copy Markdown

GH-152284 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app Bot removed the needs backport to 3.14 bugs and security fixes label Jun 26, 2026
@bedevere-app

bedevere-app Bot commented Jun 26, 2026

Copy link
Copy Markdown

GH-152285 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app Bot removed the needs backport to 3.13 bugs and security fixes label Jun 26, 2026
StanFromIreland added a commit that referenced this pull request Jun 26, 2026
…the end of input (GH-152053) (#152285)

(cherry picked from commit 588be7a)

Co-authored-by: tonghuaroot (童话) <tonghuaroot@gmail.com>
Co-authored-by: Stan Ulbrych <stan@python.org>
StanFromIreland added a commit that referenced this pull request Jun 26, 2026
…the end of input (GH-152053) (#152283)

(cherry picked from commit 588be7a)

Co-authored-by: tonghuaroot (童话) <tonghuaroot@gmail.com>
Co-authored-by: Stan Ulbrych <stan@python.org>
StanFromIreland added a commit that referenced this pull request Jun 26, 2026
…the end of input (GH-152053) (#152284)

(cherry picked from commit 588be7a)

Co-authored-by: tonghuaroot (童话) <tonghuaroot@gmail.com>
Co-authored-by: Stan Ulbrych <stan@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

json: C decoder reports "Invalid \uXXXX escape" instead of "Unterminated string starting at" for a complete \uXXXX escape at end of input

2 participants