gh-152052: Fix misleading json error for \uXXXX escape at end of input#152053
Merged
StanFromIreland merged 3 commits intoJun 26, 2026
Merged
Conversation
…f input The C accelerator of json reported "Invalid \uXXXX escape" instead of "Unterminated string starting at" for a complete, valid \uXXXX escape whose last hex digit is the final character of the input, diverging from the pure-Python decoder. The bounds check used `>=` where `>` is correct: when end == len the four hex digits at indices len-4..len-1 are all in bounds, so the escape is valid and the string is merely unterminated.
|
Thanks @tonghuaroot for the PR, and @StanFromIreland for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13, 3.14, 3.15. |
|
GH-152283 is a backport of this pull request to the 3.15 branch. |
|
GH-152284 is a backport of this pull request to the 3.14 branch. |
|
GH-152285 is a backport of this pull request to the 3.13 branch. |
StanFromIreland
added a commit
that referenced
this pull request
Jun 26, 2026
StanFromIreland
added a commit
that referenced
this pull request
Jun 26, 2026
StanFromIreland
added a commit
that referenced
this pull request
Jun 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The C accelerator of
jsonreportedInvalid \uXXXX escapewhere the pure-Python decoder reportedUnterminated string starting at, for a\uXXXXescape whose final hex digit is the last character of the input. The two decoders disagreed on the error class, and the C one pointed the user at a non-existent invalid escape rather than the missing closing quote.Root cause in
Modules/_json.cscanstring_unicode(): the bounds check used>=where>is correct. The four hex digits are read at indicesend-4 .. end-1; whenend == lenthose are all in bounds, so the escape is complete and valid and the string is merely unterminated. Changing>=to>lets control fall through to the existingUnterminated string starting atpath, matching the pure-Python decoder (the twojsonerror messages were synchronized in bpo-5067).Tests cover both decoders:
test_fail.py::test_truncated_inputasserts the exact message/position underTestCFailandTestPyFail(including a high surrogate and a surrogate pair at EOF, and a nested non-zero offset), andtest_scanstring.py::test_bad_escapesgains two no-regression cases (a genuinely truncated escape and four non-hex chars at EOF) confirming the change does not weaken rejection of real invalid escapes.Fixes #152052