Stop a short-row PAM RAT from crashing open_geotiff#3522
Conversation
read_pam_sidecar() is contracted to never raise on a bad sidecar.
open_geotiff() reads the PAM .aux.xml for any local string source, and
a malformed one should degrade to {} rather than break the read.
_parse_rat() indexes RAT cells by column index (fields[name_col] and
friends). A <Row> with fewer <F> cells than the highest column index
raises IndexError, which the read_pam_sidecar() except clause did not
cover (it caught OSError, ValueError, TypeError). The IndexError escaped
and crashed open_geotiff for any TIFF carrying such a sidecar.
Add IndexError to the caught exceptions so a short-row RAT falls back to
{} like every other malformed case.
Closes #3520
The IndexError fix left a sibling gap: safe_fromstring() raises
xml.etree.ElementTree.ParseError on any non-well-formed sidecar, and
ParseError subclasses SyntaxError, so it was not covered by the
(OSError, ValueError, TypeError, IndexError) tuple. A truncated or
corrupt .aux.xml therefore still escaped read_pam_sidecar and crashed
the open_geotiff call -- the same contract violation, reached through a
different exception and far easier to trigger than a hand-crafted short
RAT row.
Add ParseError to the caught exceptions (it is the same class for both
the defusedxml and stdlib parse paths) so a non-well-formed sidecar
degrades to {} like every other malformed case. Adds
test_non_well_formed_xml_sidecar_returns_empty, which fails with
ParseError before the change.
Surfaced by the follow-up review on this PR.
|
review-pr: one blocker, now fixed. The original IndexError fix is correct but incomplete. read_pam_sidecar still let a parse error escape: safe_fromstring() raises xml.etree.ElementTree.ParseError on any non-well-formed sidecar, and ParseError subclasses SyntaxError, which was not in the caught tuple (OSError, ValueError, TypeError, IndexError). So a truncated or corrupt .aux.xml, which is more common than a crafted short-RAT row, still escaped and crashed open_geotiff: the same contract violation this PR set out to fix, reached through a different exception. defusedxml 0.7.1 raises the same ET.ParseError, so catching it covers both parse paths. Fix: added ParseError to the caught tuple plus a regression test (truncated XML returns empty). TestPamHelpers now 7 passed, flake8 clean. The test fails with ParseError before the fix and passes after. Non-blocking: whole-sidecar discard vs per-row skip is consistent with existing behavior, left as-is. |
read_pam_sidecar()is supposed to never raise on a bad sidecar.open_geotiff()reads the PAM.aux.xmlfor any local string source, and a malformed one should degrade to{}rather than break the read._parse_rat()indexes RAT cells by column index (fields[name_col]and friends). A<Row>with fewer<F>cells than the highest column index raisesIndexError, which theread_pam_sidecar()except clause didn't cover (it caughtOSError,ValueError,TypeError). TheIndexErrorescaped and crashedopen_geotifffor any TIFF carrying such a sidecar.This adds
IndexErrorto the caught exceptions so a short-row RAT falls back to{}like every other malformed case.Test:
test_short_row_thematic_rat_returns_emptybuilds a thematic RAT with the Name column at index 1 and a single-cell row, then assertsread_pam_sidecarreturns{}. It fails withIndexErrorbefore the fix and passes after.Closes #3520.