Skip to content

fix(encoding/json): type-drive Unmarshal into Map/Slice/Pointer containers#145

Merged
paralin merged 3 commits into
s4wave:devfrom
mck:fix/json-unmarshal-typed-containers
Jul 3, 2026
Merged

fix(encoding/json): type-drive Unmarshal into Map/Slice/Pointer containers#145
paralin merged 3 commits into
s4wave:devfrom
mck:fix/json-unmarshal-typed-containers

Conversation

@mck

@mck mck commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Part of #142 (items 3, 4 and 6).

Touches the same file as #144 but a different area (Unmarshal's decode path vs Marshal's encode path). gs/encoding/json/index.ts itself merges cleanly between the two branches; only the shared test file (index.test.ts) has a textual conflict, since both PRs happen to insert new test fixture classes right after the existing FieldAlias class, so whichever merges second needs a trivial resolution there (keep both class blocks). No logic overlap. #146 (item 5, anonymous struct json tags) is stacked directly on this branch, since it extends the same goTypeDescriptor/decodeValueForType this PR introduces, so please merge this one first.

Unmarshal only recognized a struct field/target as needing special handling when its runtime value was already an instance of the right shape (an existing struct instance, a Uint8Array, etc). Everything else fell through to a generic "objects become Maps" decode with no awareness of the field's declared Go element type. Three related gaps, all fixed by the same missing piece: decoding needs to walk the field/element's TypeInfo, not just the JSON shape it happens to see.

Gap 1: map[string]Struct and map[string][]Struct fields (and further nesting) left plain JS objects/arrays in place instead of real struct instances, so generated struct-field accessors read back undefined.

type Ref struct{ Name string `json:"name"` }
type Container struct {
    Contexts map[string][]Ref `json:"contexts"`
}
json.Unmarshal([]byte(`{"contexts":{"a":[{"name":"x"}]}}`), &c)
// c.Contexts["a"][0].Name read back "": plain object, not a Ref

Gap 2: an interface{}-typed array of objects left its element objects as raw plain JS objects instead of this override's Map representation, so a case map[string]any: type switch on an element panicked calling Map-only methods like .entries().

var v any
json.Unmarshal([]byte(`[{"x":1},{"y":2}]`), &v)
// v.([]any)[0] was a plain object, not the Map every other
// interface{}-typed object value decodes to

Gap 3: json.RawMessage wasn't recognized as a map/slice element type (only as a direct field type), and there was no Pointer branch in the type-driven path at all, so pointer elements and fields (map[string]*Property, Items *ItemsSpec) leaked plain objects with undefined Go field reads.

The fix is a recursive decodeValueForType(decoded, typeInfo, opts) that decodes a raw JSON value per its Go TypeInfo: Struct constructs a real instance via the registered ctor, Slice/Map recurse per elemType, Pointer decodes as its pointee, json.RawMessage captures the raw fragment as bytes, and anything else (including interface{}) falls back to a new decodeInterfaceValue, which replaces objectToMap and additionally recurses into array elements, fixing gap 2 for both nested and top-level arrays. It is wired in at both call sites: assignDecodedFieldValue (struct fields) and assignDecodedValue's new top-level branch, which resolves a var's own type via the __goType spelling the runtime boxes onto Unmarshal's any target, needed for e.g. var m map[string]json.RawMessage where there is no struct field to hang type info off of.

A pointer-to-struct pointee is constructed directly in the Pointer branch (not by recursing into the Struct branch, which always marks its result via $.markAsStructValue): the runtime's pointer type matching (matchesPointerType in gs/builtin/type.ts) requires struct pointees to stay unmarked, so both assignDecodedFieldValue's field routing and decodeValueForType's own Pointer branch share this construction path. goTypeDescriptor also preserves a nested pointer element type (e.g. the *test.Property in map[string]*test.Property) instead of stripping every leading * while parsing; only the outermost address-of star (the Unmarshal target's own &var) is stripped, by unmarshalTargetGoType, before this function ever runs.

Tests extend gs/encoding/json/index.test.ts with one case per gap: map[string][]Struct field, interface{} array-of-objects, pointer-to-struct map value and field, RawMessage as both a map and slice element, and two cases verifying decoded pointer values match *Struct (not Struct by value) via the public $.is() API. All new tests fail on master (or the prior commit, for the pointer-marking cases) and pass with this fix, verified by temporarily reverting just the source change and confirming they fail with the errors described above.

Verify:

bun run vitest run gs/encoding/json/index.test.ts

24 passed. Full suite: bun run typecheck clean, bun run test:js 1277 passed / 4 skipped / 0 failed, go test ./... all packages green, bun run lint:js clean.

Found while transpiling and running a real ~100k-LOC Go interpreter/parser package through goscript.

…iners

Unmarshal only special-cased struct fields/targets whose runtime value was
already an instance of the right shape; anything else fell through to a
generic "objects become Maps" decode with no awareness of Go element
types. Three related gaps, all traced to the same missing piece — the
decoder needs to walk a field/element's declared TypeInfo, not just the
JSON shape it sees:

1. map[string]Struct / map[string][]Struct fields (and further nesting,
   e.g. map[string][]Struct) left plain JS objects/arrays in place instead
   of real struct instances, so generated struct-field accessors read back
   undefined:

       type Ref struct{ Name string `json:"name"` }
       type Container struct {
           Contexts map[string][]Ref `json:"contexts"`
       }
       json.Unmarshal([]byte(`{"contexts":{"a":[{"name":"x"}]}}`), &c)
       // c.Contexts["a"][0].Name read back "" (plain object, not a Ref)

2. An interface{}-typed array of objects left its element objects as raw
   plain JS objects instead of this override's Map representation, so a
   `case map[string]any:` type switch on an element panicked calling
   Map-only methods like .entries():

       var v any
       json.Unmarshal([]byte(`[{"x":1},{"y":2}]`), &v)
       // v.([]any)[0] was a plain object, not the Map every other
       // interface{}-typed object value decodes to

3. json.RawMessage wasn't recognized as a map/slice ELEMENT type (only as
   a direct field type), and there was no Pointer branch in the
   type-driven path at all, so pointer elements/fields
   (map[string]*Property, Items *ItemsSpec) leaked plain objects with
   undefined Go field reads.

Fix: a recursive decodeValueForType(decoded, typeInfo, opts) that decodes
a raw JSON value per its Go TypeInfo — Struct constructs a real instance
via the registered ctor, Slice/Map recurse per elemType, Pointer decodes
as its pointee, RawMessage captures the raw fragment as bytes, and
anything else (including interface{}) falls back to a new
decodeInterfaceValue that replaces objectToMap and additionally recurses
into array elements (fixing case 2 for nested and top-level arrays alike).
Wired in at both call sites: assignDecodedFieldValue (struct fields) and
assignDecodedValue's new top-level branch, which resolves a var's own
type via the __goType spelling the runtime boxes onto Unmarshal's `any`
target — needed for e.g. `var m map[string]json.RawMessage` where there
is no struct field to hang type info off of.

Extends gs/encoding/json/index.test.ts with one case per bug (map[string]
[]Struct field, interface{} array-of-objects, pointer-to-struct map value
+ field, and RawMessage as both a map and slice element). All four fail
on master and pass with this fix; verified by temporarily reverting just
the source change and confirming the new tests fail with the same errors
described above.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Marco Christian Krenn <marco.krenn@gmail.com>
Copilot AI review requested due to automatic review settings July 3, 2026 05:02

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the gs/encoding/json override’s Unmarshal decode path by decoding JSON values according to Go TypeInfo (including nested map/slice element types), instead of relying purely on the runtime JSON shape. It targets correctness for nested map[string]T / []T containers, pointer-to-struct elements/fields, interface{} arrays-of-objects, and json.RawMessage as an element type.

Changes:

  • Add a recursive, type-driven decoder (decodeValueForType) and use it for struct fields and top-level map/slice targets when Go type spellings are available via __goType.
  • Replace the previous object-only recursion with decodeInterfaceValue, which also recurses into arrays so objects inside []any decode into Maps consistently.
  • Extend index.test.ts with new fixtures and test cases covering the reported nested map/slice, interface-array, pointer, and RawMessage scenarios.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
gs/encoding/json/index.ts Adds type-driven decoding helpers and wires them into Unmarshal’s assignment paths.
gs/encoding/json/index.test.ts Adds regression tests for nested typed decode (map/slice), interface arrays, pointers, and RawMessage elements.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread gs/encoding/json/index.ts
Comment on lines +1113 to +1118
if ($.isPointerTypeInfo(info) && isPlainObject(decoded)) {
const pointee = resolveTypeInfo(info.elemType)
if (pointee !== null && $.isStructTypeInfo(pointee)) {
target.value = decodeValueForType(decoded, info.elemType, opts)
return
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, this was a real bug. Fixed in 1cc7a96: the field-routing path now passes the Pointer TypeInfo itself to decodeValueForType (not elemType), so it goes through the Pointer branch's now-correct unmarked construction instead of bypassing it.

Comment thread gs/encoding/json/index.ts
Comment on lines +1176 to +1178
if ($.isPointerTypeInfo(info)) {
return decodeValueForType(decoded, info.elemType, opts)
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1cc7a96: the Pointer branch now constructs struct pointees directly instead of recursing into the Struct branch, so the result stays unmarked. Added a test verifying decoded pointer values match *Struct (not Struct) via $.is().

Comment thread gs/encoding/json/index.ts Outdated
Comment on lines +1261 to +1264
let s = spelling.trim()
while (s.startsWith('*')) {
s = s.slice(1).trim()
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed in 1cc7a96: goTypeDescriptor now returns a Pointer descriptor for a leading '*' instead of stripping it, so nested pointer element types (e.g. the *test.Property in map[string]*test.Property) are preserved. Added a regression test for this exact spelling.

Addresses review feedback on s4wave#145: pointer-to-struct fields/elements
decoded to instances marked via markAsStructValue, but the runtime's
pointer type matching (matchesPointerType in gs/builtin/type.ts) requires
a struct pointee to stay UNMARKED, since marking is reserved for genuine
non-pointer struct values. A decoded *Struct value therefore matched a
Struct-by-value type assertion and failed a *Struct assertion, the
inverse of correct Go semantics, which would break type switches and
pointer-receiver method-set checks on JSON-decoded values.

Three related spots, all part of the same gap:

- decodeValueForType's Pointer branch recursed into its elemType and let
  the generic Struct branch mark the result. Now it constructs the
  pointee directly (mirroring the Struct branch's construction, minus the
  mark) for struct pointees, and only recurses for non-struct pointees
  (which carry no marking).
- assignDecodedFieldValue's pointer-to-struct field routing called
  decodeValueForType with the field's elemType, bypassing the Pointer
  branch (and its now-correct handling) entirely. Now it passes the
  Pointer TypeInfo itself, so both call sites share one code path.
- goTypeDescriptor unconditionally stripped every leading '*' while
  recursively parsing a __goType spelling, so a nested pointer element
  type (e.g. the *test.Property in map[string]*test.Property) lost its
  pointer-ness before a TypeInfo was ever built for it. Now a leading '*'
  produces a Pointer descriptor instead of being discarded; only the
  outermost address-of star (the Unmarshal target's own &var) is
  stripped, by unmarshalTargetGoType, before this function ever runs.

Extends gs/encoding/json/index.test.ts with two cases verifying the
decoded value matches *Struct and not Struct via the public $.is() API
(one for inline struct-field TypeInfo, one for a __goType spelling with a
nested pointer element). Both fail on the prior commit and pass with this
fix, verified by temporarily reverting just the source change.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Marco Christian Krenn <marco.krenn@gmail.com>
paralin
paralin previously requested changes Jul 3, 2026

@paralin paralin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes: the new pointer-to-struct decode path bypasses UnmarshalJSON for pointer fields.

At gs/encoding/json/index.ts:1116, assignDecodedFieldValue returns through the new pointer-to-struct branch before it can call assignDecodedValue. That skips the existing unmarshalJSONTarget(target.value) hook for a non-nil *T field whose pointee implements UnmarshalJSON. The result is field population into a fresh T instead of invoking the custom decoder that Go’s encoding/json would prioritize.

Focused repro on this head:

expected hook:{"value":"decoded"}
received decoded

Fix direction: check unmarshalJSONTarget(target.value) before the pointer-to-struct early return. For nil pointer fields where the pointee type implements UnmarshalJSON, allocate the pointee and invoke the hook on that new instance instead of directly field-populating it.

paralin's review on s4wave#145 flagged that the pointer-to-struct decode path
added in this branch bypasses UnmarshalJSON: assignDecodedFieldValue
routed straight into decodeValueForType and returned, so a *T field
whose T implements UnmarshalJSON got field-by-field populated into a
fresh instance instead of decoded through the custom hook Go's
encoding/json would always prefer.

Two spots needed the hook check, matching the fix direction from the
review:

- assignDecodedFieldValue now checks unmarshalJSONTarget(target.value)
  before routing into the type-driven constructor. A field that already
  holds a non-nil pointer whose pointee implements UnmarshalJSON falls
  through to assignDecodedValue, which invokes the hook on that existing
  instance in place (matching Go, which decodes into the pointee rather
  than allocating a replacement).
- decodeValueForType's Pointer branch now checks the freshly allocated
  pointee for UnmarshalJSON before falling back to assignStructFields.
  This covers the nil-field case (nothing exists yet to check the hook
  against until after allocation) and also map/slice pointer-to-struct
  elements, which go through this same branch.

Adds two regression tests to gs/encoding/json/index.test.ts: a nil
pointer field decoding through the hook, and a non-nil pointer field
decoding into the existing pointee in place (checked via object
identity, not just field content). Both fail on the prior commit and
pass with this fix, verified by temporarily reverting just the source
change.

Verify: bunx vitest run gs/encoding/json/index.test.ts
Signed-off-by: Marco Christian Krenn <marco.krenn@gmail.com>
@mck

mck commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for catching that. Fixed in c18d36a.

The pointer-to-struct branch had two spots that needed the hook check, matching your fix direction. assignDecodedFieldValue now checks unmarshalJSONTarget(target.value) before routing into the type-driven constructor: a field that already holds a non-nil pointer whose pointee implements UnmarshalJSON falls through to assignDecodedValue, which invokes the hook on that existing instance in place, the same way Go decodes into the pointee rather than allocating a replacement. decodeValueForType's Pointer branch also checks the freshly allocated pointee for UnmarshalJSON before falling back to assignStructFields, which covers the nil-field case (nothing exists to check the hook against until after allocation) and map/slice pointer-to-struct elements, which share this same branch.

Added two regression tests: a nil pointer field decoding through the hook, and a non-nil pointer field decoding into the existing pointee in place, checked via object identity so a fix that just re-hooks a fresh replacement instance would not pass it. Both fail on the parent commit and pass here, verified by temporarily reverting just the source change.

Verify: bunx vitest run gs/encoding/json/index.test.ts

mck added a commit to mck/goscript that referenced this pull request Jul 3, 2026
…gets

Part of s4wave#142 (item 5). Stacked on s4wave#145 (fix/json-unmarshal-typed-containers):
both need the __goType-spelling parser (goTypeDescriptor) and the
type-driven decoder (decodeValueForType) that PR introduces, so this PR
extends them rather than duplicating them. Please merge s4wave#145 first.

Unmarshal only knew how to populate a target carrying struct-field
runtime metadata (a registered, named struct type). An anonymous (inline,
unnamed) struct target carries no such metadata:

    var loose struct {
        Name string `json:"name"`
        Age  int    `json:"age"`
    }
    json.Unmarshal([]byte(`{"name":"Ada","age":30}`), &loose)
    // loose.Name and loose.Age read back "" and 0 -- every field empty

so it fell through to the untyped interface{} decode, which replaces the
plain-object struct value with this override's Map representation
entirely, dropping every field.

Fix: an anonymous struct target still carries its Go type spelling
(including field names, types, and json tags) via __goType, e.g.
`*struct{Name string "json:\"name\""; Age int "json:\"age\""}`. Parses it
into the same fieldMetadata shape a registered struct's fields use
(parseAnonymousStructFields, splitting on the tag's Go-quoted string
literal so a ';' inside a tag can't split a field early), then decodes
each field through the existing per-field path
(assignDecodedFieldValue/assignAnonymousStructFields) -- tags, nested
slices/maps, and RawMessage fields included. Wired into both
assignDecodedValue (a top-level anonymous-struct var) and
decodeValueForType (an anonymous struct used as a map/slice element
type, e.g. `map[string]struct{...}`).

Extends gs/encoding/json/index.test.ts with two cases: a top-level
anonymous struct target, and one used as a map value type. Both fail on
the parent commit (empty fields / a Map instead of the plain object) and
pass with this fix.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Marco Christian Krenn <marco.krenn@gmail.com>
@paralin paralin dismissed their stale review July 3, 2026 20:27

RE review

@paralin paralin self-assigned this Jul 3, 2026
@paralin paralin self-requested a review July 3, 2026 20:28

@paralin paralin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No findings on the current head.

I re-checked the prior pointer-to-struct UnmarshalJSON issue. The current code checks the existing field value's hook before routing into the type-driven pointer constructor, and the Pointer branch checks the freshly allocated pointee before default struct population. That covers both non-nil and nil pointer field cases from the prior review.

Local focused checks on current head:

bun run vitest run gs/encoding/json/index.test.ts
# 26 passed

bun run typecheck
# passed

CI metadata also shows CodeQL, DCO, dependency review, and Tests green; browser-smoke is skipped.

@paralin paralin changed the base branch from master to dev July 3, 2026 22:34
@paralin paralin merged commit 8907b2f into s4wave:dev Jul 3, 2026
6 checks passed
@paralin

paralin commented Jul 4, 2026

Copy link
Copy Markdown
Member

Merged through to master in c3933e59b4db361036c050c229e4d7580bb26204.

The final dev/master commit includes the reviewed PR changes plus the follow-up review fixes:

  • omitempty now decides pointer/interface emptiness from the wrapper itself before any VarRef unwrap, so non-nil pointers/interfaces are emitted even when their pointed or dynamic value is zero/nil.
  • Anonymous struct decode now skips unexported fields even when they carry json tags, and DisallowUnknownFields rejects keys that only match those skipped fields.

Verified locally after applying the changes:

bun run vitest run gs/encoding/json/index.test.ts
# 41 passed

bun run typecheck
# passed

bun run test
# go test ./... passed
# vitest: 156 files, 1322 passed, 4 skipped

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants