Skip to content

fix: infer surviving dims in to_dataset for aggregations (closes #189)#218

Merged
alxmrs merged 4 commits into
xqlsystems:mainfrom
ghostiee-11:fix/189-infer-dims-groupby
Jul 4, 2026
Merged

fix: infer surviving dims in to_dataset for aggregations (closes #189)#218
alxmrs merged 4 commits into
xqlsystems:mainfrom
ghostiee-11:fix/189-infer-dims-groupby

Conversation

@ghostiee-11

Copy link
Copy Markdown
Contributor

to_dataset() inferred dims only when a registered Dataset's entire dim set was present in the result, so a GROUP BY that aggregates dims away (the reporter's SELECT "time", AVG("air") ... GROUP BY "time") raised dims cannot be inferred instead of using the surviving group-by dim. Inference now uses the registered dims that survive into the result columns (in data-variable axis order), so such aggregations round-trip on the remaining dim(s). Closes #189.

Also: when several registered Datasets imply the same surviving dims, inference returns them rather than raising "ambiguous" (only genuinely differing dims still raise).

Before / after

Before (main), .to_dataset() on the group-by result:

ValueError: dims cannot be inferred: no registered Dataset has all of
its dims present in the result columns. Pass dims=[...] explicitly.

After (this PR):

<xarray.Dataset>
Dimensions:  (time: 4)
Coordinates:
  * time     (time) int64 0 1 2 3
Data variables:
    air      (time) float64 ...

Full non-integration test suite passes locally.

…ystems#189)

to_dataset() inferred dims only when a registered Dataset's entire dim set
appeared in the result, so a GROUP BY that aggregates dims away (e.g.
SELECT "time", AVG("air") ... GROUP BY "time" over a time/lat/lon grid)
raised "dims cannot be inferred" instead of using the surviving group-by dim.

_infer_dimension_columns now uses the registered dims that survive into the
result columns, in the data-variable axis order, so such aggregations
round-trip on the remaining dim(s). Several registered Datasets that imply
different surviving dims still raise; a global aggregation with no surviving
dim raises a clearer message.

Repurposes the now-inferable test to a global-aggregation case and adds an
auto-inference regression test.

@alxmrs alxmrs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes, overall it's looking great. Happy to have this feature.

Comment thread tests/test_ds.py Outdated


def test_aggregation_infers_dims(air_dataset_small):
"""#189: to_dataset() infers the surviving GROUP BY dims without dims=."""

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a Claude.md or Agents.md file to the project in this PR, and add a summary of common points of feedback I've given to you in our reviews.

For example, we shouldn't include GH issue numbers in docstrings (documentation). These should be self contained and not rely on access to the issue tracker.

Comment thread tests/test_ds.py
Comment thread tests/test_ds.py Outdated
Comment thread tests/test_ds.py Outdated
Comment thread tests/test_ds.py
Drop now-redundant explicit dims= from single-registration aggregation tests
(inference resolves them), make the inference test self-contained and
deterministic via ORDER BY, and remove issue-number and conversation
references from test docstrings and comments.

Add AGENTS.md summarizing the maintainer's review conventions: self-contained
docs, private helpers, public-contract tests, ORDER BY in tests, and
conventional commits.

@alxmrs alxmrs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideas about the agent file. Otherwise, this LGTM! Thanks for the fix!

Comment thread AGENTS.md Outdated
Comment thread AGENTS.md Outdated
Comment thread AGENTS.md Outdated
@alxmrs

alxmrs commented Jul 4, 2026

Copy link
Copy Markdown
Collaborator

One more omission in my review! Let's update the README and other docs given this change.

Document that to_dataset() infers dimensions from the registered table's
surviving dims (README and examples): a GROUP BY on a real dimension needs no
dims=, while a derived column like month is still named explicitly. Drop the
now-redundant dims= from the era5 example.

Refine AGENTS.md per maintainer feedback: drop the too-specific entry-point
rule, split imports into its own section noting transitive dependencies are
safe to import non-locally, and remove the non-essential commit-prefix rule.

@alxmrs alxmrs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick note

Comment thread README.md Outdated
Shorten the climatology comment in the README quickstart; the fuller
explanation of dimension inference lives in docs/examples.md.

@alxmrs alxmrs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! You may merge when ready.

@ghostiee-11

Copy link
Copy Markdown
Contributor Author

Yupp.. I also feel this is good now.. and mergeable. But @alxmrs, I dont have merge access.

@alxmrs alxmrs merged commit 9609a7d into xqlsystems:main Jul 4, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Can dims be auto-inferred during to_dataset?

2 participants