fix: infer surviving dims in to_dataset for aggregations (closes #189)#218
Merged
Merged
Conversation
…ystems#189) to_dataset() inferred dims only when a registered Dataset's entire dim set appeared in the result, so a GROUP BY that aggregates dims away (e.g. SELECT "time", AVG("air") ... GROUP BY "time" over a time/lat/lon grid) raised "dims cannot be inferred" instead of using the surviving group-by dim. _infer_dimension_columns now uses the registered dims that survive into the result columns, in the data-variable axis order, so such aggregations round-trip on the remaining dim(s). Several registered Datasets that imply different surviving dims still raise; a global aggregation with no surviving dim raises a clearer message. Repurposes the now-inferable test to a global-aggregation case and adds an auto-inference regression test.
alxmrs
reviewed
Jul 2, 2026
alxmrs
left a comment
Collaborator
There was a problem hiding this comment.
Some notes, overall it's looking great. Happy to have this feature.
|
|
||
|
|
||
| def test_aggregation_infers_dims(air_dataset_small): | ||
| """#189: to_dataset() infers the surviving GROUP BY dims without dims=.""" |
Collaborator
There was a problem hiding this comment.
Please add a Claude.md or Agents.md file to the project in this PR, and add a summary of common points of feedback I've given to you in our reviews.
For example, we shouldn't include GH issue numbers in docstrings (documentation). These should be self contained and not rely on access to the issue tracker.
Drop now-redundant explicit dims= from single-registration aggregation tests (inference resolves them), make the inference test self-contained and deterministic via ORDER BY, and remove issue-number and conversation references from test docstrings and comments. Add AGENTS.md summarizing the maintainer's review conventions: self-contained docs, private helpers, public-contract tests, ORDER BY in tests, and conventional commits.
Collaborator
|
One more omission in my review! Let's update the README and other docs given this change. |
Document that to_dataset() infers dimensions from the registered table's surviving dims (README and examples): a GROUP BY on a real dimension needs no dims=, while a derived column like month is still named explicitly. Drop the now-redundant dims= from the era5 example. Refine AGENTS.md per maintainer feedback: drop the too-specific entry-point rule, split imports into its own section noting transitive dependencies are safe to import non-locally, and remove the non-essential commit-prefix rule.
Shorten the climatology comment in the README quickstart; the fuller explanation of dimension inference lives in docs/examples.md.
alxmrs
approved these changes
Jul 4, 2026
alxmrs
left a comment
Collaborator
There was a problem hiding this comment.
LGTM! You may merge when ready.
Contributor
Author
|
Yupp.. I also feel this is good now.. and mergeable. But @alxmrs, I dont have merge access. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
to_dataset()inferred dims only when a registered Dataset's entire dim set was present in the result, so aGROUP BYthat aggregates dims away (the reporter'sSELECT "time", AVG("air") ... GROUP BY "time") raiseddims cannot be inferredinstead of using the surviving group-by dim. Inference now uses the registered dims that survive into the result columns (in data-variable axis order), so such aggregations round-trip on the remaining dim(s). Closes #189.Also: when several registered Datasets imply the same surviving dims, inference returns them rather than raising "ambiguous" (only genuinely differing dims still raise).
Before / after
Before (main),
.to_dataset()on the group-by result:After (this PR):
Full non-integration test suite passes locally.