pybullet_domino open-loop oracle (5/5) + CI greening#39
Open
yichao-liang wants to merge 257 commits into
Open
Conversation
Extract repeated wait-termination check into _check_wait_termination helper and unify the three _terminal branches into a single definition with config checks inside the function body.
- Remove dead/commented-out code and stale self-question comments - Add _VIRTUAL_OBJECT_TYPES constant to replace hardcoded type-name skip lists in _set_state and _get_state - Move env-specific _get_robot_state_dict branches to subclass overrides in pybullet_cover and pybullet_blocks - Extract _get_camera_matrices helper to deduplicate render methods - Extract _get_object_state_dict from _get_state for per-object logic - Move create_pybullet_block/sphere to pybullet_helpers/objects.py - Merge _create_task_specific_objects into _set_domain_specific_state - Rename: _reset_state -> _set_state, _reset_custom_env_state -> _set_domain_specific_state, _extract_feature -> _get_domain_specific_feature - Add docstrings explaining where each method is called from
Reorganize methods into labeled sections (Setup, Public API, Core Loop, State Write/Read, Grasp Management, Action Helpers, Rendering, Utilities) so related functions are adjacent. Update module docstring to document the main public API and state synchronization methods.
Add _step_base() and _domain_specific_step() to PyBulletEnv base class. step() now calls _step_base (robot control, physics, grasp) then _domain_specific_step (water filling, heating, etc.), gated by _skip_domain_specific_dynamics flag for kinematics-only mode. Migrate all 15 domain envs to override _domain_specific_step() instead of step(). Envs with pre-step logic (coffee, switch, blocks, cover) still override step() for the pre-step part only.
Document the step_base → domain_specific_step → get_observation flow, _skip_domain_specific_dynamics flag, and _domain_specific_step as an optional override.
Replace direct access to private _skip_domain_specific_dynamics attribute with a public constructor parameter, so callers declare kinematics-only mode at creation time instead of mutating internal state after construction.
…ging Both AgentSessionMixin and AgentExplorer had near-identical wrappers that ran session.query() synchronously via nest_asyncio or asyncio.run. Move that logic into a module-level run_query_sync helper in session_manager and have both callers delegate to it.
…y and maintainability
Distinguishes the grounded-plan explorer from upcoming bilevel variants. AgentExplorer -> AgentPlanExplorer, get_name() 'agent' -> 'agent_plan', file moved to agent_plan_explorer.py, and all callers / docstrings / YAML config examples updated accordingly.
The mixin is pure agent-session plumbing (session creation, lifecycle, explorer factory) and has no approach-specific logic, so it belongs next to session_manager.py, tools.py, and the sandbox managers rather than in approaches/.
The explorer asks a Claude agent for a plan sketch, refines it against the approach's current (possibly learned) option model, and rolls the refined plan out in the real env. When the mental model disagrees with reality — e.g. the sketch expects JugFilled after a Wait but the mental model's process dynamics can't produce it — the explorer truncates the plan at the deepest unsatisfiable subgoal (inclusive) so the real-env rollout ends exactly where the disagreement occurs, maximising signal per experiment. Key pieces: - predicators/agent_sdk/bilevel_sketch.py: extracted the sketch build / parse / refine helpers from AgentBilevelApproach as module-level functions so both the approach (solve path) and the new explorer (exploration path) can share them. refine_sketch gains truncate_on_subgoal_fail: the on_step_fail callback snapshots the deepest subgoal failure seen during backtracking, and on exhaustion the captured prefix is returned as the experiment plan. - predicators/explorers/agent_bilevel_explorer.py: new explorer. Reads option_model from tool_context (synced by the approach), builds the sketch prompt via bilevel_sketch, runs refine_sketch with check_subgoals=True, check_final_goal=False, truncate_on_subgoal_fail =True, wraps the result in an option_plan_to_policy that converts OptionExecutionFailure into RequestActPolicyFailure so the episode cleanly terminates at the point of real-env divergence. Stashes the sketch subgoals/options on ToolContext for downstream diffing by the learning approach. - predicators/approaches/agent_bilevel_approach.py: shim methods over bilevel_sketch; behaviour unchanged. - predicators/approaches/agent_planner_approach.py: _create_explorer dispatches both "agent_plan" and "agent_bilevel" through the agent factory path and forwards CFG.explorer as the name. - predicators/explorers/__init__.py: factory branch merged for the two agent-session-backed explorers. - predicators/agent_sdk/tools.py: ToolContext gains last_sketch_subgoals / last_sketch_options fields, populated by the explorer and marked TODO for the learning approach to consume. - tests/explorers/test_agent_bilevel_explorer.py: happy-path, fallback, wait-memory-injection, and deepest-subgoal-failure truncation tests.
- New setting agent_bilevel_explorer_max_samples_per_step (default 50), separate from the solve-path budget, so the explorer's backtracking cost is independently tunable. - Log the actual experiment plan (option names, objects, params) after refinement so the explorer's output is visible alongside the existing sketch/truncation log lines. - Test config updated to set both budgets explicitly.
AgentSimLearningApproach extends AgentBilevelApproach to learn process dynamics online. Each cycle: the agent synthesizes parameterized process rules via Claude (using run_python / evaluate_simulator / test_simulator MCP tools), parameters are fitted via emcee MCMC, and the learned dynamics are composed with a kinematics-only PyBullet oracle into a combined option model for plan refinement. Key pieces: - predicators/approaches/agent_sim_learning_approach.py: the approach. Initialises with a kinematics-only option model (so AgentBilevelExplorer sees disagreements at process-dynamic subgoals like JugFilled/Boiled), and replaces it with the kin+learned model after each successful synthesis cycle. - predicators/agent_sdk/tools.py: create_synthesis_tools() builds the three MCP tools the synthesis agent uses; extra_mcp_tools field and get_allowed_tool_list(extra_names=) plumbing lets the approach inject them into the session. - predicators/code_sim_learning/: ParamSpec, fit_params (emcee MCMC), compute_mse, LearnedSimulator. - predicators/ground_truth_models/boil/gt_simulator.py: ground-truth process-dynamics simulator for the boil environment. - tests/: approach and param-fitting tests.
- agents.yaml: comment out agent_bilevel preset, add agent_sim_learning with explorer=agent_bilevel and skip_test_until_last_ite_or_early_stopping. - common.yaml: disable failure/test video recording, set num_online_learning_cycles=1 for faster iteration.
Simulation primitives (code_sim_learning/utils.py): - apply_rules(state, rules, params) → ProcessUpdate - merge_updates(base_state, updates, process_features) → State - simulate_step(state, action, base_env, rules, params, features) → State These replace _build_fitted_step_fn, merge_process_updates, _sim_fn_from_rules, and the body of _build_combined_simulator. GT simulator factory (ground_truth_models): - GroundTruthSimulatorFactory ABC + get_gt_simulator(env_name) discovery, following the existing get_gt_options / get_gt_nsrts pattern. - PyBulletBoilGroundTruthSimulatorFactory registered in boil/. - Replaces the hardcoded _load_oracle_simulator in the approach. Oracle ablation flags (settings.py): - agent_sim_learn_oracle_sim_program: load GT rules, skip synthesis. - agent_sim_learn_oracle_sim_params: use GT param values, skip MCMC. Also: kin_env → base_env rename throughout, redundant self._types assignment removed, process_features computed once in __init__.
- yapf + isort autoformatting applied to all touched files. - pylint: fix logging-not-lazy in agent_bilevel_explorer, add broad-except and reimported disables in agent_sim_learning_approach. - mypy: fix base/env variable name collision, add type: ignore on lambda inference, add return type annotations to GT factory methods.
Use utils.abstract to evaluate expected atoms in low-level search so that DerivedPredicates — which require a Set[GroundAtom] rather than a State — are handled correctly alongside regular predicates.
When sequential simulate calls differ only in process features (as in the combined kinematic+learned simulator), reapplying joint positions and tearing down/recreating grasp constraints causes visible arm jitter. Compare robot poses first and skip the kinematic reset path when they already match.
Factor simulator synthesis into a shared _learn_simulator helper so that both learn_from_offline_dataset and learn_from_interaction_results can trigger it on their respective trajectory sources. Also create a separate headless env for parameter fitting so MCMC's thousands of _set_state calls don't thrash the GUI env during training.
Replace the silent run_mcmc call with a manual sample loop that logs step count and best log-probability roughly five times per run, and flushes handlers so the updates appear promptly under buffered logging.
Type-annotate **kwargs on PyBullet env __init__ overrides so mypy doesn't flag them. Initialize attrs used by _domain_specific_step in __init__ (pybullet_coffee, pybullet_switch) to silence defined-outside-init. Type-ignore the emcee import. Fix encoding, unused, protected-access, and redefined-outer-name warnings in the sim-learning tests and agent-SDK tooling.
When a held object's grasp constraint is recreated via _set_state, the gripper frame must match the original world pose exactly — otherwise the recorded base_link->object offset is rotated and the object lands at the wrong world position when the gripper next moves. The State representation only carries (x, y, z, tilt, wrist), so IK during reset can pick a different wrist-roll solution and corrupt the constraint. Thread joint_positions from PyBulletState.simulator_state through reset_state so we skip IK and restore the exact arm configuration. Falls back to IK when joints aren't available (plain State). Also wire wait-termination so refinement and execution can stop Wait when expected atoms hold instead of running to max_num_steps_option_rollout: set _abstract_function on the option model in BilevelPlanningApproach (mirrors AgentPlannerApproach), pass abstract_function into option_plan_to_policy in BilevelProcessPlanningApproach, and inject wait_target_atoms per sample in run_low_level_search.
After resetJointState, PyBullet's getLinkState returns a stale link pose from the previous FK cache, producing 50-500μm drift in the EE pose readback. Pass computeForwardKinematics=1 so world poses are recomputed from current joints on every call. Also skip the explicit finger reset in reset_state when joint_positions are provided: arm_joints already includes the finger joints, so set_joints has restored them to their exact continuous values, and the subsequent loop was overwriting them with the discrete-snapped value from _fingers_state_to_joint. The finger reset still runs on the IK path where set_joints leaves fingers untouched. Together these eliminate the "Could not reconstruct state exactly in reset" warning noise (24 -> 0 on the boil-oracle run).
…apping Delete agent_abstraction_learning_approach.py and agent_closed_loop_approach.py (no longer used; auto-discovery picks up the rest). Refactor the remaining agent approaches for readability: - Add AgentPlannerApproach._wrap_option_failures so the open-loop planner and bilevel _plan_to_policy share the OptionExecutionFailure -> ApproachFailure adapter. - Factor save/load onto the base via a _save_suffix attribute plus _extra_save_state / _load_extra_save_state hooks; AgentOptionLearning now only declares its suffix and extra options field. - Drop the redundant _agent_session_id assignments already handled by AgentSessionMixin._init_agent_session_state.
get_gt_simulator("pybullet_domino") previously raised
NotImplementedError because no GroundTruthSimulatorFactory was bound to
the env. Add a minimal no-op simulator: a single identity process rule,
one placeholder ParamSpec (the component loader rejects empty
rule/spec lists), and empty PROCESS_FEATURES. Register the factory in
the domino package __init__ so the registry can discover it.
When pybullet_ik_validate is disabled, a single unvalidated IK call can return joints whose EE pose matches numerically but whose carried object penetrates the table, so collision-aware BiRRT finds no path and Place looks infeasible. Retry once with validated IK (which iterates to a better Cartesian solution) before giving up, preserving the fast path for the common case. Also raise the domino drop Z from 0.5695 to 0.58: with the skill-factory Pick grasp transform the legacy height left the held domino penetrating the table at the collision-aware Place goal. Add an integration test covering the seed-0 second-bridge placement with ik_validate disabled.
…tep refinement Replace the residual-tie-break Place sampler with one that enumerates the exact placements the task generator could lay next to a reference (_generator_placements: straight / +-45-deg turn blocks in either chain direction, mirroring _place_straight_domino / _place_turn90_domino), scores each by subgoal atoms satisfied, and draws uniformly among the best-scoring ties. Randomizing lets backtracking that re-draws the step reach a turn when the lone subgoal is satisfied equally by straight and turn but a later step needs the bend. Add a future-target-bridge tie-break so the first placement is chosen to keep a purple-target completion reachable. Flag the constant Pick/Push samplers as deterministic and have backtracking refinement cap a deterministic step's retries at 1 -- re-drawing a constant sampler yields the identical option, so re-descending through it on every backtrack is wasted budget.
… domino excluded_predicates Set bilevel_plan_without_sim for the oracle_process_planning demonstrator in agents.yaml, and uncomment the domino excluded_predicates (InitialBlock,MovableBlock,Tilting,Upright) in envs/all.yaml.
A slow LLM sketch query (minutes) could overrun the solve timeout, making the refine loop's remaining-budget guard skip _refine_sketch entirely and fail without ever refining. Track query time separately and exclude it from the refinement budget; report actual sketches tried in the failure message.
Add wall-clock timing to AgentSessionManager.query(), the single funnel all agent interactions route through (planner approaches and explorers). - Per-interaction total logged at INFO: [agent-interaction] kind=... took Ns - Per-step [+Ds] prefix on each tool-call/thinking/text DEBUG line, the delta since the previous response message (model latency / tool exec). - Also echo thinking blocks to the live log; previously they were saved to the .md transcript but dropped from debug.log.
… session per test task Two related improvements to the agent_bilevel solve loop, motivated by sketches that the backtracking search could not refine being re-emitted unchanged on every retry. Refinement-failure feedback: - _refine_sketch now forwards an on_step_fail callback; _solve aggregates, across a skeleton's refine retries, the deepest step the search reached and a tally of the distinct failure reasons (e.g. a Place/MoveToDrop BiRRT collision). - On a fully-failed sketch, _record_refinement_failure writes a per-step log to <sandbox>/refinement_logs/sketch_NN_refine.md and returns a preview + pointer block. - build_solve_prompt gains a prior_failures section so the next sketch query sees what already failed and revises the dead skeleton instead of repeating it. No effect on the fixed-sketch-file path. Fresh session per test task: - New CFG flag agent_fresh_session_per_test_task (default False, unchanged behavior: all test tasks share one continuous agent conversation). - When True, reset_for_new_episode closes the agent session at the start of each test task so its solve begins with a fresh conversation; the sandbox filesystem and learned artifacts are untouched. Gated to the test phase (via a new _in_test_phase marker) so exploration episodes keep their shared session, and fires once per task, not on mid-episode replans.
…eset _reset_single_object built object orientation from yaw only, dropping the roll/pitch features. A toppled object (e.g. a fallen domino with roll~pi) was therefore reset upright; _get_state read the angle back as 0, the mismatch exceeded _reconstruction_raise_atol, and _set_state raised an uncaught ValueError. During bilevel refinement this crashed whole runs (BiRRT's _plan_with_simulator seeds its sim via _set_state on the current, possibly-toppled state). Now rebuild the quaternion from whichever Euler angles the type carries; yaw-only types are unchanged.
Session-log filenames for kind=test queries now carry a _task<idx> segment (e.g. 001_test_task0_<ts>.md) so each logged query/response is attributable to a test task. The index mirrors main.py's test_task_idx by counting test episodes in reset_for_new_episode, which fires once per test task and not on bilevel mid-episode replans.
…_mcp_tools into builders Rename the test_option_plan tool to evaluate_option_plan and object_augmentor→task_augmentor across prompts, settings, and tests. Split the monolithic create_mcp_tools into per-group _build_* helpers (_build_inspection_tools, _build_proposal_tools, _build_retraction_tools, _build_testing_tools, _build_planning_tools, _build_scene_tools).
…seeds Rename the hybrid-sim approach to agent_oracle_hybrid_sim_oracle_samplers and add a commented no-oracle-samplers variant; bump NUM_SEEDS 1→5.
When an agent numbers its sketch lines (e.g. "0: Pick(...)", mirroring the format the system prints in logs and prior-failure previews), the option name was no longer the first token, so the whole sketch parsed as empty. parse_model_output_into_option_plan and parse_subgoal_annotations now strip a leading enumeration prefix (0:, 1., 2)) via the new utils.strip_enumeration_prefix, keeping option/subgoal lists aligned. Prose bullets like "- Step 1:" are deliberately left untouched. Adds regression tests for both parsers.
The SDK reports total_cost_usd as the cumulative cost of the reused session, so the session managers were summing already-cumulative values into _total_cost_usd (a large over-count) and logging the running total as if it were per-iteration. Track the last value seen to derive each query's marginal cost, accumulate the marginals, and log both "this solve" and "total so far". Surface both in the markdown logs.
Reproduces the exact domino test tasks from a run (same seed, test_env_seed_offset, and domino flags) and saves a PNG of each test task's initial state, labeling solved vs failed tasks.
Extract resolve_refine_timeout and refine_and_validate_report into bilevel_sketch as the shared refinement + forward-validation + report core. Synthesis (run_refinement_for_synthesis) and the new planner refine_plan_sketch tool both call it, differing only in setup glue: synthesis fits PARAM_SPECS and rebuilds the option model per call, while the planner uses the prebuilt ctx.option_model. Wire refine_plan_sketch into the planner's solve tools when a simulator is available.
…ve prompt
At the start of each _solve, render the task's initial state to
test_images/{taskNNN_}initial_state.png so the agent sees the scene
layout before planning. The prompt now includes a '## Initial State
Image' section pointing to the file when available.
Handles both PyBullet envs (_set_state + render()) and general envs
(render_state) with graceful fallback on failure.
After grasping, the held object may start in shallow penetration from grasp settling. Add allow_shallow_held_object_contacts flag to Phase and wire it through make_move_to_phase, PhaseSkill, and BiRRT. When enabled, initial contacts shallower than the configurable pybullet_birrt_shallow_held_contact_margin (-0.003) are excluded from collision checking so the lift can escape without failing. Applied to the LiftSlightly phase of pick skills. Also adds min contact distance to collision log messages for easier debugging.
Replace the fixed-row staging layout with a grid search that uses oriented-rectangle overlap tests to avoid placing movable dominoes on top of start/target blocks. Returns None (triggering retry) when no collision-free slot is found. Adds _placement_collides, _placement_rect, and _rectangles_overlap helpers with a separating-axis overlap test.
Update domino env __main__ test defaults (seed=1, 1 test task, unfinished state). Rename agent config entry for clarity.
The unfinished-state staging loop placed movable dominoes with an overlap-only collision check, which could leave one inside the gripper's swept grasp footprint of the start block or a target -- especially a perpendicular neighbor a few cm away in y. The domino then lands placed but un-pickable: BiRRT finds no collision-free descent for Pick/MoveToGrasp. Add a grasp-clearance check (_grasp_clearance_blocked): reject a staging spot unless the gripper's swept footprint -- an oriented rectangle with half-extents 0.85x domino width along the long axis and 1.45x along the finger/depth axis, measured from the Fetch gripper -- is clear of every other object. Verified across seeds 0-4: previously seed1 t3, seed2 t4 and seed2 t5 each had an un-pickable movable domino; now every movable domino in all 25 tasks is graspable from init, with no generation slowdown.
Debugging/repro tooling for the domino oracle-samplers runs: - reproduce_domino_failures.py: deterministic, LLM-free reproduction of grasp/place BiRRT infeasibility and the Push parser-drop bug. - replay_domino_sketches.py: replay recorded LLM sketches through the real bilevel refinement to reproduce solve-time failures. - render_unsolved_domino_states.py: annotated init-state PNGs for the unsolved tasks. - plan_sketches/domino_repro_s1t0.txt: example sketch for --agent_bilevel_plan_sketch_file.
Keep these predicates in oracle.yaml (test oracle) but drop them for agent runs. Achieved via a deep-merged ENVS.domino override in agents.yaml instead of the shared envs/all.yaml.
Add a per-phase Phase.validate_ik flag and set it for Pick's MoveToGrasp.
When CFG.pybullet_ik_validate is False, unvalidated PyBullet IK can return a
grasp goal config whose EE pose is numerically close but whose gripper finger
slightly penetrates the very domino being grasped (~1-11mm). BiRRT then rejects
the otherwise-reachable grasp ("no collision-free path"), failing the option
mid-plan even though the grasp pose is feasible (validated IK clears it).
_plan_with_simulator now validates the goal IK when the phase requests it,
without globally enabling validation (which slows transport/place/retreat and
introduces Place/Retreat collision + refinement-budget regressions). Replaying
the recorded domino oracle-samplers sketches confirms this clears the mid-plan
Pick/MoveToGrasp failures (e.g. no_demo seed1.t3 4/5 -> 5/5) with no new
regressions, where global ik_validate=True regressed the same seed to 3/5.
…name precedence
Takes oracle_process_planning from 3/5 to 5/5 on pybullet_domino (seed 0)
via two independent fixes:
- _place_sampler (domino/processes.py): rank-sum three signals
(future-target bridge, planner grid-cell distance, planner angle error)
over the generator-faithful candidate placements and pick the
cascade-correct pose deterministically, instead of the bare grid cell
(which omits the generator's inward domino_width/2 corner offset and
stalls corner cascades).
- BilevelProcessPlanningApproach.__init__: drop any base predicate whose
name a helper predicate already provides before unioning, so the grid's
derived InFront fully replaces the env's position-based InFront. A plain
set union kept both (==-equal but different hashes), and abstract() then
evaluated the looser position one, hallucinating adjacencies that let
the planner build a physically impossible single-block bridge.
oracle.yaml runs open-loop (bilevel_plan_without_sim); the deterministic
sampler reaches the cascade-correct pose on the first try, so no per-step
sim rollout / backtracking is needed. Adds dbg_domino_{tasks,infront}.py.
Non-behavioral cleanup so yapf/isort/docformatter/mypy pass (tool versions matched to CI: yapf 0.32.0, docformatter 1.4, isort 5.10.1, mypy 1.8.0): - yapf / docformatter / isort reformatting across the domino + agent_sdk files added by earlier commits on this branch. - agent_sdk cost logging: annotate `cost: Optional[float]` so mypy can narrow the entry.get() result (was `float + None`). - maple_q / human_option_control _solve: reconcile with the base-class signature (type: ignore[override]; add the unused _allow_replan param). - agent_planner: type: ignore[no-untyped-call] for PIL Image.fromarray. - test_domino_gt_samplers: cast the classifier stub to DominoComponent. - mypy.ini: relax strict def/call typing for the PIL-heavy domino debug/analysis scripts, mirroring the existing per-script carve-outs.
Mechanical, non-behavioral fixes so `pytest --pylint` passes repo-wide under .predicators_pylintrc: - line-too-long: wrap long comments/prompt-strings to <=79 (settings.py, pybullet_env.py, agent_bilevel/agent_planner, base.py); import DominoTaskGenerator from its package re-export to shorten the line. - skill_factories/base.py: drop the unused top-level get_link_state import (the deferred in-function import already provides it); `del` the unused `objects` argument. - agent_planner: block-disable attribute-defined-outside-init for the mixin-initialized _agent_session_id set during checkpoint reload. - domino debug scripts: add docstrings, mark the intentional deferred imports / seed shadowing with pylint disables, drop an unused import/variable, use rsplit(maxsplit=1). The 2 remaining unit-test failures (test_push_second_switch_boil_position_mode, test_human_option_control_scripted_domino_solves_task) are pre-existing on this branch (fail identically on the parent commit) and are left as-is.
Both are pre-existing failures on this branch (fail on the parent commit a4f1a9a), surfaced by CI shard 5: - agent_bilevel `_query_agent_for_plan_sketch`: the sketch path was built by unconditionally prepending `scripts/<plan_sketch_dir>/`, which corrupts an ABSOLUTE `plan_sketch_file` (what the test passes) into `scripts/plan_sketches//abs/path` -> FileNotFoundError. Use the path as-is when absolute, else join under `scripts/<dir>/` (consistent with synthesis_validation, which opens the file directly). Fixes test_sketch_from_file. - scripts/scripted_option_policies/domino2.txt: the hardcoded Place poses no longer matched seed-0 task-1's solution chain, so the cascade never reached the target. Replace with the generator's solution-chain placements (domino_1 -> (0.890, 1.327, 45deg), domino_2 -> (0.821, 1.361, 90deg); drop z = _DOMINO_DROP_Z=0.58). Pick/Push params already matched the samplers. Fixes test_human_option_control_scripted_domino_solves_task. (test_push_second_switch_boil_position_mode fails only on macOS but passes on CI's Linux -- a known platform divergence -- so it is left untouched.)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This is the agent simulation-learning +
pybullet_dominofeature line (a large, long-lived branch). The commits on top ofa4f1a9ae1make the open-looporacle_process_planningdemonstrator solve all five domino test tasks at seed 0 and bring the branch's CI checks to green.Domino oracle (open-loop): 3/5 → 5/5
Two independent fixes:
_place_sampler(ground_truth_models/domino/processes.py): rank-sum three signals — future-target bridge, planner grid-cell distance, planner angle error — over the generator-faithful candidate placements and pick the cascade-correct pose deterministically, instead of the bare grid cell (which omits the generator's inwarddomino_width/2corner offset and stalls corner cascades).BilevelProcessPlanningApproach.__init__: enforce helper-predicate name precedence so the grid's derivedInFrontfully replaces the env's position-basedInFront. A plain set union kept both (they are==-equal but hash differently), soabstract()evaluated the looser position-based one, hallucinating adjacencies that let the planner build a physically-impossible single-block bridge.predicatorv3/oracle.yamlruns open-loop (bilevel_plan_without_sim): the deterministic sampler reaches the cascade-correct pose on the first try, so no per-step sim rollout / backtracking is needed. Addsscripts/dbg_domino_{tasks,infront}.pydiagnostics.CI — all checks green
yapf (0.32.0), isort (5.10.1), docformatter (1.4), mypy (1.8.0, incl.
--platform linux), pylint, and the 8 unit-test shards all pass. Bringing the branch to green required:agent_sdkfiles added earlier on the branch (all non-behavioral: line-length wraps, unused-import/arg cleanup,cost: Optional[float]narrowing,_solveoverride reconciliation, per-script mypy carve-outs for the PIL-heavy domino debug scripts).a4f1a9ae1):test_sketch_from_file:_query_agent_for_plan_sketchbuilt the sketch path by unconditionally prependingscripts/<dir>/, corrupting an absoluteplan_sketch_file. Now uses an absolute path as-is, else joins underscripts/<dir>/.test_human_option_control_scripted_domino_solves_task: the hardcodeddomino2.txtPlace poses no longer matched seed-0 task-1's solution chain, so the cascade never reached the target. Replaced with the generator's solution-chain placements.test_push_second_switch_boil_position_modefails only on macOS (passes on CI's Linux) — a known platform divergence in switch-push physics — so it is left untouched.🤖 Generated with Claude Code