Skip to content

pybullet_domino open-loop oracle (5/5) + CI greening#39

Open
yichao-liang wants to merge 257 commits into
masterfrom
domino-oracle-sim
Open

pybullet_domino open-loop oracle (5/5) + CI greening#39
yichao-liang wants to merge 257 commits into
masterfrom
domino-oracle-sim

Conversation

@yichao-liang

@yichao-liang yichao-liang commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Summary

This is the agent simulation-learning + pybullet_domino feature line (a large, long-lived branch). The commits on top of a4f1a9ae1 make the open-loop oracle_process_planning demonstrator solve all five domino test tasks at seed 0 and bring the branch's CI checks to green.

Domino oracle (open-loop): 3/5 → 5/5

Two independent fixes:

  • _place_sampler (ground_truth_models/domino/processes.py): rank-sum three signals — future-target bridge, planner grid-cell distance, planner angle error — over the generator-faithful candidate placements and pick the cascade-correct pose deterministically, instead of the bare grid cell (which omits the generator's inward domino_width/2 corner offset and stalls corner cascades).
  • BilevelProcessPlanningApproach.__init__: enforce helper-predicate name precedence so the grid's derived InFront fully replaces the env's position-based InFront. A plain set union kept both (they are ==-equal but hash differently), so abstract() evaluated the looser position-based one, hallucinating adjacencies that let the planner build a physically-impossible single-block bridge.

predicatorv3/oracle.yaml runs open-loop (bilevel_plan_without_sim): the deterministic sampler reaches the cascade-correct pose on the first try, so no per-step sim rollout / backtracking is needed. Adds scripts/dbg_domino_{tasks,infront}.py diagnostics.

CI — all checks green

yapf (0.32.0), isort (5.10.1), docformatter (1.4), mypy (1.8.0, incl. --platform linux), pylint, and the 8 unit-test shards all pass. Bringing the branch to green required:

  • Clearing pre-existing lint/type debt across the domino + agent_sdk files added earlier on the branch (all non-behavioral: line-length wraps, unused-import/arg cleanup, cost: Optional[float] narrowing, _solve override reconciliation, per-script mypy carve-outs for the PIL-heavy domino debug scripts).
  • Fixing two pre-existing failing unit tests (both fail on a4f1a9ae1):
    • test_sketch_from_file: _query_agent_for_plan_sketch built the sketch path by unconditionally prepending scripts/<dir>/, corrupting an absolute plan_sketch_file. Now uses an absolute path as-is, else joins under scripts/<dir>/.
    • test_human_option_control_scripted_domino_solves_task: the hardcoded domino2.txt Place poses no longer matched seed-0 task-1's solution chain, so the cascade never reached the target. Replaced with the generator's solution-chain placements.

test_push_second_switch_boil_position_mode fails only on macOS (passes on CI's Linux) — a known platform divergence in switch-push physics — so it is left untouched.

🤖 Generated with Claude Code

yichao-liang and others added 30 commits April 7, 2026 12:11
Extract repeated wait-termination check into _check_wait_termination helper
and unify the three _terminal branches into a single definition with
config checks inside the function body.
- Remove dead/commented-out code and stale self-question comments
- Add _VIRTUAL_OBJECT_TYPES constant to replace hardcoded type-name
  skip lists in _set_state and _get_state
- Move env-specific _get_robot_state_dict branches to subclass overrides
  in pybullet_cover and pybullet_blocks
- Extract _get_camera_matrices helper to deduplicate render methods
- Extract _get_object_state_dict from _get_state for per-object logic
- Move create_pybullet_block/sphere to pybullet_helpers/objects.py
- Merge _create_task_specific_objects into _set_domain_specific_state
- Rename: _reset_state -> _set_state,
  _reset_custom_env_state -> _set_domain_specific_state,
  _extract_feature -> _get_domain_specific_feature
- Add docstrings explaining where each method is called from
Reorganize methods into labeled sections (Setup, Public API, Core Loop,
State Write/Read, Grasp Management, Action Helpers, Rendering, Utilities)
so related functions are adjacent. Update module docstring to document
the main public API and state synchronization methods.
Add _step_base() and _domain_specific_step() to PyBulletEnv base class.
step() now calls _step_base (robot control, physics, grasp) then
_domain_specific_step (water filling, heating, etc.), gated by
_skip_domain_specific_dynamics flag for kinematics-only mode.

Migrate all 15 domain envs to override _domain_specific_step() instead
of step(). Envs with pre-step logic (coffee, switch, blocks, cover)
still override step() for the pre-step part only.
Document the step_base → domain_specific_step → get_observation flow,
_skip_domain_specific_dynamics flag, and _domain_specific_step as an
optional override.
Replace direct access to private _skip_domain_specific_dynamics
attribute with a public constructor parameter, so callers declare
kinematics-only mode at creation time instead of mutating internal
state after construction.
…ging

Both AgentSessionMixin and AgentExplorer had near-identical wrappers that
ran session.query() synchronously via nest_asyncio or asyncio.run. Move
that logic into a module-level run_query_sync helper in session_manager
and have both callers delegate to it.
Distinguishes the grounded-plan explorer from upcoming bilevel variants.
AgentExplorer -> AgentPlanExplorer, get_name() 'agent' -> 'agent_plan',
file moved to agent_plan_explorer.py, and all callers / docstrings /
YAML config examples updated accordingly.
The mixin is pure agent-session plumbing (session creation, lifecycle,
explorer factory) and has no approach-specific logic, so it belongs
next to session_manager.py, tools.py, and the sandbox managers rather
than in approaches/.
The explorer asks a Claude agent for a plan sketch, refines it against
the approach's current (possibly learned) option model, and rolls the
refined plan out in the real env. When the mental model disagrees with
reality — e.g. the sketch expects JugFilled after a Wait but the mental
model's process dynamics can't produce it — the explorer truncates the
plan at the deepest unsatisfiable subgoal (inclusive) so the real-env
rollout ends exactly where the disagreement occurs, maximising signal
per experiment.

Key pieces:

- predicators/agent_sdk/bilevel_sketch.py: extracted the sketch build
  / parse / refine helpers from AgentBilevelApproach as module-level
  functions so both the approach (solve path) and the new explorer
  (exploration path) can share them. refine_sketch gains
  truncate_on_subgoal_fail: the on_step_fail callback snapshots the
  deepest subgoal failure seen during backtracking, and on exhaustion
  the captured prefix is returned as the experiment plan.

- predicators/explorers/agent_bilevel_explorer.py: new explorer.
  Reads option_model from tool_context (synced by the approach),
  builds the sketch prompt via bilevel_sketch, runs refine_sketch with
  check_subgoals=True, check_final_goal=False, truncate_on_subgoal_fail
  =True, wraps the result in an option_plan_to_policy that converts
  OptionExecutionFailure into RequestActPolicyFailure so the episode
  cleanly terminates at the point of real-env divergence. Stashes the
  sketch subgoals/options on ToolContext for downstream diffing by
  the learning approach.

- predicators/approaches/agent_bilevel_approach.py: shim methods over
  bilevel_sketch; behaviour unchanged.

- predicators/approaches/agent_planner_approach.py: _create_explorer
  dispatches both "agent_plan" and "agent_bilevel" through the agent
  factory path and forwards CFG.explorer as the name.

- predicators/explorers/__init__.py: factory branch merged for the
  two agent-session-backed explorers.

- predicators/agent_sdk/tools.py: ToolContext gains
  last_sketch_subgoals / last_sketch_options fields, populated by the
  explorer and marked TODO for the learning approach to consume.

- tests/explorers/test_agent_bilevel_explorer.py: happy-path, fallback,
  wait-memory-injection, and deepest-subgoal-failure truncation tests.
- New setting agent_bilevel_explorer_max_samples_per_step (default 50),
  separate from the solve-path budget, so the explorer's backtracking
  cost is independently tunable.
- Log the actual experiment plan (option names, objects, params) after
  refinement so the explorer's output is visible alongside the
  existing sketch/truncation log lines.
- Test config updated to set both budgets explicitly.
AgentSimLearningApproach extends AgentBilevelApproach to learn process
dynamics online. Each cycle: the agent synthesizes parameterized
process rules via Claude (using run_python / evaluate_simulator /
test_simulator MCP tools), parameters are fitted via emcee MCMC, and
the learned dynamics are composed with a kinematics-only PyBullet
oracle into a combined option model for plan refinement.

Key pieces:
- predicators/approaches/agent_sim_learning_approach.py: the approach.
  Initialises with a kinematics-only option model (so
  AgentBilevelExplorer sees disagreements at process-dynamic subgoals
  like JugFilled/Boiled), and replaces it with the kin+learned model
  after each successful synthesis cycle.
- predicators/agent_sdk/tools.py: create_synthesis_tools() builds the
  three MCP tools the synthesis agent uses; extra_mcp_tools field and
  get_allowed_tool_list(extra_names=) plumbing lets the approach
  inject them into the session.
- predicators/code_sim_learning/: ParamSpec, fit_params (emcee MCMC),
  compute_mse, LearnedSimulator.
- predicators/ground_truth_models/boil/gt_simulator.py: ground-truth
  process-dynamics simulator for the boil environment.
- tests/: approach and param-fitting tests.
- agents.yaml: comment out agent_bilevel preset, add agent_sim_learning
  with explorer=agent_bilevel and skip_test_until_last_ite_or_early_stopping.
- common.yaml: disable failure/test video recording, set
  num_online_learning_cycles=1 for faster iteration.
Simulation primitives (code_sim_learning/utils.py):
- apply_rules(state, rules, params) → ProcessUpdate
- merge_updates(base_state, updates, process_features) → State
- simulate_step(state, action, base_env, rules, params, features) → State
These replace _build_fitted_step_fn, merge_process_updates,
_sim_fn_from_rules, and the body of _build_combined_simulator.

GT simulator factory (ground_truth_models):
- GroundTruthSimulatorFactory ABC + get_gt_simulator(env_name) discovery,
  following the existing get_gt_options / get_gt_nsrts pattern.
- PyBulletBoilGroundTruthSimulatorFactory registered in boil/.
- Replaces the hardcoded _load_oracle_simulator in the approach.

Oracle ablation flags (settings.py):
- agent_sim_learn_oracle_sim_program: load GT rules, skip synthesis.
- agent_sim_learn_oracle_sim_params: use GT param values, skip MCMC.

Also: kin_env → base_env rename throughout, redundant self._types
assignment removed, process_features computed once in __init__.
- yapf + isort autoformatting applied to all touched files.
- pylint: fix logging-not-lazy in agent_bilevel_explorer, add
  broad-except and reimported disables in agent_sim_learning_approach.
- mypy: fix base/env variable name collision, add type: ignore on
  lambda inference, add return type annotations to GT factory methods.
Use utils.abstract to evaluate expected atoms in low-level search so
that DerivedPredicates — which require a Set[GroundAtom] rather than a
State — are handled correctly alongside regular predicates.
When sequential simulate calls differ only in process features (as in
the combined kinematic+learned simulator), reapplying joint positions
and tearing down/recreating grasp constraints causes visible arm
jitter. Compare robot poses first and skip the kinematic reset path
when they already match.
Factor simulator synthesis into a shared _learn_simulator helper so
that both learn_from_offline_dataset and learn_from_interaction_results
can trigger it on their respective trajectory sources. Also create a
separate headless env for parameter fitting so MCMC's thousands of
_set_state calls don't thrash the GUI env during training.
Replace the silent run_mcmc call with a manual sample loop that logs
step count and best log-probability roughly five times per run, and
flushes handlers so the updates appear promptly under buffered
logging.
Type-annotate **kwargs on PyBullet env __init__ overrides so mypy
doesn't flag them. Initialize attrs used by _domain_specific_step in
__init__ (pybullet_coffee, pybullet_switch) to silence
defined-outside-init. Type-ignore the emcee import. Fix encoding,
unused, protected-access, and redefined-outer-name warnings in the
sim-learning tests and agent-SDK tooling.
When a held object's grasp constraint is recreated via _set_state, the
gripper frame must match the original world pose exactly — otherwise
the recorded base_link->object offset is rotated and the object lands
at the wrong world position when the gripper next moves. The State
representation only carries (x, y, z, tilt, wrist), so IK during reset
can pick a different wrist-roll solution and corrupt the constraint.

Thread joint_positions from PyBulletState.simulator_state through
reset_state so we skip IK and restore the exact arm configuration.
Falls back to IK when joints aren't available (plain State).

Also wire wait-termination so refinement and execution can stop Wait
when expected atoms hold instead of running to
max_num_steps_option_rollout: set _abstract_function on the option
model in BilevelPlanningApproach (mirrors AgentPlannerApproach), pass
abstract_function into option_plan_to_policy in
BilevelProcessPlanningApproach, and inject wait_target_atoms per
sample in run_low_level_search.
After resetJointState, PyBullet's getLinkState returns a stale link
pose from the previous FK cache, producing 50-500μm drift in the EE
pose readback. Pass computeForwardKinematics=1 so world poses are
recomputed from current joints on every call.

Also skip the explicit finger reset in reset_state when joint_positions
are provided: arm_joints already includes the finger joints, so
set_joints has restored them to their exact continuous values, and the
subsequent loop was overwriting them with the discrete-snapped value
from _fingers_state_to_joint. The finger reset still runs on the IK
path where set_joints leaves fingers untouched.

Together these eliminate the "Could not reconstruct state exactly in
reset" warning noise (24 -> 0 on the boil-oracle run).
…apping

Delete agent_abstraction_learning_approach.py and
agent_closed_loop_approach.py (no longer used; auto-discovery picks up
the rest). Refactor the remaining agent approaches for readability:

- Add AgentPlannerApproach._wrap_option_failures so the open-loop
  planner and bilevel _plan_to_policy share the OptionExecutionFailure
  -> ApproachFailure adapter.
- Factor save/load onto the base via a _save_suffix attribute plus
  _extra_save_state / _load_extra_save_state hooks; AgentOptionLearning
  now only declares its suffix and extra options field.
- Drop the redundant _agent_session_id assignments already handled by
  AgentSessionMixin._init_agent_session_state.
get_gt_simulator("pybullet_domino") previously raised
NotImplementedError because no GroundTruthSimulatorFactory was bound to
the env. Add a minimal no-op simulator: a single identity process rule,
one placeholder ParamSpec (the component loader rejects empty
rule/spec lists), and empty PROCESS_FEATURES. Register the factory in
the domino package __init__ so the registry can discover it.
When pybullet_ik_validate is disabled, a single unvalidated IK call can
return joints whose EE pose matches numerically but whose carried object
penetrates the table, so collision-aware BiRRT finds no path and Place
looks infeasible. Retry once with validated IK (which iterates to a
better Cartesian solution) before giving up, preserving the fast path
for the common case.

Also raise the domino drop Z from 0.5695 to 0.58: with the skill-factory
Pick grasp transform the legacy height left the held domino penetrating
the table at the collision-aware Place goal. Add an integration test
covering the seed-0 second-bridge placement with ik_validate disabled.
…tep refinement

Replace the residual-tie-break Place sampler with one that enumerates
the exact placements the task generator could lay next to a reference
(_generator_placements: straight / +-45-deg turn blocks in either chain
direction, mirroring _place_straight_domino / _place_turn90_domino),
scores each by subgoal atoms satisfied, and draws uniformly among the
best-scoring ties. Randomizing lets backtracking that re-draws the step
reach a turn when the lone subgoal is satisfied equally by straight and
turn but a later step needs the bend. Add a future-target-bridge
tie-break so the first placement is chosen to keep a purple-target
completion reachable.

Flag the constant Pick/Push samplers as deterministic and have
backtracking refinement cap a deterministic step's retries at 1 --
re-drawing a constant sampler yields the identical option, so
re-descending through it on every backtrack is wasted budget.
… domino excluded_predicates

Set bilevel_plan_without_sim for the oracle_process_planning demonstrator
in agents.yaml, and uncomment the domino excluded_predicates
(InitialBlock,MovableBlock,Tilting,Upright) in envs/all.yaml.
A slow LLM sketch query (minutes) could overrun the solve timeout, making
the refine loop's remaining-budget guard skip _refine_sketch entirely and
fail without ever refining. Track query time separately and exclude it from
the refinement budget; report actual sketches tried in the failure message.
Add wall-clock timing to AgentSessionManager.query(), the single funnel
all agent interactions route through (planner approaches and explorers).

- Per-interaction total logged at INFO: [agent-interaction] kind=... took Ns
- Per-step [+Ds] prefix on each tool-call/thinking/text DEBUG line, the
  delta since the previous response message (model latency / tool exec).
- Also echo thinking blocks to the live log; previously they were saved
  to the .md transcript but dropped from debug.log.
… session per test task

Two related improvements to the agent_bilevel solve loop, motivated by
sketches that the backtracking search could not refine being re-emitted
unchanged on every retry.

Refinement-failure feedback:
- _refine_sketch now forwards an on_step_fail callback; _solve aggregates,
  across a skeleton's refine retries, the deepest step the search reached
  and a tally of the distinct failure reasons (e.g. a Place/MoveToDrop
  BiRRT collision).
- On a fully-failed sketch, _record_refinement_failure writes a per-step
  log to <sandbox>/refinement_logs/sketch_NN_refine.md and returns a
  preview + pointer block.
- build_solve_prompt gains a prior_failures section so the next sketch
  query sees what already failed and revises the dead skeleton instead of
  repeating it. No effect on the fixed-sketch-file path.

Fresh session per test task:
- New CFG flag agent_fresh_session_per_test_task (default False, unchanged
  behavior: all test tasks share one continuous agent conversation).
- When True, reset_for_new_episode closes the agent session at the start of
  each test task so its solve begins with a fresh conversation; the sandbox
  filesystem and learned artifacts are untouched. Gated to the test phase
  (via a new _in_test_phase marker) so exploration episodes keep their
  shared session, and fires once per task, not on mid-episode replans.
…eset

_reset_single_object built object orientation from yaw only, dropping the
roll/pitch features. A toppled object (e.g. a fallen domino with roll~pi)
was therefore reset upright; _get_state read the angle back as 0, the
mismatch exceeded _reconstruction_raise_atol, and _set_state raised an
uncaught ValueError. During bilevel refinement this crashed whole runs
(BiRRT's _plan_with_simulator seeds its sim via _set_state on the current,
possibly-toppled state). Now rebuild the quaternion from whichever Euler
angles the type carries; yaw-only types are unchanged.
Session-log filenames for kind=test queries now carry a _task<idx> segment
(e.g. 001_test_task0_<ts>.md) so each logged query/response is attributable
to a test task. The index mirrors main.py's test_task_idx by counting test
episodes in reset_for_new_episode, which fires once per test task and not on
bilevel mid-episode replans.
…_mcp_tools into builders

Rename the test_option_plan tool to evaluate_option_plan and
object_augmentor→task_augmentor across prompts, settings, and tests.
Split the monolithic create_mcp_tools into per-group _build_* helpers
(_build_inspection_tools, _build_proposal_tools, _build_retraction_tools,
_build_testing_tools, _build_planning_tools, _build_scene_tools).
…seeds

Rename the hybrid-sim approach to agent_oracle_hybrid_sim_oracle_samplers
and add a commented no-oracle-samplers variant; bump NUM_SEEDS 1→5.
When an agent numbers its sketch lines (e.g. "0: Pick(...)", mirroring the
format the system prints in logs and prior-failure previews), the option
name was no longer the first token, so the whole sketch parsed as empty.
parse_model_output_into_option_plan and parse_subgoal_annotations now
strip a leading enumeration prefix (0:, 1., 2)) via the new
utils.strip_enumeration_prefix, keeping option/subgoal lists aligned.
Prose bullets like "- Step 1:" are deliberately left untouched. Adds
regression tests for both parsers.
The SDK reports total_cost_usd as the cumulative cost of the reused
session, so the session managers were summing already-cumulative values
into _total_cost_usd (a large over-count) and logging the running total
as if it were per-iteration. Track the last value seen to derive each
query's marginal cost, accumulate the marginals, and log both "this
solve" and "total so far". Surface both in the markdown logs.
Reproduces the exact domino test tasks from a run (same seed,
test_env_seed_offset, and domino flags) and saves a PNG of each test
task's initial state, labeling solved vs failed tasks.
Extract resolve_refine_timeout and refine_and_validate_report into
bilevel_sketch as the shared refinement + forward-validation + report
core. Synthesis (run_refinement_for_synthesis) and the new planner
refine_plan_sketch tool both call it, differing only in setup glue:
synthesis fits PARAM_SPECS and rebuilds the option model per call,
while the planner uses the prebuilt ctx.option_model. Wire
refine_plan_sketch into the planner's solve tools when a simulator
is available.
…ve prompt

At the start of each _solve, render the task's initial state to
test_images/{taskNNN_}initial_state.png so the agent sees the scene
layout before planning. The prompt now includes a '## Initial State
Image' section pointing to the file when available.

Handles both PyBullet envs (_set_state + render()) and general envs
(render_state) with graceful fallback on failure.
After grasping, the held object may start in shallow penetration from
grasp settling. Add allow_shallow_held_object_contacts flag to Phase
and wire it through make_move_to_phase, PhaseSkill, and BiRRT. When
enabled, initial contacts shallower than the configurable
pybullet_birrt_shallow_held_contact_margin (-0.003) are excluded from
collision checking so the lift can escape without failing.

Applied to the LiftSlightly phase of pick skills. Also adds min contact
distance to collision log messages for easier debugging.
Replace the fixed-row staging layout with a grid search that uses
oriented-rectangle overlap tests to avoid placing movable dominoes on
top of start/target blocks. Returns None (triggering retry) when no
collision-free slot is found.

Adds _placement_collides, _placement_rect, and _rectangles_overlap
helpers with a separating-axis overlap test.
Update domino env __main__ test defaults (seed=1, 1 test task,
unfinished state). Rename agent config entry for clarity.
The unfinished-state staging loop placed movable dominoes with an
overlap-only collision check, which could leave one inside the gripper's
swept grasp footprint of the start block or a target -- especially a
perpendicular neighbor a few cm away in y. The domino then lands placed
but un-pickable: BiRRT finds no collision-free descent for
Pick/MoveToGrasp.

Add a grasp-clearance check (_grasp_clearance_blocked): reject a staging
spot unless the gripper's swept footprint -- an oriented rectangle with
half-extents 0.85x domino width along the long axis and 1.45x along the
finger/depth axis, measured from the Fetch gripper -- is clear of every
other object.

Verified across seeds 0-4: previously seed1 t3, seed2 t4 and seed2 t5
each had an un-pickable movable domino; now every movable domino in all
25 tasks is graspable from init, with no generation slowdown.
Debugging/repro tooling for the domino oracle-samplers runs:
- reproduce_domino_failures.py: deterministic, LLM-free reproduction of
  grasp/place BiRRT infeasibility and the Push parser-drop bug.
- replay_domino_sketches.py: replay recorded LLM sketches through the
  real bilevel refinement to reproduce solve-time failures.
- render_unsolved_domino_states.py: annotated init-state PNGs for the
  unsolved tasks.
- plan_sketches/domino_repro_s1t0.txt: example sketch for
  --agent_bilevel_plan_sketch_file.
Keep these predicates in oracle.yaml (test oracle) but drop them for
agent runs. Achieved via a deep-merged ENVS.domino override in
agents.yaml instead of the shared envs/all.yaml.
Add a per-phase Phase.validate_ik flag and set it for Pick's MoveToGrasp.
When CFG.pybullet_ik_validate is False, unvalidated PyBullet IK can return a
grasp goal config whose EE pose is numerically close but whose gripper finger
slightly penetrates the very domino being grasped (~1-11mm). BiRRT then rejects
the otherwise-reachable grasp ("no collision-free path"), failing the option
mid-plan even though the grasp pose is feasible (validated IK clears it).

_plan_with_simulator now validates the goal IK when the phase requests it,
without globally enabling validation (which slows transport/place/retreat and
introduces Place/Retreat collision + refinement-budget regressions). Replaying
the recorded domino oracle-samplers sketches confirms this clears the mid-plan
Pick/MoveToGrasp failures (e.g. no_demo seed1.t3 4/5 -> 5/5) with no new
regressions, where global ik_validate=True regressed the same seed to 3/5.
…name precedence

Takes oracle_process_planning from 3/5 to 5/5 on pybullet_domino (seed 0)
via two independent fixes:

- _place_sampler (domino/processes.py): rank-sum three signals
  (future-target bridge, planner grid-cell distance, planner angle error)
  over the generator-faithful candidate placements and pick the
  cascade-correct pose deterministically, instead of the bare grid cell
  (which omits the generator's inward domino_width/2 corner offset and
  stalls corner cascades).
- BilevelProcessPlanningApproach.__init__: drop any base predicate whose
  name a helper predicate already provides before unioning, so the grid's
  derived InFront fully replaces the env's position-based InFront. A plain
  set union kept both (==-equal but different hashes), and abstract() then
  evaluated the looser position one, hallucinating adjacencies that let
  the planner build a physically impossible single-block bridge.

oracle.yaml runs open-loop (bilevel_plan_without_sim); the deterministic
sampler reaches the cascade-correct pose on the first try, so no per-step
sim rollout / backtracking is needed. Adds dbg_domino_{tasks,infront}.py.
Non-behavioral cleanup so yapf/isort/docformatter/mypy pass (tool versions
matched to CI: yapf 0.32.0, docformatter 1.4, isort 5.10.1, mypy 1.8.0):

- yapf / docformatter / isort reformatting across the domino + agent_sdk
  files added by earlier commits on this branch.
- agent_sdk cost logging: annotate `cost: Optional[float]` so mypy can
  narrow the entry.get() result (was `float + None`).
- maple_q / human_option_control _solve: reconcile with the base-class
  signature (type: ignore[override]; add the unused _allow_replan param).
- agent_planner: type: ignore[no-untyped-call] for PIL Image.fromarray.
- test_domino_gt_samplers: cast the classifier stub to DominoComponent.
- mypy.ini: relax strict def/call typing for the PIL-heavy domino
  debug/analysis scripts, mirroring the existing per-script carve-outs.
Mechanical, non-behavioral fixes so `pytest --pylint` passes repo-wide
under .predicators_pylintrc:

- line-too-long: wrap long comments/prompt-strings to <=79 (settings.py,
  pybullet_env.py, agent_bilevel/agent_planner, base.py); import
  DominoTaskGenerator from its package re-export to shorten the line.
- skill_factories/base.py: drop the unused top-level get_link_state import
  (the deferred in-function import already provides it); `del` the unused
  `objects` argument.
- agent_planner: block-disable attribute-defined-outside-init for the
  mixin-initialized _agent_session_id set during checkpoint reload.
- domino debug scripts: add docstrings, mark the intentional deferred
  imports / seed shadowing with pylint disables, drop an unused
  import/variable, use rsplit(maxsplit=1).

The 2 remaining unit-test failures (test_push_second_switch_boil_position_mode,
test_human_option_control_scripted_domino_solves_task) are pre-existing on
this branch (fail identically on the parent commit) and are left as-is.
Both are pre-existing failures on this branch (fail on the parent commit
a4f1a9a), surfaced by CI shard 5:

- agent_bilevel `_query_agent_for_plan_sketch`: the sketch path was built by
  unconditionally prepending `scripts/<plan_sketch_dir>/`, which corrupts an
  ABSOLUTE `plan_sketch_file` (what the test passes) into
  `scripts/plan_sketches//abs/path` -> FileNotFoundError. Use the path as-is
  when absolute, else join under `scripts/<dir>/` (consistent with
  synthesis_validation, which opens the file directly). Fixes
  test_sketch_from_file.

- scripts/scripted_option_policies/domino2.txt: the hardcoded Place poses no
  longer matched seed-0 task-1's solution chain, so the cascade never reached
  the target. Replace with the generator's solution-chain placements (domino_1
  -> (0.890, 1.327, 45deg), domino_2 -> (0.821, 1.361, 90deg); drop z =
  _DOMINO_DROP_Z=0.58). Pick/Push params already matched the samplers. Fixes
  test_human_option_control_scripted_domino_solves_task.

(test_push_second_switch_boil_position_mode fails only on macOS but passes on
CI's Linux -- a known platform divergence -- so it is left untouched.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant