Policy runner

bar_policy.remote_policy_runner is the Python rclpy node that drives bar::RemotePolicyController from an ONNX policy + (optionally) a LeRobot reference dataset. This page documents the launch surface, the ONNX metadata schema the runner reads, and the observation-term registry that maps names from the ONNX to ObservationTerm instances.

Why "self-describing ONNX"

The training pipeline writes 13 fields into the ONNX file's custom_metadata_map at export time — joint order, per-joint PD gains, default pose, action scale, observation term names, body names, the LeRobot dataset id, the policy tick period. The runner parses those fields into a typed PolicyMetadata object and never relies on YAML to restate them.

This matters because any mismatch between the training-time graph and the deployment-time YAML breaks the policy silently (wrong joint order or off-by-one observation slice will still produce well-shaped tensors that just behave terribly). Embedding the metadata in the ONNX file means training and deployment cannot drift independently.

Composition

Module	Responsibility
`OnnxPolicyRunner`	`onnxruntime.InferenceSession` wrapper. Single-step inference, shape contract `float32 (1, N) -> float32 (1, M)`.
`PolicyMetadata`	Frozen dataclass decoded from `custom_metadata_map`. Helpers like `joint_stiffness_map()`, `default_joint_pos_map()`, `action_scale_map()` return `dict[str, float]` keyed by joint name.
`term_builders.build_observation_terms`	Resolves each name in `observation_names` to an `ObservationTerm` via three-stage lookup: built-in proprioception → reference-provider term → constant fallback. Unknown names raise.
`ObservationManager`	Concatenates configured terms into the policy input vector each tick. Owns the reference-provider lifecycle (`reset` on activation, `step` after each `record_action`).
`TrackingReferenceProvider`	Loads a LeRobot dataset (local path or HF repo id) and serves per-frame body-pos / body-ori / joint references on the manager's cadence. Mirrors the training-side `pianist_tracking.loader.teleop_sequence` layout exactly. Activated when `task_type='tracking'`.
`_build_reference_provider` hook	`RemotePolicyRunner` extracts the provider-construction step as an overridable method. Base class handles `tracking`; downstream task packages (e.g. `pianist_ros2`'s `pianist_policy.PianoPolicyRunner`) subclass and override only this hook, inheriting all ONNX loading / observation packing / action decoding / MITCommand publishing. See Task packages below.
`PolicyActionDecoder`	`target[name] = default[name] + action[name] * scale[name]`.
`ActionMapper`	Assembles a full-articulation `MITCommand` from any `{joint_name: target_position}` map. Undriven joints (those not in `action_joint_names`) are pinned to `position=0` with the same `K`/`D` the policy ships, so the controller never holds an arbitrary pose.

ONNX metadata schema

The 13 required keys (parsed from session.get_modelmeta().custom_metadata_map):

Key	Type	Meaning
`task_type`	`"tracking"` \| `"piano"` \| `"locomotion"`	Selects the reference provider class.
`joint_names`	`tuple[str, ...]`	Full articulation list (17 for Lite). Indexes `joint_stiffness`, `joint_damping`, `default_joint_pos`.
`action_joint_names`	`tuple[str, ...]`	The subset the policy actually emits actions for. Joints in `joint_names` but not here are "undriven" and get pinned to zero.
`joint_stiffness`	`tuple[float, ...]`	Per-joint `K_p`, aligned to `joint_names`.
`joint_damping`	`tuple[float, ...]`	Per-joint `K_d`, aligned to `joint_names`.
`default_joint_pos`	`tuple[float, ...]`	Per-joint default position (the `q_default` in `out = (q - q_default) * scale`).
`observation_names`	`tuple[str, ...]`	The flat observation vector, term by term. See Observation term registry.
`command_names`	`tuple[str, ...]`	Future-use; runtime command terms (e.g. velocity targets) that aren't part of the proprio observation.
`action_scale`	`float` \| `tuple[float, ...]`	Per-action-joint scale. Scalar broadcasts across all joints.
`policy_dt`	`float`	Policy tick period in seconds. The runner uses this for its `create_timer` unless `policy_dt_override` is set.
`body_names`	`tuple[str, ...]`	Reference-tracked bodies (for `motion_body_pos_b` / `motion_body_ori_b` terms).
`dataset_repo_id`	`str`	Default HF repo id for the reference dataset; overridable at launch.
`lookahead_steps`	`tuple[int, ...]`	Piano-only: frame offsets the policy was trained to look ahead at.

Observation term registry

observation_names is a CSV like motion_body_pos_b, motion_body_ori_b, joint_pos, joint_vel, actions (tracking task) or target_keys, progress, key_pressed, joint_pos, joint_vel, actions (piano task). Names match upstream pianist-tracking-mj actor terms directly. Each name maps to one of three sources, resolved in this order:

Built-in proprioception terms — instantiated directly from the metadata:

Name	Term class	Slice from
`joint_pos`	`JointPositionTerm(default, scale=1)`	`state.joint_position - default`
`joint_vel`	`JointVelocityTerm(num_joints, scale=1)`	`state.joint_velocity`
`actions`	`LastActionTerm(action_dim)`	last action recorded via `record_action()`
`imu_quaternion`	`ImuFieldTerm("imu_quaternion")`	`state.imu_quat` `(w, x, y, z)`
`imu_angular_velocity`	`ImuFieldTerm(...)`	`state.imu_gyro`
`imu_linear_acceleration`	`ImuFieldTerm(...)`	`state.imu_accel`

Reference-provider terms — delegated to the attached provider. The provider declares its own supported terms via supports(name):
- TrackingReferenceProvider (bar_policy, task_type='tracking'): motion_body_pos_b, motion_body_ori_b.
- PianoCompositeProvider (pianist_policy, task_type='piano'): target_keys, target_keys_future, progress, key_pressed. (target_keys_future is the K-frame lookahead used by the song provider; upstream piano_env_cfg currently keeps it commented out, so its slot in observation_names is conditional on the trained policy.)
Constant terms — caller-supplied fallback. Useful for running without a dataset (zero-fill motion_body_pos_b) during smoke tests.

Unknown names raise ValueError at runner startup — the build-time guarantee is that every observation slot has a deterministic source.

Launch parameters

remote_policy_runner (entry point: remote_policy_runner = bar_policy.remote_policy_runner:main):

Param	Default	Effect
`checkpoint_file`	(empty)	Path to a local `.onnx` file. Mutually exclusive with `wandb_run_path`.
`wandb_run_path`	(empty)	W&B run path (`entity/project/run-id`); the runner downloads the ONNX from the run.
`wandb_checkpoint_name`	(empty)	Specific checkpoint filename inside the W&B run (e.g. `model_2000.onnx`).
`motion_file`	`''`	Piano-only: explicit local LeRobot directory. Wins over registry and metadata.
`registry_name`	`''`	Piano-only: HF repo id to use instead of the ONNX-baked one. Useful when iterating on dataset variants.
`episode_index`	`0`	Reference-dataset episode index to track.
`command_topic`	`/remote_policy_controller/command`	Where to publish `MITCommand`. Must match `RemotePolicyController`'s subscription.
`joint_state_topic`	`/lite/joint_states`	Source for proprio state (driven by `joint_state_broadcaster`, owner-prefixed at bringup).
`policy_dt_override`	`0.0`	Force a different tick period than `PolicyMetadata.policy_dt`. `0.0` means use metadata.

Argument names match mjlab's play CLI vocabulary so the same string moves between training-time and deployment-time scripts.

Dataset resolution order

The runner picks the reference dataset source in this priority:

motion_file:=<local path> — air-gapped deployments pre-fetch via hf download and point here.
registry_name:=<hf_repo_id> — for testing alternative datasets without re-exporting the ONNX.
PolicyMetadata.dataset_repo_id — the default. First run will download from Hugging Face Hub.

Logged at startup so you always know which one was used.

Task-specific runner subclasses live downstream

RemotePolicyRunner is generic. The piano task's runner — pianist_policy.PianoPolicyRunner in T-K-233/pianist_ros2 — subclasses it and overrides only _build_reference_provider() to attach a PianoSongProvider that fetches the LeRobot dataset declared in meta.dataset_repo_id from Hugging Face Hub. Every other piece of the runner (ONNX load, observation pack, action decode, MITCommand publish) is inherited; the subclass adds ~50 LOC.

New task families follow the same pattern: a sibling <task>_ros2 repo with its own <task>_policy package whose runner is a thin RemotePolicyRunner subclass.

Per-tick flow

Any ReferenceProvider (the built-in TrackingReferenceProvider, the downstream PianoSongProvider, or any other) is interchangeable from the runner's perspective — the same per-tick ordering applies regardless of which one is plugged in.

Each timer fire (at policy_dt):

Build the observation: every ObservationTerm.pack(state, slice) runs against the current MITState (refreshed from /lite/joint_states) and the current reference frame (from the provider).
Run the ONNX forward pass on the flat observation vector.
Decode the raw action via target[name] = default[name] + action[name] * scale[name].
Assemble the MITCommand with the configured K_p / K_d and undriven joints pinned to position=0.
Publish on command_topic.
obs_mgr.record_action(action) — refreshes LastActionTerm and steps the reference provider so the dataset frame advances exactly once.

Steps 5 and 6 are intentionally ordered so the policy never observes the post-action dataset frame on the same tick it commanded against — every tick is (t, t+1) in the dataset, never (t+1, t+1).

Policy runner

Why "self-describing ONNX"​

Composition​

ONNX metadata schema​

Observation term registry​

Launch parameters​

Dataset resolution order​

Task-specific runner subclasses live downstream​

Per-tick flow​

See also​