Tutorial: Run a tracking policy

End-to-end: take a trained ONNX tracking policy, run it through bar_policy/remote_policy_runner, and watch the arms follow a LeRobot teleop dataset under MuJoCo physics. By the end you'll know what the runner reads from where, how the metadata schema makes everything self-describing, and how to swap in your own policy without rewriting any YAML.

Time + materials

20 minutes
A working workspace build
One of: a local .onnx checkpoint, OR W&B credentials with access to a tracking run, OR access to the canonical Lite-tracking checkpoint (see policy_runner reference for the resolution path)
Network access for the first run (pulls the LeRobot dataset from the HF Hub if you're using the W&B path)

The mental model

Three actors and the data between them:

                   ┌──────────────────────┐
                   │ remote_policy_runner │   (Python, bar_policy)
                   │   - load ONNX         │
                   │   - read 13 metadata  │
                   │     fields            │
                   │   - subscribe /joint_ │
                   │     states + /imu     │
                   │   - step inference @  │
                   │     policy_dt         │
                   └──────────┬───────────┘
                              │ /remote_policy_controller/command
                              │ (bar_msgs/MITCommand)
                              ▼
                   ┌──────────────────────┐
                   │ RemotePolicyController│  (C++, bar_controllers)
                   │   - validate joint    │
                   │     order             │
                   │   - hand off via      │
                   │     RealtimeBuffer    │
                   └──────────┬───────────┘
                              │ 5 MIT command interfaces per joint
                              ▼
                   ┌──────────────────────┐
                   │ MujocoSystem          │  (or RobstrideSystem)
                   │   - apply τ = K·err   │
                   │     + D·erṙ + ff      │
                   └──────────────────────┘

The policy itself is self-describing: every constant it needs (joint order, K_p, K_d, default pose, action scale, observation terms, dataset id, tick rate) is baked into the ONNX's custom_metadata_map. You don't write any of this into YAML.

Step 1 — Bring up MuJoCo

The same launch as the previous tutorial, no gamepad needed:

ros2 launch bar_bringup_lite mujoco.launch.py

Wait for zero_torque_controller to come active.

Step 2 — Walk to STANDBY

remote_policy_controller expects the arms in a sane starting pose. Walk the FSM. In a second terminal, drop into the env:

cd bar_ws
pixi shell

Then:

# Either via the gamepad if you have one:
#   X  → DAMPING
#   L1+A → STANDBY (wait ~4 s for is_finished:true)
#
# Or via direct switch_controllers:
ros2 control switch_controllers --deactivate zero_torque_controller --activate damping_controller
ros2 control switch_controllers --deactivate damping_controller    --activate standby_controller

# Wait for is_finished:
ros2 topic echo /standby_controller/state
# ... is_finished: true

Step 3 — Start the policy runner

In a third terminal:

# Option (a): local ONNX
ros2 launch bar_policy lite_policy.launch.py task:=tracking \
    checkpoint_file:=/path/to/policy.onnx

# Option (b): W&B run path (downloads + caches to ~/.cache/bar_policy/wandb/)
ros2 launch bar_policy lite_policy.launch.py task:=tracking \
    wandb_run_path:=IsaacLab-Training/mjlab/<run_id>

# Option (c): W&B run with a specific checkpoint pinned
ros2 launch bar_policy lite_policy.launch.py task:=tracking \
    wandb_run_path:=IsaacLab-Training/mjlab/<run_id> \
    wandb_checkpoint_name:=model_4000.onnx

What happens behind the scenes:

checkpoint_loader.resolve_checkpoint resolves the checkpoint to a local file (downloading from W&B if needed, ~1 s once cached).

The runner parses the ONNX custom_metadata_map. You'll see logs like:

[remote_policy_runner]: joint_names=[14 entries]
[remote_policy_runner]: policy_dt=0.02  (50 Hz)
[remote_policy_runner]: observation terms: motion_body_pos_b, motion_body_ori_b, joint_pos, joint_vel, actions
[remote_policy_runner]: dataset_repo_id=Berkeley-Humanoids/lite-teleop

The runner pulls the LeRobot reference motion (parquet shards from HF Hub) keyed off episode_index.
The runner subscribes to /lite/joint_states and (if the observation needs it) /imu/data.
The runner starts a timer at policy_dt; each tick it packs the observation, runs ONNX inference, decodes the action, builds an MITCommand, and publishes.

At this point the runner is publishing to /remote_policy_controller/command but remote_policy_controller isn't active yet, so nothing changes physically.

Step 4 — STANDBY → REMOTE

In the FSM-walk terminal (inside pixi shell):

ros2 control switch_controllers \
    --deactivate standby_controller \
    --activate   remote_policy_controller

The motors immediately track the policy. In MuJoCo you'll see the arms move through the tracking dataset.

Verify the publish rate:

ros2 topic hz /remote_policy_controller/command
# average rate: 50.0

If the runner crashes or the dataset finishes, the stale-command policy kicks in within 100 ms and the arms go limp. If the runner is healthy but the rate drops below ~10 Hz, the controller will also flag stale.

Step 5 — Inspect the data flow

# What does an MITCommand actually look like?
ros2 topic echo --once /remote_policy_controller/command | head -30

# How fast is observation packing?
ros2 topic hz /lite/joint_states                 # broadcaster rate
ros2 topic hz /remote_policy_controller/command  # runner rate (policy_dt)

The two rates match because the runner ticks at policy_dt = 0.02, which equals the controller_manager's update rate.

Step 6 — Shut down

DAMP first, then exit (still in the pixi shell terminal):

ros2 control switch_controllers \
    --deactivate remote_policy_controller \
    --activate   damping_controller

# Kill the runner first (Ctrl+C in its terminal), then the launch.

The runner's finally clause closes the publishers cleanly; the launch's on_deactivate cascades through controllers.

What you came away with

Skill	Page where it's documented in detail
Resolving an ONNX checkpoint (local / W&B)	Reference → Policy runner
The 13 metadata fields	same
LeRobot dataset path	same
The `MITCommand` schema	Reference → Messages
`RemotePolicyController` lifecycle	Reference → Controllers

How-to → Promote a Python policy to in-process C++ — the lower-latency path, once you've validated in Python.
Concepts → Frozen schemas — what about this contract is frozen, and what's tunable.

Tutorial: Run a tracking policy

Time + materials​

The mental model​

Step 1 — Bring up MuJoCo​

Step 2 — Walk to STANDBY​

Step 3 — Start the policy runner​

Step 4 — STANDBY → REMOTE​

Step 5 — Inspect the data flow​

Step 6 — Shut down​

What you came away with​

Next​