Camera Setup
A minimum of one camera is required for imitation learning. Two cameras significantly improve policy performance:
- Top-down camera — mounted 60–80 cm above the workspace on a camera arm or ceiling mount. Captures object positions and gripper state clearly. 640×480 @ 30 fps is sufficient; 1280×720 preferred.
- Wrist camera (optional) — mounted near the end-effector. Provides fine-grained view for contact and grasping tasks. Intel RealSense D405 or equivalent.
# Test camera streams before recording
python3 -c "
import cv2
cap = cv2.VideoCapture(0) # /dev/video0
ret, frame = cap.read()
print('Camera OK:', frame.shape if ret else 'FAILED')
cap.release()
"
Recording Workflow
Use the LeRobot record script. It handles multi-camera sync, joint state logging, and HuggingFace Hub upload in one command:
python -m lerobot.scripts.control_robot \
--robot.type=linker_bot_o6 \
--control.type=record \
--control.fps=30 \
--control.repo_id=your-username/o6-pick-place \
--control.tags='[o6,pick-place,tabletop]' \
--control.warmup_time_s=5 \
--control.episode_time_s=30 \
--control.reset_time_s=10 \
--control.num_episodes=60
The workflow per episode:
- Warm-up period (5 s) — set up the scene (place the object in starting position).
- Recording (up to 30 s) — perform the task using your teleoperation method.
- Reset period (10 s) — return arm to home, reset objects, prepare for next episode.
- Repeat for all 60 planned episodes.
LeRobot Dataset Format
Each episode is stored as a Parquet file with synchronized observations and actions. Key fields:
# Dataset structure (per episode)
{
"observation.state": [T, 12], # joint positions [6] + velocities [6]
"action": [T, 6], # commanded joint positions
"observation.images.top": [T, H, W, 3], # top camera frames (uint8 RGB)
"observation.images.wrist": [T, H, W, 3], # wrist camera (if configured)
"timestamp": [T], # seconds from episode start
"episode_index": scalar,
"frame_index": [T],
"task": str # task description
}
Task Definition Tips
- Start simple. "Pick up the red cube and place it in the bin" is a good first task. Avoid compound multi-step tasks until you have a working baseline.
- Consistent starting position. Place objects in the same location (±1 cm) for every episode. High variation requires more data.
- Complete the task end-to-end. Each episode should go from a clear start state to a clear goal state. Do not record partial completions.
- Smooth motions. Avoid jerky, corrective motions if possible. Smooth demonstrations produce better policies.
Quality Filtering Protocol
After your recording session, review and filter before training:
- Watch each episode playback:
python -m lerobot.scripts.visualize_dataset --repo-id your-username/o6-pick-place --episode-index 0 - Mark episodes where the task was NOT completed end-to-end for deletion.
- Check for dropped camera frames (visual stutters in playback). Discard affected episodes.
- Confirm joint state data is continuous — no large discontinuities between consecutive frames.
- Verify object starting position was consistent across kept episodes.
- After filtering, confirm you have ≥50 episodes remaining.
- Back up the dataset to a second location before proceeding.
Uploading to HuggingFace Hub
# Push to Hub (requires huggingface-cli login)
huggingface-cli login
python -m lerobot.scripts.push_dataset_to_hub \
--repo-id your-username/o6-pick-place
Uploading to the Hub makes your dataset available for team collaboration, reproducible training runs, and contribution to the shared robotics dataset registry at roboticscenter.ai/datasets.
Unit 4 Complete When...
You have 50 or more high-quality demonstration episodes in your LeRobot dataset. All episodes pass the quality checklist. Camera streams are complete and joint state data is continuous. The dataset is backed up and optionally pushed to HuggingFace Hub. You have identified your task description string and confirmed it is consistent across all episodes.