HOW WE ENSURE QUALITY

Most manipulation datasets ship raw trajectories with no quality guarantee. You find out the data is bad when your policy doesn't improve. We verify before we deliver.

STAGE 01

MULTI-MODAL CAPTURE

Synchronized multi-camera arrays, depth sensing, and tactile instrumentation. Every modality time-aligned to sub-10ms.

EGOCENTRIC VIDEO

60 fps, multi-camera

First-person perspective from expert workers performing industrial manipulation tasks. Multi-camera RGB synchronized to sub-10ms.

DEPTH

15-30 Hz, metric-scale

LiDAR depth maps providing world-space 3D geometry. Accurate hand and object positioning for policy training.

TACTILE

12 sensors, 25 Hz

Per-finger contact dynamics including grip force and contact events. The signal that cameras cannot capture.

KINEMATICS

6DoF + MANO pose

Full 3D hand articulation with depth-refined metric-scale positioning. 21-joint mesh recovery on every frame.

21%

Vision Only

Each square = 1% success rate on contact-rich benchmarks

BY TASK TYPE

Peg insertion18% → 76%
Cable routing12% → 64%
Surface wiping31% → 82%
Connector seating15% → 68%
Object reorientation29% → 65%

Gray = vision only. White = added by tactile. Tasks where cameras cannot see the contact dynamics.

STAGE 02

AUTOMATED ANNOTATION + EXPERT REVIEW

Hand pose estimation, object segmentation, and action classification on every frame. Human reviewers handle the edge cases.

HAND POSE

Dense 3D hand mesh recovery on every frame. Fused with tactile signals to disambiguate during occlusion.

OBJECT TRACKING

Instance segmentation with depth-informed boundaries. Consistent object identity across the full episode.

ACTION SEGMENTS

Temporal action labels from tactile events, hand motion, and audio. Language-annotated task descriptions per episode.

STAGE 03

INTEGRITY & SYNC QA GATE

Every episode passes a QA gate before delivery. Broken episodes don't get shipped.

WHAT WE CHECK

  • All modalities temporally aligned to within one frame
  • Monotonic timestamps and regular sampling on every stream
  • NaN, Inf, and dropout detection on every channel
  • Hand pose and action smoothness — velocity-spike detection flags glitches
  • Per-feature statistics (mean, std, min, max) on state and action vectors

WHAT HAPPENS ON FAILURE

  • QA failures halt the pipeline run — broken episodes don't get shipped
  • Failure reason is classified and logged per episode
  • Per-dataset QA statistics ride along with delivery
  • Deliberate failure + recovery episodes are captured by design and segmented separately
STAGE 04

WHAT YOU RECEIVE

EPISODE STRUCTURE

episode_001/
├─ video/
│ ├─ cam_ego.mp4
│ └─ cam_exo.mp4
├─ depth/
│ └─ frame_0001.png … N.png
├─ tactile.parquet
├─ kinematics.parquet
├─ metadata.json
└─ qa_report.json

QA REPORT

Threaded pipe fitting

EP-0847 · 42s

PASS
Modality syncall streams < 1 frame drift
Timestampsmonotonic, regular
NaN / dropout0 across 24 channels
Action smoothnessp99/max 0.42
Pose continuityboth hands every frame

Hardware mounting with drill

EP-0923 · 67s

REJECTED
Modality synctactile drift 3 frames
Timestampsmonotonic, regular
NaN / dropout1 channel intermittent
Action smoothnessp99/max 0.38
Pose continuityboth hands every frame

FORMAT

LeRobot v3 schema (synchronized MP4 + Parquet)

GR00T N1 and Isaac Lab compatible

Camera extrinsics and calibration data included

DELIVERY

Chunked storage for efficient streaming

Delivered via bucket or direct transfer

Per-episode and per-dataset statistics

SEE IT IN YOUR PIPELINE

Request a sample episode. Drop it into your training stack. If it works, we talk.