Validation & Benchmarks

THEORETICAL VALIDATION

Comparative analysis of structured task datasets for robotics

Recent research demonstrates the value of manual-derived procedural knowledge for robotics:

Robots operating novel appliances by grounding in user manuals achieved 30-50% improvement in zero-shot success rates.

Structured procedures from appliance documentation significantly improved task completion compared to demonstration-only learning.

Sim-to-real transfer benefits from procedural fidelity — synthetic tasks alone miss real appliance complexity.

"Every major humanoid effort highlights the same gap: robots lack structured task data for real-world objects."

Trajectory datasets show what happened. Syngraph specifies what should happen — and what to do when it doesn't.

Largest structured procedure dataset for robotics. Comprehensive coverage across household appliance categories.

Formal schema with explicit preconditions, actions, effects. Directly executable by planners — no interpretation layer needed.

Hazards, failure modes, and recovery procedures extracted from manufacturer documentation. Audit-ready for regulated deployments.

Every fact traces to source. No hallucination. No invented procedures. Verifiable ground truth.

Procedures, controls, parts from document layout

Preconditions, constraints, failure modes, cross-references

Current benchmark: 0.988 quality score, zero hallucinated primitives.

We are developing public benchmarks for:

Academic collaborations welcome.