In a bold leap toward general-purpose humanoid intelligence, robotics company Figure today unveiled Project Go-Big, a massive new initiative to train its Helix model exclusively on egocentric human video. By partnering with real-world environments provided by Brookfield — spanning 100,000+ residences, 500 million sq ft of commercial offices, and 160 million sq ft of logistics space — Figure aims to create the world’s largest and most diverse humanoid “pretraining” dataset.
What sets Go-Big apart is its ambition: to enable zero-shot transfer from human video to robot behavior. In early results, Helix can respond to natural language commands like “Walk to the kitchen table” in cluttered home environments — without any robot-specific demonstration data.
Background & Motivation
In vision and language, model performance has scaled dramatically with large datasets like ImageNet, WebText, and YouTube. But robotics — especially humanoid robotics — has lacked an analogous, large-scale, diverse training corpus. Human environments are messy, varied, and full of subtle affordances (for navigating clutter, opening doors, avoiding obstacles). Traditional robot training regimes (hand-coded paths, curated motion capture, simulation-to-real pipelines) struggle to generalize.
Because humanoid robots share the same kinematic structure (e.g. a camera “eye” height, leg geometry) as humans, Figure aims to exploit this alignment: by training on human first-person (“egocentric”) video, Helix can learn policies and navigation behaviors that transfer directly — no robot-specific demonstration or imitation needed.
Thus, Project Go-Big aspires to become the “YouTube for robot behavior” — a broad, real-world dataset of human activity in homes, offices, logistics settings, etc.
Key Features & Claims
Here are the standout elements of this announcement:
Feature | Description / Claim |
---|---|
Egocentric human video as sole training input | Figure says Helix is trained 100% on human first-person video (no robot demonstrations) for navigation. |
Zero-shot human-to-robot transfer for navigation | Helix can interpret natural language (“go to the fridge”) and navigate cluttered, unfamiliar spaces. |
Unified model for manipulation + navigation | The same Helix network outputs both manipulation and navigation commands, avoiding separate modules. |
Massive real-world environments via Brookfield | Brookfield’s real estate footprint (residential, commercial, logistics) gives access to diverse physical settings for data collection. |
Strategic & financial backing | Brookfield is not just a data source: it has also invested in Figure’s Series C round, aligning incentives. |
Scaling infrastructure beyond data | The partnership may also involve GPU data center planning, robotic training infrastructure, and deployment studies across Brookfield’s properties. |
Challenges, Uncertainties & Open Questions
While ambitious, Project Go-Big faces numerous challenges and caveats:
-
Domain shift & embodiment mismatch
Even though humans and robots have similar kinematics, differences in sensor placement, dynamics, and actuation may pose subtle mismatches. How reliably can video-derived policies generalize across those divergences? -
Data quality, labeling, and annotation
Pure video is raw; extracting the right supervision (object localization, affordances, obstacle mapping, semantics) may require heavy annotation or automated pipelines. The quality of data curation will matter a lot. -
Safety, failure modes, and edge conditions
Real-world homes have unpredictable clutter, pets, stairs, delicate objects, humans. Navigation policies must gracefully handle failure, collisions, or unsafe states. -
Privacy & ethics of video capture
Capturing egocentric human video in residential and commercial spaces raises privacy, consent, and security concerns. How will Figure handle permissions, anonymization, and ethical oversight? -
Deployment scaling & cost
Building and maintaining a pipeline from video collection, storage, compute, and robotic deployment is expensive. Ensuring real-world returns (e.g. useful robot tasks) will be essential. -
Comparative alternatives
Other robotics teams may rely more on simulation, reinforcement learning, or mixed modalities (LiDAR, depth sensors, mapping). How this approach compares in robustness and scalability remains to be seen.
Implications & Potential Impact
If Project Go-Big succeeds, its ramifications could reshape humanoid robotics and embodied AI:
-
Generalist robots in homes and offices
Robots could be instructed in ordinary language (“bring me water”, “go to the window”) and navigate complex, cluttered environments — a big step from task-specific robots. -
Reduced reliance on robot-specific data
The ability to train from human video reduces the need for expensive robot demonstrations or specialized lab environments. -
Acceleration of cross-domain embodied AI
This approach might inspire similar methods in drones, mobile robots, or assistive devices — using human video for embodied learning. -
New business models & real estate integration
With Brookfield’s real estate portfolio as a “testbed,” we may see robots deployed in apartments, office buildings, and logistics centers — bringing robotics closer to daily life. -
AI infrastructure & data advantage
The sheer scale of the dataset (spanning many buildings, layouts, styles) could become a competitive moat for Figure, much like large language models have benefited from scale and data diversity.
The Team & Backing
The announcement credits several individuals: Joumana Kourani, Rohit Naik, AJ T., Max P., Aidan Plenn, Irizarry S., Lilah Noyes, Ciara Hurley, Rogebson Pieroni, Thomas Salovitch. These appear to be people contributing to the project (engineering, research, operations) along with Figure’s core leadership.
In the broader context, Figure has closed over USD 1 billion in Series C funding (post-money valuation of $39B) as part of its scaling push.