Everyone’s Hyped About Sora 2 — So We Went Back to Veo 3 to See Where AI Video Stands Today
- Qiuyan Xu
- Oct 6, 2025
- 2 min read
Updated: 6 days ago
Everyone’s talking about Sora 2 — the upcoming leap in AI video generation promising longer, more coherent, and physics-grounded scenes. While waiting to test it ourselves, we decided to look back — and experiment with what’s already here: Veo 3.
At Gravitate AI, we turned one photo from a recent team-building day at a local climbing gym into two short Veo 3 clips. No corporate talks — only trust, focus, and fun. But what the AI produced told us something bigger about how machines see and imagine the world.
🎥 Version 1 — Synchronized Motion
Everyone climbs at once — cinematic, energetic, perfectly smooth. It feels coordinated, but almost too perfect — a scene of teamwork rather than a moment within it.
🎥 Version 2 — Sequential Motion
We added nuance to the prompt:
“Don’t start with everyone climbing. Let the left climber go first, another adjust, another pause.”
The pacing became more human — yet the AI made bold choices: 🧩 It invented a new climber on the front left. 🧩 It created a hold near the top that didn’t exist.
🧠 Why This Happens
Veo 3 doesn’t really know what’s real. It builds a 3D illusion from the photo, then animates statistically plausible motion. So when told “left front climber,” it conjured one to satisfy the prompt.
The model followed semantic logic, not physical logic. It satisfied the story — not the physics.
⚖️ What This Reveals
Aspect
Cinematography
Human Motion
Scene Continuity
Prompt Control
Current Strength
Smooth camera motion
Plausible gestures
Stable tone
Understands sequence words
Limitation
Lacks subject focus
No true contact realism
Geometry drifts
Over-literal meaning
AI video sits between image synthesis and simulation. It can create believable motion — but not yet consistent worlds.
🔭 Looking Ahead — and Toward Sora 2
If Sora 2 delivers what OpenAI has previewed — stronger temporal coherence, physics-aware environments, and multi-scene continuity — it could mark the moment when AI video begins to reason about space and time.
We’re excited to test it when available — not for spectacle, but to benchmark how close AI video is getting to true spatial understanding.
🧗 Closing Thought
Climbing — like building AI — is about testing every hold. Some are solid, some imagined. Each step upward reveals what’s real.

