I Tested 12 AI Models on the Same Video. The Results Were Wild.
Native video, frame-based, open-weight, proprietary. One hallucinated. Four failed. Here's the full breakdown.
Most AI projects don't fail because of the model. They fail at the messy middle: data pipelines, production infrastructure, organizational alignment. I've spent 15+ years working in that middle, and I write about what I've learned.
Currently at Airbnb. Previously at Capital One and Walmart Global Tech. MBA from NYU Stern. MS in Analytics from Georgia Tech. Adjunct faculty at Northeastern University.
Building workflows that make AI agents genuinely useful in day-to-day work.
Lessons from building pipelines, platforms, and ML systems at scale.
What it actually takes to move from prototype to reliable, running systems.
Notes on building products, finding ideas, and the lessons from YC and beyond.
Navigating tech careers, staying sharp, and thinking clearly under uncertainty.
How non-technical leaders can evaluate, adopt, and invest in AI with confidence.
Native video, frame-based, open-weight, proprietary. One hallucinated. Four failed. Here's the full breakdown.
It's not "how good is it." It's "what can I attempt now that I couldn't before."
I ran one AI task through 4 harness architectures — solo agent, generator+evaluator, and full planner pipeline. Here's what broke and what worked.