Amazon’s head of AI argues that chasing leaderboard rankings for large language models (LLMs) is a distraction from the real goal: creating AI that solves specific business problems. Rohit Prasad, Amazon’s SVP of AGI, believes current benchmarks are unreliable and don’t reflect a model’s true utility. He says the focus should shift from theoretical performance to demonstrable real-world results.
The Problem with AI Benchmarks
The AI industry has become obsessed with benchmark scores, but these metrics are flawed. The models are trained on different datasets, and evaluations are often biased or inconsistent. This makes direct comparisons meaningless. Prasad argues that the only fair comparison would require all models to be trained on identical data, which is impractical. The real issue is that benchmarks don’t measure the value of AI in practical applications.
Introducing Nova Forge: Custom AI at Scale
Amazon’s response is Nova Forge, a new service allowing companies to train custom AI models without the usual massive costs. Forge gives businesses access to Amazon’s Nova model checkpoints at various stages of training. This lets them inject their own proprietary data early in the process, when the model is most receptive to learning. This approach avoids the pitfalls of fine-tuning closed models or retraining open-weight models, both of which can degrade performance.
Forge essentially democratizes advanced AI development by offering access to tools previously available only to major tech companies. Amazon built Forge internally because its teams needed this capability. The company’s pattern has always been to first solve its own problems before turning the solutions into a business.
Reddit’s Early Success with Forge
Reddit is one of the first companies using Forge to build custom safety models trained on 23 years of community moderation data. Reddit’s CTO, Chris Slowe, says the tool is “revolutionary,” enabling them to create a model that understands the nuances of their platform’s unique culture. Their goal is to replace multiple existing safety systems with a single, highly specialized model that can better enforce community rules.
Slowe admits that Nova isn’t a top-ranked model, but it doesn’t matter. What matters is the model’s ability to perform its intended function—in this case, understanding what constitutes “jerk” behavior on Reddit.
The Future of AI: Specialization Over General Intelligence
Amazon is betting that the race for the most intelligent AI is less important than the ability to build useful AI. The company is positioning itself as the platform for businesses that need custom solutions, rather than competing directly with OpenAI and Anthropic on pure model capability. This strategy aligns with AWS’s core philosophy: providing infrastructure and tools that empower others to innovate.
Amazon is betting that the model race has commoditized and that it can succeed by being the place where companies can build specialized AI for specific business problems.
Ultimately, the success of this approach will depend on developer adoption. But if Amazon is right, the future of AI isn’t about who has the highest benchmark score—it’s about who can deliver real-world value.




























