Reinforcement learning (RL) has become the dominant paradigm for fine-tuning large language models. It has enabled major advances in alignment and reasoning, but it also comes with significant tradeoffs. RL post-training requires complex reward pipelines, careful hyperparameter tuning, and substantial computational infrastructure. As models scale, these systems can become expensive to run, sensitive to instability, and prone to unintended behaviors such as reward hacking.

A few months ago, Cognizant AI Lab published groundbreaking research that challenged that dominance and quickly gained widespread attention.In that work, we introduced the first successful use of Evolution Strategies (ES) to fine-tune the full parameter set of large language models without backpropagation. ES was able to efficiently search over billions of parameters and outperform state-of-the-art reinforcement learning methods. It demonstrated:

Stronger sample efficiency

Greater tolerance to long-horizon rewards

Improved robustness across base models

More stable performance across runs

Significantly lower training cost

But the deeper impact of that breakthrough was not simply that ES worked at scale. It expanded the scope of what fine-tuning could target and where it could operate.

If large models can be adapted without gradients, what new kinds of capabilities and objectives become reachable? And if optimization is no longer tied to backpropagation, how far can fine-tuning extend across more complex tasks, different alignment goals, and even new hardware constraints?

Those questions directly shaped the next phase of our research.

Today, we are launching the second phase of our ES fine-tuning work: four new papers that each explore a distinct research direction made possible by the original breakthrough.

These directions span:

Expanding ES to more complex and structured reasoning domains

Improving metacognitive alignment in language models

Enabling fine-tuning directly in quantized, low-precision environments

Building a theoretical foundation for ES scalability in high-dimensional systems

Together, they demonstrate that ES fine-tuning is not a single breakthrough result, but an expanding research trajectory with growing practical applications and deeper scientific implications.