Fortytwo: Swarm Inference with Peer-Ranked Consensus

Vladyslav Larin, Ihor Naumenko, Aleksei Ivashov, Ivan Nikitin, Alexander Firsov

2025-10-30

Fortytwo: Swarm Inference with Peer-Ranked Consensus

Summary

This paper introduces Fortytwo, a new way to run AI programs that doesn't rely on one super-powerful computer. Instead, it uses a network of many smaller AI 'nodes' working together to get better results.

What's the problem?

Current AI systems are hitting a limit in how much they can improve just by getting bigger and using more computing power. Also, relying on a single, central AI creates a bottleneck and makes it hard for everyone to access high-quality AI services. It's like everyone trying to get through one doorway at once – it gets crowded and slow.

What's the solution?

Fortytwo uses a 'swarm intelligence' approach, similar to how bees work together in a hive. Each AI node gives its answer to a question, and then the system compares the answers in pairs to figure out which nodes are the most reliable. Over time, the more accurate nodes gain more 'reputation' and their answers are given more weight. This system also prevents someone from cheating by creating many fake nodes, requiring them to prove their abilities and put up a stake. It's like a peer-review system for AI, where the best answers rise to the top.

Why it matters?

This research is important because it paves the way for decentralized AI systems. This means more people can access powerful AI without needing massive, expensive computers. It also makes the system more robust and secure, as it's harder to disrupt a network of many nodes than a single central point. Ultimately, it aims to democratize access to high-quality AI and make it more reliable.

Abstract

As centralized AI hits compute ceilings and diminishing returns from ever-larger training runs, meeting demand requires an inference layer that scales horizontally in both capacity and capability. We present Fortytwo, a novel protocol that leverages swarm intelligence principles and distributed pairwise ranking consensus to achieve superior performance in AI inference. Our approach reimagines collaboration among AI nodes using swarm inference: a peer-ranked, reputation-weighted consensus across heterogeneous models that surfaces the highest-quality responses. Using pairwise ranking with a custom Bradley-Terry-style aggregation model, we demonstrate that swarm inference substantially outperforms majority voting, achieving 85.90% on GPQA Diamond versus 68.69% for majority voting with the same model set - an improvement of +17.21 percentage points (approximately +25.1% relative). The protocol incorporates on-chain reputation so node influence adapts to demonstrated accuracy over time, yielding a meritocratic consensus that filters low-quality or malicious participants. To resist Sybil attacks, Fortytwo employs proof-of-capability in its consensus: nodes must successfully complete calibration/test requests and stake reputation to enter ranking rounds, making multi-identity attacks economically unattractive while preserving openness. Across six challenging benchmarks, including GPQA Diamond, LiveCodeBench, and AIME, our evaluation indicates higher accuracy and strong resilience to adversarial and noisy free-form prompting (e.g., prompt-injection degradation of only 0.12% versus 6.20% for a monolithic single-model baseline), while retaining practical deployability. Together, these results establish a foundation for decentralized AI systems - democratizing access to high-quality inference through collective intelligence without sacrificing reliability or security.

View Paper