Transformers meet Neural Algorithmic Reasoners

Wilfried Bounsi, Borja Ibarz, Andrew Dudzik, Jessica B. Hamrick, Larisa Markeeva, Alex Vitvitskyi, Razvan Pascanu, Petar Veličković

2024-06-14

Transformers meet Neural Algorithmic Reasoners

Summary

This paper presents a new model called TransNAR, which combines the strengths of Transformers, known for understanding language, with neural algorithmic reasoners (NARs), which are good at solving math problems. This hybrid approach aims to improve how AI can reason about complex tasks.

What's the problem?

While Transformers have become very effective at understanding and generating human language, they struggle with tasks that require precise calculations and algorithmic reasoning. This is a problem because many real-world applications need AI to perform both language understanding and complex reasoning accurately.

What's the solution?

To solve this problem, the authors developed TransNAR, a model that merges the capabilities of Transformers with those of NARs. They created a hybrid architecture that allows the language model to access and utilize the mathematical reasoning power of the NAR. This is done through a two-phase training process where the Transformer can learn from the node embeddings produced by the NAR. The researchers tested TransNAR on a benchmark called CLRS-Text and found that it performed significantly better than models that only used Transformers for reasoning tasks.

Why it matters?

This research is important because it shows how combining different types of AI can lead to better performance in solving complex problems. By integrating language understanding with robust reasoning, TransNAR could enhance applications in fields like education, finance, and robotics, where both language skills and precise calculations are essential.

Abstract

Transformers have revolutionized machine learning with their simple yet effective architecture. Pre-training Transformers on massive text datasets from the Internet has led to unmatched generalization for natural language understanding (NLU) tasks. However, such language models remain fragile when tasked with algorithmic forms of reasoning, where computations must be precise and robust. To address this limitation, we propose a novel approach that combines the Transformer's language understanding with the robustness of graph neural network (GNN)-based neural algorithmic reasoners (NARs). Such NARs proved effective as generic solvers for algorithmic tasks, when specified in graph form. To make their embeddings accessible to a Transformer, we propose a hybrid architecture with a two-phase training procedure, allowing the tokens in the language model to cross-attend to the node embeddings from the NAR. We evaluate our resulting TransNAR model on CLRS-Text, the text-based version of the CLRS-30 benchmark, and demonstrate significant gains over Transformer-only models for algorithmic reasoning, both in and out of distribution.

View Paper