Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations
A. Bochkov
2025-07-11
Summary
This paper talks about how transformer language models that use fixed visual symbols instead of changeable, meaningful embeddings actually perform better on reasoning tasks.
What's the problem?
Normally, it is believed that models need to learn detailed semantic meanings in their input embeddings to understand and reason well, but training these embeddings can be slow and complicated.
What's the solution?
The researchers showed that by using frozen (fixed) visual unicode-like tokens without explicit semantic meanings, the models can still develop strong understanding and reasoning abilities just from their architecture and training process.
Why it matters?
This matters because it challenges the idea that embeddings must be semantic for good performance, suggesting that the power of transformers comes more from how they are built and trained, which could simplify future AI designs.
Abstract
Transformer models with fixed, non-semantic visual embeddings outperform those with trainable semantic embeddings on reasoning tasks, suggesting that high-level semantics emerge from the model's architecture rather than the embeddings.