PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference
Burc Gokden
2025-02-24
Summary
This paper talks about CrossOver, a new AI system that can understand 3D scenes by combining different types of information like photos, 3D scans, and text descriptions in a flexible way.
What's the problem?
Current methods for understanding 3D scenes using multiple types of data (like images and 3D models) often need all the data to be perfectly matched and available, which isn't always possible in real-world situations.
What's the solution?
The researchers created CrossOver, which uses special encoders for different types of data and a multi-step training process. This allows the system to understand 3D scenes even when some types of data are missing. It can work with photos, 3D scans, computer models, floor plans, and text descriptions without needing them to be perfectly aligned.
Why it matters?
This matters because it makes 3D scene understanding more practical for real-world use. It could help improve things like virtual reality, robotics, and automated systems that need to understand complex 3D environments. The flexibility of CrossOver means it can work in situations where not all types of data are available, making it more useful in everyday applications.
Abstract
We show that Large Language Model from Power Law Decoder Representations (PLDR-LLM) is a foundational model whose deductive outputs are invariant tensors up to a small perturbation. PLDR-LLM learns a singularity condition for the deductive outputs that enable the once-inferred energy-curvature tensor G_{LM} to replace the deep neural network of power law graph attention (PLGA) generating the deductive outputs at inference. We demonstrate that a cache for G_{LM} (G-cache) and KV-cache can be implemented in a straightforward manner to improve the inference time. The invariance and generalizable nature of deductive outputs is at a very high fidelity where deductive outputs have same RMSE and determinant values up to 15 decimal places after caching, and zero-shot benchmark scores remain unchanged. Ablation studies show that learned deductive outputs have distinct loss and accuracy characteristics from models pretrained with transferred, randomly initialized or identity tensors as a constant tensor operator and an LLM with scaled-dot product attention (SDPA) is a special case of PLDR-LLM where G_{LM} is predefined as identity. The observed invariance characteristic introduces a novel asymmetry between training and inference phases with caching. We outline observed common characteristics of the deductive outputs for the learned singularity condition. We provide an implementation of a training and inference framework for PLDR-LLM with KV-cache and G-cache.