MLP-KAN: Unifying Deep Representation and Function Learning

Yunhong He, Yifeng Xie, Zhengqing Yuan, Lichao Sun

2024-10-07

MLP-KAN: Unifying Deep Representation and Function Learning

Summary

This paper presents SciPrompt, a new method that enhances the classification of scientific texts into specific topics by automatically retrieving relevant scientific terms and improving the prompts used for training language models.

What's the problem?

Classifying scientific texts can be difficult, especially when there is limited labeled data available. Traditional methods often rely on manually created prompts and terms, which require expert knowledge and can be time-consuming. This makes it challenging to accurately categorize texts, particularly into more specific subcategories within scientific fields.

What's the solution?

To address this issue, the authors developed SciPrompt, which automatically gathers relevant scientific terms from existing literature to enhance the classification process. They use these terms to create more informative prompts that help the model understand the context better. Additionally, they introduced a new verbalization strategy that uses correlation scores to improve predictions during model training. This allows SciPrompt to perform well even with few examples for training.

Why it matters?

This research is important because it shows how automated methods can make scientific text classification more efficient and accurate. By reducing the reliance on manual input and leveraging existing knowledge, SciPrompt can help researchers and scientists categorize their work more effectively, leading to better organization and accessibility of scientific information.

Abstract

Recent advancements in both representation learning and function learning have demonstrated substantial promise across diverse domains of artificial intelligence. However, the effective integration of these paradigms poses a significant challenge, particularly in cases where users must manually decide whether to apply a representation learning or function learning model based on dataset characteristics. To address this issue, we introduce MLP-KAN, a unified method designed to eliminate the need for manual model selection. By integrating Multi-Layer Perceptrons (MLPs) for representation learning and Kolmogorov-Arnold Networks (KANs) for function learning within a Mixture-of-Experts (MoE) architecture, MLP-KAN dynamically adapts to the specific characteristics of the task at hand, ensuring optimal performance. Embedded within a transformer-based framework, our work achieves remarkable results on four widely-used datasets across diverse domains. Extensive experimental evaluation demonstrates its superior versatility, delivering competitive performance across both deep representation and function learning tasks. These findings highlight the potential of MLP-KAN to simplify the model selection process, offering a comprehensive, adaptable solution across various domains. Our code and weights are available at https://github.com/DLYuanGod/MLP-KAN.

View Paper