From CISC to RISC: language-model guided assembly transpilation
Ahmed Heakl, Chaimaa Abi, Rania Hossam, Abdulrahman Mahmoud
2024-11-26

Summary
This paper discusses a new tool called CRT that helps convert software from the x86 architecture to the ARM architecture, making it easier to run older programs on newer, more efficient systems.
What's the problem?
As technology evolves, many devices are shifting from x86 architecture, commonly used in PCs, to ARM architecture, which is more energy-efficient and powerful. However, this transition is difficult because a lot of existing software is designed for x86, and simply switching architectures can cause compatibility issues, meaning older programs may not run on new devices.
What's the solution?
The authors introduce CRT, a lightweight tool that uses a language model to automatically translate x86 assembly code (the low-level instructions that computers understand) into ARM assembly code. This method helps bridge the gap between the two architectures while maintaining the original program's functionality. The paper shows that CRT achieves high translation accuracy and performs better than existing methods, allowing for faster and more efficient execution of programs on ARM devices.
Why it matters?
This research is important because it facilitates the transition from x86 to ARM architecture, which is becoming increasingly common in various devices. By making it easier to run older software on newer systems, CRT can help users take advantage of the benefits of ARM technology without losing access to their existing applications.
Abstract
The transition from x86 to ARM architecture is becoming increasingly common across various domains, primarily driven by ARM's energy efficiency and improved performance across traditional sectors. However, this ISA shift poses significant challenges, mainly due to the extensive legacy ecosystem of x86 software and lack of portability across proprietary ecosystems and software stacks. This paper introduces CRT, a lightweight LLM-based transpiler that automatically converts x86 assembly to ARM assembly. Our approach bridges the fundamental architectural gap between x86's CISC-based and ARM's RISC-based computing paradigms while preserving program semantics and optimizing performance. We evaluate CRT on diverse real-world applications, achieving 79.25% translation accuracy from x86 to ARMv5 on our comprehensive test suite, and an 88.68% accuracy from x86 to RISC-V. In practical deployments on Apple M2 hardware (ARMv8), our transpiled code achieves 1.73times speedup compared to Apple's Rosetta 2 virtualization engine, while delivering 2.41times memory efficiency and 1.47times better energy consumption. Through testing and analysis, we show that CRT successfully navigates the CISC/RISC divide and generates correctly executable RISC code despite machine ``language'' barriers. We release our code, models, training datasets, and benchmarks at: https://ahmedheakl.github.io/asm2asm/.