Speculative Ad-hoc Querying

Haoyu Li, Srikanth Kandula, Maria Angels de Luis Balaguer, Aditya Akella, Venkat Arun

2025-03-04

Summary

This paper talks about SpeQL, a new system that predicts what a user is typing in a database query and starts processing it before the user finishes, making results appear almost instantly.

What's the problem?

Running database queries on large datasets can be very slow, especially when users need quick responses to analyze data. Traditional methods wait until a query is fully written before starting to process it, which wastes time and slows down exploratory tasks.

What's the solution?

The researchers created SpeQL, which uses AI to predict what the user might be typing based on the database structure and past queries. It starts preparing results early by predicting the query structure and creating smaller temporary tables that contain the most likely needed data. This allows SpeQL to show results for partial queries in real time, speeding up the process significantly.

Why it matters?

This matters because it makes working with large datasets much faster and more efficient. By reducing query wait times by up to 289 times, SpeQL helps users analyze data more quickly and discover patterns they might have missed otherwise. This could be especially useful in fields like business analytics, scientific research, or any area that relies on fast data exploration.

Abstract

Analyzing large datasets requires responsive query execution, but executing SQL queries on massive datasets can be slow. This paper explores whether query execution can begin even before the user has finished typing, allowing results to appear almost instantly. We propose SpeQL, a system that leverages Large Language Models (LLMs) to predict likely queries based on the database schema, the user's past queries, and their incomplete query. Since exact query prediction is infeasible, SpeQL speculates on partial queries in two ways: 1) it predicts the query structure to compile and plan queries in advance, and 2) it precomputes smaller temporary tables that are much smaller than the original database, but are still predicted to contain all information necessary to answer the user's final query. Additionally, SpeQL continuously displays results for speculated queries and subqueries in real time, aiding exploratory analysis. A utility/user study showed that SpeQL improved task completion time, and participants reported that its speculative display of results helped them discover patterns in the data more quickly. In the study, SpeQL improves user's query latency by up to 289times and kept the overhead reasonable, at 4$ per hour.

View Paper