Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow Extraction

Sergio Burdisso, Srikanth Madikeri, Petr Motlicek

2024-10-29

Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow Extraction

Summary

This paper introduces Dialog2Flow (D2F), a new method for automatically extracting structured workflows from conversations without needing pre-labeled data.

What's the problem?

Extracting useful workflows from conversations can be very challenging, especially when the dialogues are not pre-annotated. Traditional methods often require a lot of manual work to create structured workflows, which is time-consuming and inefficient. This makes it hard to quickly adapt workflows to new tasks or domains.

What's the solution?

The authors developed D2F embeddings that group conversation sentences based on their meanings and functions, allowing for better organization of dialogues. They created a large dataset by combining twenty different task-oriented dialogue datasets, which helps the model learn how to recognize different actions in conversations. D2F uses a special training technique called soft contrastive loss, which improves how well the model learns from the data compared to standard methods. By clustering these embeddings, D2F can convert dialogues into sequences of actions, making it easier to extract workflows.

Why it matters?

This research is important because it simplifies the process of creating structured workflows from conversations, which can be applied in various fields like customer service, education, and automation. By enabling faster and more efficient workflow extraction, D2F can help organizations adapt to new tasks more quickly and improve their operational efficiency.

Abstract

Efficiently deriving structured workflows from unannotated dialogs remains an underexplored and formidable challenge in computational linguistics. Automating this process could significantly accelerate the manual design of workflows in new domains and enable the grounding of large language models in domain-specific flowcharts, enhancing transparency and controllability. In this paper, we introduce Dialog2Flow (D2F) embeddings, which differ from conventional sentence embeddings by mapping utterances to a latent space where they are grouped according to their communicative and informative functions (i.e., the actions they represent). D2F allows for modeling dialogs as continuous trajectories in a latent space with distinct action-related regions. By clustering D2F embeddings, the latent space is quantized, and dialogs can be converted into sequences of region/action IDs, facilitating the extraction of the underlying workflow. To pre-train D2F, we build a comprehensive dataset by unifying twenty task-oriented dialog datasets with normalized per-turn action annotations. We also introduce a novel soft contrastive loss that leverages the semantic information of these actions to guide the representation learning process, showing superior performance compared to standard supervised contrastive loss. Evaluation against various sentence embeddings, including dialog-specific ones, demonstrates that D2F yields superior qualitative and quantitative results across diverse domains.

View Paper