Generating a Low-code Complete Workflow via Task Decomposition and RAG
Orlando Marquez Ayala, Patrice Béchard
2024-12-04

Summary
This paper discusses a new approach called Masked Referring Image Segmentation (MaskRIS) that improves how AI identifies and segments objects in images based on text descriptions.
What's the problem?
Referring Image Segmentation (RIS) is a complex task where AI needs to find and outline objects in an image based on detailed text descriptions. Traditional methods of training AI models often rely on standard image augmentations, which can confuse the model and lead to mistakes. This makes it hard for the AI to accurately identify the objects as described, resulting in lower quality outputs.
What's the solution?
To solve this issue, the researchers developed MaskRIS, which uses a unique method of data augmentation that includes both image and text masking. By applying random masking to parts of the image and focusing on relevant areas, MaskRIS helps the AI model better understand what to look for. This approach also incorporates Distortion-aware Contextual Learning (DCL), which improves the model's ability to handle incomplete information and complex language. The experiments showed that MaskRIS significantly enhances the performance of various RIS models compared to existing methods.
Why it matters?
This research is important because it advances the capabilities of AI in understanding and interpreting images based on textual descriptions. By improving the accuracy of object segmentation, MaskRIS can enhance applications in fields like robotics, augmented reality, and image editing, where precise identification of objects is crucial for effective interaction and analysis.
Abstract
AI technologies are moving rapidly from research to production. With the popularity of Foundation Models (FMs) that generate text, images, and video, AI-based systems are increasing their complexity. Compared to traditional AI-based software, systems employing FMs, or GenAI-based systems, are more difficult to design due to their scale and versatility. This makes it necessary to document best practices, known as design patterns in software engineering, that can be used across GenAI applications. Our first contribution is to formalize two techniques, Task Decomposition and Retrieval-Augmented Generation (RAG), as design patterns for GenAI-based systems. We discuss their trade-offs in terms of software quality attributes and comment on alternative approaches. We recommend to AI practitioners to consider these techniques not only from a scientific perspective but also from the standpoint of desired engineering properties such as flexibility, maintainability, safety, and security. As a second contribution, we describe our industry experience applying Task Decomposition and RAG to build a complex real-world GenAI application for enterprise users: Workflow Generation. The task of generating workflows entails generating a specific plan using data from the system environment, taking as input a user requirement. As these two patterns affect the entire AI development cycle, we explain how they impacted the dataset creation, model training, model evaluation, and deployment phases.